Data Analysis| Project 02: Call Center Data Analysis With SQL

Data Detective Mahadi
6 Dec 202217:35

Summary

TLDRThis video script outlines a comprehensive data analysis process on a database table with 12 columns. It begins with visual inspection, noting issues like incorrect date formats and spaces in column names. The script guides through SQL commands to correct date formats, alter data types, and rename columns. It addresses nullifying blank values and checks for duplicates. The analysis includes calculating call percentages by reason, identifying peak call days, and assessing call durations. It concludes with customer sentiment analysis and service level assessments for call centers, providing actionable insights for improvement.

Takeaways

  • πŸ“Š The script involves working with a database table, focusing on data issues and their resolutions.
  • πŸ—“οΈ There's an issue with the date format in the 'call timestamp' column, which needs to be corrected from 'month date year' to 'year month day'.
  • 🚫 The 11th column has a name with spaces, which needs to be renamed for consistency.
  • πŸ”„ The 'C Series score' is incorrectly identified as a 'capital deliber' data type and needs to be changed to 'integer'.
  • πŸ”§ The 'current timestamp' is identified as a 'debt' type, which should be corrected.
  • πŸ› οΈ SQL safe updates are disabled for making changes to the table, a practice for safety during modifications.
  • πŸ”„ The 'timestamp' column's date format is changed using the STR_TO_DATE function in SQL.
  • πŸ”„ Data types are corrected for 'call timestamp' and 'C Series score' columns to ensure accuracy.
  • πŸ”§ The 11th column, initially named with spaces, is renamed to 'CD in a minute' to avoid issues.
  • βœ… Blank values in the 'C Series score' column are replaced with 'NULL' to maintain data integrity.
  • πŸ” The script checks for duplicate values in the dataset and finds none, indicating data is ready for analysis.
  • πŸ“Š Various data analysis steps are performed, including calculating percentages of call reasons, identifying peak call days, and analyzing call durations.
  • πŸ“ˆ The analysis reveals that 'billing question' is the most common call reason with the highest percentage.
  • πŸ“‰ Friday is identified as the day with the most calls, while Sunday has the least.
  • πŸ“Š The minimum, maximum, and average call durations are calculated, providing insights into call handling times.
  • πŸ”Ž Customer sentiment analysis shows a higher percentage of negative sentiments compared to positive.
  • πŸ“Š The final analysis examines call center performance by counting calls within and above service levels for different response times.

Q & A

  • What was the issue with the date format in the database?

    -The date format under the 'call timestamp' column was in the format of month-date-year, but it needed to be in the format of year-month-day.

  • How was the date format corrected in the database?

    -The date format was corrected by using the STR_TO_DATE function in SQL to convert the 'call timestamp' column to the desired format.

  • What was the problem with the 11th column in the database?

    -The name of the 11th column contained spaces, which needed to be renamed for consistency and proper SQL practices.

  • How were the data types issues in the 'C Series score' and 'call timestamp' columns addressed?

    -The data types were corrected by altering the table to change the 'C Series score' to an integer and the 'call timestamp' to a date type.

  • What was the issue with the values in the fourth column?

    -There were blanking values in the fourth column, which were replaced with NULL to accurately represent missing data.

  • How were duplicates checked in the database?

    -Duplicates were checked by counting the total number of rows and comparing it with the count of unique 'ID' and 'customer name' values.

  • What was the most common reason for calls according to the analysis?

    -The most common reason for calls was 'billing question', with a percentage of 71.2%.

  • Which day of the week had the most calls?

    -Friday had the most calls, while Sunday had the least.

  • What were the minimum, maximum, and average call durations?

    -The minimum call duration was 5 minutes, the maximum was 45 minutes, and the average call duration was 25 minutes.

  • What was the sentiment analysis result for the 'sentiment' column?

    -The sentiment analysis showed that negative sentiments were higher compared to positive sentiments, with positive sentiments being very low.

  • How were calls within or above the service level checked for each call center?

    -Calls within or above the service level were checked by grouping and counting the calls based on 'call center' and 'response time', then ordering the results by 'call center' and 'counts'.

Outlines

00:00

πŸ” Database and Data Table Inspection

The speaker begins by introducing the database and data table, highlighting the need to examine the 'Title, First' column for the first 10 rows. They note issues such as incorrect date formats in the 'call timestamp' column, which should be in 'year-month-day' order, and a column name with spaces that needs renaming. Additionally, there are blank values in the fourth column that require attention. The speaker proceeds to demonstrate how to check data types and how to correct the identified issues by altering the table structure and data types.

05:07

πŸ› οΈ Correcting Data Types and Renaming Columns

In this segment, the speaker focuses on correcting data types and renaming a column. They explain that the 'C Series score' should be an integer and the 'current timestamp' should be a date type. The speaker demonstrates how to modify these data types using SQL commands. They also rename the 11th column from 'duration in minutes' to 'CD in a minute' after ensuring that all data types are correct. The speaker then addresses the issue of blank values in the 'C series score' column, replacing zeros with null values, and checks for duplicates, confirming there are none.

10:11

πŸ“Š Analyzing Call Data and Customer Sentiments

The speaker moves on to data analysis, starting with calculating the number of calls made for each reason and the corresponding percentages. They demonstrate how to use SQL to group and calculate these metrics. The analysis continues with identifying the day with the most calls, revealing that Friday has the highest number. The speaker also calculates the minimum, maximum, and average call duration. Lastly, they analyze customer sentiment, finding a higher number of negative sentiments compared to positive ones, and check the number of calls written below or above the service level for each call center.

15:13

πŸ“ Conclusion and Project Details

In the final paragraph, the speaker concludes the data analysis and provides a brief overview of the findings. They mention that the project details and a download link for the internet will be available in the description below the video. The speaker thanks the viewers for watching and teases the next project, inviting them to join for further insights.

Mindmap

Keywords

πŸ’‘Database

A database is an organized collection of data, typically stored and accessed electronically. In the context of the video, the database is the central repository for the data being analyzed. The script mentions checking the database and its tables, indicating that the database is the foundation for the data manipulation and analysis activities described.

πŸ’‘Data Table

A data table is a structured set of data, often in the form of rows and columns, that represents the contents of a database. The video script refers to a 'data table' multiple times, emphasizing the importance of organizing and examining the data within this structure to identify issues and perform subsequent analysis.

πŸ’‘Column

In the context of databases and data tables, a column represents a vertical arrangement of data values, where each column typically contains values of a similar data type. The script discusses various issues related to columns, such as incorrect date formats and the need to rename a column, highlighting the significance of columns in structuring and interpreting data.

πŸ’‘Data Type

Data types define the kind of data that can be stored in a database. They include integers, strings, dates, and more. The video script mentions the need to check and correct data types, such as converting a timestamp from a string to a date type, which is crucial for accurate data handling and analysis.

πŸ’‘SQL

SQL (Structured Query Language) is a standard language for managing and manipulating relational databases. The script includes SQL commands for updating tables and changing data formats, demonstrating the use of SQL for interacting with the database and performing data operations.

πŸ’‘Timestamp

A timestamp is a data element that specifies the date and time. In the video script, there's a mention of a 'call timestamp' that needs to be corrected from a month-date-year format to a year-month-date format, illustrating the importance of accurate timestamp formatting for data analysis.

πŸ’‘Data Analysis

Data analysis involves inspecting, cleaning, transforming, and modeling data to extract useful information, draw conclusions, and support decision-making. The video script outlines a series of steps for data analysis, including calculating call percentages and durations, which are typical tasks in analyzing and interpreting data.

πŸ’‘Null Values

Null values represent missing or unknown data in a database. The script discusses replacing zero values with null to accurately represent missing data, which is an important step in data cleaning to ensure the integrity of the analysis.

πŸ’‘Duplicate Values

Duplicate values are instances where the same data appears more than once in a dataset. The video script includes a step to check for and eliminate duplicate values, which is crucial for ensuring the uniqueness and reliability of the data for analysis.

πŸ’‘Customer Segments

Customer segments refer to dividing a customer base into groups based on certain characteristics. In the script, analyzing customer segments based on 'sentiment' is mentioned, which involves categorizing customers according to their expressed feelings or opinions, a common practice in customer relationship management and marketing.

πŸ’‘Service Level

Service level in a business context often refers to the standard of service provided to customers. The script discusses analyzing calls 'below or above the service level,' which likely refers to evaluating customer service interactions against predefined quality standards.

Highlights

Introduction to the database and data table with 12 columns.

Identification of issues with date format and column naming.

Plan to change the date format from month-date-year to year-month-date.

Need to rename a column that contains spaces.

Addressing blanking values in the fourth column.

Checking data types and identifying mismatches.

Conversion of data types for 'C Series score' and 'Call timestamp'.

Disabling SQL safe updates for data manipulation.

Updating the 'Call timestamp' column to the correct date format.

Enabling SQL safe updates after the update.

Fixing data type issues for 'Call timestamp' and 'C Series score'.

Renaming the 11th column from 'Duration in minutes' to 'CD in a minute'.

Replacing zero values in the 'C Series score' column with null.

Checking for duplicate values and confirming the data's uniqueness.

Analyzing call reasons and calculating the percentage of calls per reason.

Identifying the day with the most calls and the least.

Calculating minimum, maximum, and average call duration.

Analyzing customer sentiment with a focus on negative versus positive sentiments.

Checking the number of calls written below or above the service level for each call center.

Conclusion of data analysis and introduction to the next project.

Transcripts

play00:08

this is our database and this is our

play00:11

entire table so let's check our Title

play00:15

First how it looks like

play00:20

will be the results to 10 rows

play00:25

[Music]

play00:27

let's check it now

play00:29

[Music]

play00:33

okay this is our data table and we have

play00:35

12 column here

play00:36

which is some problems here like the

play00:40

date format under call timestamp is

play00:42

month date in the year but it is SQL we

play00:46

need your first then month and end it we

play00:50

need to change that

play00:51

and in 11th column the name of the

play00:54

column contain spaces we need to renamed

play00:56

it

play00:58

and in the fourth column there are some

play01:01

blanking values

play01:03

we need to fix that too

play01:07

so now in Step number two let's check

play01:10

the data types now

play01:12

explained then we'll write in the base

play01:16

name Dot

play01:19

the data table name

play01:24

let's check the result

play01:29

okay uh the problem we found are the

play01:32

C Series score is the number but

play01:35

this is um

play01:37

a capital deliber we returns this data

play01:39

type and in the current timestamp we

play01:43

know this is a debt but the data is in

play01:46

capital we need to change that to

play01:50

okay so let's convert the bit format

play01:53

first

play01:54

so our step number three is change the

play01:58

date format to do this first of all we

play02:02

need to disable the SQL save updates

play02:07

this is optional but uh this is safety

play02:14

okay well

play02:15

save updates

play02:18

we go to zero

play02:24

now we update our table table name

play02:29

and we set the column name

play02:33

call timestamp

play02:35

equal to

play02:38

scr to date

play02:42

parenthesis

play02:44

the column name again call timestamp

play02:47

and the format we have

play02:50

month

play02:51

slash

play02:53

okay

play02:54

slash here

play02:56

okay fine now enable the SQL safe

play03:00

updates again

play03:04

copy this code

play03:06

and change w01

play03:10

now let's check this value

play03:13

okay we have an error here

play03:18

now it will work

play03:20

[Music]

play03:24

yeah it worked

play03:26

[Music]

play03:28

let's check our turntable again

play03:30

[Music]

play03:38

we'll meet the results to open rows

play03:41

let's check it

play03:44

[Music]

play03:50

yes the date formula is now year month

play03:52

and

play03:54

day

play03:57

now step number four we are going to fix

play03:59

the types issues we have to change the

play04:03

data types as a date for call times step

play04:07

and scissor score has integer field so

play04:11

all third tables or table name then

play04:15

modify

play04:17

or modify column

play04:24

and column name B is called timestamp

play04:26

[Music]

play04:28

we put the data type date okay another

play04:33

row now alter table

play04:36

table name

play04:40

then modify column

play04:47

name is

play04:49

CZ score and we put the intercept

play04:54

to visit

play04:56

let's check

play05:06

okay the buildup IS successfully updated

play05:10

now let's check the details again

play05:13

explain

play05:15

our database name

play05:17

dot or data table name

play05:22

[Music]

play05:25

this is the updated types

play05:29

all the types are now correct

play05:32

now step number five let's rename the

play05:35

11th column name which was called

play05:37

duration in minutes first of all check

play05:40

the data table again

play05:45

foreign

play05:47

[Music]

play05:55

in minutes we are going to change this

play05:57

name

play05:59

[Music]

play06:01

so we alter our table the name CC data

play06:05

now change

play06:08

the column

play06:13

[Music]

play06:16

called division

play06:19

Vietnamese and the new name of this

play06:23

column will be CD in a minute

play06:27

okay the other type will be integer to

play06:30

visit and not know

play06:35

[Music]

play06:37

okay go

play06:41

[Music]

play06:46

the column ahead is successfully updated

play06:49

let's check it

play06:54

foreign

play06:55

[Music]

play07:20

values in the fourth column

play07:23

we will replace those zero values

play07:27

with null

play07:34

so again let's disable the SQL save

play07:38

updates

play07:48

let's go to save up here is equal to

play07:50

zero

play07:52

Now update

play07:54

our data table

play07:57

and set

play07:58

our column name c series called equal to

play08:01

null

play08:04

where

play08:06

she said score

play08:08

is equal to zero

play08:11

now again enable the SQL save updates

play08:13

copy this code

play08:15

and change the value

play08:17

1.

play08:22

[Music]

play08:25

okay successfully updated let's check it

play08:28

now

play08:33

[Music]

play08:41

[Music]

play08:43

all the blank values are now replaced

play08:45

with null

play08:47

okay now we are going to check for

play08:49

duplicates value

play08:55

[Music]

play08:57

let's count select count everything

play09:01

as row number

play09:03

[Music]

play09:04

from alternative

play09:10

[Music]

play09:16

so we have

play09:18

32

play09:19

941 total low number now let's check the

play09:24

unique hello number for ID and customer

play09:27

name

play09:28

[Music]

play09:37

select count distinct ID

play09:45

and count for

play09:48

distinct customer name

play09:54

[Music]

play09:57

let's check it

play10:06

okay both of them have 32 941 rows so we

play10:11

don't have any duplicate values here

play10:13

so our data is ready for the analysis

play10:16

now so our next analysis steps is

play10:20

uh The Operators

play10:23

call the first one for various reasons

play10:25

so let's count how many calls they made

play10:29

for each reason

play10:31

we need to calculate the percentage too

play10:34

so select reason count

play10:36

everything and we round up the

play10:39

percentage so round

play10:44

[Music]

play10:49

count

play10:51

foreign

play10:57

divided by

play11:01

[Music]

play11:10

[Music]

play11:23

into 100

play11:26

and we round up

play11:28

with one digit

play11:30

let's fix the parenthesis

play11:35

as percentage

play11:38

from

play11:42

and we will Group by

play11:43

provision

play11:44

[Music]

play11:47

okay let's check it now

play11:53

so we got the highest percentage for

play11:56

billing question reason

play11:58

and it's 71.2 now reset number nine we

play12:03

find out which guy has most calls

play12:07

[Music]

play12:13

the select day name

play12:17

the column name

play12:20

as day of call

play12:30

now count everything

play12:34

as number of calls

play12:44

from evil CC data and we will Group by

play12:50

1 that means day name this will day of

play12:56

and we order by

play12:59

the count flow

play13:01

that means number of calls

play13:08

let's check the output

play13:14

so we clearly see Friday has most of the

play13:16

polls

play13:18

and the Sunday has less amount of code

play13:22

last number 10

play13:24

we will find out the minimum maximum and

play13:27

average call duration

play13:30

so select

play13:31

[Music]

play13:33

minimum

play13:35

the club name is Siri I mean

play13:38

as maximum sorry minimum division

play13:46

minimum duration

play13:51

Max function

play13:54

for the column CDMA

play13:57

as maximum duration

play14:00

[Music]

play14:04

now average

play14:07

we round the average value so run

play14:10

function

play14:13

and then average

play14:16

for the column

play14:22

and for the Roundup value one

play14:25

and as average integration

play14:28

[Music]

play14:35

from our data table

play14:42

okay check the result

play14:47

we got minimum division is 5 minute

play14:50

maximum deviation is 45

play14:53

and average duration is 25 minute

play14:56

now we are going to analyze the

play14:58

customers segments for the sentiment

play15:00

column

play15:04

select sentiment

play15:08

and count everything

play15:12

from our data table

play15:16

[Music]

play15:18

Group by sentiment

play15:21

foreign

play15:26

[Music]

play15:34

okay let's take it

play15:41

okay we found the negative sentiments

play15:43

are high comparing to the positive

play15:45

sentiments

play15:47

well positive sentiments is very low

play15:50

now step number 12 and last step

play15:54

checking how many calls are written

play15:57

below or above the service for each

play15:59

columns

play16:00

so select call center

play16:06

then response time

play16:12

count everything

play16:16

as counts

play16:22

from conductable CC guitar

play16:27

now we Group by

play16:29

[Music]

play16:30

one and two that means call center and

play16:35

response time

play16:37

and Water by

play16:40

1 3

play16:42

that means call center and counts

play16:49

let's check the result

play16:52

[Music]

play16:59

we found a number of calls which are

play17:01

within below or above the service for

play17:03

each call center like Baltimore

play17:08

within service

play17:12

within service

play17:15

that's it done our data analysis second

play17:20

project the internet download link and

play17:22

the project details the project details

play17:25

link will be in the description below

play17:28

thanks for watching see you in the next

play17:32

project

Rate This
β˜…
β˜…
β˜…
β˜…
β˜…

5.0 / 5 (0 votes)

Related Tags
Data CleaningSQL TutorialCall CenterData AnalysisDatabase ManagementData TransformationData TypesQuery OptimizationData IntegrityData Visualization