Human activity detection

Sudhir Singh
23 Apr 201908:58

Summary

TLDRThis video discusses the importance and implementation of a human activity detection model in the context of Big Data. Highlighting its significance in medical care and elderly support, the script details the dataset from the UCI repository, which includes eight attributes and eleven activity classes. After pre-processing, feature extraction, and model training using a decision tree, the model achieved over 73% accuracy in classifying activities based on attributes like Z coordinates and tag identifiers. The video concludes with suggestions for improving model performance through time series analysis.

Takeaways

  • 🧑‍🔬 Activity detection is increasingly important in medical care, supporting elderly independent living and emergency assistance.
  • 📈 Companies like Fitbit and Apple rely on activity detection for health monitoring and safety features, respectively.
  • 📚 The data set used for the project was sourced from the UCI Machine Learning Repository and includes eight attributes and eleven classes.
  • 👥 The data set was created by recording five people performing various activities over five sessions each.
  • 📊 The data set attributes include sequence name, tag identifier, timestamp, date, and x, y, z coordinates, along with the activity classification.
  • ⚙️ Pre-processing involved checking for missing values and removing highly correlated activities, resulting in three main classes: lying, walking, and sitting.
  • 📈 Feature extraction and analysis identified highly correlated attributes like timestamp, tag identifier, and x, y, z coordinates.
  • 📉 Box plots were used to visualize variations in attributes across different activities, aiding in attribute selection for the model.
  • 🌐 A 70/30 train-test split was found to be optimal for model training, with 5-fold cross-validation repeated three times for parameter tuning.
  • 🔑 The decision tree model was chosen for classification, using attributes like x, y, z coordinates, timestamp, and tag identifier.
  • 📊 The model achieved over 73% accuracy in classifying activities, with the Z coordinate being particularly effective for distinguishing lying and walking.
  • 🤖 Error analysis showed some confusion between lying and sitting, and walking and sitting, suggesting room for improvement in the model.

Q & A

  • Why is activity detection important in the field of medical care?

    -Activity detection is important in medical care because it can support the elderly for independent living and can be a life-saving feature, such as in fall detection systems that automatically alert emergency services if a person falls and doesn't get up for a while.

  • Which companies are mentioned in the script that rely on activity detection?

    -Fitbit and Apple are mentioned as companies that rely on activity detection. Fitbit for tracking physical activities and Apple for features like fall detection in their Apple Watch products.

  • What is the source of the data set used for the human activity detection model?

    -The data set used for the human activity detection model was obtained from the UCI Machine Learning Repository.

  • How many people were involved in the creation of the data set and what were they asked to do?

    -Five people were involved in the creation of the data set, and they were asked to perform a sequence of activities over five times.

  • What are the eight attributes contained in the data set?

    -The eight attributes in the data set are the sequence name, tag identifier, timestamp, date, and the x, y, z coordinates, followed by the activity classification.

  • What was the reason for removing certain activities during the pre-processing stage?

    -Certain activities were removed during pre-processing because they were highly correlated, such as active researches lying and lying down, which had extremely similar variations in terms of their features.

  • How many class labels were retained after the pre-processing of the data set?

    -After pre-processing, three class labels were retained: lying, walking, and sitting.

  • What was the best train-test split ratio found for the model?

    -The best train-test split ratio found for the model was 70/30.

  • Which model was used to classify the data based on the given attributes?

    -A decision tree model was used to classify the data based on the given attributes.

  • What attributes were selected for training the decision tree model?

    -The attributes selected for training the decision tree model were the x, y, z coordinates, along with the timestamp and the tag identifier.

  • What was the overall accuracy achieved by the decision tree model in classifying the activities?

    -The decision tree model achieved an accuracy of over 73% in classifying the activities.

  • What was the main issue identified in the error analysis of the decision tree model?

    -The main issue identified in the error analysis was that the model sometimes confused activities like lying with sitting about 16% of the time, and walking with sitting about 12% of the time.

  • How can the performance of the decision tree be improved further?

    -The performance of the decision tree can be improved by conducting a time series analysis of each attribute, which is likely to increase accuracy.

Outlines

00:00

📊 Human Activity Detection: Importance and Data Overview

The video introduces a human activity detection model developed as part of a Big Data course. It emphasizes the significance of activity detection in healthcare and elderly care, citing examples like Fitbit and Apple Watch's fall detection feature. The dataset used is from the UCI Machine Learning Repository, capturing data from five individuals performing various activities. The data includes attributes such as sequence name, tag identifier, timestamps, dates, and XYZ coordinates, along with activity labels. The video explains the pre-processing steps, which involved removing highly correlated activities to focus on three primary classes: lying, walking, and sitting. This decision was made to retain over 80% of the dataset while simplifying the model's complexity.

05:04

📈 Data Analysis and Model Training Strategy

This paragraph delves into the analysis of the dataset, highlighting the use of box plots to understand attribute variations across different activities. It discusses the selection of attributes for model training based on their correlation and variation. The training and testing strategy involves a 70/30 split and 5-fold cross-validation repeated three times to ensure robust parameter tuning. The chosen model is a decision tree, which uses attributes like XYZ coordinates, timestamp, and tag identifier to classify activities. The model's accuracy in classifying activities such as lying, walking, and sitting is reported, with a focus on the decision tree's ability to differentiate these activities based on the Z coordinate primarily.

Mindmap

Keywords

💡Activity Detection

Activity detection refers to the process of identifying and classifying human activities based on sensor data. In the context of the video, it is crucial for applications such as supporting elderly independent living and health monitoring, as well as for fall detection in wearable devices like the Apple Watch, which can alert emergency services if a person falls and does not get up, potentially saving lives.

💡Data Analysis

Data analysis is the examination of data sets to draw conclusions about the information they contain. The video emphasizes its growing importance in medical care, where activity detection through data analysis can provide insights into a person's health and well-being, contributing to better patient care and monitoring.

💡Big Data

Big Data refers to data sets that are so large and complex that traditional data processing software is inadequate to deal with them. The video discusses implementing a human activity detection model as part of an introduction to Big Data course, indicating the use of large datasets in activity detection and the need for advanced analytical methods.

💡UCI Machine Learning Repository

The UCI Machine Learning Repository is a collection of databases, domain theories, and data generators used for machine learning research. It is mentioned in the video as the source of the localization data for person activity, which was used to create the dataset for the activity detection model.

💡Pre-processing

Pre-processing in data analysis involves cleaning and transforming raw data into an understandable format. The video describes checking for missing values and removing highly correlated activities during pre-processing, which is essential for preparing the data for accurate model training.

💡Feature Extraction

Feature extraction is the process of selecting the most relevant data attributes or features from the dataset. The video explains how summary and visualization of the dataset led to the identification of key attributes such as the x, y, and z coordinates, timestamp, and tag identifier, which were used for training the model.

💡Correlation

Correlation measures the extent to which two variables are linearly related. In the video, the correlation between attributes is plotted to identify which features are highly correlated, such as the timestamp, tag identifier, and x, y, z coordinates, guiding the selection of attributes for the model.

💡Decision Tree

A decision tree is a flowchart-like structure in which each internal node represents a 'yes' or 'no' question, and each branch represents the outcome of that question. The video describes using a decision tree model to classify activities based on selected attributes, such as the x, y, and z coordinates, which helps in achieving a certain level of accuracy in activity classification.

💡Accuracy

Accuracy in the context of machine learning refers to the percentage of correct predictions made by the model. The video mentions achieving over 73% accuracy in classifying activities, which is a key performance metric indicating how well the model can predict the correct activity.

💡Confusion Matrix

A confusion matrix is a table used to describe the performance of a classification model. The video uses a confusion matrix to analyze the error in the decision tree model, showing which activities were often confused with each other and the accuracy of correct predictions.

💡Time Series Analysis

Time series analysis is a statistical technique that deals with time series data or trend analysis. The video suggests that further time series analysis of the attributes could improve the performance of the decision tree model, indicating the potential for enhancing accuracy through deeper temporal data analysis.

Highlights

The importance of activity detection in medical care and elderly support.

Companies like Fitbit and Apple utilize activity detection for health monitoring and safety features.

Introduction of a human activity detection model implemented as part of a Big Data course.

Data set obtained from UCI machine learning repository with over eight attributes and eleven classes.

Description of the data set attributes including sequence name, tag identifier, timestamp, and coordinates.

The significance of the activity label in identifying the performed activity.

Pre-processing steps including checking for missing values and removing highly correlated activities.

Final class labels after pre-processing: lying, walking, and sitting.

Feature extraction and data visualization techniques used in the study.

High correlation found between timestamp, tag identifier, and XYZ coordinates.

Selection of attributes for model training based on their variation with activities.

70/30 train-test split and 5-fold cross-validation strategy for model fine-tuning.

Use of a decision tree model for activity classification based on selected attributes.

Achievement of over 73% accuracy in classifying activities using the decision tree model.

Error analysis revealing confusion between similar activities like lying and sitting.

Potential improvement of model performance through time series analysis of attributes.

Summary of the activity detection model's effectiveness and areas for future enhancement.

Transcripts

play00:00

hi everyone in this video we talk about

play00:02

the human accurate detection model that

play00:05

we implemented as a as a part of our

play00:07

introduction to Big Data course but

play00:10

before we get into the implementations

play00:12

let's first have a look at why activity

play00:14

detection is important as we can see

play00:20

from the current trend data analysis is

play00:22

becoming increasingly important in the

play00:24

field of medical care activity detection

play00:27

can be important for the purposes of

play00:29

supporting the elderly for independent

play00:32

living companies like Fitbit rely almost

play00:36

entirely on activity detection while

play00:39

companies like Apple have introduced

play00:41

features like fall detection in the

play00:43

Apple watch products which could be a

play00:45

life saving feature for example if a

play00:50

person Falls and doesn't get up for a

play00:52

while

play00:52

he's probably injured so the watch

play00:55

automatically allows the emergency

play00:57

services for assistance for these

play01:02

reasons these chosen the localization

play01:04

data for person activity as our data set

play01:09

which we obtained from the UCI machine

play01:11

learning repository for the creation of

play01:14

this data set five people were asked to

play01:17

perform a sequence of activities over

play01:18

five times and the data set contains

play01:22

over eight attributes and eleven classes

play01:25

which we'll get into in just a minute

play01:28

alright let's have a look at the

play01:31

attributes in the data set the data set

play01:33

has over eight attributes namely the

play01:36

sequence name the tag identifier time

play01:38

stamp date x y&z coordinates followed by

play01:42

the activity it is classified into as we

play01:45

can see from the notebook in the right

play01:47

the sequence name is composed mainly of

play01:49

normal values from a 0 1 to e zero 5

play01:53

over here a denotes the name of the

play01:55

person being recorded while the number

play01:58

denotes the recording session of that

play02:00

person the tag identifier helps us

play02:04

identify which Tag sensor information is

play02:08

actually being reported in the room the

play02:11

time stamp tells us the time at which

play02:13

the recording

play02:14

while the date gives us the date of the

play02:16

recording the XYZ coordinates as we know

play02:20

are simply the XYZ coordinates of that

play02:22

tag finally the activity label gives us

play02:26

which activity was actually being

play02:28

performed let's have a look at an

play02:34

example instance here is 0 1 is the

play02:38

sequence name a is the name of the

play02:40

person performing the activity while 0 1

play02:43

is the first recording instance of that

play02:45

activity set the second column the tag

play02:48

identifier corresponds to the chest type

play02:51

so we know that this row reports theta

play02:54

mainly for the chest bag here's the

play02:57

timestamp and the x y&z coordinates for

play03:00

the data as we can see this particular

play03:03

row has been classified into the walking

play03:06

activity next we get into pre-processing

play03:12

the data as a part of our pre-processing

play03:16

we check for missing values and removed

play03:19

highly correlated activities for example

play03:22

active researches lying and lying down

play03:25

had an extremely similar variation in

play03:28

terms of their features so we removed

play03:30

these activities and ended up with final

play03:33

3 class labels namely lying walking and

play03:37

sitting since these are the most

play03:39

important activities generally performed

play03:42

by humans we still managed to retain

play03:44

over 80% of our dataset we can see the

play03:49

cleaning being performed in the Jupiter

play03:51

to the right

play04:02

next we get into the feature extraction

play04:06

from our dataset as we can see from the

play04:09

right we found out the summary and

play04:11

visualize the data set as follows we

play04:14

found that lying sitting and walking

play04:17

comprised a major part of the data set

play04:20

to see the correlation between all the

play04:23

attributes we plotted the we plotted

play04:27

their correlations as can be seen in in

play04:29

the notebook on the right the

play04:34

observations from the plot can be seen

play04:36

below we found out looking at the upper

play04:39

right triangle that the attributes like

play04:42

the timestamp tag identifier x y&z

play04:46

coordinates were highly correlated and

play04:48

this is what drove our attribute

play04:51

selection to train our model let's have

play04:54

a look at the variation between each

play04:58

attributes activity wise as we can see

play05:03

from box plots for the activity versus

play05:06

bag identifiers the timestamp the X the

play05:09

Y and the Z coordinates we are able to

play05:13

make out how each are to be varies with

play05:15

activities thus this gives us a good

play05:18

sense of which attribute to select for

play05:21

the purpose of training our model next

play05:25

we dive into the Train and test strategy

play05:28

we use for our body after dividing date

play05:32

the data into several ratios we found

play05:35

the best plate to be the 70/30 train

play05:39

test split for our model for the

play05:43

purposes of fine-tuning our parameters

play05:45

we used a 5 fold cross validation

play05:48

repeated three times this helped us to

play05:51

work with the dependent and grouped data

play05:53

as in the data set below we can see in

play05:57

the notebook on the right how the data

play05:59

was split into train and test groups

play06:08

let's have a look at the model we use to

play06:11

classify our data we use the decision

play06:14

tree model for the purposes of pinning

play06:16

our activities based on the given

play06:18

attributes in two sitting falling in

play06:21

line as seen from the attribute

play06:23

correlations plotted above we selected

play06:26

attributes such as the x y&z coordinates

play06:28

along with the time stamp and the tag

play06:30

identifier for the purposes of training

play06:34

our decision tree as we can see from the

play06:37

decision tree in the notebook to the

play06:39

right based simply on the Z coordinates

play06:42

we are able to classify activities such

play06:44

as lying and walking with accuracy of 61

play06:47

and 53% respectively all right let's try

play06:53

and evaluate our model now we were able

play06:58

to achieve an accuracy of over 73% in

play07:02

classifying the attributes into

play07:07

activities let's analyze the performance

play07:09

of our decision tree based on attributes

play07:12

like the Z coordinate we were able to

play07:14

classify the data to large extent next

play07:18

it used the tagger identify attribute to

play07:21

be able to differentiate between

play07:22

activities like sitting and walking

play07:25

thirdly the y coordinate was used to

play07:29

differentiate between activities like

play07:31

lying and walking let's get into the

play07:34

error analysis of our decision tree as

play07:37

we can see from the confusion matrix we

play07:40

were able to predict correctly

play07:43

activities such as lying sitting and

play07:45

walking with an accuracy of power 82 71

play07:49

and 80% however our tree still got

play07:52

confused between activities such as

play07:54

lying and sitting about 16% of the time

play07:58

and walking and sitting about 12% of the

play08:01

time finally let's summarize what we

play08:07

just saw we saw how each activity varies

play08:12

with the attribute mainly through the

play08:15

boxplots

play08:16

of the activities

play08:18

absolutes this also validated our

play08:21

initial ideas that activities such as

play08:24

lying would have a very little change in

play08:26

all three coordinates whereas activities

play08:30

like walking would have no change in

play08:32

that chest Z coordinate especially while

play08:35

the person is walking on a straight line

play08:39

to improve the performance of our tree

play08:41

we can further do a time series analysis

play08:45

of each of these attributes which is

play08:48

definitely bound to increase the

play08:49

accuracy thank you for your time and

play08:53

patience have a good day

Rate This

5.0 / 5 (0 votes)

Ähnliche Tags
Activity DetectionHealthcareElder SupportBig DataData AnalysisFitbitApple WatchFall DetectionMachine LearningUCI RepositoryDecision Tree
Benötigen Sie eine Zusammenfassung auf Englisch?