Dataframes Part 02 - 01/03
Summary
TLDRThe script is a tutorial on handling data frames in Python using Pandas. It covers loading data, selecting specific columns, and accessing rows. The instructor demonstrates creating data frames from existing data and dictionaries, and explains the difference between series and data frames. They also show how to modify indices, use methods like 'head' and 'tail' for viewing rows, and utilize 'shape' to determine the dimensions of a data frame.
Takeaways
- đ **Data Frame Loading**: The script starts by loading a dataset and selecting only the data part of it into a DataFrame called 'diabetes'.
- đ **DataFrame Identification**: It's emphasized that a DataFrame is identified by an index and a structure similar to an Excel sheet with rows and columns.
- đą **DataFrame Structure**: The 'diabetes' DataFrame contains specific columns like age, sex, BMI, BP, and other probabilities.
- đ **Building DataFrames**: DataFrames can be constructed from existing libraries, APIs, dictionaries, or CSV files.
- đ ïž **DataFrame from Dictionary**: A DataFrame can be built from a dictionary where keys are column names and values are lists of data.
- đ **Series vs DataFrame**: Accessing a single column from a DataFrame results in a Series, while accessing multiple columns retains the DataFrame structure.
- đ **Accessing Columns**: Columns in a DataFrame can be accessed using the dot notation similar to accessing keys in a dictionary.
- đ **Accessing Rows**: Rows in a DataFrame can be accessed using the `.loc` or `.iloc` methods, with `.loc` using index labels and `.iloc` using integer positions.
- đ **Displaying Data**: The `.head()` and `.tail()` methods are used to display the first or last few rows of a DataFrame, which is useful for quick data inspection.
- đ **DataFrame Shape**: The `.shape` attribute provides the dimensions of the DataFrame, indicating the number of rows and columns.
Q & A
What does 'dot data' refer to in the context of loading a dataset?
-In the context of loading a dataset, 'dot data' refers to accessing the 'data' attribute of an object, which typically contains the actual data within a dataset, excluding additional information such as descriptions.
How is a DataFrame represented visually in Python's pandas library?
-A DataFrame in pandas is visually represented with an index and columns, similar to an Excel sheet. It has a gray and white line display to indicate the rows and columns, with the first five and last five rows shown by default when the DataFrame is too large to fully display.
What is the significance of the index in a pandas DataFrame?
-The index in a pandas DataFrame is significant as it labels the rows and allows for efficient data retrieval. By default, it starts at 0 and increments by 1, but it can be customized to start at different values or use different labels.
How can you create a DataFrame from a dictionary in pandas?
-You can create a DataFrame from a dictionary by using the `pd.DataFrame()` function, where the dictionary's keys become the column names and the values become the data in the columns.
What must be true for all arrays when creating a DataFrame from a dictionary?
-When creating a DataFrame from a dictionary, all arrays (lists of values for each column) must have the same length, otherwise pandas will raise an error because it requires uniformity in the size of the data.
What is the difference between a Series and a DataFrame in pandas?
-A Series is a one-dimensional labeled array that behaves like a column in a DataFrame. A DataFrame is a two-dimensional labeled data structure with columns that can be of different types. Selecting a single column from a DataFrame results in a Series.
How do you access a single column from a DataFrame?
-To access a single column from a DataFrame, you use the DataFrame name followed by the column name in square brackets, similar to accessing a key in a dictionary.
What is the 'iloc' function used for in pandas DataFrames?
-The 'iloc' function in pandas is used for integer-location based indexing and selection by position. It allows you to access rows by their integer index, which is useful when you don't know the label of the row but know its position.
How can you view the first few rows of a DataFrame using a method?
-You can view the first few rows of a DataFrame using the 'head()' method. By default, it shows the first five rows, but you can specify a different number to see more or fewer rows.
What does the 'shape' attribute of a DataFrame return and what does it represent?
-The 'shape' attribute of a DataFrame returns a tuple where the first element is the number of rows and the second element is the number of columns, representing the dimensions of the DataFrame.
Outlines
đ Data Frame Initialization and Exploration
The speaker begins by discussing the process of loading a dataset into a DataFrame, specifically mentioning the exclusion of unnecessary data like descriptions. They focus on extracting the core data and storing it in a DataFrame named 'diabetes'. The speaker then explains how DataFrames are visualized, comparing them to Excel sheets and highlighting features like indexing and line visibility. They touch on the concept of DataFrame size, explaining how Python displays data with the first and last few lines when the dataset is too large. The lecture also introduces another DataFrame 'DF restaurants' for comparison. The speaker then delves into constructing DataFrames from dictionaries, demonstrating the process with an example dictionary and explaining the importance of matching array sizes when creating DataFrames from dictionaries. They also mention the use of the 'pd' nickname for the pandas library and the creation of DataFrames from CSV files or APIs.
đ Accessing DataFrame Elements
In this section, the speaker discusses how to access elements within a DataFrame, drawing parallels with dictionary access methods. They explain the difference between accessing a single column (resulting in a Series) and multiple columns (which still results in a DataFrame). The speaker emphasizes the importance of understanding the type of object being manipulated, whether it's a Series or a DataFrame, especially when performing operations between them. They also cover accessing specific rows within a DataFrame using the '.loc' method and how to modify the index of a DataFrame. The speaker provides examples of accessing rows by index name and by position, highlighting the difference between '.loc' and '.iloc' for accessing rows.
đ Advanced DataFrame Navigation
The speaker continues by introducing advanced methods for navigating DataFrames, such as using '.iloc' for accessing rows by their position and '.head()' for viewing the first few rows of a DataFrame. They also mention the '.tail()' method for accessing the last few rows. The section covers the '.shape' attribute, which provides the dimensions of the DataFrame, and the speaker provides a practical example of creating a function to print the shape of a DataFrame. The speaker concludes by emphasizing the importance of knowing the size and structure of a DataFrame for efficient data manipulation and analysis.
Mindmap
Keywords
đĄDataFrame
đĄPandas
đĄSeries
đĄIndex
đĄiloc
đĄhead
đĄtail
đĄshape
đĄDictionary
đĄData Manipulation
đĄAPI
Highlights
Loading a dataset and extracting data using '.data'
Creating a DataFrame named 'diabetes'
Explanation of DataFrame structure and display
Understanding DataFrame indices and line visibility
Building a DataFrame from a dictionary
Importing Pandas library with a nickname 'PD'
Creating a DataFrame from a dictionary with unequal sizes
Accessing DataFrame columns using column names
Difference between Series and DataFrame
Transforming a Series into a DataFrame using 'to_frame()'
Accessing DataFrame rows using '.loc'
Using '.iloc' to access DataFrame rows by position
Difference between '.loc' and '.iloc' for accessing rows
Using 'head' to display the first few rows of a DataFrame
Using 'tail' to display the last few rows of a DataFrame
Determining DataFrame shape with '.shape'
Creating a function to print DataFrame shape
Transcripts
when we load this data set we put the
ice frame and go through and we do see
that here we have some data right but
there is also over data like the
description Etc so we just want the data
in it so what we do do is like dot data
so this is and then we got the data
frame so we're going to have our data
frame diabetes uh that is going to be
equal to this and then I have my data
frame diabetes
so this is uh how it works I'm gonna put
a bit more plus so you see more so here
we have a data frame DF diabet and we do
see how you have another one so you know
it is a data frame because it has this
index that we see here it has this gray
and this white line so you can't see
better if you want so it is like a nice
display you look a bit like you know
this Excel table an Excel sheet for
instance
and here you do see like there is 441
from 0 to 441 so 442. two lines uh in
here you don't see the middle so here is
like the first part of your column and
here has the last line of your colon and
because it is too big uh python decided
to just you know display the first five
line and the last five line together so
you could you know go see how it is
inside so this is how um so that we can
build this data diabetes so in the first
lecture about that I said we will use
this uh little DF restaurants example
but we will also use this DF diabet so
you see what is here you get the age sex
the BMI is a BP and then a different
probabilities
um so this is uh how it works
um uh a good order different there is
another possibility uh to build at a
frame as like uh it can
um be made from
um
from a dictionary so let's say I'm
building a dictionary so I'm going to
write a dictionary you know and it's
going to be like call one and in my
colon one uh I will get like
um one two three four five six seven
eight nine ten zero so let's say you
have this this is going to be my dick
one uh and then I will want to build
data frame from this dick so we're just
going to be one column how would I do
that I go to my library pandas so PD
because that is a nickname I gave it
when I import
from the I know Panda data frame I
forget something Panda data frame yeah
from sixth so I drew from dect and my
dect is uh dict uh one basically so here
I am you know I have PD I want to create
what I want to create a data frame and
then I have my dict one uh so then what
is the output the upper design my colon
one and the value then okay if I add
another column uh so I need to put it as
a string so it's going to be like number
maybe it's going to be like later
and in my letter I will have something
like a
a
B
C
C
d
e how many one two three four five six
seven one two three four five six seven
uh so yeah I need to put this in the end
uh so here I can build another
dictionary if they don't have the same
size then sometimes I have issues right
because it will tell me this one and
this one don't have the same size right
that's what it is all array must have
the same name so when you build from a
dictionary and you put the name of the
first column
you put the content as a list and all
this is like arrays have to get the same
size right
um as uh it is not possible to build
your your URL and you can see in your
project you have the orientation like
the string as a colon Etc this is a
different parameter frame so this is
another way uh we're gonna call it
dfdict
the F the project
and this is how it works so we have uh
OD update uh and now I can call it again
so you can check ift object and DFW bits
and my GF restaurant so I have like all
of them I'm happy
um so here is like we build a sunset
from this like load from some things
that already exist in some library from
this API I get it or I can build it from
a dictionary or can build it from a CSV
and then you can explore the option and
read the documentation to see how you
can build them differently
so you see like the structure uh it's
like this colon in this rows what you're
asking me is like when I'm calling my DF
dict I'm accessing all of my data frame
but maybe I can only access a part a
small part of my data frame so I would
like to access maybe just the First
Column so how do I do that so to access
the First Column I will just do call one
a bit like four dictionary right so you
remember for dictionary if I want to
access one colon you know if I want to
do dict one of colon one what I will do
I will do dict one of colon 1.
uh so you do see uh when I do deep on a
colon one the content here is the last
but when I do DF date of colon 1 because
DFT is a data frame I have the same
content but here what we get is called a
series so this is a data frame and this
is only if you select only a single
colon from a data frame then it is
called a Siri so this is how you will
access the color if not you want to
access a different column uh so this
could happen you know when you do like
uh DF diabats
um so we get like age and sex
age and sex
so if you get DF diabet and you get
agent sex up uh you will need to put
this uh double bracket if you want to
select two colors and then the odd part
because it's not a single column is a
double column then this is a data frame
uh as a knot uh you can easily transform
this as a data frame you just do there
is a method come to frame and then this
is a data frame so you do see the
difference between a series who is like
uh region like this without this like
gray and white stuff and then a two
frame so this is important to know what
type are you manipulating because if you
want to do operation let's say between
data frame you need to know you need to
know that if you want to aggregate two
column or something it has to be two
column of the same type you know two
series or two data frame if you want to
mix this one with this one and you want
to do an operation between this they
have to be data frame basically
uh so this is SolidWorks this is data
frame and you can use this like two
frame function to bring a single column
that is a series to a data frame uh so
yeah this is how it works to access a
color
so it's a bit like you know in Excel you
can just select one colon Etc and we do
a bit like in this dictionary so syntax
is a bit the same as for the dictionary
type
um then what is new is like when I got
my DF diabet
um I want to access maybe a single row
you know so what we will do
um in um in a list you know I would like
to do DF diabet of zero
but DFI a bit of zero wouldn't work if
it was a colon zero so how do I do this
I do Dot Lock I do that lock and then
when I do DF diabets.lock of zero I go
to the index that t there and I select
the row that is having a name zero it
could be that my index is different
right
so to access my index I will do
dfdiabets.index
dot index
uh and I will see it's like orange index
stats Etc no I could also go to my ZF
dict and I can also dot index and I see
is data zero is stop F7 and maybe I just
want to modify it you know so
um
uh I want to do uh so it started at zero
and let's say I want to start it at uh
seven value one two three four five six
seven
oh yeah so no I change it so if I have
my dick dot index
index is like this
and and if I did DF dict I will say that
you know before I was having this like
zero one two three four five six by
default it's a notary increment and
start with zero if not specified
otherwise uh in here I set an index you
know I want I say I want as an index two
three four five six seven eight so no my
index is two three four five six seven
eight perfect column one later so you
know if I do dfdig.log to zero this
would have worked before but no it told
me zero is not an index you know key raw
is zero meaning is not index so no I
look again and I'm like hmm this first
row is having a name the name of my
first row is two so I need to do log of
2 and if I do lock off two you will see
that I'm accessing this first line if I
want to access this line here I will do
Locker 5.
perfect so we use uh this uh this
bracket to access the colon also double
brackets if you want several colon and
if I only want to access a line or
several line I could do log five and I
could also put like different numbers in
there
um but you're like um I want to access
the line that is the first one no matter
what and I don't know the name then
there is a function that is a bit like
the lock and it's called e-lock so for
e-lock
um in pandas you just want to put the
position you know
um so if you go there and you do Dove
illock of zero this is working then you
got the first one if you want the second
row you will do this one right
um so you can really play uh with e-lock
and you can put different one if you
want the two first one you put zero and
one and then you will get the two first
one uh so this will be the same
us doing I want
um the lodge so the log of 0 and 1 is
the name are two and three so then I
will do two and three
and this will get me the same right so
if I use e-lock to say it is the index
of the location if you want or I use a
lock and I'm looking for the stuff so
this will always return the same thing
let's say I'm like shuffling my data
frame or you know I'm Shuff shuffling
rolls around this could be different
they say exactly know what I get as a
result right uh yeah so this is a
difference between the lock and Ayla and
how I can access row and how I can
access columns
so uh let's say here I have this a big
data frame that is my DFW about you know
and I'm like you know I can plot
everything Etc either is a function
because like display and we can use as
display as well to display everything
um but there is something very practical
so it is called head so head it's like a
method so that applies to my data frame
diabet and how does it work it's
basically how we love my head uh and you
just show me the five step Pro if I put
five it's five uh by default it's five I
can put more if I want to see the top 10
rows I do dot head of 20 and then I see
the top 10 row if I put 20 then I will
get the top 20 row but I will see
another practical because it can only
show a 10 volts Max
uh and then uh there is a thing called
tail and tail as you see show me the
last one so if I do tile of 10 then I
will get my 10 uh last row so this is uh
horses stuff is working uh then we have
um a thing called the shape so the shape
is like how big it is you know so if I
do my DF dot diabeto shape it will give
me two number the first number will be
the number of rows so this is my number
of row Heights 442 row and I have 10
colon if I go to my DF dict and I do dot
shape I will Got7 so no you know I could
write a function Dev print or I could
just print
uh deaf prints shape so if I do deaf
print shape I put a DF as an entrance
and then I will print
I will print I put an F string as we all
remember how to do F and two three
I put an F string and I knew I use the
data frame have has and here I do
DF dot shape of zero so this is going to
be the number of rows rows and
DF dot shape of one and this is the
number of colors
uh so now I do print shape of my DF
diabetes and here's a data Frameworks 40
40 rows uh maybe I can just put that
here so it's a bit nicer so that's a
frame has uh 442 row and 10 column now
if I print shape of something else so a
print shape of my addict up here I will
see I have seven rows and two colon
seven rows and two columns uh yeah so
this is to print the shape of the data
frame and know how big it is
5.0 / 5 (0 votes)