How to import data and install packages. R programming for beginners.
Summary
TLDRThis video tutorial from the 'SPAR Programming' series introduces viewers to the basics of creating and managing projects in R. It covers the importance of projects for organizing scripts, data, and outputs in a single directory. The instructor demonstrates how to import data, specifically using CSV files, and emphasizes the use of code for data importation to ensure automation. The video also touches on data manipulation techniques, such as selecting specific variables and filtering data based on conditions. Additionally, it highlights the utility of R packages, like 'tidyverse', for expanding functionality and facilitating data analysis. The tutorial aims to empower viewers with the skills to start analyzing their own data effectively.
Takeaways
- đ Start by creating a project in R to organize scripts, data, and outputs neatly within a single directory.
- đ R projects help set the working directory, making it easier to manage file paths and outputs.
- đ Create a new project by using the 'Create a Project' button and naming the project for easy identification.
- đ Import data into R by writing code that automates the process, ensuring reproducibility.
- đ« Avoid using the 'Import Dataset' options in RStudio, as coding the import process is more efficient in the long run.
- đą Use functions like `read.csv` to import data from files, and assign the imported data to an object for further analysis.
- đ Learn to manipulate data using R functions such as `head()`, `tail()`, and `view()` to explore the dataset.
- đïž Understand the concept of 'packages' in R, which are collections of functions that solve specific problems and extend R's capabilities.
- đ ïž Install and load R packages using `install.packages()` and `library()` or `require()` functions to access additional commands and functions.
- đŹ Demonstrated the use of the 'tidyverse' package for data analysis, emphasizing its ease of use and power.
- đ Explored data manipulation techniques such as selecting specific columns, filtering rows based on conditions, and arranging data.
Q & A
What is the main topic of the video?
-The main topic of the video is how to get started with R programming, focusing on creating a project, importing data, installing packages, and manipulating data.
What are the four quadrants mentioned in the video?
-The four quadrants refer to the layout of the RStudio environment, which includes the script editor, console, environment/workspace, and files/plots panels.
Why is it recommended to start a project in R?
-Starting a project in R helps organize the work by setting the working directory, keeping scripts, data, and outputs neatly stored in one place, which is useful for managing and reproducing the analysis.
How does one create a new project in R according to the video?
-To create a new project in R, click on the 'Create Project' button, choose to create a new directory, give the project a name, and then click 'Create Project'.
What is the purpose of the 'read.csv' function in R?
-The 'read.csv' function is used to import data from a CSV (Comma Separated Values) file into the R environment, making it available for analysis.
How can one view the first six rows of a dataset in R?
-To view the first six rows of a dataset in R, use the 'head' function followed by the dataset name, like 'head(mydata)'.
What is the significance of the pipe operator in R scripting?
-The pipe operator (%>%) in R allows for chaining commands together, making it easier to read and write complex sequences of operations on data.
What does the 'install.packages' function do in R?
-The 'install.packages' function is used to install additional packages in R that provide extra functions and capabilities for specific tasks or analyses.
How can one select specific variables from a dataset in R?
-To select specific variables from a dataset in R, use the 'select' function from the 'dplyr' package, specifying the variables to include, like 'mydata %>% select(variable1, variable2)'.
What is the purpose of the 'filter' function in data manipulation?
-The 'filter' function is used to subset a dataset to include only the rows that meet certain conditions, such as filtering by age or height as demonstrated in the video.
How can one arrange a dataset by a specific variable in R?
-To arrange a dataset by a specific variable in R, use the 'arrange' function from the 'dplyr' package, specifying the variable to sort by, like 'mydata %>% arrange(variable)'.
Outlines
đ Project Creation and Data Import Basics
This paragraph introduces the concept of a project in R programming, emphasizing the importance of organizing scripts, data, and outputs in one place for better workflow management. It guides viewers on how to create a new project in RStudio, set a working directory, and the benefits of doing so. The speaker also touches on the different options available for starting a new project and suggests ignoring certain features for now to focus on the essentials. The paragraph concludes with a demonstration of how to import data into the R environment by creating a new directory for the project and manually copying data into it.
đ Data Manipulation and Package Installation
The second paragraph delves into data manipulation within R, starting with the process of importing a CSV file into the R environment. It explains how to read a CSV file using the 'read.csv' function and the importance of assigning the imported data to an object for further use. The paragraph also introduces the 'head', 'tail', and 'view' functions for data examination. It then covers the extraction of specific data components using indexing and the dollar sign notation. The speaker highlights the utility of R packages, explaining what they are, how to install them using 'install.packages', and load them using 'library' or 'require'. The paragraph wraps up with a brief mention of the 'tidyverse' package and its role in simplifying data analysis.
đ Data Analysis with the Tidyverse Package
In the final paragraph, the focus shifts to performing data analysis using the 'tidyverse' package in R. The speaker demonstrates how to select specific variables from a data frame using the 'select' function and the pipe operator '%>%' for chaining commands. It also shows how to filter data based on conditions such as age and height, and how to arrange the data by a specific variable, in this case, height. The paragraph concludes with an invitation for viewers to subscribe and enable notifications for further educational content on R programming and data analysis.
Mindmap
Keywords
đĄProject
đĄImport Data
đĄPackages
đĄCSV
đĄData Manipulation
đĄWorking Directory
đĄR Script
đĄData Frame
đĄPipe Operator
đĄAnalysis
Highlights
Introduction to the series and the goal of the video: creating a project in R and understanding its components.
Explanation of what a project is and its importance in organizing scripts, data, and outputs in R.
Overview of the RStudio interface and the importance of starting with a project.
Detailed steps on how to create a new project in RStudio, including naming and setting up the working directory.
How to organize files and directories for a project to keep everything neat and accessible.
Importing data into R: recommended methods and why scripting the import process is better than using GUI options.
Demonstration of the read.csv function for importing CSV files and converting them into data frames.
Explanation of functions and objects in R, using examples like head(), tail(), and view() to explore data.
Manipulating data frames: accessing specific rows and columns using indexing and dollar sign notation.
Introduction to packages in R: what they are, how to install them, and how to use them in scripts.
Example of installing and loading the tidyverse package and its significance in data analysis.
Using the pipe operator (%>%) to chain commands and perform multiple operations on data frames.
Demonstration of selecting specific columns from a data frame using the select() function.
Filtering data based on conditions using the filter() function and combining multiple conditions.
Sorting data by a specific column using the arrange() function to organize the data output.
Encouragement to subscribe and hit the notification bell to stay updated with future R programming videos.
Transcripts
welcome back to the spar programming
video series on how to get started with
our in this video we're gonna talk about
how to create a project and I'm gonna
explain to you what a project is we're
gonna talk about how to import data I'm
gonna teach you how to install packages
and you are going to love packages they
Rock and we're gonna talk a little bit
about manipulating data
I want you at the end of the session to
feel as if you can do something in our
okay so let's get started if you want to
learn about our programming then you
have come to the right place
on this YouTube channel we're creating
our programming videos on everything
right we're looking at our studio at
this point you've installed our you've
installed our studio if you're not
familiar with this environment there's
these four quadrants I've got a video
that goes through that and introduces
you to this environment so have a look
at that video I'm not gonna go through
it right now
you wanting to get going right so at the
top on the left you've got a little pull
down menu and you've got some options
things you can start there's an R script
or a notebook
we're gonna go through each of these in
detail in future videos I'm gonna
suggest ignore this these for now right
start off by starting a project so just
to the left you're gonna see create a
project button and the reason is if you
write your script in the context of a
project that you've started R will know
where to look for your daughter where to
put all of your outputs your graphics
etc etc it stores everything quite
neatly in one place it's sets watch of
what we call your working directory and
that's quite a useful thing you're gonna
find it more and more useful down the
line so my suggestion is right off the
get-go get into the habit of when you're
starting a project in our click on the
starter project button you've got an
option you and I'm gonna suggest create
a new directory a new project give the
project a name and I'm gonna call it
test one like that and say create
project ok so R is created a project we
can see the project down here at the
bottom on the right just so that you can
see what's happening at the same time on
my hard drive
if we let have a look at my hard drive R
is created a folder called test
one click on that folder and there we
can see that icon that represents the
project if all was closed and we went to
this place on my hard drive clicked on
that icon our would open up in that
project and we would see all of the
script and the data and the outputs from
that project all in one place it would
be very neat it would be lovely it would
be poetry you're gonna love it
okay so that's starting a project okay
so how do we get some data into our well
let's go back to our hard drive that's
the folder that was created when we
created our project go in there I'm
gonna cut and paste some data into that
folder if we go back into our we can see
that data sitting here now that dot it
hasn't been imported yet we still need
to do that but at least we know where to
find it now to bring that data into our
into our environment make it into an
object that we can use there's a few
things we can do and I'm gonna show you
the things not to do but just so that
you know that they exist if we ticked on
the start down here you've got the
option of import data set you can do
that that's fine but I'm gonna say don't
do that
there's other options we've got import
data set up here again don't do that the
best thing to do is to use your code get
that get your actual script to go and
fetch the data so that when you run your
code it's automated it automatically
goes fetches the data creates an object
puts it in your environment and you
never have to think about it again so
that's the way to do it and I'm going to
show you how right here we've got some
code and this code is going to import
some data and it's also going to do a
little bit of analysis and I'm going to
go through the code one step at a time
just to teach you how it is that I've
done this right just so that you know
when you write code here in the source
up on the lip top left if you when
you've written the code you put go to
file save as and you save it and it pops
down here into your project which is all
nicely and neatly kept and you're
working direct you down on the bottom
right okay so let's go through this one
step at a time okay we're gonna start
off by looking at the read CSV read dot
CSV function of course we can import any
kind of data we can import data straight
from Microsoft Excel we can import SPSS
files we can ports data files as CSV is
a nice and simple file if you've got an
Excel spreadsheet you can save it as a
CSV I usually save
as CSV is an import them that way
because it's uncomplicated and it's not
messy but we're gonna create videos that
look at each of these individually and
we'll go through them one at a time for
this video we're just going to stick
with a nice simple CSV file so we've got
a function that says read CSV in
brackets and in inverted commas that's
important we have the file name and the
final extension now if I didn't have
this little arrow over here if I just
did the function and it's a file it's
gonna look in our working directory I
push command and to run that or I click
on run over there so command enter and
down here in the console we can see
there's our data now that's not
particularly useful to us right now as
it is because we want that to be an
object that we can use so if I give that
a name and I say my data and create this
little arrow with the less than and -
which is kind of like an arrow it says
everything that's over here gets
assigned to that name push ctrl enter
and in our environment on the Left we
can see my daughter sits there we can
have a look at what the variable
variables within there are and that's
our daughter sitting and it's being read
in and it's within our and we can start
using it so we want to view our daughter
now for the most part our works with
functions and objects so we've got an
object my daughter sitting in our
environment of there
and we've got functions a function is
this function called head if we type in
head my daughter and push command into
or ctrl enter or run up there it's gonna
give us the first six rows of data if we
do tail my daughter it's gonna give us
the last six rows of our data and if we
do view my daughter it's gonna produce
the daughter in a little spreadsheet
that we can look at so let's have a look
at that right remember with this kind of
data this is a nice flat spreadsheet
we've got each row is an observation and
each column is a variable let's go back
to our four quadrants we can also view
the data if we looking at our script we
can also view the data by clicking on my
data we can click on the object and
it'll also bring it up over there now we
might want to extract specific
components of our data
so remember we've we've said that rows
observation columns are variables and if
we put my diet and we use these square
brackets to tell our way to look the
first number after the square bracket
tells it what row to look at and the
second number what column right so if we
run that it's gonna give us blue well
what is blue blue is the first row and
the third column the variable eye color
so we got this cell over here popped out
if we didn't put a row and we just put
comma 3 it's gonna do the entire column
so let's run that and there we go blue
brown blue blah-blah-blah-blah-blah
that's basically spitting out this
entire column and this column this
variable name is eye color so we can
also do my daughter dollar sign I color
and it does the same thing okay before
we start doing some analysis I just want
to talk to you a little bit about
packages because you're gonna find these
things tremendously useful right
packages are the program functions that
solve very specific problems they expand
the our vocabulary to install a package
you use this function install packages
right and then open brackets you need
the inverted commas you put the name of
the package close brackets you only ever
need to install a package once once its
installed on your computer it's there
but when you want to use it in your
script you need to include either
library or require either of those two
you don't need them both you put that
into your script it'll go and fetch that
package it'll use it and then from that
point onwards in your script you have
access to additional commands and
functions so of course I have previously
installed the tidy verse package at this
point I want to push command enter or
control enter to run this line of code
that uses it so bada-bing now I'm gonna
show you how to do an analysis in our
using some of this vocabulary that comes
to the tight abuse you're going to see
how easy and it should have been
straightforward it is when you see how
easy it is you're going to be really
excited about analyzing your own
daughter okay so the first thing we do
is we type in our daughter frame we
start off with my my daughter that's
that's our object okay if I push command
in total control in throw on a PC at
this point it brings up the whole dart
frame in that this is a small dot frame
okay
shift control to to have a closer look
at the console right this is our whole
dart frame now daughter frames are
usually much much bigger than this we
may have hundreds of variables what do
we do we want to select just a few of
them we might in this case what I select
just name age and height right shift
control zero to go back to all four
quadrants so we want to select that
before we select it I want to teach you
about a little thing called the pipe
operator right shift command M that's
the pipe operator right it's a percent
greater than percent it looks a little
bit like a pipe and what it means is
whatever you've done on the left hand
side whatever that line of code is gets
piped into the next line of code right
so if you've done some sort of change or
manipulation that change gets piped into
and you'll see more how that works as we
as we go through this example now you
would ordinary you could just carry on
typing to the right I like to after a
pipe operator go to the next line R will
see that as continuing on the same line
it doesn't really matter okay it looks
like an air is popping up there it's not
you can ignore that rid okay so we've
said my daughter we've got in a pipe
operator which just means and then so my
daughter and then right we've said we
want to select name age and height so it
is literally as simple as that
select open brackets name age height
command enter okay now we can see in our
console and I'm gonna zoom in and the
console with shift control - we can see
we originally had the entire data frame
we wanted to select a few of the
variables in this case name age and
height and there they are now we might
want to only look at people that are
less than 24 years old so another pipe
operator which is and then go to the
next time filter by those that are aged
less than 24 and we might want to say
let's let's make this a bit more
complicated to say age less than 24
and height greater than 1.78 for example
okay let's have a look at what that does
and voila
let's have a look at that will we don't
need to zoom in on the console because
we can see right here through just three
rows that met that criteria we might now
want to add here another pipe operator
and then arrange by height and it'll
arrange it by height come on to enter
and there we go voila so if you are
serious about learning how to analyze
data and you want to learn our
programming then hit the subscribe
button now and hit the little bell
notification if you want to get notified
of future videos
[Music]
Voir Plus de Vidéos Connexes
R programming in one hour - a crash course for beginners
Azure Service Fabric - Tutorial 17 - Data Packages, Config and Environment Variables
Intro to Data Visualization with R & ggplot2 | Google Data Analytics Certificate
Python: Pandas Tutorial | Intro to DataFrames
Pandas Creating Columns - Data Analysis with Python Course
BeautifulSoup + Requests | Web Scraping in Python
5.0 / 5 (0 votes)