How to import data and install packages. R programming for beginners.

R Programming 101
14 Feb 201911:54

Summary

TLDRThis video tutorial from the 'SPAR Programming' series introduces viewers to the basics of creating and managing projects in R. It covers the importance of projects for organizing scripts, data, and outputs in a single directory. The instructor demonstrates how to import data, specifically using CSV files, and emphasizes the use of code for data importation to ensure automation. The video also touches on data manipulation techniques, such as selecting specific variables and filtering data based on conditions. Additionally, it highlights the utility of R packages, like 'tidyverse', for expanding functionality and facilitating data analysis. The tutorial aims to empower viewers with the skills to start analyzing their own data effectively.

Takeaways

  • πŸ“š Start by creating a project in R to organize scripts, data, and outputs neatly within a single directory.
  • πŸ” R projects help set the working directory, making it easier to manage file paths and outputs.
  • πŸ“ Create a new project by using the 'Create a Project' button and naming the project for easy identification.
  • πŸ“ˆ Import data into R by writing code that automates the process, ensuring reproducibility.
  • 🚫 Avoid using the 'Import Dataset' options in RStudio, as coding the import process is more efficient in the long run.
  • πŸ”’ Use functions like `read.csv` to import data from files, and assign the imported data to an object for further analysis.
  • πŸ”‘ Learn to manipulate data using R functions such as `head()`, `tail()`, and `view()` to explore the dataset.
  • πŸŽ›οΈ Understand the concept of 'packages' in R, which are collections of functions that solve specific problems and extend R's capabilities.
  • πŸ› οΈ Install and load R packages using `install.packages()` and `library()` or `require()` functions to access additional commands and functions.
  • πŸ”¬ Demonstrated the use of the 'tidyverse' package for data analysis, emphasizing its ease of use and power.
  • πŸ“Š Explored data manipulation techniques such as selecting specific columns, filtering rows based on conditions, and arranging data.

Q & A

  • What is the main topic of the video?

    -The main topic of the video is how to get started with R programming, focusing on creating a project, importing data, installing packages, and manipulating data.

  • What are the four quadrants mentioned in the video?

    -The four quadrants refer to the layout of the RStudio environment, which includes the script editor, console, environment/workspace, and files/plots panels.

  • Why is it recommended to start a project in R?

    -Starting a project in R helps organize the work by setting the working directory, keeping scripts, data, and outputs neatly stored in one place, which is useful for managing and reproducing the analysis.

  • How does one create a new project in R according to the video?

    -To create a new project in R, click on the 'Create Project' button, choose to create a new directory, give the project a name, and then click 'Create Project'.

  • What is the purpose of the 'read.csv' function in R?

    -The 'read.csv' function is used to import data from a CSV (Comma Separated Values) file into the R environment, making it available for analysis.

  • How can one view the first six rows of a dataset in R?

    -To view the first six rows of a dataset in R, use the 'head' function followed by the dataset name, like 'head(mydata)'.

  • What is the significance of the pipe operator in R scripting?

    -The pipe operator (%>%) in R allows for chaining commands together, making it easier to read and write complex sequences of operations on data.

  • What does the 'install.packages' function do in R?

    -The 'install.packages' function is used to install additional packages in R that provide extra functions and capabilities for specific tasks or analyses.

  • How can one select specific variables from a dataset in R?

    -To select specific variables from a dataset in R, use the 'select' function from the 'dplyr' package, specifying the variables to include, like 'mydata %>% select(variable1, variable2)'.

  • What is the purpose of the 'filter' function in data manipulation?

    -The 'filter' function is used to subset a dataset to include only the rows that meet certain conditions, such as filtering by age or height as demonstrated in the video.

  • How can one arrange a dataset by a specific variable in R?

    -To arrange a dataset by a specific variable in R, use the 'arrange' function from the 'dplyr' package, specifying the variable to sort by, like 'mydata %>% arrange(variable)'.

Outlines

00:00

πŸ“‚ Project Creation and Data Import Basics

This paragraph introduces the concept of a project in R programming, emphasizing the importance of organizing scripts, data, and outputs in one place for better workflow management. It guides viewers on how to create a new project in RStudio, set a working directory, and the benefits of doing so. The speaker also touches on the different options available for starting a new project and suggests ignoring certain features for now to focus on the essentials. The paragraph concludes with a demonstration of how to import data into the R environment by creating a new directory for the project and manually copying data into it.

05:01

πŸ” Data Manipulation and Package Installation

The second paragraph delves into data manipulation within R, starting with the process of importing a CSV file into the R environment. It explains how to read a CSV file using the 'read.csv' function and the importance of assigning the imported data to an object for further use. The paragraph also introduces the 'head', 'tail', and 'view' functions for data examination. It then covers the extraction of specific data components using indexing and the dollar sign notation. The speaker highlights the utility of R packages, explaining what they are, how to install them using 'install.packages', and load them using 'library' or 'require'. The paragraph wraps up with a brief mention of the 'tidyverse' package and its role in simplifying data analysis.

10:02

πŸ“Š Data Analysis with the Tidyverse Package

In the final paragraph, the focus shifts to performing data analysis using the 'tidyverse' package in R. The speaker demonstrates how to select specific variables from a data frame using the 'select' function and the pipe operator '%>%' for chaining commands. It also shows how to filter data based on conditions such as age and height, and how to arrange the data by a specific variable, in this case, height. The paragraph concludes with an invitation for viewers to subscribe and enable notifications for further educational content on R programming and data analysis.

Mindmap

Keywords

πŸ’‘Project

A 'Project' in the context of the video refers to a specific collection of scripts, data, and outputs organized within a designated directory in RStudio. It helps in maintaining a clean and organized workspace, making it easier to manage different aspects of data analysis. The script mentions creating a new project called 'test one' to demonstrate how R organizes everything neatly in one place.

πŸ’‘Import Data

The term 'Import Data' pertains to the process of bringing external data into the R environment to work with it. In the video, the speaker discusses different methods of importing data, such as through the 'Import Dataset' option or by using code to automate the process, which is recommended for efficiency and organization.

πŸ’‘Packages

In R programming, 'Packages' are collections of functions and data that extend the software's capabilities. They are used to solve specific problems and enhance the functionality of R scripts. The video emphasizes the importance of packages like 'tidyverse' for data manipulation and analysis, and demonstrates how to install and load them into a project.

πŸ’‘CSV

The 'CSV' (Comma Separated Values) file is a simple file format used to store tabular data, where each line represents a row and commas separate the values. The speaker in the video suggests using CSV files for their simplicity and recommends saving Excel spreadsheets as CSVs for easy import into R.

πŸ’‘Data Manipulation

Data Manipulation involves altering and organizing data to suit analysis needs. The video script covers basic data manipulation techniques in R, such as selecting specific columns ('name', 'age', 'height') and filtering data based on conditions (e.g., age less than 24) using functions from the 'tidyverse' package.

πŸ’‘Working Directory

The 'Working Directory' is the current directory R is using for reading and writing files. The video explains that starting a project sets the working directory, which helps R to know where to look for data files and where to store outputs, thus maintaining an organized workflow.

πŸ’‘R Script

An 'R Script' is a file containing R code that can be executed to perform data analysis tasks. The video mentions starting an R script from within a project, which allows for saving and organizing the code in a structured manner within the project's directory.

πŸ’‘Data Frame

A 'Data Frame' in R is a data structure used to store data tables. It consists of rows (observations) and columns (variables). The video script describes how to create a data frame from a CSV file and how to manipulate it using functions like 'select', 'filter', and 'arrange' from the 'tidyverse' package.

πŸ’‘Pipe Operator

The 'Pipe Operator' in R, represented by '%>%', is used to feed the result of one function as an argument to the next function. This makes the code more readable and modular. The video demonstrates the use of the pipe operator in a sequence of data manipulation steps.

πŸ’‘Analysis

The term 'Analysis' in the video refers to the process of examining data to draw conclusions or insights. The script includes an example of a simple analysis where the speaker filters and sorts data based on specific criteria, showcasing how R can be used for data analysis.

Highlights

Introduction to the series and the goal of the video: creating a project in R and understanding its components.

Explanation of what a project is and its importance in organizing scripts, data, and outputs in R.

Overview of the RStudio interface and the importance of starting with a project.

Detailed steps on how to create a new project in RStudio, including naming and setting up the working directory.

How to organize files and directories for a project to keep everything neat and accessible.

Importing data into R: recommended methods and why scripting the import process is better than using GUI options.

Demonstration of the read.csv function for importing CSV files and converting them into data frames.

Explanation of functions and objects in R, using examples like head(), tail(), and view() to explore data.

Manipulating data frames: accessing specific rows and columns using indexing and dollar sign notation.

Introduction to packages in R: what they are, how to install them, and how to use them in scripts.

Example of installing and loading the tidyverse package and its significance in data analysis.

Using the pipe operator (%>%) to chain commands and perform multiple operations on data frames.

Demonstration of selecting specific columns from a data frame using the select() function.

Filtering data based on conditions using the filter() function and combining multiple conditions.

Sorting data by a specific column using the arrange() function to organize the data output.

Encouragement to subscribe and hit the notification bell to stay updated with future R programming videos.

Transcripts

play00:00

welcome back to the spar programming

play00:02

video series on how to get started with

play00:04

our in this video we're gonna talk about

play00:06

how to create a project and I'm gonna

play00:08

explain to you what a project is we're

play00:10

gonna talk about how to import data I'm

play00:12

gonna teach you how to install packages

play00:14

and you are going to love packages they

play00:16

Rock and we're gonna talk a little bit

play00:18

about manipulating data

play00:19

I want you at the end of the session to

play00:22

feel as if you can do something in our

play00:24

okay so let's get started if you want to

play00:28

learn about our programming then you

play00:30

have come to the right place

play00:31

on this YouTube channel we're creating

play00:33

our programming videos on everything

play00:39

right we're looking at our studio at

play00:41

this point you've installed our you've

play00:43

installed our studio if you're not

play00:44

familiar with this environment there's

play00:46

these four quadrants I've got a video

play00:48

that goes through that and introduces

play00:49

you to this environment so have a look

play00:51

at that video I'm not gonna go through

play00:52

it right now

play00:52

you wanting to get going right so at the

play00:55

top on the left you've got a little pull

play00:57

down menu and you've got some options

play00:59

things you can start there's an R script

play01:01

or a notebook

play01:01

we're gonna go through each of these in

play01:03

detail in future videos I'm gonna

play01:05

suggest ignore this these for now right

play01:08

start off by starting a project so just

play01:12

to the left you're gonna see create a

play01:14

project button and the reason is if you

play01:16

write your script in the context of a

play01:19

project that you've started R will know

play01:21

where to look for your daughter where to

play01:23

put all of your outputs your graphics

play01:26

etc etc it stores everything quite

play01:28

neatly in one place it's sets watch of

play01:31

what we call your working directory and

play01:33

that's quite a useful thing you're gonna

play01:35

find it more and more useful down the

play01:36

line so my suggestion is right off the

play01:38

get-go get into the habit of when you're

play01:41

starting a project in our click on the

play01:44

starter project button you've got an

play01:46

option you and I'm gonna suggest create

play01:48

a new directory a new project give the

play01:52

project a name and I'm gonna call it

play01:53

test one like that and say create

play01:57

project ok so R is created a project we

play02:01

can see the project down here at the

play02:03

bottom on the right just so that you can

play02:04

see what's happening at the same time on

play02:06

my hard drive

play02:07

if we let have a look at my hard drive R

play02:09

is created a folder called test

play02:12

one click on that folder and there we

play02:13

can see that icon that represents the

play02:15

project if all was closed and we went to

play02:18

this place on my hard drive clicked on

play02:20

that icon our would open up in that

play02:23

project and we would see all of the

play02:25

script and the data and the outputs from

play02:27

that project all in one place it would

play02:28

be very neat it would be lovely it would

play02:30

be poetry you're gonna love it

play02:32

okay so that's starting a project okay

play02:36

so how do we get some data into our well

play02:38

let's go back to our hard drive that's

play02:40

the folder that was created when we

play02:41

created our project go in there I'm

play02:44

gonna cut and paste some data into that

play02:46

folder if we go back into our we can see

play02:49

that data sitting here now that dot it

play02:51

hasn't been imported yet we still need

play02:52

to do that but at least we know where to

play02:54

find it now to bring that data into our

play02:57

into our environment make it into an

play02:59

object that we can use there's a few

play03:00

things we can do and I'm gonna show you

play03:02

the things not to do but just so that

play03:04

you know that they exist if we ticked on

play03:06

the start down here you've got the

play03:08

option of import data set you can do

play03:10

that that's fine but I'm gonna say don't

play03:11

do that

play03:12

there's other options we've got import

play03:14

data set up here again don't do that the

play03:17

best thing to do is to use your code get

play03:20

that get your actual script to go and

play03:22

fetch the data so that when you run your

play03:24

code it's automated it automatically

play03:26

goes fetches the data creates an object

play03:28

puts it in your environment and you

play03:29

never have to think about it again so

play03:31

that's the way to do it and I'm going to

play03:32

show you how right here we've got some

play03:34

code and this code is going to import

play03:37

some data and it's also going to do a

play03:38

little bit of analysis and I'm going to

play03:40

go through the code one step at a time

play03:41

just to teach you how it is that I've

play03:42

done this right just so that you know

play03:44

when you write code here in the source

play03:46

up on the lip top left if you when

play03:49

you've written the code you put go to

play03:50

file save as and you save it and it pops

play03:53

down here into your project which is all

play03:56

nicely and neatly kept and you're

play03:58

working direct you down on the bottom

play04:00

right okay so let's go through this one

play04:03

step at a time okay we're gonna start

play04:04

off by looking at the read CSV read dot

play04:07

CSV function of course we can import any

play04:09

kind of data we can import data straight

play04:11

from Microsoft Excel we can import SPSS

play04:14

files we can ports data files as CSV is

play04:17

a nice and simple file if you've got an

play04:18

Excel spreadsheet you can save it as a

play04:20

CSV I usually save

play04:22

as CSV is an import them that way

play04:24

because it's uncomplicated and it's not

play04:26

messy but we're gonna create videos that

play04:28

look at each of these individually and

play04:30

we'll go through them one at a time for

play04:32

this video we're just going to stick

play04:33

with a nice simple CSV file so we've got

play04:36

a function that says read CSV in

play04:38

brackets and in inverted commas that's

play04:41

important we have the file name and the

play04:43

final extension now if I didn't have

play04:45

this little arrow over here if I just

play04:46

did the function and it's a file it's

play04:49

gonna look in our working directory I

play04:51

push command and to run that or I click

play04:53

on run over there so command enter and

play04:56

down here in the console we can see

play04:58

there's our data now that's not

play05:00

particularly useful to us right now as

play05:02

it is because we want that to be an

play05:04

object that we can use so if I give that

play05:06

a name and I say my data and create this

play05:10

little arrow with the less than and -

play05:14

which is kind of like an arrow it says

play05:15

everything that's over here gets

play05:17

assigned to that name push ctrl enter

play05:19

and in our environment on the Left we

play05:21

can see my daughter sits there we can

play05:23

have a look at what the variable

play05:24

variables within there are and that's

play05:27

our daughter sitting and it's being read

play05:29

in and it's within our and we can start

play05:31

using it so we want to view our daughter

play05:33

now for the most part our works with

play05:36

functions and objects so we've got an

play05:38

object my daughter sitting in our

play05:40

environment of there

play05:41

and we've got functions a function is

play05:43

this function called head if we type in

play05:46

head my daughter and push command into

play05:48

or ctrl enter or run up there it's gonna

play05:51

give us the first six rows of data if we

play05:56

do tail my daughter it's gonna give us

play05:57

the last six rows of our data and if we

play05:59

do view my daughter it's gonna produce

play06:02

the daughter in a little spreadsheet

play06:03

that we can look at so let's have a look

play06:05

at that right remember with this kind of

play06:07

data this is a nice flat spreadsheet

play06:09

we've got each row is an observation and

play06:12

each column is a variable let's go back

play06:16

to our four quadrants we can also view

play06:20

the data if we looking at our script we

play06:23

can also view the data by clicking on my

play06:25

data we can click on the object and

play06:28

it'll also bring it up over there now we

play06:31

might want to extract specific

play06:33

components of our data

play06:35

so remember we've we've said that rows

play06:38

observation columns are variables and if

play06:40

we put my diet and we use these square

play06:42

brackets to tell our way to look the

play06:45

first number after the square bracket

play06:46

tells it what row to look at and the

play06:48

second number what column right so if we

play06:51

run that it's gonna give us blue well

play06:55

what is blue blue is the first row and

play07:00

the third column the variable eye color

play07:03

so we got this cell over here popped out

play07:06

if we didn't put a row and we just put

play07:10

comma 3 it's gonna do the entire column

play07:13

so let's run that and there we go blue

play07:16

brown blue blah-blah-blah-blah-blah

play07:17

that's basically spitting out this

play07:19

entire column and this column this

play07:22

variable name is eye color so we can

play07:24

also do my daughter dollar sign I color

play07:27

and it does the same thing okay before

play07:31

we start doing some analysis I just want

play07:33

to talk to you a little bit about

play07:34

packages because you're gonna find these

play07:35

things tremendously useful right

play07:37

packages are the program functions that

play07:41

solve very specific problems they expand

play07:43

the our vocabulary to install a package

play07:46

you use this function install packages

play07:48

right and then open brackets you need

play07:50

the inverted commas you put the name of

play07:52

the package close brackets you only ever

play07:54

need to install a package once once its

play07:56

installed on your computer it's there

play07:58

but when you want to use it in your

play08:00

script you need to include either

play08:02

library or require either of those two

play08:05

you don't need them both you put that

play08:06

into your script it'll go and fetch that

play08:08

package it'll use it and then from that

play08:10

point onwards in your script you have

play08:12

access to additional commands and

play08:14

functions so of course I have previously

play08:17

installed the tidy verse package at this

play08:19

point I want to push command enter or

play08:22

control enter to run this line of code

play08:24

that uses it so bada-bing now I'm gonna

play08:28

show you how to do an analysis in our

play08:31

using some of this vocabulary that comes

play08:34

to the tight abuse you're going to see

play08:35

how easy and it should have been

play08:36

straightforward it is when you see how

play08:38

easy it is you're going to be really

play08:39

excited about analyzing your own

play08:40

daughter okay so the first thing we do

play08:42

is we type in our daughter frame we

play08:44

start off with my my daughter that's

play08:47

that's our object okay if I push command

play08:50

in total control in throw on a PC at

play08:53

this point it brings up the whole dart

play08:56

frame in that this is a small dot frame

play08:57

okay

play08:58

shift control to to have a closer look

play09:01

at the console right this is our whole

play09:03

dart frame now daughter frames are

play09:04

usually much much bigger than this we

play09:06

may have hundreds of variables what do

play09:08

we do we want to select just a few of

play09:10

them we might in this case what I select

play09:12

just name age and height right shift

play09:15

control zero to go back to all four

play09:17

quadrants so we want to select that

play09:19

before we select it I want to teach you

play09:20

about a little thing called the pipe

play09:22

operator right shift command M that's

play09:25

the pipe operator right it's a percent

play09:27

greater than percent it looks a little

play09:30

bit like a pipe and what it means is

play09:32

whatever you've done on the left hand

play09:34

side whatever that line of code is gets

play09:36

piped into the next line of code right

play09:40

so if you've done some sort of change or

play09:41

manipulation that change gets piped into

play09:43

and you'll see more how that works as we

play09:45

as we go through this example now you

play09:47

would ordinary you could just carry on

play09:48

typing to the right I like to after a

play09:51

pipe operator go to the next line R will

play09:54

see that as continuing on the same line

play09:55

it doesn't really matter okay it looks

play09:57

like an air is popping up there it's not

play09:59

you can ignore that rid okay so we've

play10:01

said my daughter we've got in a pipe

play10:03

operator which just means and then so my

play10:06

daughter and then right we've said we

play10:08

want to select name age and height so it

play10:10

is literally as simple as that

play10:12

select open brackets name age height

play10:18

command enter okay now we can see in our

play10:22

console and I'm gonna zoom in and the

play10:23

console with shift control - we can see

play10:26

we originally had the entire data frame

play10:29

we wanted to select a few of the

play10:32

variables in this case name age and

play10:33

height and there they are now we might

play10:37

want to only look at people that are

play10:40

less than 24 years old so another pipe

play10:42

operator which is and then go to the

play10:45

next time filter by those that are aged

play10:48

less than 24 and we might want to say

play10:53

let's let's make this a bit more

play10:55

complicated to say age less than 24

play10:58

and height greater than 1.78 for example

play11:10

okay let's have a look at what that does

play11:11

and voila

play11:13

let's have a look at that will we don't

play11:15

need to zoom in on the console because

play11:16

we can see right here through just three

play11:18

rows that met that criteria we might now

play11:22

want to add here another pipe operator

play11:26

and then arrange by height and it'll

play11:31

arrange it by height come on to enter

play11:35

and there we go voila so if you are

play11:39

serious about learning how to analyze

play11:40

data and you want to learn our

play11:41

programming then hit the subscribe

play11:43

button now and hit the little bell

play11:44

notification if you want to get notified

play11:45

of future videos

play11:49

[Music]

Rate This
β˜…
β˜…
β˜…
β˜…
β˜…

5.0 / 5 (0 votes)

Related Tags
R ProgrammingData AnalysisVideo TutorialProject CreationData ImportPackage InstallationData ManipulationCSV Filestidyverse PackageData FramesScript Automation