Intro to Data Visualization with R & ggplot2 | Google Data Analytics Certificate
Summary
TLDRThis video script delves into the significance of data visualization in analysis, highlighting ggplot2 as a pivotal tool in R's tidyverse. It guides viewers through creating various plots, customizing aesthetics, and utilizing geoms to represent data effectively. The script also covers facet functions for subset display, annotation techniques for plot clarification, and methods for saving visualizations. The tutorial aims to empower data analysts with ggplot2's capabilities for insightful and accessible data storytelling.
Takeaways
- π Data visualization is crucial for conveying insights from data analysis, making complex information understandable through compelling visuals.
- π R's ggplot2 package is a popular tool for data visualization, offering powerful and user-friendly features to create various types of plots.
- π The ggplot2 package is part of the tidyverse, inspired by 'The Grammar of Graphics', providing a systematic approach to building visuals.
- π Aesthetics in ggplot2 determine the visual properties of plot elements, such as size, shape, and color, and are mapped from data variables.
- π Geoms are geometric objects used to represent data in plots, with options like points for scatter plots, bars for bar charts, and lines for line diagrams.
- π Facets in ggplot2 allow for displaying subsets of data in separate plots, useful for exploring relationships within different groups or categories.
- ποΈ Annotations and labels in ggplot2, such as titles, subtitles, captions, and text annotations, enhance the clarity and communication of a plot's purpose.
- 𧩠The use of color, shape, and size aesthetics can help differentiate data points and highlight key insights, making visuals more accessible and informative.
- π Saving plots in R using ggsave or RStudio's Export option is essential for reproducing and sharing work, facilitating collaboration and feedback.
- π ggplot2's flexibility and popularity among data analysts make it a foundational tool for learning more advanced data visualization techniques in R.
Q & A
What is the significance of data visualization in data analysis?
-Data visualization is crucial in data analysis as it helps stakeholders understand the meaning of data through clear and compelling visuals. It brings the story of the data to life and makes it easier to comprehend.
What is ggplot2 and why is it popular in R?
-Ggplot2 is R's most popular visualization package, created by Hadley Wickham. It is favored for its power and user-friendliness, allowing users to create a variety of plots, organize and represent different variables in a dataset, and customize the visuals.
What inspired the creation of ggplot2?
-Ggplot2 was inspired by the 1999 book 'The Grammar of Graphics' by Leland Wilkinson, which is a scholarly study of data visualization by a computer scientist. The name ggplot2 stands for 'grammar of graphics'.
What are some other visualization packages available in R?
-Besides ggplot2, R offers other visualization packages such as Plotly, Lattice, RGL, Dygraphs, Leaflet, Highcharter, Patchwork, gganimate, and ggridges, each serving different visualization needs from simple pie charts to complex interactive graphs and 3D visuals.
How does ggplot2 help in combining data manipulation and visualization?
-Ggplot2 allows users to combine data manipulation and visualization using the pipe operator, making it a versatile tool for data analysis.
What are aesthetics in ggplot2 and how are they used?
-In ggplot2, aesthetics are visual properties of objects in a plot, such as size, shape, or color of data points. They are used to map visual features in a plot to variables in the data, enhancing the visual representation and storytelling of the data.
What is a geom in ggplot2 and how does it differ from aes?
-A geom in ggplot2 refers to the geometric object used to represent data, such as points for scatter plots or bars for bar charts. It differs from aes, which is used to define the mapping between data and plot aesthetics.
How can facets be used in ggplot2 to display data?
-Facets in ggplot2 allow users to display smaller groups or subsets of data by creating separate plots for each variable or category. This can help in focusing on specific trends or relationships within the data.
What are some common geom functions in ggplot2?
-Common geom functions in ggplot2 include geom_point for scatter plots, geom_bar for bar charts, geom_line for line diagrams, geom_smooth for trend lines, and geom_jitter for scatter plots with random noise to avoid over-plotting.
How can annotations be added to a plot in ggplot2?
-Annotations can be added to a plot in ggplot2 using the annotate function. This allows users to include text labels, titles, subtitles, captions, and other notes to explain or comment on the plot, making it easier for stakeholders to understand the data presentation.
What are the methods to save plots in ggplot2?
-Plots in ggplot2 can be saved using the Export option in the Plots tab of RStudio or the ggsave function provided by the ggplot2 package. Users can save plots as image files or PDF files, ensuring they can access and share their work later.
Outlines
π Introduction to Data Visualization with ggplot2
The script introduces data visualization as a crucial aspect of data analysis, emphasizing its role in conveying data insights to stakeholders. It highlights ggplot2, a popular R package, as a powerful and user-friendly tool for creating various plots. The speaker previews upcoming lessons on coding with ggplot2, customizing visuals, and leveraging its features for effective data storytelling. The paragraph also mentions other visualization packages in R, such as Plotly and Lattice, and touches on the origin of ggplot2, inspired by 'The Grammar of Graphics', a foundational text for data visualization.
π¨ Mastering ggplot2 Basics: Aesthetics and Geom Functions
This paragraph delves into the fundamentals of ggplot2, focusing on aesthetics and geoms. Aesthetics are visual properties like color, size, and shape that map to data variables, while geoms represent the geometric objects used to visualize data, such as points, bars, or lines. The speaker discusses the pipe operator for combining data manipulation and visualization and mentions the ggplot2 cheatsheet as a reference guide. Core concepts like mapping data to aesthetics, choosing appropriate geoms for data representation, and customizing plot labels are introduced, setting the stage for more advanced topics.
π Exploring Data with ggplot2: Creating and Customizing Plots
The script provides a step-by-step guide to creating a ggplot2 plot, starting with the ggplot function and progressing through adding layers with geoms and mapping aesthetics. It explains the importance of the plus sign for layering and the use of the aes function for defining mappings between data and visual properties. The paragraph includes practical tips for writing code in R, such as paying attention to case sensitivity and matching parentheses, and encourages learners to utilize R's help resources and community for troubleshooting.
π Enhancing Visuals: Applying Aesthetics to Data Points
This section discusses the use of aesthetics to enhance data visualization, allowing for clearer communication of data insights. It demonstrates how to map variables to aesthetics such as color, shape, and size to differentiate data points, using the Penguins dataset as an example. The speaker shows how to adjust transparency with the alpha aesthetic and how to use R's automatic legends to aid in data interpretation. The paragraph also covers how to change the overall appearance of a plot by setting aesthetics outside of the aes function.
π Understanding Geom Functions: Diverse Data Representations
The script explores the variety of geom functions available in ggplot2 for creating different types of plots. It contrasts the use of geom_point for scatter plots with geom_bar for bar charts and introduces geom_smooth for adding trend lines. The paragraph explains how to combine geoms in a single plot and how to use geom_jitter to address over-plotting issues. The goal is to show the flexibility of ggplot2 in representing data through various geoms and how they can be tailored to specific storytelling needs.
π Analyzing Trends with Advanced ggplot2 Features
This paragraph examines advanced features of ggplot2 for detailed data analysis, such as using geom_smooth to illustrate trends and facets to display subsets of data. The speaker discusses how to use facet_wrap and facet_grid functions to create multi-panel plots that reveal patterns and trends within data groups. The paragraph provides examples of how facet functions can be applied to the Penguins and Diamonds datasets to uncover insights that may not be apparent in a single plot.
ποΈ Customizing Plots with Labels and Annotations
The script focuses on the customization of plots using labels and annotations to enhance data communication. It describes how to add titles, subtitles, and captions to plots for clarity and how to use the annotate function to place text within the plot grid to emphasize specific data points. The paragraph provides a hands-on example using the Penguins dataset and discusses various formatting options for annotations, such as color, font style, size, and angle.
πΎ Saving Your ggplot2 Visualizations
This section outlines the importance of saving ggplot2 plots for future access and collaboration. It demonstrates two methods for saving plots: using the Export option in RStudio and the ggsave function from the ggplot2 package. The speaker provides step-by-step instructions for each method, including choosing file formats, naming files, and accessing saved files. The paragraph reinforces the value of reproducible and shareable work in an analytical role.
π Course Completion and Next Steps
The final paragraph celebrates the completion of the video and encourages further engagement with the course material. It prompts viewers to access the full course experience for job search assistance and to earn an official certificate. It also invites viewers to watch the next video in the series and to subscribe to the channel for more educational content, highlighting the ongoing learning journey in data analytics.
Mindmap
Keywords
π‘Data visualization
π‘ggplot2
π‘Aesthetics
π‘Geoms
π‘Facets
π‘Annotations
π‘Data frames
π‘The Grammar of Graphics
π‘Customization
π‘RStudio Cloud
π‘Saving plots
Highlights
Data visualization is crucial for data analysis, as it allows stakeholders to understand data through clear and compelling visuals.
ggplot2 is R's most popular data visualization package, noted for its power and user-friendliness.
Learning ggplot2 enhances data visualization skills, facilitating the understanding of data changes and their immediate visual representation.
R enables quick transitions between data analysis and visualization, streamlining the workflow of a data analyst.
Various visualization packages exist for R, including Plotly, Lattice, and RGL, each serving different purposes from simple charts to complex visuals.
ggplot2's creation was inspired by 'The Grammar of Graphics', providing a systematic approach to building a wide range of visualizations.
Aesthetics in ggplot2 are visual properties like size, shape, or color, mapping data variables to visual elements in a plot.
Geoms in ggplot2 represent data through geometric objects such as points, bars, or lines, chosen based on the data type and desired representation.
Facets in ggplot2 allow for the display of subsets of data on separate plots, revealing patterns and trends within specific groups.
Labels and annotations in ggplot2 are used to add context, highlight important data, and guide the viewer's attention.
The ggplot2 cheatsheet serves as a comprehensive reference guide for functions and is useful for learning new visualization techniques.
ggplot2's syntax and structure promote the reuse of code for creating various plots, making the visualization process efficient.
Common issues in ggplot2, such as misplaced plus signs or parentheses, can be easily resolved with attention to detail and R's helpful resources.
The use of geom functions like geom_smooth and geom_jitter can enhance scatter plots by showing trends and dealing with over-plotting.
Bar charts can be effectively created and customized in ggplot2 using geom_bar, with options to display counts or proportions.
Faceting with facet_wrap and facet_grid in ggplot2 enables the organization of complex data into more digestible and insightful visuals.
Customizing ggplot2 plots with titles, subtitles, captions, and annotations helps in effectively communicating the story of the data.
Plots created in ggplot2 can be saved using RStudio's Export option or the ggsave function for future access and sharing.
Transcripts
SPEAKER: Data visualization is one of the most important parts
of data analysis.
Powerful visuals show stakeholders
what your data means in a clear and compelling way,
and highlighting key insights.
Visuals help bring the story of your data to life,
and make that story easier to understand.
You might remember the sneak peek
I gave you of R's data visualization powers.
I created those visuals with ggplot2,
one of the core packages of the tidyverse.
Ggplot2 is R's most popular visualization package,
and for good reason.
It's a powerful and user-friendly data viz tool.
Up next, you'll learn how to write and execute
all the code we previewed earlier.
You'll learn how to use ggplot2 to create a variety of plots,
organize and represent different variables in your data set,
and customize the look and feel of your visuals.
Working with ggplot2 can help you
get the most out of your data.
Your new data viz skills will also
make it easier to learn other parts of R. Going forward,
you'll be better able to visualize
the results of any change you make to your data.
Plus, you get an immediate result for all your hard work,
which is one of my favorite parts
of creating plots in ggplot2.
Just enter some code, run it, and out comes
a cool-looking visual that helps you and others understand
your data.
Visualization is a key part of a data analyst's workflow,
and R lets you move back and forth
between analysis and visualization
quickly and easily.
I'm looking forward to showing you what ggplot2 can do.
[MUSIC PLAYING]
In this video, we'll focus on ggplot2.
We'll learn about its main features and functions,
and how it can help you visualize your data.
First, let's talk about some different visualization
packages you can use with R. Base R has its own package,
and there are other useful packages you can add.
They'll help you do almost anything
you want with your data, from making simple pie
charts to creating more complex visuals like interactive graphs
and maps.
General purpose packages like Plotly
let you do a wide range of visualization functions.
Others, like RGL, focus on specific solutions
like 3D visuals.
Some of the most popular include ggplot2, Plotly, Lattice, RGL,
Dygraphs, Leaflet, Highcharter, Patchwork,
gganimate, and ggridges.
Personally, ggplot2 is my favorite for data analysis.
It's both powerful and flexible.
With a little bit of code, you can create
all kinds of different plots.
You can use ggplot2 on its own or extend its powers
with other packages.
Plus, it's the most popular visualization package in R.
A lot of data analysts prefer to use
ggplot2, which is why we're using ggplot2 here.
ggplot2 was originally created by the statistician
and developer Hadley Wickham in 2005.
Wickham's inspiration for creating ggplot2
came from the 1999 book "The Grammar of Graphics,"
a scholarly study of data visualization by computer
scientist Leland Wilkinson.
The first two letters of ggplot2 actually
stand for grammar of graphics.
And in the same way the grammar of a human language
gives us rules to build any kind of sentence,
the grammar of graphics gives us rules
to build any kind of visual.
So ggplot2 has some basic building blocks
that you can use to create plots.
In other words, when you learn the basic steps
for creating a plot in ggplot2, you
can reuse these steps to create lots
of different kinds of plots.
Plus, you can add or remove layers of detail to your plot
without changing its basic structure or the underlying
data.
This makes ggplot2 really powerful.
In the next video, we'll go over these steps one by one.
ggplot2 has lots of other benefits, too.
You can create all different types of plots,
including scatter plots, bar charts, line diagrams, and tons
more.
You can change the colors, layout, and dimensions
of your plots, and add text elements,
like titles, captions, and labels.
With just a little bit of code, you
can create high-quality visuals.
Plus, ggplot2 lets you combine data manipulation
and visualization using the pipe operator.
ggplot2 also has tons of functions that
cover all your data viz needs.
To give you an idea, check out the ggplot2 cheatsheet,
which is a popular reference guide.
You can find out more about the cheatsheet in an upcoming
reading.
It's not important to learn all these functions right away,
or even know what they are.
Over time, as you get into more advanced data analysis,
you can learn about new functions as you need them.
Just know that if you need to find a function for something,
ggplot2 probably has it.
And like we discussed, even the basic functions of ggplot2
let you do so much.
We'll focus on some core concepts in ggplot2--
aesthetics, geoms, facets, labels and Annotations.
These might be new concepts to you.
And that's OK.
We'll learn about them together.
And soon we'll explore each in detail.
For now, let's get a quick preview.
In ggplot2, an aesthetic is a visual property
of an object in your plot.
For example, in a scatter plot, aesthetics
include things like the size, shape, or color
of your data points.
Think of an aesthetic as a connection, or mapping,
between a visual feature in your plot
and a variable in your data.
We'll talk more about mapping later on.
A geom refers to the geometric object
used to represent your data.
For example, you can use points to create a scatter plot,
bars to create a bar chart, or lines to create a line diagram.
You can choose a geom to fit the type of data you have.
Points show the relationship between two
quantitative variables.
Bars show one quantitative variable varies
across different categories.
Up next, we'll talk about the facet function.
Facets let you display smaller groups, or subsets,
of your data.
With facets, you can create separate plots
for all the variables in your data set.
Finally, the label and annotate functions
let you customize your plot.
You can add text, like titles, subtitles, and captions,
to communicate the purpose of your plot
or highlight important data.
That's all for now.
Coming up, we'll use code to create our own first plot
in ggplot2.
[MUSIC PLAYING]
You might remember that the Penguins data set contains
size measurements for three penguin
species that live in the Palmer Archipelago, in Antarctica.
The data set includes variables, such as body mass, flipper
length, and bill length.
Now, we'll learn how to use code to create those visuals.
We'll go through the process of creating a plot step by step.
We'll also go over some general tips
on how to write code in ggplot2, and check out some useful help
resources.
First, let's log in to RStudio Cloud.
As we go along, I encourage you to join in and try out
all the code in RStudio.
Feel free to pause the video any time you need to.
We are assuming you already have the tidyverse packages
installed.
If you don't, refer to an earlier video,
or run install.packages("tidyverse").
Let's start by loading the ggplot2 package
and the Penguins data set.
Let's check out the plot that shows the relationship
between body mass and flipper length in the three penguin
species.
The plot shows a positive relationship
between the two variables.
In other words, the larger the penguin,
the longer the flipper.
Now, let's check out the code.
The code uses functions from ggplot2
to plot the relationship between body mass and flipper length.
As a quick refresher, in R, a function
is a name, followed by a set of parentheses.
Lots of functions require special information
to do their jobs.
You write this information, called the function's argument,
inside the parentheses.
The three functions in the code are the ggplot function,
the geom point function, and the aes function.
Every ggplot2 plot starts with the ggplot function.
The argument of the ggplot function
tells R what data to use for your plot.
So the first thing to do is choose a data frame
to work with.
You can set up the code like this.
Inside the parentheses of the function, write the word data,
then an equal sign, then penguins.
This code initializes, or starts, the plot.
If we stop right now and run the code,
the result will be an empty plot.
Let's try it.
This is just the first step in creating a plot.
The next thing you might notice about this code
is the plus sign at the end of the first line.
You use the plus sign to add layers to your plot.
In ggplot2, plots are built through combinations of layers.
First, we start with our data.
Then, we add a layer to our plot by choosing a geom
to represent our data.
The function geom_point tells R to use
points to represent our data.
Keep in mind that the plus sign must
be placed at the end of each line to add a layer.
Adding a geom function is the second step in creating a plot.
As a reminder, a geom is a geometric object
used to represent your data.
Geoms include points, bars, lines, and more.
In R code, the function geom_point
tells R to use points, and create a scatter plot.
We'll learn more about geoms later on.
Next, we need to choose specific variables from our data set,
and tell R how we want these variables to look in our plot.
In ggplot2, the way a variable looks is called its aesthetic.
As a quick reminder, an aesthetic
is a visual property of an object in your plot,
like its position, color, shape, or size.
The mapping=aes part of the code tells R what aesthetics to use
for the plot.
You use the aes function to define
the mapping between your data and your plot.
Mapping means matching up a specific variable in your data
set with a specific aesthetic.
For example, you can map a variable
to the x-axis of your plot or you can map a variable
to the y-axis of your plot.
In a scatter plot, you can also map a variable
to the color, size, and shape of your data points.
We'll learn more about aesthetics soon.
Mapping aesthetics to variables is the third step
in creating a plot.
In R code, we map the variable flipper length to the x-axis,
and the variable body mass to the y-axis.
Inside the parentheses of the aes function,
we write the name of the aesthetic, then
the equals sign, then the name of the variable.
We write the code, and R takes care of the rest.
Using the penguins data, R creates a scatter plot,
puts the variable body mass on the y-axis,
and the variable flipper length on the x-axis.
Our code follows the common sequence
for creating plots in ggplot2.
Earlier, we talked about the grammar
of graphics, a set of steps for making
all kinds of different plots.
You can also think of the sequence as the basic grammar
for making plots in ggplot2.
To create a plot, follow these three steps--
start with the ggplot function, and choose
a data set to work with.
Add a geom_function to display your data.
Map the variables you want to plot
in the arguments of the aes function.
We can also turn our code into a reusable template
for creating plots in ggplot2.
To make a plot, replace the bracketed sections
in the code with a data set, a GEOM_FUNCTION,
or a group of AESTHETIC MAPPINGS.
We can make all kinds of different plots
using this template.
For example, instead of plotting the relationship
between body mass and flipper length,
we could use two different variables in the Penguins data
set.
Let's try bill length and bill depth.
We can put bill length on the x-axis and bill depth
on the y-axis.
Let's run the code and check out this new scatter plot.
As you learn to write code in R, or any other programming
language, you'll come across problems.
It happens to everyone.
I've been working in R for years,
and I still write code that has errors.
A lot of times, these will be minor errors with easy fixes.
It helps if you pay attention to the details.
For example, R is case sensitive.
If you accidentally capitalize the first letter
in a certain function, it might affect your code.
Also, make sure every opening parenthesis in your function
matches with a closing parenthesis.
Notice how this code won't run correctly, but this code does.
One common problem when working with ggplot2
is remembering to put the plus sign in the right
place when adding a layer to your plot.
Always put the plus sign at the end of a line of code.
It's easy to forget and put it at the beginning of the line.
Or you might accidentally use a pipe instead of a plus sign.
We all make mistakes.
That's part of the learning process.
The good news is we have plenty of tries to get it right.
There's also plenty of resources to help you out.
To learn more about any R function, just run the code
question mark, function_name.
For example, if you want to learn more
about the geom_point function, type in question
mark geom_point.
As a new learner, you might not understand all the concepts
in the Help page.
At the bottom of the page, you can
find specific examples of code that may show you
how to solve your problem.
If you still can't find what you're looking for,
feel free to reach out to the R community online.
As I mentioned earlier, there are
tons of great online resources for R. Chances are,
someone else has had the same problem.
That's it for now.
Up next, we'll learn more about aesthetics.
[MUSIC PLAYING]
In this video, you'll learn how to change
the aesthetics of your visuals, which
can help you present your data in a more compelling way.
With aesthetics, you can highlight key points
in your data, and communicate more clearly and effectively
with your stakeholders.
Earlier, we learned that an aesthetic is a visual property
of an object in your plot.
For example, in a scatter plot, aesthetics
include the size, shape, or color of your data points.
You can display a point in different ways
by changing its aesthetics, or the way it looks.
You can make a point small, triangular, or blue,
or a combination of these.
Let's go back to our Penguins data
set and review the code for our plot
that shows the relationship between body mass and flipper
length.
As a quick refresher, the mapping=aes part of the code
tells R what aesthetics to use for the plot.
You use the aes function to define
the mapping between your data and your plot.
Mapping means matching up a specific variable in your data
set with a specific aesthetic.
For example, you can map a variable
to the x-axis of your plot or you can map a variable
to the y-axis of your plot.
To map an aesthetic to a variable,
set the name of the aesthetic equal to the name
of the variable inside the parentheses of the aes
function.
Our code tells R to map flipper length to the x-axis and body
mass to the y-axis.
Let's log into RStudio Cloud and run the code.
As a quick reminder, let's start by loading the ggplot2 package
and the Penguins data set.
R will automatically place the appropriate label
on each axis of our scatter plot.
After you map a variable to an aesthetic,
R takes care of the rest.
You can also map data to other aesthetics,
like color, size, and shape.
Right now, our plot is in black and white.
It clearly shows the positive relationship
between the two variables.
As the values on the x-axis increase,
the values on the y-axis increase.
But it's also got some limitations.
For example, we can't tell which data points refer to each
of the three penguin species.
To solve this problem, we can map a new variable
to a new aesthetic.
Let's add a third variable to our scatter plot
by mapping it to a new aesthetic.
We'll map the variable Species to the aesthetic Color
by adding some code inside the parentheses of the aes
function.
We'll add a comma after the body mass variable,
and type color=species.
Our code tells R to assign a different color
to each species of penguin.
Let's check it out.
The Gentoos are the largest of the three penguin species.
The legend, just to the right of the plot,
shows us that the blue points refer to the Gentoo penguins.
Not only does R automatically apply different colors
to each data point, it also gives a legend to show us
the color coding.
That's what I love about R--
give it just a little bit of code,
and it'll go the extra mile to help you out.
We can also use shape to highlight the different penguin
species.
Let's map the variable species to the aesthetic shape.
To do this, we can change the code from color=species
to shape=species.
Instead of colored points, R assigns different shapes
to each species.
Now, the legend shows us a circle for the Adelie species,
a triangle for the Chinstraps, and a square for the Gentoos.
You might notice that our plot is in
black and white again because we removed the code for color.
Let's put some color back into our plot.
If we want, we can map more than one aesthetic
to the same variable.
Let's map both color and shape to species.
We'll add the code color-=species,
while keeping the code shape=species.
Now our plot shows a different color and a different shape
for each species.
We can keep going.
Let's add size as well, and map three aesthetics to species.
If we add size=species, each colored shape will also be
a different size.
Using more than one aesthetic can also
be a way to make your visuals more accessible because it
gives your viewers more than one way to make sense of your data.
We can also map species to the alpha aesthetic, which controls
the transparency of the points.
Our first plot showed the relationship
between body mass and flipper length in black and white.
Then, we mapped the variable species to the aesthetic color
to show the difference between each of the three penguin
species.
If we want to keep our graph in black and white,
we can map the alpha aesthetic to species.
This will make some points more transparent,
or see-through, than others.
This gives us another way to represent each penguin species.
Let's try it.
Alpha is a good option when you've got a dense plot
with lots of data points.
You can also set the aesthetic apart from a specific variable.
Let's say we want to change the color of all the points
to purple.
Here, we don't want to map color to a specific variable,
like species.
We just want every point in our scatter plot to be purple.
So we need to set our new piece of code outside of the aes
function, and use quotation marks for R color value.
This is because all the code inside of the aes function
tells R how to map aesthetics to variables.
For example, mapping the aesthetic color
to the variable species.
If we want to change the appearance of our overall plot
without regard to specific variables,
we write code outside of the aes function.
Let's write the code and run it.
That's all for now.
We just learned about the most common aesthetics for points--
x, y, color, shape, size, and alpha.
We also discovered how aesthetics
can change the look of our plot and highlight important data.
We've covered a lot so far and learned
a bunch of new concepts.
It takes time to process new information
and learn new skills.
So feel free to watch any of these videos
again if you need a refresher or want to practice in RStudio.
Coming up, we'll learn more about geoms.
[MUSIC PLAYING]
In this video, we'll learn how to use different geom functions
to create different types of plots, such as scatter plots
and bar charts.
There are lots of different geoms available.
You can choose a specific geom based
on how you want to represent your data and your goals
for communicating it.
This lets you tell the story of your data in different ways
and communicate effectively to different audiences.
Let's start with two visualizations.
Both visuals contain the same x variable and the same y
variable.
Both use the same data, but each plot
uses a different visual object to represent the data.
One uses points.
The other uses a smooth line.
In other words, they use different geoms.
In ggplot2, a geom is the geometrical object
used to represent your data.
Geoms include points, bars, lines, and more.
The geom_point function uses points to create scatter plots.
The geom_bar function uses bars to create bar charts, and so
on.
To change the geom in our plot, we
need to change the geom function in our code.
For example, take the plot that shows
the relationship between body mass and flipper length.
The code uses geom_point to create a scatter plot.
Let's log into RStudio Cloud and watch what
happens when we change geoms.
First, let's load the ggplot2 package and the Penguins data
set.
Now, we can put geom_smooth in place of geom_point.
We still have the same data, but now the data
has got a different visual appearance.
Instead of points, there's a smooth line that fits the data.
The geom_smooth function is useful for showing
general trends in our data.
The line clearly shows the positive relationship
between body mass and flipper length.
The larger the penguin, the longer the flipper.
We can even use two geoms in the same plot.
Let's say we want to show the relationship between the trend
line and the data points more clearly.
We can combine the code for geom_point
and the code for geom_smooth by adding a plus symbol
after geom_smooth.
Let's write the code and run it.
Let's say we want to plot a separate line
for each species of penguin.
We can add the linetype aesthetic to our code
and map it to the variable species.
Geom_smooth will draw a different line
with a different linetype for each species of penguin.
The legend shows how each linetype
matches with each species.
The plot clearly shows the trend for each species.
Finally, let's check out the geom_jitter function.
The geom_jitter function creates a scatter plot,
and then adds a small amount of random noise
to each point in the plot.
Jittering helps us deal with over-plotting,
which happens when the data points in a plot
overlap with each other.
Jittering makes the points easier to find.
I'll show you what I mean.
Let's replace geom_point with geom_jitter.
Now that we've seen what ggplot2 can do with scatter plots,
let's explore bar charts.
We'll use the Diamonds data set that you're already
familiar with.
This includes data like the quality, clarity, and cut
for over 50,000 diamonds.
This data set comes with the ggplot2 package,
so it's already loaded.
To make a bar chart, we use the geom_bar function.
Let's write some code that plots a bar chart of the variable cut
in the Diamonds data set.
Cut refers to a diamond's proportions, symmetry,
and polish.
Notice that we didn't supply a variable for the y-axis.
When you use geom_bar, R automatically
counts how many times each x value appears in the data,
and then shows the counts on the y-axis.
The default for geom_bar is to count rows.
But that's only one of the many different applications
for bar charts.
For example, the x-axis of our plot shows five categories
of cut quality--
fair, good, very good, premium, and ideal.
The y-axis shows the number of diamonds in each category.
Over 20,000 diamonds have a value
of ideal, which is the most common type of cut.
Geom_bar uses several aesthetics that you're already
familiar with, such as color, size, and alpha.
Let's add the color aesthetic to our plot,
and map it to the variable cut.
We write the code the same way as we did with scatter plots,
and add color=cut after x=cut.
Don't forget to put a comma after x=cut to add a new
aesthetic.
The color aesthetic adds color to the outline of each bar.
R also supplies a legend to show the color coding.
Let's say we want to highlight the difference between cuts
even more clearly to make our plot easier to understand.
We can use the fill aesthetic to add color
to the inside of each bar.
In our code, we put fill=cut in place of color=cut.
R automatically chooses the colors and supplies a legend.
That looks great.
I really enjoy using the fill aesthetic.
If we map fill to a new variable,
geom_bar will display what's called a stacked bar chart.
Let's map fill to clarity instead of cut.
Our plot now shows 40 different combinations
of cut and clarity.
Each combination has its own colored rectangle.
The rectangles that have the same cut value
are stacked on top of each other in each bar.
The plot organizes the complex data.
Now we know the difference in volume between cuts,
and we can figure out the difference in clarity
within each cut.
This is just the beginning of what you can do with geoms.
ggplot2 has over 30 geom functions
that you can use to make plots, and extension packages
give you even more.
The ggplot2 cheatsheet is a great resource
for learning more about geoms.
As you move forward and do more advanced data analysis,
you'll find plenty of new geoms to work with.
Until then, the geoms we just reviewed will keep you busy
and let you do a lot with your data.
Coming up, we'll learn how to use
the facet functions to display our data in different ways.
[MUSIC PLAYING]
In this video, we'll learn how to use the ggplot2 facet
functions to display our data in new ways.
Facet functions let you display smaller groups, or subsets,
of your data.
A facet is a side, or section, of an object,
like the sides of a gemstone.
Facets show different sides of your data
by placing each subset on its own plot.
Faceting can help you discover new patterns in your data
and focus on relationships between different variables.
For example, let's say you're looking at sales
data for a clothing company.
You might want to break down your data by category
to show specific trends--
children's clothing versus adult clothing,
or spring fashions versus fall fashions.
Or if you are running an employee engagement survey,
you might want to break down your data by tenure,
and compare senior employees to new employees.
ggplot2 has two functions for faceting--
facet_wrap and facet_grid.
Let's explore them both.
We'll start with facet_wrap.
To facet your plot by a single variable, use facet_wrap.
Let's say we wanted to focus on the data
for each species of penguin.
Take our plot that shows the relationship between body mass
and flipper length in each penguin species.
The facet_wrap function lets us create a separate plot
for each species.
To add a new layer to our plot, we'll
add a plus symbol to our code.
Then, inside the parentheses of the facet_wrap function,
type a tilde symbol, followed by the name of the variable.
Let's log into RStudio Cloud and check it out.
As a reminder, we'll start by loading the ggplot2 package
and the Penguins data set.
You can find the tilde symbol in the upper-left corner
of the keyboard, just below the Escape key.
There!
The separate plots show the relationship
between body mass and flipper length
within each species of penguin.
Pretty cool, right?
Facets help us focus on important parts of our data
that we might not notice in a single plot.
If your visual is too busy--
for example, if it's got too many variables or levels
within variables-- faceting can be a good option.
Let's try faceting the Diamonds data set.
Earlier, we made a bar chart that
showed the number of diamonds for each category of cut--
fair, good, very good, premium, and ideal.
We can use facet_wrap on the cut variable
to create a separate plot for each category of cut.
Let's check it out.
To facet your plot with two variables,
use the facet_grid function.
Facet_grid will split the plot into facets
vertically by the values of the first variable
and horizontally by the values of the second variable.
For example, we can take our penguins plot and use
facet_grid with the two variables sex and species.
In the parentheses following the facet_grid function,
we write sex, then the tilde symbol, then species.
Let's run the code.
There are nine separate plots, each
based on a combination of the three species of penguin,
and three categories of sex.
Facet_grid lets you quickly reorganize and display
complex data, and makes it easier
to spot relationships between different groups.
If we want, we can focus our plot
on only one of the two variables.
For example, we can tell R to remove sex
from the vertical dimension of the plot and just show species.
Let's check it out.
You can easily spot differences in the relationship
between flipper length and body mass between the three species.
In the same way, we can focus our plot
on sex instead of species.
Facets let you reorganize your data
to show specific relationships between variables,
and reveal important patterns and trends
in subsets of your data.
That's all for now.
Next up, we'll learn how to customize our plots
using labels and annotations.
[MUSIC PLAYING]
In everyday language, to annotate
means to add notes to a document or diagram
to explain or comment upon it.
In ggplot2, adding annotations to your plot
can help explain the plot's purpose
or highlight important data.
When you present your data visuals to stakeholders,
you may not have much time to meet with them.
Labels and annotations will point their attention
to key things and help them quickly understand your plot.
Let's start with the label function.
It's super useful for adding informative labels
to a plot, such as titles, subtitles, and captions.
For example, we can add a title to our plot that
shows the relationship between body mass and flipper length
for the three penguin species.
A title will clearly indicate the purpose of the plot.
Let's go over the code.
First, we add a plus sign to add a new layer to our plot.
Next, in the parentheses following the label function,
we write the word title, then an equals sign, then
the specific text we want in our title.
Let's log in to RStudio Cloud and check it out.
First, let's load the ggplot2 package and the Penguins data
set.
Remember, put the plus sign at the end of a line of code.
It's easy to forget.
R automatically displays the title at the top of the plot.
We can also add a subtitle to our plot
to highlight important information about our data.
To do this, we enter the code for a subtitle
in the same way as a title.
Remember to add a comma after the title argument
before you enter your subtitle.
R automatically displays the subtitle just below the title.
We can add a caption to our plot in the same way.
Captions let us show the source of our data.
The Palmer Penguins data was collected from 2007 to 2009
by Dr. Kristen Gorman, a member of the Palmer Station Long-Term
Ecological Research Program.
Let's cite Dr. Gorman in our caption.
R automatically displays the caption
at the bottom right of our plot.
Titles, subtitles, and captions are
labels that we put outside of the grid of our plot
to indicate important information.
If we want to put text inside the grid
to call out specific data points,
we can use the annotate function.
For example, let's say we want to highlight the data
from the Gentoo penguins.
We can use the annotate function to add some text
next to the data points that refer to the Gentoos.
This text will clearly communicate what the plot shows
and reinforce an important part of our data.
OK.
Let's check out the code.
In the parentheses of the annotate function,
we've got information on the type of label,
the specific location of the label,
and the context of the label.
In this case, we want to write a text label.
We also want to place it near the Gentoo data points.
Let's put it at the following coordinates--
x-axis equals 220 millimeters, and y-axis equals 3,500 grams.
Finally, let's write our text--
The Gentoos are the largest.
Let's run it.
Check it out.
R automatically places the text label
on the correct coordinates in our plot.
We can customize our annotation even more.
Let's say we want to change the color of our text.
Well, we can add color equals, followed
by the name of the color.
Let's try purple.
We can also change the font style and size of our text.
Use fontface and size to write the code.
Let's bold our text and make it a little larger.
We can even change the angle of our text.
For example, we can tilt our text at a 25-degree angle
to line it up with our data points.
Let's try it.
That looks great.
By this point, our code is getting pretty long.
If you want to use less code, you
can store your plot as a variable in R.
As a quick reminder, to create a variable in R,
you type the variable name, then a less-than sign,
followed by a dash.
Let's try it with the variable name p.
Now, instead of writing all the code again,
we can just call p and add an annotation to it-- like this.
You get the same result.
Some people like to see every step of their code listed out
in front of them.
So there are advantages to doing it the longer way.
It's really up to you.
I just want you to know that you've got options.
Hopefully, this gives you an idea of some of the ways
you can customize your plots.
Labels and annotations can be really helpful
when it comes to highlighting important parts of your data
and communicating key points.
That's all for now.
Coming up, you'll learn some useful ways
to save your plots in ggplot2.
[MUSIC PLAYING]
In this video, we'll learn how to save our plots.
Saving your work so that you can access it later
is so important.
It lets you continue to work on it
yourself or share it with others.
Being able to reproduce and share your work
is a key part of your future analyst role
because it lets you collaborate with teammates.
They can double-check your work and offer feedback
to help you improve it.
So let's save our plots.
To do this, you'll use the Export option
in the Plots tab of RStudio or the ggsave function provided
by the ggplot2 package.
First, we'll save our plots using the Export option.
Then, we'll use the ggsave function.
Let's log into RStudio Cloud.
We'll load the ggplot2 package and the Penguins data set.
To start, let's write some code and create
the plot that shows the relationship between body mass
and flipper length in three penguin species.
Let's use the Export option in the Plots tab to save our plot.
We can save it as an image file or a PDF file.
Let's try saving it as an image.
There are six different options for image format,
including PNG and JPEG.
Let's try PNG.
Next, we name our file and click Save.
Now, if we click on the Files tab,
we'll find our file in the list.
Let's open it up.
Looks great!
That covers the Export option for saving a plot.
Now, let's check out the ggsave function.
ggsave is a useful function for saving a plot.
It defaults to saving the last plot
that you displayed and uses the size of the current graphics
device.
Let's try saving our plot as a PNG file using ggsave.
ggsave will automatically save the plot
that shows the relationship between body mass and flipper
length because this is the last plot that we displayed.
We have to give the file a name and say what kind of file
we want to save it as.
Let's write the code.
Within the parentheses of the function,
we start off with a quotation mark,
followed by the name of the file.
Let's name it Three Penguin Species.
We put a period after the file name, then
the type of file we want, then a closing quotation mark.
Let's run it.
Now, if we click on the Files tab,
we'll find our new file in the list.
Let's open it up.
Again, looks great!
That covers the basics of saving plots.
After all your hard work creating plots in ggplot2,
you definitely want to remember to save them so you can
access and share them later on.
And that's the end of our work on data visualization.
You're off to a great start visualizing data with ggplot2.
Plus, the concepts we've covered are a great base
for learning even more about data viz in R
as you move forward.
TONY: Congratulations on finishing this video
from the Google Data Analytics Certificate.
Access the full experience, including job search help,
and start to earn the Official Certificate
by clicking the icon or the link in the description.
Watch the next video in the course by clicking here.
And subscribe to our channel for more from upcoming
Browse More Related Video
R programming in one hour - a crash course for beginners
How to import data and install packages. R programming for beginners.
HISTOGRAM CHART IN MATPLOTLIB - Learn HISTOGRAM PLOT IN MATPLOTLIB | Python Matplotlib Tutorial
Customize Interactive Report using Actions Menu (Sort, Control Break, Highlight) - Part 12
What is Exploratory Data Analysis (EDA)? | Techcanvass
ETC1000 Topic 2b
5.0 / 5 (0 votes)