What is ggplot2?

ggplot2 is an R package for producing statistical graphics. It provides beautiful, hassle-free plots, that take care of fiddly details like drawing legends. ggplot2 is also designed to work in a layered fashion, starting with a layer showing the raw data then adding layers of annotations and statistical summaries. The first thing you need to do is install the package from CRAN then load it:

install.packages("ggplot2")
library(ggplot2)

There is also a reference manual for ggplot2, and another interesting package called ggplot2movies.

We will begin with chapter two out of Wickham’s book. You can follow along with these slides. ggplot2 contains a data set called diamonds. qplot is similar to plot. There is also an optional data argument. Try it:

head(diamonds)
## # A tibble: 6 × 10
##   carat       cut color clarity depth table price     x     y     z
##   <dbl>     <ord> <ord>   <ord> <dbl> <dbl> <int> <dbl> <dbl> <dbl>
## 1  0.23     Ideal     E     SI2  61.5    55   326  3.95  3.98  2.43
## 2  0.21   Premium     E     SI1  59.8    61   326  3.89  3.84  2.31
## 3  0.23      Good     E     VS1  56.9    65   327  4.05  4.07  2.31
## 4  0.29   Premium     I     VS2  62.4    58   334  4.20  4.23  2.63
## 5  0.31      Good     J     SI2  63.3    58   335  4.34  4.35  2.75
## 6  0.24 Very Good     J    VVS2  62.8    57   336  3.94  3.96  2.48

Appendix A page 184 offers a comparison between qplot and plot. For a discussion on method see chapter 2.5.1 page 14.

qplot(carat, price, data = diamonds, colour=clarity)

Notice a legend is automatically included in your plot.

qplot(diamonds$cut, diamonds$carat)
qplot(carat, price, data = diamonds)
qplot(carat, price, data = diamonds, geom=c("point", "smooth"), method="lm")
qplot(carat, price, data = dsmall, colour = color)
qplot(carat, price, data = dsmall, shape = cut)

geom is short for geometric objects and describes the type of object that is used to display the data. See pages 13 and 14 of Wickham. You can assign colour, size, and shape to points on your plot. Appendix B page 195 discusses colour and shape.

qplot(carat, data = diamonds, geom="histogram")
qplot(carat, data = diamonds, geom="histogram", binwidth = 1)
qplot(carat, data = diamonds, geom="histogram", binwidth = 0.1)
qplot(carat, data = diamonds, geom="histogram", binwidth = 0.01)

See chapter 3 page 39. Plots can be created in two ways: all at once with qplot(), or piece-by-piece with ggplot() and layer functions. ggplot() has two arguments: data and aesthetic mapping. Once you have a plot, you can add, +, a layer to your plot. See page chapter 4 page 42.

d <- ggplot(diamonds, aes(x=carat, y=price))
d
d + geom_point()
d + geom_point(aes(colour = carat))
dsamp <- diamonds[sample(nrow(diamonds), 1000), ]
d=ggplot(diamonds, aes(carat, price)) + geom_point(aes(colour = clarity))
d  + scale_colour_brewer()
ggplot(diamonds) + geom_histogram(aes(x=price))

For a description of geoms, see chapter 5 page 67. In particular, geom_point is a scatterplot with closed points. In particular, what happens if you replace colour=carat with colour=price? For scale_colour_brewer, see chapter 6 page 94.

Separation of statistics and geometric elements

p <- ggplot(diamonds, aes(x=price))
p + geom_histogram()
p + stat_bin(geom="area")
p + stat_bin(geom="point")
p + stat_bin(geom="line")
p + geom_histogram(aes(fill = clarity))
p + geom_histogram(aes(y = ..density..))

stat_bin and geom_histogram is discussed in chapter 4.7 page 61.

Setting vs mapping

p <- ggplot(diamonds, aes(x=carat,y=price))

What will this do?

p + geom_point(aes(colour = "green"))
p + geom_point(colour = "green")
p + geom_point(colour = colour)

Along with histograms, boxplots provide good information. See chapters 2.5.2 page 16 for an introduction, as well as chapter 4 and 5.

To render an output, first save your rmd file, then in your console type:

render("ggplot.rmd", output_file = "ggplot.html")