--- title: "Introduction to ggplot2" output: html_document: default pdf_document: default --- # What is ggplot2? `ggplot2` is an R package for producing statistical graphics. It provides beautiful, hassle-free plots, that take care of fiddly details like drawing legends. `ggplot2` is also designed to work in a layered fashion, starting with a layer showing the raw data then adding layers of annotations and statistical summaries. The first thing you need to do is install the package from CRAN then load it: ```{r eval=FALSE} install.packages("ggplot2") library(ggplot2) ``` There is also a [reference manual](https://cran.r-project.org/web/packages/ggplot2/ggplot2.pdf) for ggplot2, and another interesting package called [ggplot2movies](https://cran.r-project.org/web/packages/ggplot2movies/index.html). We will begin with chapter two out of Wickham's book. You can follow along with these [slides](http://ggplot2.org/resources/2007-vanderbilt.pdf). `ggplot2` contains a data set called `diamonds`. **qplot** is similar to **plot**. There is also an optional *data* argument. Try it: ```{r} head(diamonds) ``` Appendix A page 184 offers a comparison between **qplot** and **plot**. For a discussion on **method** see chapter 2.5.1 page 14. ```{r} qplot(carat, price, data = diamonds, colour=clarity) ``` Notice a legend is automatically included in your plot. ```{r eval=FALSE} qplot(diamonds$cut, diamonds$carat) qplot(carat, price, data = diamonds) qplot(carat, price, data = diamonds, geom=c("point", "smooth"), method="lm") qplot(carat, price, data = dsmall, colour = color) qplot(carat, price, data = dsmall, shape = cut) ``` `geom` is short for geometric objects and describes the type of object that is used to display the data. See pages 13 and 14 of Wickham. You can assign colour, size, and shape to points on your plot. Appendix B page 195 discusses colour and shape. ```{r eval=FALSE} qplot(carat, data = diamonds, geom="histogram") qplot(carat, data = diamonds, geom="histogram", binwidth = 1) qplot(carat, data = diamonds, geom="histogram", binwidth = 0.1) qplot(carat, data = diamonds, geom="histogram", binwidth = 0.01) ``` See chapter 3 page 39. Plots can be created in two ways: all at once with **qplot()**, or piece-by-piece with **ggplot()** and layer functions. **ggplot()** has two arguments: data and aesthetic mapping. Once you have a plot, you can add, +, a layer to your plot. See page chapter 4 page 42. ```{r eval=FALSE} d <- ggplot(diamonds, aes(x=carat, y=price)) d d + geom_point() d + geom_point(aes(colour = carat)) dsamp <- diamonds[sample(nrow(diamonds), 1000), ] d=ggplot(diamonds, aes(carat, price)) + geom_point(aes(colour = clarity)) d + scale_colour_brewer() ggplot(diamonds) + geom_histogram(aes(x=price)) ``` For a description of geoms, see chapter 5 page 67. In particular, geom_point is a scatterplot with closed points. In particular, what happens if you replace colour=carat with colour=price? For scale_colour_brewer, see chapter 6 page 94. ## Separation of statistics and geometric elements ```{r eval=FALSE} p <- ggplot(diamonds, aes(x=price)) p + geom_histogram() p + stat_bin(geom="area") p + stat_bin(geom="point") p + stat_bin(geom="line") p + geom_histogram(aes(fill = clarity)) p + geom_histogram(aes(y = ..density..)) ``` stat_bin and geom_histogram is discussed in chapter 4.7 page 61. ## Setting vs mapping ```{r eval=FALSE} p <- ggplot(diamonds, aes(x=carat,y=price)) ``` ### What will this do? ```{r eval=FALSE} p + geom_point(aes(colour = "green")) p + geom_point(colour = "green") p + geom_point(colour = colour) ``` Along with histograms, boxplots provide good information. See chapters 2.5.2 page 16 for an introduction, as well as chapter 4 and 5. To render an output, first save your rmd file, then in your console type: ```{r eval=FALSE} render("ggplot.rmd", output_file = "ggplot.html") ```