MAT5335 Project 5 due Friday 22 February Write up in RMarkdown and knit to either pdf, html, or word. 1. Pick two out of the three data sets. From the ggplot2 R package, use qplot for one data set and ggplot for the other. Please add some characteristics or layers. There are many examples in the book ggplot2. * Top baby names from ssa.gov * Individual contribution from fec.gov * Illinois Public University Salaries from ibhe.org To access the salaries file, you want to do download the file and open it in R. http://salarysearch.ibhe.org/search.aspx Select Fiscal Year 2016 [All Institutions] (This is the academic year July 2015 - June 2016) Click on 'Search', then 'Download Data' Save the data as an html file (not xls) in the directory that RStudio can access. # RStudio install.packages("htmltab") library(htmltab) # open the file in a web browser and copy the link. mydoc="file://localhost/Users/aalvarado2/2016_Salaries.html" # xp is the first table (in this case the only table) xp="//table[1]" # df is a data frame with 17771 rows and 6 columns; may take a couple minutes. df=htmltab(doc=mydoc,which=xp) # the commands below removes the dollar symbols and commas as well as converts the characters to numeric. df[,5]=as.numeric(sub(",", "",sub("\\$","",df[,5]))) df[,6]=as.numeric(sub(",", "",sub("\\$","",df[,6]))) # rename so that the data frame is easier to work with. names(df)=c("Institution","Name","Position","Title","BaseSalary","AddComp") tapply(df$BaseSalary,df$Position,mean) df[which(df$Name=="First Last"),] # if a person does not come up, they may have a middle initial. grep("LastName",df$Name) # now save it as a csv file, so that we can import in SQL. write.csv(df, "Salaries2016.csv",row.names=F) 2. Import Salaries2016.csv into SQLite. You may need to replace the column name BaseSalary with BaseSalary*1.0 or cast(BaseSalary as numeric). You may find it helpful to turn header on. Count how many records there are, i.e., how many Illinois public university administrators and faculty. Select those records whose Base Salary are in the top ten. Select those records whose Base Salary >500000. Some folks also have additional compensation. Print out those with additional compensation > 100000. Count how many employees are in each different position, restricted to EIU. Restricted to EIU, select assistant professors and base salary, and print in descending order. Restricted to EIU, then another institution, then all institutions, find the average base salary for each position, print salary descending order. Print out the number of employees in each position at each institution; group by position. Print out the employees whose name begins with Ale or Ric. You may want to put a limit. Optional: Is there a way to count all the records in your query?