MAT5335 Project 7 due Friday 10 March 1. Read chapter 2 from Weapons of Math Destruction. You may find chapters 15, 16, 17 and 20 from 'Statistics with R' helpful. For the next part of the homework, you will need the file sunfish.csv. 2. In class you constructed three linear regressions (reg1, reg2, reg3) to predict the length of the three different species, based on the length of the spinybase. a. Which of the three linear models is the best predictor? Explain. b. In class we also calculated reg.todo. Explain what this is. 3a. Make a plot of length vs spinybase from the sunfish file. Note this includes all three species. Include the line of best fit. b. If the length of the fish is 90 cm, what is a good estimate for the length of spinybase? c. R has a function, cor(), which returns the correlation between two variables. Find the correlation between spinybase and length. d. A residual is the difference between the observed value (from the table) and the predicted value (from the linear model). These are the horizontal values. The function, resid(), returns a list of residuals. Find the summary of the residuals of your linear model. e. Make a plot of residuals against the explanatory variable. Include the line with zero slope and zero intercept. f. Find the summary of your linear regression, and confirm that the 'residuals' and 'R-squared' are the same as in part c. and d. 4. R has a dataset called attitude, with 30 observations and 7 variables. Type, ?attitude for a description. a. Make a linear regression model to predict the overall rating based on the handling of employee complaints. Plot both the scatterplot and the linear model. b. Make multiple linear regression model to predict the overall rating based on the handling of employee complaints, raises based on performance and opportunity to learn. c. From part b, can you predict the overall rating if the handling of complaints was 75%, raises based on performance was 80%, and opportunity to learn was 60%. 5. See the qqplot handout from class. Let x be a simulation of n trials for a binomial random variable which counts the number of heads in size=50 coin flips. Generate n random numbers from the standard normal distribution, call this vector z. Run qqplot(z,x) and include the line with the appropriate slope and intercept. Hint: The mean of the binomial is size*p; the sd is sqrt(size*p*(1-p)). Do this for n=100, 1000, 10000. What can you conclude about the binomial distribution as n increases? 6. Generate n uniform random numbers between 0 and 1, call this vector u. Generate a new vector v=-log(u)/rate. Next generate n exponential random numbers with rate, call this vector w. Use qqplot to make three different plots. Which one is linear? What does this tell you? Do this for n=100, 1000, 10000.