You will submit your homework as an R Markdown (.Rmd) file by committing to your git repository and pushing to GitLab. We will knit this file to produce the .html output file (you do not need to submit the .html, but you should make sure that it can be produced successfully).
We will review both your .Rmd file and the .html file. To receive full credit:
You must submit your .Rmd file on time. It must be named exactly as specified, and it must knit without errors to produce a .html file.
The .html file should read as a well written report, with all results and graphs supported by text explaining what they are and, when appropriate, what conclusions can be drawn. Your report should not contain any extraneous material, such as leftovers from a template.
The R code in your .Rmd file must be clear, readable, and follow the coding standards.
The text in your .Rmd file must be readable and use R markdown properly, as shown in the class template file.
Create a new folder called HW5 in your repository. Use exactly this spelling with upper case letters. You can do this in the RStudio IDE, with R’s dir.create function, or using a shell.
In this folder, create a new Rmarkdown file called hw5.Rmd. Again use exactly this spelling. RStudio will give you a template, or you can use the one available here. Commit your new file to your repository. (If you are using git in a shell you will need to use git add before git commit).
In this file present your answers to the following problems. Your presentation should follow the pattern and guidelines in the class template file.
This problem refers to the data provided in the nycflights13 package. Airport codes for the three New York City airports can be computed from the origin variable in the flights table packages using unique(). Use filter() and select() on the airports table to create a table containing the airport codes and the airport names for these three airports and show the result as a nicely formatted table.
Continuing with the nycflights13 data, using the flights table compute average and median departure delays for each of the three New York City airports, omitting missing values. Present the results as a nicely formatted table and comment on the results.
Use density plots to compare the distributions of the air time (as recorded in the air_time variable) for flights originating from each of the three New York City airports. What differences do you see?
There are several options for displaying the densities:
color or fill and alpha to distinguish the distributions;Consider all three approaches and comment on their advantages and disadvantages.
The default bandwidth used by geom_density() and geom_density_ridges() may be too narrow; a larger bandwidth of, say, 50 may be better. The bw argument can be used to specify a different bandwidth. These examples specify a narrower bandwidth for the barley data:
library(ggplot2)
data(barley, package = "lattice")
ggplot(barley, aes(x = yield)) +
geom_density(bw = 1) +
facet_wrap(~site, ncol = 1)
library(ggridges)
ggplot(barley, aes(x = yield, y = site)) +
geom_density_ridges(aes(height = after_stat(density)),
stat = "density", bw = 1)
In Problem 4 of Assignment 4 you created a strip plot showing highway fuel economy values for each of the years from 2000 through 2024. Compare your result to three other options:
Comment on the advantages and disadvanteges of each approach in this case.
You can create an HTML file in RStudio using the Knit tab on the editor window. You can also use the R command
rmarkdown::render("hw5.Rmd")
with your working directory set to HW5.
Commit your changes to your hw5.Rmd file to your local git repository. You do not heed to commit your HTML file.
Submit your work by pushing your local repository changes to your remote repository on the UI GitLab site. After doing this, it is a good idea to check your repository on the UI GitLab site to make sure everything has been submitted successfully