4. Assignment 4:#

Due: 2020-10-05 11:59pm

Table 4.1 practice basic pandas by reshaping and organizing data#
task	skill (max level)
drop nan rows from a dataset	prepare (2)
display parts of a dataframe	summarize (1)
impute a value to fill missing values	prepare (2)
filter data based on extreme values or other outliers	prepare (2)
convert a variable to one hot encoding	prepare (2)
add a new column computed from one or more other columns	prepare (2)
transform a dataset to tidy format	prepare (2)
compute overall and individual summary statistics	summarize (2)
use split-apply-combine paradigm	summarize (2)
generate at least two types of plots	visualize (2)
interpret statistics and plots	summarize, visualize
use list comprehensions or loops and pythonic conventions	python (2)
load data from at least two types	access (2)
compare data storage formats	access (2)
match EDA techniques to questions appropriately	process (1)

For this assignment, prepare the provided datasets. Your preparation needs to include the following steps and narrative description of how you’re making decisions about your data cleaning.

The notebooks in the template have instructions for how to work with each dataset.

To earn prepare level 2, clean the data and do just enough exploratory data analysis to show that the data is usable (eg 1 stat and/or plot). For prepare level 2:

travel_times AND one of:
cs_degrees, airlines, and coffee

To earn summarize and visualize level 2, add extra exploratory data analyses meeting the criteria above.

To earn python level 2, make sure that you use a function or lambda and comprehension or pythonic loops somewhere. The CS degrees data will have that, but it’s harder. The coffee data will be the easiest one to get all python level 2.

For access level 2 you must clean the airline data (to get data in a second file type).

Hint

renaming thing is often done well with a dictionary comprehension or lambda.

Programming for Data Science at URI Fall 2021

Assignment 4:

4. Assignment 4:#