Assignment 3: Exploratory Data Analysis¶
Due: 2020-09-27
Objective & Evaluation¶
This assignment is an opportunity to earn level 1 or 2 achievements in summarize
, visualize
, or access
. You can earn level 2 in python
.
Accept the assignment on GitHub Classroom. The template will convert notebooks that are added to markdown, which makes reading on GitHub for easier grading. It will sync between .ipynb and .md style notebooks stored in your repository.
This week I encourage you to try working with git, but if you’re not comfortable with that you can work via upload again.
Exploratory Data Analysis¶
This week your goal is to do a small exploratory data analysis for two datasets of your choice. One dataset must include at least two continuous valued variables and at least one categorical variable(d1). One dataset must include at least two categorical variables and at least one continuous valued variable(d2).
Use a separate notebook for each dataset, name them dataset_01.ipynb
and dataset_02.ipynb
.
For each dataset:
Include a markdown header with a title for your analysis
Load the data to a notebook as a
DataFrame
from url.Explore the dataset in a notebook enough to describe its structure
shape
columns
variable types
Write a short description of what the data contains and what it could be used for
Complete an exploratory analysis with statistics and plots. Your analysis should include markdown cells describing the results you see, not only what you did. Your analysis should be structured to follow the steps below, for the corresponding dataset.
For d1:
Display all of the summary statistics for a subset of 5 of your choice or all variables if there are fewer than 5 numerical values
Display all of summary statistics grouped by a categorical variable
For two continuous variables make a scatter plot and color the points by a categorical variable
Pose one question for this dataset that can be answered with summary statistics, compute a statistic and plot that help answer that exploratory question.
For d2:
Display two individual summary statistics for one variable
Group the data by two categorical variables and display a table of one summary statistic
Use a seaborn plotting function with the
col
parameter or aFacetGrid
to make a plot that shows something informative about this data, using both categorical variables and at least one numerical value. Describe what this tells you about the data.Produce one additional plot of a different plot type that shows something about this data.
In total, in each of your notebooks you will have:
a loaded dataset
a basic description
at least two summary statistic calculations
at least two plots