5. Assignment 5: Constructing Datasets and Using Databases#
date : 2023-10-10
Eligible skills: (links to checklists)
5.2. Constructing Datasets#
Your goal is to programmatically construct a ready to analyze dataset from multiple sources.
Your dataset must combine at least 2 source tables.
At least one source table must come from a database or from web scraping (not `pd_read_html``).
You should use at least two different joins (this means either use 3 data sources or combine two datasets in two different ways)
The notebook you submit should include:
a motivating question for why you’re combining the datasets in an introduction section
code and description of how you built and prepared each dataset. For each step, describe what you’re about to do, the code with output, interpretation that leads into the next step.
exploratory data analysis that shows why you built the data and confirms that is prepared enough to analyze. (this can be one simple statistic or plot as long as it is something that requires the merge you used)
For construct only, this can be very minimal EDA.
5.3. Additional achievements#
To earn additional achievements, you must do more cleaning and/or exploratory data analysis.
5.3.1. Prepare level 2#
To earn level 2 for prepare, you must manipulate either a component table or the final dataset. See your Achievement checklist for which aspects of prepare you still need, but sample manipulations include:
transform into a tidy format
add a new column by computing from others
handle NaN values by dropping or filling
drop a column, row, or duplicates in another way
5.3.2. Summarize and Visualize level 2#
To earn level 2 for summarize and/or visualize, include additional analyses after building the datasets.
Connect your EDA to questions, and focus on the aspects of these achiements you have not successfully demonstrated.
5.3.3. Python Level 2#
Use pythonic naming conventions throughout, AND:
Use pythonic loops and a list or dictionary OR
use a list or dictionary comprehension
this can be in your cleanup or your EDA
Thinking Ahead
Compare the level 2 skill definitions to level 3, how could you extend and adapt what you’ve done to meet level 3?
Thinking Ahead
You could also demonstrate understanding of how merges work by converting a dataset that is provided as a single table with redundant information into a number of smaller tables in a database.