5. Assignment 5: Constructing Datasets#

accept the assignment

Due: 2020-10-12

Eligible skills: (links to checklists)

  • first chance construct 1 and 2

  • (last assignment*) access 1 and 2

  • (last assignment*) python 1 and 2

  • (last assignment*) prepare 1 and 2

  • summarize 1 and 2

  • visualize 1 and 2

these skills will be eligible in future portfolio checks, but not future assignments

5.1. Constructing Datasets#

Hint

there is a section of datasets that are provided in multiple parts

Your goal is to programmatically construct three (3) ready to analyze datasets from multiple sources.

  • Each dataset must combine at least 2 source tables(minimum 4 total source tables).

  • At least one source table must come from a database or from web scraping.

  • You should use at least two different joins(types of merges, or concat).

The notebook you submit should include:

  • a motivating question for why you’re combining the datasets in an introduction section

  • code and description of how you built and prepared each dataset. For each step, describe what you’re about to do, the code with output, interpretation that leads into the next step.

  • exploratory data analysis that shows why you built the data and confirms that is prepared enough to analyze.

  • For one pair of tables, show how a different merge could answer a different question.

For construct only you can include very minimal EDA.

5.2. Additional achievements#

To earn additional achievements, you must do more cleaning and/or exploratory data analysis.

5.2.1. Prepare level 2#

To earn level 2 for prepare, you must, either on component table(s) or the final dataset apply and explain transformations that meet whatever components are unchecked on your prepare level 2 issue.

5.2.2. Summarize and Visualize level 2#

To earn level 2 for summarize and/or visualize, include additional analyses after building the datasets.

Check your issues for what components we have not seen from you.

5.2.3. Python Level 2#

Use pythonic naming conventions throughout, AND:

  • Use pythonic loops and a list or dictionary OR

  • use a list or dictionary comprehension

Thinking Ahead

Compare the level 2 skill definitions to level 3, how could you extend and adapt what you’ve done to meet level 3?