Skip to article frontmatterSkip to article content

Assignment 3: Preparing for Analysis

Due: 2025-10-14

Submission Instructions

  1. Create one notebook for each of the following tasks

  2. Export as myst markdown (by installing jupytext which should include frontend features )

  3. Upload (or push) to a branch called assignment3

  4. Open a PR

Our evaluation will focus on your ability to apply merges and clean the data.

Job Satisfaction Dataset

You can download the following files from this page (at the top the download icon).

file

description

stack_overflow_24.csv

2000 random rows from the 2024 stack overflow survey

stack_overflow_25.csv

2000 random rows from the 2025 stack overflow survey

stack_overflow_schema_24.csv

schema for the 2024 stack overflow survey

stack_overflow_schema_25.csv

schema for the 2025 stack overflow survey

Your task is to create a dataset that someone can be used to compare how developer job satisfaction changed from 2024 to 2025.

Note that you were given only a random sample of 2000 rows from each year, but your code should work, for the full datasets (49000+ rows) so you should do everything programmatically.

Keep all of the columns that could be helpful in answering the question and combine the two csvs into one in a way that lets someone see trends year over year.

In this, you should handle missing values appropriately in service of that general goal.

UN Votes

use the data from the tidy tuesday on UN Votes to create a dataset and then answer (using stats and/or plots):

Which issues are most divisive in the UN? (meaning they create split votes the most)

Today I learned

Write a notebook that can, on any given day, tell you how many of the most recent 25 posts to r/TodayILearned are from wikipedia

Tips: