Due: 2025-10-14
Submission Instructions¶
Create one notebook for each of the following tasks
Export as myst markdown (by installing jupytext which should include frontend features )
Upload (or push) to a branch called
assignment3
Open a PR
Related Notes¶
Our evaluation will focus on your ability to apply merges and clean the data.
Job Satisfaction Dataset¶
You can download the following files from this page (at the top the download icon).
file | description |
---|---|
stack_overflow_24.csv | 2000 random rows from the 2024 stack overflow survey |
stack_overflow_25.csv | 2000 random rows from the 2025 stack overflow survey |
stack_overflow_schema_24.csv | schema for the 2024 stack overflow survey |
stack_overflow_schema_25.csv | schema for the 2025 stack overflow survey |
Your task is to create a dataset that someone can be used to compare how developer job satisfaction changed from 2024 to 2025.
Note that you were given only a random sample of 2000 rows from each year, but your code should work, for the full datasets (49000+ rows) so you should do everything programmatically.
Keep all of the columns that could be helpful in answering the question and combine the two csvs into one in a way that lets someone see trends year over year.
In this, you should handle missing values appropriately in service of that general goal.
UN Votes¶
use the data from the tidy tuesday on UN Votes to create a dataset and then answer (using stats and/or plots):
Which issues are most divisive in the UN? (meaning they create split votes the most)
Today I learned¶
Write a notebook that can, on any given day, tell you how many of the most recent 25 posts to r/TodayILearned are from wikipedia
Tips:
'https://www.reddit.com/r/todayilearned.json'
gives you 25 recent poststhere is a
url
column, once that is unpacked.we used json data from an api in class before