Data Science Achievements#

In this course there are 5 learning outcomes that I expect you to achieve by the end of the semester. To get there, you’ll focus on 15 smaller achievements that will be the basis of your grade. This section will describe how the topics covered, the learning outcomes, and the achievements are covered over time. In the next section, you’ll see how these achievements turn into grades.

Learning Outcomes#

By the end of the semester

  1. (process) Describe the process of data science, define each phase, and identify standard tools

  2. (data) Access and combine data in multiple formats for analysis

  3. (exploratory) Perform exploratory data analyses including descriptive statistics and visualization

  4. (modeling) Select models for data by applying and evaluating mutiple models to a single dataset

  5. (communicate) Communicate solutions to problems with data in common industry formats

We will build your skill in the process and communicate outcomes over the whole semester. The middle three skills will correspond roughly to the content taught for each of the first three portfolio checks.

Schedule#

The course will meet in . Every class will include participatory live coding (instructor types code while explaining, students follow along) instruction and small exercises for you to progress toward level 1 achievements of the new skills introduced in class that day.

Each Assignment will have a deadline posted on the assignment page, typically the same day each week. Portfolio deadlines will be announced at least 2 weeks in advance.

topics skills
week
1 [admin, python review] process
2 Loading data, Python review [access, prepare, summarize]
3 Exploratory Data Analysis [summarize, visualize]
4 Data Cleaning [prepare, summarize, visualize]
5 Databases, Merging DataFrames [access, construct, summarize]
6 Modeling, classification performance metrics, cross validation [evaluate]
7 Naive Bayes, decision trees [classification, evaluate]
8 Regression [regression, evaluate]
9 Clustering [clustering, evaluate]
10 SVM, parameter tuning [optimize, tools]
11 KNN, Model comparison [compare, tools]
12 Text Analysis [unstructured]
13 Images Analysis [unstructured, tools]
14 Deep Learning [tools, compare]

Achievement Definitions#

The table below describes how your work will be assessed to earn each achievement. The keyword for each skill is a short name that will be used to refer to skills throughout the course materials; the full description of the skill is in this table.

skill Level 1 Level 2 Level 3
keyword
python pythonic code writing python code that mostly runs, occasional pep8 adherance python code that reliably runs, frequent pep8 adherance reliable, efficient, pythonic code that consistently adheres to pep8
process describe data science as a process Identify basic components of data science Describe and define each stage of the data science process Compare different ways that data science can facilitate decision making
access access data in multiple formats load data from at least one format; identify the most common data formats Load data for processing from the most common formats; Compare and constrast most common formats access data from both common and uncommon formats and identify best practices for formats in different contexts
construct construct datasets from multiple sources identify what should happen to merge datasets or when they can be merged apply basic merges merge data that is not automatically aligned
summarize Summarize and describe data Describe the shape and structure of a dataset in basic terms compute summary statndard statistics of a whole dataset and grouped data Compute and interpret various summary statistics of subsets of data
visualize Visualize data identify plot types, generate basic plots from pandas generate multiple plot types with complete labeling with pandas and seaborn generate complex plots with pandas and plotting libraries and customize with matplotlib or additional parameters
prepare prepare data for analysis identify if data is or is not ready for analysis, potential problems with data apply data reshaping, cleaning, and filtering as directed apply data reshaping, cleaning, and filtering manipulations reliably and correctly by assessing data as received
evaluate Evaluate model performance Explain and compute basic performance metrics for different data science tasks Apply and interpret basic model evaluation metrics to a held out test set Evaluate a model with multiple metrics and cross validation
classification Apply classification identify and describe what classification is, apply pre-fit classification models fit, apply, and interpret preselected classification model to a dataset fit and apply classification models and select appropriate classification models for different contexts
regression Apply Regression identify what data that can be used for regression looks like fit and interpret linear regression models fit and explain regrularized or nonlinear regression
clustering Clustering describe what clustering is apply basic clustering apply multiple clustering techniques, and interpret results
optimize Optimize model parameters Identify when model parameters need to be optimized Optimize basic model parameters such as model order Select optimal parameters based of mutiple quanttiateve criteria and automate parameter tuning
compare compare models Qualitatively compare model classes Compare model classes in specific terms and fit models in terms of traditional model performance metrics Evaluate tradeoffs between different model comparison types
representation Choose representations and transform data Identify options for representing text and categorical data in many contexts Apply at least one representation to transform unstructured or inappropriately data for model fitting or summarizing apply transformations in different contexts OR compare and contrast multiple representations a single type of data in terms of model performance
workflow use industry standard data science tools and workflows to solve data science problems Solve well strucutred fully specified problems with a single tool pipeline Solve well-strucutred, open-ended problems, apply common structure to learn new features of standard tools Independently scope and solve realistic data science problems OR independently learn releated tools and describe strengths and weakensses of common tools

Assignments and Skills#

Using the keywords from the table above, this table shows which assignments you will be able to demonstrate which skills and the total number of assignments that assess each skill. This is the number of opportunities you have to earn Level 2 and still preserve 2 chances to earn Level 3 for each skill.

A1 A2 A3 A4 A5 A6 A7 A8 A9 A10 A11 A12 A13 # Assignments
keyword
python 1 1 0 1 1 0 0 0 0 0 0 0 0 4
process 1 0 0 0 0 1 1 1 1 1 1 0 0 7
access 0 1 1 1 1 0 0 0 0 0 0 0 0 4
construct 0 0 0 0 1 0 1 1 0 0 0 0 0 3
summarize 0 0 1 1 1 1 1 1 1 1 1 1 1 11
visualize 0 0 1 1 0 1 1 1 1 1 1 1 1 10
prepare 0 0 0 1 1 0 0 0 0 0 0 0 0 2
evaluate 0 0 0 0 0 1 1 1 0 1 1 0 0 5
classification 0 0 0 0 0 0 1 0 0 1 0 0 0 2
regression 0 0 0 0 0 0 0 1 0 0 1 0 0 2
clustering 0 0 0 0 0 0 0 0 1 0 1 0 0 2
optimize 0 0 0 0 0 0 0 0 0 1 1 0 0 2
compare 0 0 0 0 0 0 0 0 0 0 1 0 1 2
representation 0 0 0 0 0 0 0 0 0 0 0 1 1 2
workflow 0 0 0 0 0 0 0 0 0 1 1 1 1 4

Warning

process achievements are accumulated a little slower; details will follow.

Extensions#

Warning

this rolling deadline is new for Fall 2024 and aims to let students distribute work in a better way for yourself. After A2 feedback is posted, I will give more explanation about how to do this, in concrete terms.

There are no extensions applicable to assignment 1, but starting after assignment 2’s feedback you can start working on level 3 achievements. You can add on and extend each analysis, once you have earned level 2 for a skill to earn level 3. You can also add new analyses that instead combine different sets of skills.

Extensions will all be graded by Dr. Brown (and most assignments will be graded by the TA Surbhi). You will make separate PRs for your attempts at level 3 from level 2.

While assignments have fixed grades, you can submit extensions as you complete them. I recommend planning to work on them consistently throughout the semester.

Warning

In previous semesters, there were checklists, but they are removed because they distracted students from learning the important things