{ "cells": [ { "cell_type": "markdown", "id": "f7d2d806", "metadata": {}, "source": [ "## Learning Objective, Schedule, and Rubric" ] }, { "cell_type": "code", "execution_count": 1, "id": "fb42db8a", "metadata": { "tags": [ "remove-input" ] }, "outputs": [], "source": [ "\n", "import yaml as yml\n", "import pandas as pd\n", "import os\n", "from IPython.display import display, Markdown\n", "pd.set_option('display.max_colwidth', None)\n", "\n", "\n", "def yml_df(file):\n", " with open(file, 'r') as f:\n", " file_unparsed = f.read()\n", "\n", " file_dict = yml.safe_load(file_unparsed)\n", " return pd.DataFrame(file_dict)\n", "\n", "outcomes_df = yml_df('../_data/learning_outcomes.yml')\n", "# outcomes_df.set_index('keyword',inplace=True)\n", "schedule_df = yml_df('../_data/schedule.yml')\n", "schedule_df.set_index('week', inplace=True)\n", "# schedule_df = pd.merge(schedule_df,outcomes_df,right_on='keyword', left_on= 'clo')\n", "rubric_df = yml_df('../_data/rubric.yml')\n", "rubric_df.set_index('keyword', inplace=True)" ] }, { "cell_type": "markdown", "id": "12257fb3", "metadata": {}, "source": [ "### Learning Outcomes\n", "\n", "There are five learning outcomes for this course." ] }, { "cell_type": "code", "execution_count": 2, "id": "a80627be", "metadata": { "tags": [ "remove-input" ] }, "outputs": [ { "data": { "text/markdown": [ "1. (process) Describe the process of data science, define each phase, and identify standard tools \n", "2. (data) Access and combine data in multiple formats for analysis \n", "3. (exploratory) Perform exploratory data analyses including descriptive statistics and visualization \n", "4. (modeling) Select models for data by applying and evaluating mutiple models to a single dataset \n", "5. (communicate) Communicate solutions to problems with data in common industry formats" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "outcome_list = [ str(i+1) + '. ' + ' (' + k + ') ' + o for i,(o,k) in enumerate(zip(outcomes_df['outcome'], outcomes_df['keyword']))]\n", "\n", "display(Markdown(' \\n'.join(outcome_list)))\n", "#outcomes_df[['keyword','outcome']]" ] }, { "cell_type": "markdown", "id": "8bf2055f", "metadata": {}, "source": [ "We will build your skill in the `process` and `communicate` outcomes over the whole semester. The middle three skills will correspond roughly to the content taught for each of the first three portfolio checks. \n", "\n", "(schedule)=\n", "### Schedule\n", "\n", "````{margin}\n", "```{note}\n", "On the [BrightSpace calendar](https://brightspace.uri.edu/d2l/le/calendar/101136) page you can get a feed link to add to the calendar of your choice by clicking on the subscribe (star) button on the top right of the page. Class is for 1 hour there because of Brightspace/zoom integration limitations, but that calendar includes the zoom link.\n", "```\n", "````\n", "\n", "The course will meet MWF 1-1:50pm on Zoom. Every class will include participatory live coding (instructor types, students follow along)) instruction and small exercises for you to progress toward level 1 achievements of the new skills introduced in class that day.\n", "\n", "Programming assignments that will be due each week Tuesday by 11:59pm.\n", "_until week 5 they were due Sundays_" ] }, { "cell_type": "code", "execution_count": 3, "id": "c67db653", "metadata": { "tags": [ "remove-input" ] }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
topicsskills
week
1[admin, python review]process
2Loading data, Python review[access, prepare, summarize]
3Exploratory Data Analysis[summarize, visualize]
4Data Cleaning[prepare, summarize, visualize]
5Databases, Merging DataFrames[access, construct, summarize]
6Modeling, Naive Bayes, classification performance metrics[classification, evaluate]
7decision trees, cross validation[classification, evaluate]
8Regression[regression, evaluate]
9Clustering[clustering, evaluate]
10SVM, parameter tuning[optimize, tools]
11KNN, Model comparison[compare, tools]
12Text Analysis[unstructured]
13Topic Modeling[unstructured, tools]
14Deep Learning[tools, compare]
\n", "
" ], "text/plain": [ " topics \\\n", "week \n", "1 [admin, python review] \n", "2 Loading data, Python review \n", "3 Exploratory Data Analysis \n", "4 Data Cleaning \n", "5 Databases, Merging DataFrames \n", "6 Modeling, Naive Bayes, classification performance metrics \n", "7 decision trees, cross validation \n", "8 Regression \n", "9 Clustering \n", "10 SVM, parameter tuning \n", "11 KNN, Model comparison \n", "12 Text Analysis \n", "13 Topic Modeling \n", "14 Deep Learning \n", "\n", " skills \n", "week \n", "1 process \n", "2 [access, prepare, summarize] \n", "3 [summarize, visualize] \n", "4 [prepare, summarize, visualize] \n", "5 [access, construct, summarize] \n", "6 [classification, evaluate] \n", "7 [classification, evaluate] \n", "8 [regression, evaluate] \n", "9 [clustering, evaluate] \n", "10 [optimize, tools] \n", "11 [compare, tools] \n", "12 [unstructured] \n", "13 [unstructured, tools] \n", "14 [tools, compare] " ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "\n", "schedule_df.replace({None:'TBD'})\n", "schedule_df[['topics','skills']]" ] }, { "cell_type": "markdown", "id": "c180ac7b", "metadata": {}, "source": [ "(skill-rubric)=\n", "### Skill Rubric\n", "\n", "\n", "The skill rubric describes how your participation, assignments, and portfolios will be assessed to earn each achievement. The keyword for each skill is a short name that will be used to refer to skills throughout the course materials; the full description of the skill is in this table." ] }, { "cell_type": "code", "execution_count": 4, "id": "5114bd06", "metadata": { "tags": [ "remove-input" ] }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
skillLevel 1Level 2Level 3
keyword
pythonpythonic code writingpython code that mostly runs, occasional pep8 adherancepython code that reliably runs, frequent pep8 adherancereliable, efficient, pythonic code that consistently adheres to pep8
processdescribe data science as a processIdentify basic components of data scienceDescribe and define each stage of the data science processCompare different ways that data science can facilitate decision making
accessaccess data in multiple formatsload data from at least one format; identify the most common data formatsLoad data for processing from the most common formats; Compare and constrast most common formatsaccess data from both common and uncommon formats and identify best practices for formats in different contexts
constructconstruct datasets from multiple sourcesidentify what should happen to merge datasets or when they can be mergedapply basic mergesmerge data that is not automatically aligned
summarizeSummarize and describe dataDescribe the shape and structure of a dataset in basic termscompute summary statndard statistics of a whole dataset and grouped dataCompute and interpret various summary statistics of subsets of data
visualizeVisualize dataidentify plot types, generate basic plots from pandasgenerate multiple plot types with complete labeling with pandas and seaborngenerate complex plots with pandas and plotting libraries and customize with matplotlib or additional parameters
prepareprepare data for analysisidentify if data is or is not ready for analysis, potential problems with dataapply data reshaping, cleaning, and filtering as directedapply data reshaping, cleaning, and filtering manipulations reliably and correctly by assessing data as received
classificationApply classificationidentify and describe what classification is, apply pre-fit classification modelsfit preselected classification model to a datasetfit and apply classification models and select appropriate classification models for different contexts
regressionApply Regressionidentify what data that can be used for regression looks likecan fit linear regression modelscan fit and explain regrularized or nonlinear regression
clusteringClusteringdescribe what clustering isapply basic clusteringapply multiple clustering techniques, and interpret results
evaluateEvaluate model performanceExplain basic performance metrics for different data science tasksApply basic model evaluation metrics to a held out test setEvaluate a model with multiple metrics and cross validation
optimizeOptimize model parametersIdentify when model parameters need to be optimizedManually optimize basic model parameters such as model orderSelect optimal parameters based of mutiple quanttiateve criteria and automate parameter tuning
comparecompare modelsQualitatively compare model classesCompare model classes in specific terms and fit models in terms of traditional model performance metricsEvaluate tradeoffs between different model comparison types
unstructuredmodel unstructured dataIdentify options for representing text data and use them once data is tranformedApply at least one representation to transform unstructured data for model fitting or summarizingapply multiple representations and compare and contrast them for different end results
workflowuse industry standard data science tools and workflows to solve data science problemsSolve well strucutred problems with a single tool pipelineSolve semi-strucutred, completely specified problems, apply common structure to learn new features of standard toolsScope, choose an appropriate tool pipeline and solve data science problems, describe strengths and weakensses of common tools
\n", "
" ], "text/plain": [ " skill \\\n", "keyword \n", "python pythonic code writing \n", "process describe data science as a process \n", "access access data in multiple formats \n", "construct construct datasets from multiple sources \n", "summarize Summarize and describe data \n", "visualize Visualize data \n", "prepare prepare data for analysis \n", "classification Apply classification \n", "regression Apply Regression \n", "clustering Clustering \n", "evaluate Evaluate model performance \n", "optimize Optimize model parameters \n", "compare compare models \n", "unstructured model unstructured data \n", "workflow use industry standard data science tools and workflows to solve data science problems \n", "\n", " Level 1 \\\n", "keyword \n", "python python code that mostly runs, occasional pep8 adherance \n", "process Identify basic components of data science \n", "access load data from at least one format; identify the most common data formats \n", "construct identify what should happen to merge datasets or when they can be merged \n", "summarize Describe the shape and structure of a dataset in basic terms \n", "visualize identify plot types, generate basic plots from pandas \n", "prepare identify if data is or is not ready for analysis, potential problems with data \n", "classification identify and describe what classification is, apply pre-fit classification models \n", "regression identify what data that can be used for regression looks like \n", "clustering describe what clustering is \n", "evaluate Explain basic performance metrics for different data science tasks \n", "optimize Identify when model parameters need to be optimized \n", "compare Qualitatively compare model classes \n", "unstructured Identify options for representing text data and use them once data is tranformed \n", "workflow Solve well strucutred problems with a single tool pipeline \n", "\n", " Level 2 \\\n", "keyword \n", "python python code that reliably runs, frequent pep8 adherance \n", "process Describe and define each stage of the data science process \n", "access Load data for processing from the most common formats; Compare and constrast most common formats \n", "construct apply basic merges \n", "summarize compute summary statndard statistics of a whole dataset and grouped data \n", "visualize generate multiple plot types with complete labeling with pandas and seaborn \n", "prepare apply data reshaping, cleaning, and filtering as directed \n", "classification fit preselected classification model to a dataset \n", "regression can fit linear regression models \n", "clustering apply basic clustering \n", "evaluate Apply basic model evaluation metrics to a held out test set \n", "optimize Manually optimize basic model parameters such as model order \n", "compare Compare model classes in specific terms and fit models in terms of traditional model performance metrics \n", "unstructured Apply at least one representation to transform unstructured data for model fitting or summarizing \n", "workflow Solve semi-strucutred, completely specified problems, apply common structure to learn new features of standard tools \n", "\n", " Level 3 \n", "keyword \n", "python reliable, efficient, pythonic code that consistently adheres to pep8 \n", "process Compare different ways that data science can facilitate decision making \n", "access access data from both common and uncommon formats and identify best practices for formats in different contexts \n", "construct merge data that is not automatically aligned \n", "summarize Compute and interpret various summary statistics of subsets of data \n", "visualize generate complex plots with pandas and plotting libraries and customize with matplotlib or additional parameters \n", "prepare apply data reshaping, cleaning, and filtering manipulations reliably and correctly by assessing data as received \n", "classification fit and apply classification models and select appropriate classification models for different contexts \n", "regression can fit and explain regrularized or nonlinear regression \n", "clustering apply multiple clustering techniques, and interpret results \n", "evaluate Evaluate a model with multiple metrics and cross validation \n", "optimize Select optimal parameters based of mutiple quanttiateve criteria and automate parameter tuning \n", "compare Evaluate tradeoffs between different model comparison types \n", "unstructured apply multiple representations and compare and contrast them for different end results \n", "workflow Scope, choose an appropriate tool pipeline and solve data science problems, describe strengths and weakensses of common tools " ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "\n", "rubric_df.replace({None:'TBD'},inplace=True)\n", "rubric_df.rename(columns={'mastery':'Level 3',\n", " 'compentent':'Level 2',\n", " 'aware':'Level 1'}, inplace=True)\n", "\n", "rubric_df[['skill','Level 1','Level 2','Level 3']]" ] }, { "cell_type": "code", "execution_count": 5, "id": "d8b480b2", "metadata": { "tags": [ "remove-input" ] }, "outputs": [], "source": [ "\n", "assignment_dummies = pd.get_dummies(rubric_df['assignments'].apply(pd.Series).stack()).sum(level=0)\n", "assignment_dummies['# Assignments'] = assignment_dummies.sum(axis=1)\n", "col_rename = {float(i):'A' + str(i) for i in range(1,14)}\n", "assignment_dummies.rename(columns =col_rename,inplace=True)\n", "\n", "portfolio_dummies = pd.get_dummies(rubric_df['portfolios'].apply(pd.Series).stack()).sum(level=0)\n", "col_rename = {float(i):'P' + str(i) for i in range(1,5)}\n", "portfolio_dummies.rename(columns =col_rename,inplace=True)\n", "\n", "\n", "rubric_df = pd.concat([rubric_df,assignment_dummies, portfolio_dummies],axis=1)\n", "\n", "assignment_cols = ['A'+ str(i) for i in range(1,14)] + ['# Assignments']\n", "\n", "portfolio_cols = [ 'Level 3'] + ['P' + str(i) for i in range(1,5)]" ] }, { "cell_type": "markdown", "id": "435f72ae", "metadata": {}, "source": [ "(assignment-skills)=\n", "### Assignments and Skills\n", "\n", "Using the keywords from the table above, this table shows which assignments you will be able to demonstrate which skills and the total number of assignments that assess each skill. This is the number of opportunities you have to earn Level 2 and still preserve 2 chances to earn Level 3 for each skill." ] }, { "cell_type": "code", "execution_count": 6, "id": "290fe58b", "metadata": { "tags": [ "remove-input" ] }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
A1A2A3A4A5A6A7A8A9A10A11A12A13# Assignments
keyword
python11110000000004
process11000000000002
access01110000000003
construct00001100000002
summarize001111111111111
visualize001101111111110
prepare00011000000002
classification00000110010003
regression00000001001002
clustering00000000101002
evaluate00000000011002
optimize00000000011002
compare00000000001012
unstructured00000000000112
workflow00000000011114
\n", "
" ], "text/plain": [ " A1 A2 A3 A4 A5 A6 A7 A8 A9 A10 A11 A12 A13 \\\n", "keyword \n", "python 1 1 1 1 0 0 0 0 0 0 0 0 0 \n", "process 1 1 0 0 0 0 0 0 0 0 0 0 0 \n", "access 0 1 1 1 0 0 0 0 0 0 0 0 0 \n", "construct 0 0 0 0 1 1 0 0 0 0 0 0 0 \n", "summarize 0 0 1 1 1 1 1 1 1 1 1 1 1 \n", "visualize 0 0 1 1 0 1 1 1 1 1 1 1 1 \n", "prepare 0 0 0 1 1 0 0 0 0 0 0 0 0 \n", "classification 0 0 0 0 0 1 1 0 0 1 0 0 0 \n", "regression 0 0 0 0 0 0 0 1 0 0 1 0 0 \n", "clustering 0 0 0 0 0 0 0 0 1 0 1 0 0 \n", "evaluate 0 0 0 0 0 0 0 0 0 1 1 0 0 \n", "optimize 0 0 0 0 0 0 0 0 0 1 1 0 0 \n", "compare 0 0 0 0 0 0 0 0 0 0 1 0 1 \n", "unstructured 0 0 0 0 0 0 0 0 0 0 0 1 1 \n", "workflow 0 0 0 0 0 0 0 0 0 1 1 1 1 \n", "\n", " # Assignments \n", "keyword \n", "python 4 \n", "process 2 \n", "access 3 \n", "construct 2 \n", "summarize 11 \n", "visualize 10 \n", "prepare 2 \n", "classification 3 \n", "regression 2 \n", "clustering 2 \n", "evaluate 2 \n", "optimize 2 \n", "compare 2 \n", "unstructured 2 \n", "workflow 4 " ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "rubric_df[assignment_cols]" ] }, { "cell_type": "markdown", "id": "82024c27", "metadata": {}, "source": [ "(portfolioskills)=\n", "### Portfolios and Skills\n", "\n", "The objective of your portfolio submissions is to earn Level 3 achievements. The following table shows what Level 3 looks like for each skill and identifies which portfolio submissions you can earn that Level 3 in that skill." ] }, { "cell_type": "code", "execution_count": 7, "id": "2218366e", "metadata": { "tags": [ "remove-input" ] }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Level 3P1P2P3P4
keyword
pythonreliable, efficient, pythonic code that consistently adheres to pep81100
processCompare different ways that data science can facilitate decision making0110
accessaccess data from both common and uncommon formats and identify best practices for formats in different contexts1100
constructmerge data that is not automatically aligned1100
summarizeCompute and interpret various summary statistics of subsets of data1100
visualizegenerate complex plots with pandas and plotting libraries and customize with matplotlib or additional parameters1100
prepareapply data reshaping, cleaning, and filtering manipulations reliably and correctly by assessing data as received1100
classificationfit and apply classification models and select appropriate classification models for different contexts0110
regressioncan fit and explain regrularized or nonlinear regression0110
clusteringapply multiple clustering techniques, and interpret results0110
evaluateEvaluate a model with multiple metrics and cross validation0110
optimizeSelect optimal parameters based of mutiple quanttiateve criteria and automate parameter tuning0011
compareEvaluate tradeoffs between different model comparison types0011
unstructuredapply multiple representations and compare and contrast them for different end results0011
workflowScope, choose an appropriate tool pipeline and solve data science problems, describe strengths and weakensses of common tools0011
\n", "
" ], "text/plain": [ " Level 3 \\\n", "keyword \n", "python reliable, efficient, pythonic code that consistently adheres to pep8 \n", "process Compare different ways that data science can facilitate decision making \n", "access access data from both common and uncommon formats and identify best practices for formats in different contexts \n", "construct merge data that is not automatically aligned \n", "summarize Compute and interpret various summary statistics of subsets of data \n", "visualize generate complex plots with pandas and plotting libraries and customize with matplotlib or additional parameters \n", "prepare apply data reshaping, cleaning, and filtering manipulations reliably and correctly by assessing data as received \n", "classification fit and apply classification models and select appropriate classification models for different contexts \n", "regression can fit and explain regrularized or nonlinear regression \n", "clustering apply multiple clustering techniques, and interpret results \n", "evaluate Evaluate a model with multiple metrics and cross validation \n", "optimize Select optimal parameters based of mutiple quanttiateve criteria and automate parameter tuning \n", "compare Evaluate tradeoffs between different model comparison types \n", "unstructured apply multiple representations and compare and contrast them for different end results \n", "workflow Scope, choose an appropriate tool pipeline and solve data science problems, describe strengths and weakensses of common tools \n", "\n", " P1 P2 P3 P4 \n", "keyword \n", "python 1 1 0 0 \n", "process 0 1 1 0 \n", "access 1 1 0 0 \n", "construct 1 1 0 0 \n", "summarize 1 1 0 0 \n", "visualize 1 1 0 0 \n", "prepare 1 1 0 0 \n", "classification 0 1 1 0 \n", "regression 0 1 1 0 \n", "clustering 0 1 1 0 \n", "evaluate 0 1 1 0 \n", "optimize 0 0 1 1 \n", "compare 0 0 1 1 \n", "unstructured 0 0 1 1 \n", "workflow 0 0 1 1 " ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "rubric_df[portfolio_cols]" ] } ], "metadata": { "jupytext": { "text_representation": { "extension": ".md", "format_name": "myst", "format_version": 0.12, "jupytext_version": "1.6.0" } }, "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.10" }, "source_map": [ 12, 16, 41, 49, 56, 76, 82, 90, 103, 122, 129, 133, 141 ] }, "nbformat": 4, "nbformat_minor": 5 }