9. Assignment 9: Linear Regression#
9.1. Quick Facts#
Due: 2023-04-04
9.2. Assessment#
Eligible skills: (links to checklists)
9.4. Instructions#
Find a dataset suitable for regression. We recommend a dataset from the UCI repository.
9.4.1. Linear Regression Basics#
TLDR: Fit a linear regression model, measure the fit with two metrics, and make a plot that helps visualize the result.
Include a basic description of the data(what the features are)
Write your own description of what the prediction task it, why regression is appropriate.
Fit a linear model on all relevant features with 75% training data.
Test it on 25% held out test data and measure the fit with two metrics and one plot
Inspect the model to answer:
Does this model make sense?
What to the coefficients tell you?
What to the residuals tell you?
Repeat the split, train, and test steps 5 times.
Is the performance consistent enough you trust it?
Interpret the model and its performance in terms of the application. Some questions you might want to answer in order to do this include:
do you think this model is good enough to use for real?
is this a model you would trust?
do you think that a more complex model should be used?
do you think that maybe this task cannot be done with machine learning?
Try fitting the model only on one feature. Justify your choice of feature based on the results above. Plot this result.
9.4.2. Part 2: Exploring Evaluation#
Do an experiment to compare test set size vs performance:
Train a regression model on 10%, 30%, … , 90% of the data. Save the results of both test and train performance for each size training data in a DataFrame with columns [‘train_pct’,‘n_train_samples’,‘n_test_samples’,‘train_r2’,‘test_r2’]
Plot the performance vs training percentage in a line graph.
Interpret these results. How does training vs test size impact the model?
Thinking Ahead
Try these experiments with a different type of regression.