import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
import pandas as pd
from sklearn import datasets, linear_model
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import PolynomialFeatures
sns.set_theme(font_scale=2,palette='colorblind')We will use the same data
test_samples = 20
diabetes_X, diabetes_y = datasets.load_diabetes(return_X_y=True)
X_train,X_test, y_train,y_test = train_test_split(diabetes_X, diabetes_y ,
test_size=test_samples,random_state=0)And retrain a model like Tuesday
regr_db = linear_model.LinearRegression()
regr_db.fit(X_train, y_train)and again score it
y_pred = regr_db.predict(X_test)
regr_db.score(X_test, y_test)0.5195333332288746This is better than Tuesday’s score.
This time it is better just by using more data to train
train_samples,_ = X_train.shape
total_samples, _ = diabetes_X.shape
train_samples, train_samples(422, 422)Above we used an integer to the test_size parameter so we set the number of samples instead of the percentage of the data. We used 20 for testing, which is only 4.52% of the data. Which is a lot less than the 25% used before, so wwith more training data we can get a better model.
Polynomial Regression¶
Polynomial regression is still a linear problem. Linear regression solves for the for a dimensional problem.
Quadratic regression solves for
This is still a linear problem, because we can create a new matrix that has the polynomial values of each feature and solve for more values.
So if our original features are our new will have 3 types of features original(), squared() and interactions ().
We use a transformer object, which works similarly to the estimators, but does not use targets.
First, we instantiate.
poly = PolynomialFeatures()Then we can fit transform on the training data and tranform on the test data:
X2_train = poly.fit_transform(X_train)
X2_test = poly.transform(X_test)This changes the shape a lot, now we have a lot more features
X2_train.shape(422, 66)Solution to Exercise 1
We can break down this total into different types, the original ones (), those squared, (), every pair () and a constant (so we do not need the intercept separately).
Now we can fit a model and score ite
regr_db2 = linear_model.LinearRegression()
regr_db2.fit(X2_train, y_train)
regr_db2.score(X2_test, y_test)0.549017964337913And we get even better performance than adding data alone did above.
regr_db2.coef_array([ 1.98152760e-09, 2.26718871e+01, -2.84321941e+02, 4.77972133e+02,
3.55751429e+02, -1.05551594e+03, 7.64836015e+02, 1.92998441e+02,
1.29510126e+02, 9.95077095e+02, 6.97824322e+01, 1.35199820e+03,
3.45185945e+03, 1.47333609e+02, -6.34976797e+01, -3.44685551e+03,
-1.39445637e+03, 5.68431155e+03, 5.24944560e+03, 1.93320706e+03,
1.44078438e+03, -1.71687299e+00, 1.07459375e+03, 1.71895547e+03,
1.01185087e+04, -8.18260464e+03, -2.98856060e+03, -2.75793035e+03,
-2.63196845e+03, 4.87268727e+02, 3.31912403e+02, 2.57451083e+03,
-5.38594465e+03, 3.88190888e+03, 2.88432660e+03, 4.95180080e+02,
2.98317779e+03, -5.23610008e+02, -6.08039517e+01, 8.85704665e+03,
-5.89220694e+03, -3.89995076e+03, -1.62903703e+03, -2.50355665e+03,
-2.07339059e+03, 9.56513316e+04, -1.35146919e+05, -8.45241707e+04,
-4.18163194e+04, -8.84048804e+04, -6.63039722e+03, 5.00187584e+04,
5.43127248e+04, 2.05139512e+04, 6.78912952e+04, 4.41920280e+03,
2.10042065e+04, 2.61797171e+04, 3.74988567e+04, 5.16395623e+03,
1.12705277e+04, 1.13915240e+04, 4.08084006e+03, 2.05240548e+04,
2.20132713e+03, 1.91176349e+03])Loading a Pretrained model!¶
In class, we went over the installation and login from the sharing a model tutorial in preparation for A5.
from huggingface_hub import hf_hub_download
import skops.io as sio
hf_hub_download(repo_id="CSC310-fall25/example_decision_tree", filename="model.pkl",local_dir='.')
dt_loaded = sio.load('model.pkl')---------------------------------------------------------------------------
ModuleNotFoundError Traceback (most recent call last)
Cell In[12], line 1
----> 1 from huggingface_hub import hf_hub_download
2 import skops.io as sio
3 hf_hub_download(repo_id="CSC310-fall25/example_decision_tree", filename="model.pkl",local_dir='.')
ModuleNotFoundError: No module named 'huggingface_hub'dt_loaded.predict(np.asarray([[5,6], [1,3]]))