{ "cells": [ { "cell_type": "markdown", "id": "ea93e76e", "metadata": {}, "source": [ "# Interpretting Regression" ] }, { "cell_type": "code", "execution_count": 1, "id": "fa5a64e9", "metadata": {}, "outputs": [], "source": [ "import matplotlib.pyplot as plt\n", "import seaborn as sns\n", "import numpy as np\n", "import pandas as pd\n", "import itertools as itr\n", "from sklearn import datasets, linear_model\n", "from sklearn.metrics import mean_squared_error, r2_score\n", "from sklearn.model_selection import cross_val_score\n", "from sklearn.model_selection import train_test_split\n", "from sklearn.preprocessing import PolynomialFeatures\n", "sns.set_theme(font_scale=2,palette='colorblind')" ] }, { "cell_type": "markdown", "id": "e1acdaf9", "metadata": {}, "source": [ "we'll return to the same data we used on Monday, first." ] }, { "cell_type": "code", "execution_count": 2, "id": "33d485e6", "metadata": {}, "outputs": [], "source": [ "tips = sns.load_dataset(\"tips\").dropna()" ] }, { "cell_type": "code", "execution_count": 3, "id": "6a421364", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(244, 7)" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "tips.shape" ] }, { "cell_type": "code", "execution_count": 4, "id": "a5c2fabb", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
total_billtipsexsmokerdaytimesize
016.991.01FemaleNoSunDinner2
110.341.66MaleNoSunDinner3
221.013.50MaleNoSunDinner3
323.683.31MaleNoSunDinner2
424.593.61FemaleNoSunDinner4
\n", "
" ], "text/plain": [ " total_bill tip sex smoker day time size\n", "0 16.99 1.01 Female No Sun Dinner 2\n", "1 10.34 1.66 Male No Sun Dinner 3\n", "2 21.01 3.50 Male No Sun Dinner 3\n", "3 23.68 3.31 Male No Sun Dinner 2\n", "4 24.59 3.61 Female No Sun Dinner 4" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "tips.head()" ] }, { "cell_type": "markdown", "id": "c47792ec", "metadata": {}, "source": [ "Again, we'll prepare the data." ] }, { "cell_type": "code", "execution_count": 5, "id": "92ed182d", "metadata": {}, "outputs": [], "source": [ "# sklearn requires 2D object of features even for 1 feature\n", "tips_X = tips['total_bill'].values\n", "tips_X = tips_X[:,np.newaxis] # add an axis\n", "tips_y = tips['tip']\n", "\n", "tips_X_train,tips_X_test, tips_y_train, tips_y_test = train_test_split(\n", " tips_X,\n", " tips_y,\n", " train_size=.8,\n", " random_state=0)" ] }, { "cell_type": "markdown", "id": "8df86619", "metadata": {}, "source": [ "Next, we'll fit the model" ] }, { "cell_type": "code", "execution_count": 6, "id": "a3ace32e", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0.5906895098589039" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "regr_tips = linear_model.LinearRegression()\n", "regr_tips.fit(tips_X_train,tips_y_train)\n", "regr_tips.score(tips_X_test,tips_y_test)" ] }, { "cell_type": "markdown", "id": "8466dbb0", "metadata": {}, "source": [ "This doesn't perform all that well, but let's investigate it further.\n", "We'll start by looking at the residuals" ] }, { "cell_type": "code", "execution_count": 7, "id": "cddf9627", "metadata": {}, "outputs": [], "source": [ "tips_y_pred = regr_tips.predict(tips_X_test)" ] }, { "cell_type": "markdown", "id": "2ecbebe4", "metadata": {}, "source": [ "## Examining Residuals\n", "\n", "The error, the difference between the predictions and the truth is called the\n", "residual." ] }, { "cell_type": "code", "execution_count": 8, "id": "02a259a9", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "64 0.092195\n", "63 -0.960007\n", "55 -0.593783\n", "111 0.730731\n", "225 0.104349\n", "92 0.585451\n", "76 -0.315843\n", "181 -2.361866\n", "188 -0.713567\n", "180 0.704514\n", "73 -1.523002\n", "107 -0.819782\n", "150 -0.108729\n", "198 0.287638\n", "224 0.748317\n", "44 -1.627113\n", "145 0.337270\n", "110 -0.615508\n", "243 -0.152549\n", "189 -0.734142\n", "210 1.939957\n", "104 -1.025283\n", "138 0.578198\n", "8 0.525219\n", "199 0.337033\n", "203 0.116940\n", "220 0.006281\n", "125 -0.285225\n", "5 -1.232034\n", "22 0.325922\n", "74 0.255195\n", "124 -0.282726\n", "12 0.952023\n", "168 0.444221\n", "45 -0.200007\n", "158 -0.284589\n", "37 -0.401728\n", "136 0.029040\n", "212 -3.290531\n", "223 -0.423739\n", "222 -0.060454\n", "118 0.432432\n", "231 -0.451826\n", "155 -1.220382\n", "209 0.034393\n", "18 -0.827854\n", "108 -0.964850\n", "15 -0.801360\n", "71 -0.318168\n", "Name: tip, dtype: float64" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "tips_y_pred - tips_y_test" ] }, { "cell_type": "markdown", "id": "9b5db8b4", "metadata": {}, "source": [ "To examine these, we can plot them on the data:" ] }, { "cell_type": "code", "execution_count": 9, "id": "333ca912", "metadata": {}, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAZQAAAEFCAYAAADE/xFGAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjUuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8qNh9FAAAACXBIWXMAAAsTAAALEwEAmpwYAAAj+klEQVR4nO3de5Qc5Xnn8W/PRdKgC0YykqUZQGRNXgLBIRPkSxzA1+MsK3w5Y8usFWNr7fggBwmntaCYcdaDneYatSNz8/pgJOPIOZbTB7PYGzvYJlwcn8NlDPbC6j2QA0YzEpIWCaNL68JM7x9VNVNdU9XT3VN9rd/nnDmleuvS79S0+um33vd9KlUoFBAREZmpjkZXQERE2oMCioiIxEIBRUREYqGAIiIisVBAERGRWHQ1ugIxmw2sAHYDYw2ui4hIq+gElgKPA8eqPUm7BZQVwCONroSISIu6EHi02oPbLaDsBjhw4DDj460zv2bRonm88sqhRlejaen6RNO1KU3XJ5r/2nR0pDjllLngfoZWq90CyhjA+HihpQIK0HL1rTddn2i6NqXp+kQLuTYz6ipQp7yIiMRCAUVERGKhgCIiIrFQQBERkVgooIiItLBcbjv9/eeyZMnJ9PefSy63vWF1abdRXiIiiZHLbSedXkc+nwdgZGQn6fQ6AAYGVtW9PmqhiIi0qEzmOvL5PGlgFEgD+XyeTOa6htRHAUVEpEWNjo4AMAQsc5f+8npTQBERaVG9vX0AzHfX5wfK600BRUSkRQ0Ofpmenp6isp6eHgYHv9yQ+qhTXkSkRU10vK/97ERZNntrQzrkQS0UEZGWFgwejQomoIAiIiIxUUAREZFYKKCIiEgsFFBERCQWCigiIhILBRQREYmFAoqIiMRCAUVERGKhgCIiIrFQQBERkVgooIiISCwUUEREJBYKKCIiEgsFFBERiYUCioiIxEIBRUREYqGAIiIisVBAERGRWCigiIhILBRQREQkFgooIiISCwUUERGJhQKKiIjEQgFFRERi0VXOTsaYdwEPlnnOM6y1L5Vxzq3Ap0rsYq21Z5f5miIi0mBlBRTgZeDbJba/FfgD4D+AnRXW4RfA8yHluys8j4iINFBZAcVauwP4dNR2Y8yz7j/vttYWKqzDXdbarRUeIyIiTWbGfSjGmHfgtE7GgK0zPZ+IiLSmODrl/5u7/LG1dlcM5xMRkRZUbh9KKGPMScDH3dVvVXmadxtj3gLMA/YAjwIPWGvHZ1I3ERGprxkFFOBjwHxgL/DDKs9xeUjZs8aYy6y1v6m6ZiIiUlczveXl3e66x1p7osJjnwLWA+fgtE6WASuBp92ynxpjemdYPxERqZNUoVDpoCyHMebNwHPu6jnW2v8bR4WMMbOAh4C3A7dba6+s4PDlwAtx1ENEpGWkUpP/rvIz3XUm8GK1B8/klpfXOvllXMEEwFp73BhzA3AfcEk153jllUOMj8/ootbVqafOZ9++g42uRtPS9Ymma1NaUq7Pqb5/l/v7+q9NR0eKRYvmzbgeVd3yMsZ0Mtn3UW1nfCk73KVueYmItIhq+1A+gPNhfwj4XnzVmbDIXR6qwblFRKQGqg0on3GX2621tfjQX+UuH6/BuUVEpAYq7kMxxrwRuNRdLXm7y+0L+Qhwr7X2i77y84E+4F+stWO+8i7gKpzRXwBfq7R+IiLSGNV0yn8S6AZ2WGv/fZp9lwLGXfotB+4F9htjhnHmsSwCzsMZPjwOXGOt/UkV9RMRkQaoJqCscZd3z+B1nwY242QpPge4ECgAI8AWnOHCT87g/CIiUmcVBxRr7Vsq2PfThGQptta+AHyh0tcWEZHmpSc2iohILBRQREQkFgooIiISCwUUERGJhQKKiIjEQgFFRERioYAiIiKxUEAREZFYKKCIiEgsFFBERCQWCigiIhILBRQREYmFAoqIiMRCAUVERGKhgCIiIrFQQBERkVgooIiISCwUUEREJBYKKCIiEgsFFBERiYUCioiIxEIBRUREYqGAIiIisVBAERGRWCigiIhILBRQREQkFgooIiISCwUUERGJhQKKiIjEQgFFRERioYAiIiKxUEAREZFYKKCIiLSwXK6r5Ho9KaCIiLSoXK6LdHpOUVk6PadhQUUBRRIrl9tOf/+5LFlyMv3955LLbW90lUQqksnMJp9PFZXl8ykymdkNqY8CiiRSLreddHodIyM7KRQKjIzsJJ1e15JBRYExuUZHnWBykHlFS6+83hRQJJEymevI5/OkgVEgDeTzeTKZ6xpcs8ps27atbQKjVK63twDAEEPsYilDDBWV15sCiiTS6OgIAEPAMnfpL28Vg4ODbREYpTqDg8fo6SmQZQO97CLLBnp6CgwOHmtIfRRQJJF6e/sAmO+uzw+Ut4qXXnoJaP3AKNUZGHidbPYofX3jpFIF+vrGyWaPMjDwekPqo4AiiTQ4+GV6enqKynp6ehgc/HKDalSd008/HWj9wCjVGxh4neHhw+zZc4jh4cMNCyaggCIJNTCwimz21qKybPZWBgZWNahG1clkMm0RGKU9KKBIYgWDR6sFE4DVq1e3RWCU9qCAItLi2iEwSntQQBERkVgooIiISCzKTvhijNkKfKrELtZae3YlL26M6QDWAmuAs4Ex4NfAHdbaf6rkXCIi0ljVtFB+AXw75OfeSk5ijOl0j7kNOAv4V+BRYAXwXWPM5irqJtL2/KlWli9frlnx0jSqSUl5l7V2awyv/QXgg8CzwHustXsAjDFnAY8A640xP7fW3hfDa4m0BS8HWT6fB+C3v/0t6fQ6rmhwvUSgQX0obuvkGnd1rRdMAKy1zwEb3dXBetdNpJlF5SATaQaN6pR/B7AYGLHWPhyy/fvACWCFMaa3rjUTaWJROchEmkE1AeXdxpisMeabxpivGmM+4HauV+KP3eXjYRuttUeAZ9zV86uoo0jNNDJdfFQOMpFmUE1AuRz4a+AvgS8BPwZ+Y4w5r4JznOkuf1tin5cC+4o0XKOfoxKVg0ykGVQSUJ4C1gPnAPNwWtwrgafdsp9WcHtqnrs8XGKfQ+5SX8KkaTT6OSpROchEmkGqUJjZg1iMMbOAh4C3A7dba68s45hv4rRwMtbaL0Xssw34BHCttfaGMquzHHihzH1FIOV7sl0Z/xc6OjooFAq8hvNN5yCwAEilUoyPj9eokiGC9a7w9xCJcCbwYrUHz/hJ9tba48aYG4D7gEvKPMxrfcwtsY/XijlYaZ1eeeUQ4+Ot85/q1FPns29fxb9mYtTy+pzq+3c5r9Hb28fIyM7QdPH1/BsG613p75EU+r8VzX9tOjpSLFo0b5ojphfXKK8d7rLcW14vusszSuxzWmBfkYZrl+eoiNTCjFsorkXu8lDJvSYNu8sVYRuNMScBf+iu/moG9RKJ1UQm37WfnShTungRR1wtFO9/U+gw4BC/BPYBfcaYi0K2fwzoBh631o7GUD8REamxsgKKMeZ8Y8xKd4a7v7zLGLMBZ/QXwNcC2+8xxuwwxhR11Ftrx4Cb3dU7jTGLfcecBdzormbK/1VEas8bNuxXz2HDIs2s3Ftey3ESOe43xgwDe3Fuc52HM3x4HLjGWvuTwHGnAwZ4Y8g5vwZcBFwKPGeM+RlOq+R9wBzgVuXxkmbjDRv284YN67aXJF25t7yeBjYDFmfOyQBwMXAE2AK81Vp7SyUv7LZSPgysA54HPuCe80lgtbV2ffTRIo3hpT4pt1wkScpqoVhrX8DJDlwRa+27ptk+jpO+/rZKzy3SCN6w4bBykaTTExtFKqBhwyLR4ho2LJIIGjYsEk0tFJEKBYOHgomIQwFFRKqSy3XR3z+XJUvm0d8/l1xONzySTu8AEalYLtdFOj2HfN5JSjkykiKdngMcZWDg9cZWThpGLRSRCngP10q6TGY2+XyKNJsYZRlpNpHPp8hkZje6atJACigiZfI/XCtYnjSjo07LZIghlrGbIfdhxF65JJMCikiZ/A/XCpbXUzCANSKg9fY6j4eY7+aD9ZZeuSSTAopImbzZ8EMR5fUQlUus3gYHj9HTUxw8enoKDA4eq3tdpHmoU16kTMGHa/nL6yUql1i9OR3vR2HtZFk2qw75pFMLRVqO1zG+ZMnJ9PefW7dbPmGz5L3yemmmnGHB4KFgImqhSEvxbvl438pHRnZO3PKp9QTDsFny9Xhdv6hcYiLNQC0UaSn+jvFRIM1k+vhy+Vs4lWr0rPioXGIizUABRVqKv2N8GZMd5OXeCvIP/S0UWm9E0sDAKrLZW4vKLrvsR0XrmrEujaKAIi3F6wD3Osa95RvecEpZ/SrBFk4reuyxTxatb9nyrqL1dHqOgoo0hAKKtJSwWz7d3d0cPnxootXh9auEBZVgC6fVbNw4my1bugOlxZMJNWNdGkUBRVpK2C2f+fPnc/z48bL6VYItnFZzzz3dBAPIKMs4yiwADjLPKdOMdWkABRRpOcGO8QMHDgDl9atEDf1tFWNjztLr/SkAy9gNpNjF0okUKJqxLo2ggCJ1kctt5+yzl7N48QIWL16AMctjmz8S1a8SNuEwrIXTSjo7neX9rGSMjom2yhyO0csusmzQjHVpGAUUqblcbjtXXfV59u/fP1F24MB+1q9fG0tQqfSxvI0e+jsTl19+AijwIe6ni7HA1gKnnDKuGevSMAooUnOZzHUTfRyvuj9p4MSJE7EkVgxrdbTrY3lvuukYa9acoLOzwOSNL8eddx7F2sN1CybNkKRSmosCitScf2TVye7PUGDbTCXpsbw33XSM3bsPsXfvoaLyerZKopJUKqgkmwKK1FzYyKpS/RzS/KKSVNY7lb80FwUUqbnBwS8za9asKeXd3d11Tawo8fFalgfd9YOBckkmBRSpuYGBVWzefMeU8q9//c6iW1ONyiIslfNalkPALiZvYarFmWwKKDJj1QaCYDDx59gqNdu9kcJSmmzcmLxZ6d7IuizQC2QpPbJOkkEBRWaknEAQ1oHrlXviyCJca7lcF1deOWdK+ZYt3YkLKt7Iur6+00ilUvT1nda2I+ukfKlWzLhawnLghVdeOcT4eOv8XqeeOp99+w5Ov2MT6u8/1wkiwAZgE8631b6+0xgefqZon+Bf5DTfPkuWnEyhUOA1nA77g8ACIJVKMT4+PuX6nLp4wcS/9+19LbIsin/fco/p75/LyEgHhUDqkxQFOjsL7N59KOLI2qnkd06iVv6/VWv+a9PRkWLRonkAZwIvVntOtVBkRspJJx/VUesvr2S2e6N4+bG8fFkwORNkLDjHUCSBFFBkRsoJBFFBwV9e6Wz3RvDyYw0xRJ7ZjJPiflYCkylRRJJMAUVmpJxAUM6z2Os52z3qWSHTPUNkcPAYnZ0FsmzgJI7SyTgf4n6g4KZEEUk2BRSZkXICQVRCxulmt9cqmKTTUzvWYfoHUw0MvM5ttx1l7txxnJtdBTo6CqxZc4KbblIyRhF1yjeBdug4LKdzONgRHrZfaGd7yPWptlM+qmMdnM71vr5xhocPhx7bjNQpX1o7/N+qFXXKi8xQsGPd+9qhB1OJzJwCirSF4K2qqFtX/o71XSzlflbqwVQiMVFAkRkrlcbcP4u+dq8/tV8kqj9kcPAYPT1Ox3ovu/gQ9+vBVCIxKT2sRWQa3iz4K3xl/lnx6fS6KVlpZ/6aXUWvNzg4m3y++FZVPp8ik5k9JaW7s36UTGY2IyMpOjudOSR9fU4w0YOpRKqnFkrCVJp3y9t/8eIFLF16CosXLyg6LiqN+bXXXjMlnUqQMcsrzv8V1hrZv7+4X2S6/pCBgdcZHj7M3r2HJp4rMjxcvwdTibQrtVASxGtNeAHAy7sF4UN0g/uPudPB/cdFzYI/cGA/Bw44j/wdovhZKP59gue7ImQ/v0xmamsEd8TWEENsYBOb2ACoP0Sk3tRCSZBKEzAG978v5LiwWfBpd9npTh8PCybefv7zXXvtU0Xbw/pAgqO0JtOgTPaLqD9EpDEUUBKknLxbpfa/NOS4sNQo3vaxsbHQGfL+/SbP9185cODGou1hHevBUVre6KxTTnHmkKRSzjKbPapbWCJ1poCSIJUmYAzu791o8h83MLCKhQsXFh3nbfdSmkcprsdmYG7Rdq9j3S84SstrjVx//TGGhw+zZ4/6Q0QaRQElQSpNwBiVgyt4XCZzc+T28tOnvBGYvmN9YOB1stmjao2INCF1yifIxIf72s9OlJVKwBi2f9hxAwOrpuzjbQ8O8Y1WqmN9alBRABFpPsrl1QTqnW+o0vxP5TyMKrjPN+48wrXXzubAgRSFiIbwQeYxn0McZB4LOIiTCGUyePT0FMhmj3LFFT3KxxRBubxKUy6vaLXI5VVWC8UY0w1cBFwCXAz8PjAH2Af8ErjNWvtvlbywMWYr8KkSu1hr7dmVnFMaY+PG2dwdKFu/fg4nTkTnxSowtTWycGGBk04qMDqaordXEw1FWk25t7wuBh5w//0y8DBwGDgHGAAGjDFftdb+jyrq8Avg+ZDy3VWcS+qkv38uo6MpenoKHDmSmhJQTpxIkWYTG9jEUWYzh2OcoJMuxigAP2QlWTaQdYNJT0+BTEYBRKSVlRtQxoEcsNla+4h/gzHm48A24G+NMQ9aax+ssA53WWu3VniMlCmX204mcx2joyP09vYxOPjlMvs0ShsZcW5jHTkS3QoZYoj5HOIos9jFUjb5AojDuS2ptCci7aGsUV7W2p9baz8aDCbutu8BW93Vv4ixbjJD3kz3kZGdFAqFohnp5Z8j/DtHmk2Msow0m4DJNPAwOUJrPocAmMPxiSG+3oOpwHkwldKeiLSPuIYN/8pdhk9okJqKys8VNTM+eGz4ObswZi5r14Y/3fAWrmYZuycmFt7PSsZJcZTZE2XFQ4ALwD7uvPMoe/c6+bP0lEOR9hLXsOGz3GU1/R7vNsa8BZgH7AEeBR6w1o7HVLe2Vio/l3+m+3x3mQ0cn06v47HHfo8HHri4KPtuKgWFQvjtrALQ4bZJvFaI82x1vzGG+O9s4H+yiTXAavr6HmVg4JkZ/sYi0qxm3EIxxrwJ+LS7mqviFJcDfw38JfAl4MfAb4wx5820bklQKj9X1Mx4v3z+w2zd+ja3TyTF2FgKSFEopCZuax2leLb61DAz7v44t7Jmzz5Od/dnyDJEL7vJcj09PT+InEApIu1hRvNQjDFdOAHgvcDPrLXvq+DYLwBjwE+Bl4AFQD+QAf4I2Av0W2vDMp9HWQ68UMH+La+jo4NCocBrOAHjIM6FTKVSfOc73+Fzn/sch48cmdg/xa0UWOdb/x2wYGJE1hP8CRfwJJvYUNSpPofjoa/vzCG5gVRqPePjk43Kbdu2MTg4yEsvvcTpp59OJpNh9erVNbkGiZbyhff2mlMmjTGjeSgzDSh3AZ8BdgJvtda+XPXJJs85C3gIeDtwu7X2ygoOX07CJjb295/rdLr7ylI4ebSGh59h48YnuHvLe3zbxosmGqYYB1K8xnzmc2hiaqE36TDK75gPFPgKbybLUxOvVwuanBZNExtL03snWi0mNlZ9y8sYsxknmLwMvDeOYAJgrT0O3OCuXhLHOdtZqfxcuVwX99zzrsAR4c8S8YLHZALI6GAC8AYO8gYOkeUpZs2apdtZIlJdp7wxZhOwHmem/Huttc/FWivY4S57Yz5v2wnLt5XPH+bKK53O9bAeD68VUigqqc7ChQvJZG6uIAmkiLSrigOKMeZmnL7fV4D3WWufjb1WsMhdlv6aLAA89tgnuQJ/csaUG0zC3c9K/gv/mx+5DcBUcXQpy17dXhGRgIpueRljbgSuBg4A77fW/romtQLv6+7jNTp/W9i48QkWL4YtW7qLykdZxn1cWjTx0O9D3E8XY+5Q3wIXXlj5d4JynwEvIslRdgvFGPN3wEbgVZxg8qvSR4Ax5gbgI8C91tov+srPx5kE+S/W2jFfeRdwFc7tNICvlVu/pNm48Qm2bHkrwYdSASxjN5fyQ1IwMcmwmNMc6eyEd77zWR5/fEXFr5/JXKfbXCJSpNxswx8EBt3V54F1xpiwXXdYa/3PcV0KGHfptxy4F9hvjBnGGSK8CDgP56mw48A11tqflPdrJM8995yFP5j4+0VSlOpcd1KeeLPU+/v/85TZ82EOUjyPJeqxwSKSXOXe8vI/4/UCnLTzYT9/Xub5nsZ55qtlMmPxxcARYAvOEORbyjxXW8nluujvn8uSJfPo758bmUtrbGxZ0XrKt9w1JX5P8gcTmAwM0w2sHAqsRz02WESSSw/YagKnnjqfb3wjz+DgbPbv97cvJh8yFUyeuHTp7xgbm/xQL/iOCc418QvOVfDmsaSBDTjNwzDBfvtv3HlX3W55aS5BNM1DKU3vnWhNNQ9F4rNtG1x1VRf79zvpT/yZfPP5FJnM7CnHXH75cziPpAlze+RrBTvTvXksWUqP0V64cGGJrdIIwb+lBkpIoymgNIGrrjrE8ePdE4Ekw7VFmXxHR6fOJbnppgtYs+YxOjtHcLqcJvX0XBP5Wun0uqIPnoGBVWSzt9LXd1rJOgb7WYLnkfrykoL66W8ijaZbXjUQ9lCrUreHFi+eC3RMpD/xS1Ggr2+c4eGo1ojDf+uj1LQSf1qWYJ3T6XUcieigD54z6jy1oNsWU02Xckcceu9E0y2vFpDLbWf9+kcYGXmYQuF1RkYeZv36R6b55vhS5JaeHudphpU6GFj6hY3Q8rIWV3JOjfRqnOBgCv1NpBkooMRsw4ZOTpy4mzQ5RukjTY4TJ27n2mufijxm0aIscJghhqaM0ArrkC/HELCLqaOzIHyEVtQHVKlzaqRX43jXfgj9TaR5KKDEKJfr4siRy4EOhhjy9YPM5cCBDZHHbd78Nrq7/4osA/RS/A2znGASbP10d3dPdLIHH6jlJY4MivqA8h/nP2fUeaQ+goMp9DeRZqCAUqFS80Sc0VgdpNnE3CkjsE6PPOfq1av5+tcvpK/vIlKpytKrhXXOplIpTjll4cTSL5u9NbQ/J+wDKnhcX99ppFIp+vpOizyP1Ic3mOKMM87Q30SahjrlK5DLdZFOzyGfD58nsmTJPAqFyWeLjJPiam4hywYWLjzIjh3h5w12HFYyt6CcztlyzxccTLBzZGfZ9agldaxG07UpTdcnmjrlGyyTmU0+Hz1PpLe3+DnrHRTIsgEokMlU9aSAacXZOTswsIrh4WfYs+d3GikkIhVTQKmANx+kuH9ksnxw8Bg9PcUto1TKyZ1Vbsd6pZPV1DkrIs1CAYXy82cFWyDe0isfGHidbPZo0TF33HG0KHdW6XpUPlltus7ZamdTaxa2iFQq8QHF6xcZGemgUEgxMtJBOj0nNKiEtUCC80SCLZFKhvyGzQXJ5/NkMtdFHuOf6R7snK12NrVmYYtINRIfUKbrF/ELa4EE54nM5Jt9tf0hwb4Pb6RPNQFqJseJSLIlPqBM1y8SVKoFMtNv9nH3h1QboDQLW0SqkfiAMl2/SCVm+s0+7slq1QYodfSLSDUSH1DK6Rcp10y/2ZfqD6lGtQFKs7BFpBq1mRzRQpxbVkdh7WRZtfmzenv7GBnZyRDOw6o2+crLr8+q2GY7+/tSys18PJPjRCTZNFPeVcns9Kh9vT4U/22vnp6eaVsZms1bmq5PNF2b0nR9ommmfJOL+5aViEgrSfwtr7jFectKRKSVqIUiIiKxUEAREZFYKKCIiEgsFFBERCQWCihUln9LWXhFRMIlPqBUkn9LWXhFRKIlPqBUkn9LWXhFRKIlPqBUkn9LWXhFRKIlPqBUkllXWXhFRKIlPqBUkllXWXhFRKIlPqBUkn9LubpERKIp23ATUEbU0nR9ounalKbrE03ZhkVEpGkpoIiISCwUUEREJBYKKCIiEot2e8BWJzgdTK2mFetcT7o+0XRtStP1ieZdG9816pzJ+dptlNefAY80uhIiIi3qQuDRag9ut4AyG1gB7AbGGlwXEZFW0QksBR4HjlV7knYLKCIi0iDqlBcRkVgooIiISCwUUEREJBYKKCIiEgsFFBERiYUCioiIxEIBRUREYtFuqVeahjHGAH+OM9HyAuD3gRTwMWvtP09z7CeAtcBbcCYc7QC2AHdaa8drWe9aM8Z0AxcBlwAX41yXOcA+4JfAbdbafytxfNteG48xZh3OjOXzgMXAAuBV4GlgK7DNWjtlApkxpgPn2qwBzsaZ3Ptr4A5r7T/Vo+6NYIy5Hviiu3q1tfbvI/ZLwntnK/CpErtYa+3ZIcfF8t5RC6V21gL/AKwGDE4wmZYx5nZgG04QegR4AOdD9zbgn90/fCu7GPgpkMZ5kvLDwL3AfmAAeNAY85WwAxNwbTwbgQ8DeeDfgRzwPPAe4DvAvcHf1RjTiXMdbwPOAv4VJ4XGCuC7xpjN9ap8PRljVgDXACVnaCfoveP5BfDtkJ97gzvG+d5RC6V2/g9wC/AE8CTwLZwP00jGmAHg88DLwEXW2ufc8iXAg8BHgHVAK384jON8QG621hblXTPGfBznP/3fGmMetNY+6NuWhGvjuQz4lbX2sL/QGHMu8DPgQzjfQrf4Nn8B+CDwLPAea+0e95izcD5A1xtjfm6tva/21a8PY8xsnA/JPcBjOEE4bL8kvXc8d1lrt5a57xeI6b3TblG5aVhr77LWXmOt3W6t/Y8yD/Oa7Ru9N717rj04LR6Av2nlb1PW2p9baz8aDCbutu/h3NIB+IvA5ra/Nh5r7aPBYOKWPwPc7q6+3yt3v2Fe466u9T4Q3GOew2nxAAzWpsYN8xXgD4ArgN+V2C8x751Kxf3eSdwFbFbGmD7gT4DjwPeD2621DwGjwJuAt9e3dnX1K3fZ5xXo2hR53V36E/i9A6evZcRa+3DIMd8HTgArjDG9Na5fXRhj3gZsAL5rrb2/xH5675QW63tHAaV5/LG7fMZam4/Y5/HAvu3oLHe521emawMYY87E+TYO8L98m7zf+XFCWGuPAM+4q+fXpHJ1ZIyZg3Oraz9w1TS7J/W9825jTNYY801jzFeNMR+IaIHF+t5RH0rzONNd/rbEPi8F9m0rxpg3AZ92V3O+TYm8NsaYNTj9bt04LbY/xfkSeL211t+5Wu71OZ/2uD4ZnIEul1lr/980+ybyvQNcHlL2rDHmMmvtb3xlsb531EJpHvPc5ZR75z6H3OX8Gtel7owxXcA/AicDPwvcxkjqtXknTuf7J3CGWgP8LfDVwH6JuT7GmD/F6UT+gdvnNp3EXBvXU8B64Byc330ZsBJnyPk5wE8Dt65ivT5qoUiz+AbwXmAnUzvkE8la+1ngs8aYHpxvh2uAIWCVMeYSa+2uRtav3tzrsBV4DWfUlgRYa/8hUHQY+JEx5gHgIZx+oi8CV9bi9dVCaR7et4C5Jfbxvk0crHFd6sod5/4ZnGGd77XWvhzYJbHXBsBam7fWPmutvRrnw+CPcOYMeJJyfa7H6WNLW2t3T7ezKynXpiRr7XHgBnf1Et+mWK+PWijN40V3eUaJfU4L7NvyjDGbcJro+3CCyXMhu73oLhN1bSJsBf4euNQY022tPUFyrs9HcOYxfcoYE5wN7s3+XmuMWQk877bwXnTL2/3alGOHu/Tf8nrRXcZyfdRCaR7ecNlz3aZ9mBWBfVuaMeZmnBnzrwDvs9Y+G7Fr4q5NCQdwhg53AQvdsmF3uSLsAGPMScAfuqutfn06cAYqBH+WuNt/z12/wF3Xe2fSInd5yFcW63tHAaVJWGt34vxxZwEfC243xlyMM9LnZZycVy3NGHMjcDXOB+T7rbW/jto3addmGhfhBJNXAW+E0y9xWnh9xpiLQo75GM5IscettaP1qGQtWGuXW2tTYT84w4jByeWVstae7x6j986kVe7SP0Q41veOAkpz8e5x3mSMebNXaIxZDNzhrt7Y6onsjDF/hzMD91WcYFLON8OkXJs/M8asdEe9Bbe9EyeFD8C3rLVjAO7yZrf8TveaeMecBdzormZqV/OmlpT3zvnue6czUN5ljNmAc2sZ4GvetrjfO6lCoWRONamSMaafyTcrOEP25gPP4UzIAsBa+/bAcXfgpIM4ipNE8QTO6KcFwA+Aj3ofJK3IGPNBwMsJ9ASTk6aCdlhrb/QXtPu1ATDGfBonR9erON+sX8Z53/wnnPcQwI9wslbnfcd5Cf4uxRkF9TOcb5bvw8nmfKu11vtAaTu+LLuh2YYT8t75MJOJVoeBvTi3uc7DGT48DvyNtfaWwHGxvXfUKV87C4C3hZSfFVI2wVr7eWPMo8Bf4dwL9tJs3017pNle6Pv3BUze6w56iMlvR0Airg04v/dXcdLXn4UzmTGFE1hywD9aa38QPMhaO+Z+oHweZ3jxB3BSkD+Jk4L8u/WofLNKyHvnaZwEl2/F+fJxIU4W5hGcLym3W2ufDB4U53tHLRQREYmF+lBERCQWCigiIhILBRQREYmFAoqIiMRCAUVERGKhgCIiIrFQQBERkVgooIiISCwUUEREJBYKKCIiEov/D6CAdJDTKPONAAAAAElFTkSuQmCC\n", "text/plain": [ "
" ] }, "metadata": { "filenames": { "image/png": "/home/runner/work/BrownFall21/BrownFall21/_build/jupyter_execute/notes/2021-10-29_15_0.png" }, "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "plt.scatter(tips_X_test,tips_y_test, color='black')\n", "plt.scatter(tips_X_test,tips_y_pred, color='blue')\n", "\n", "[plt.plot([x,x],[yp,yt], color='red', linewidth=3)\n", " for x, yp, yt in zip(tips_X_test, tips_y_pred,tips_y_test)];" ] }, { "cell_type": "markdown", "id": "4e6b5750", "metadata": {}, "source": [ "We can plot them as a scatter plot as well." ] }, { "cell_type": "code", "execution_count": 10, "id": "cce6607a", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "filenames": { "image/png": "/home/runner/work/BrownFall21/BrownFall21/_build/jupyter_execute/notes/2021-10-29_17_1.png" }, "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "tips_residuals = tips_y_pred - tips_y_test\n", "plt.scatter(tips_X_test,tips_residuals, color='red')" ] }, { "cell_type": "markdown", "id": "7134277d", "metadata": {}, "source": [ "One thing we notice is that the residuals are smaller for some values of the\n", "`total_bill` and larger for others. This suggests that there is more\n", "information left.\n", "A good fit, would have residuals that are evenly distributed, not correlated\n", "with the feature(s) in this case, the total bill.\n", "\n", "\n", "## Polynomial regression\n", "\n", "Polynomial regression is still a linear problem. Linear regression solves for\n", "the $\\beta_i$ for a $d$ dimensional problem.\n", "\n", "$$ y = \\beta_0 + \\beta_1 x_1 + \\ldots + \\beta_d x_d = \\sum_i^d \\beta_i x_i$$\n", "\n", "Quadratic regression solves for\n", "\n", "$$ y = \\beta_0 + \\sum_i^d \\beta_i x_i$ + \\sum_j^d \\sum_i^d \\beta_{d+i} x_i x_j + \\sum_i^d x_i^2$ $$\n", "\n", "This is still a linear problem, we can create a new $X$ matrix that has the\n", "polynomial values of each feature and solve for more $\\beta$ values.\n", "\n", "We use a transformer object, which works similarly to the estimators, but does\n", "not use targets.\n", "First, we instantiate." ] }, { "cell_type": "code", "execution_count": 11, "id": "6b36ef86", "metadata": {}, "outputs": [], "source": [ "poly = PolynomialFeatures(include_bias=False)" ] }, { "cell_type": "markdown", "id": "e9840438", "metadata": {}, "source": [ "Then we apply it" ] }, { "cell_type": "code", "execution_count": 12, "id": "f896a321", "metadata": {}, "outputs": [], "source": [ "tips_X2_train = poly.fit_transform(tips_X_train)\n", "tips_X2_test = poly.fit_transform(tips_X_test)" ] }, { "cell_type": "markdown", "id": "26948813", "metadata": {}, "source": [ "We can see wht it did by looking at the shape." ] }, { "cell_type": "code", "execution_count": 13, "id": "af4018bf", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "((195, 1), (195, 2))" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "tips_X_train.shape, tips_X2_train.shape" ] }, { "cell_type": "code", "execution_count": 14, "id": "4d9aca51", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([ 722.5344, 1067.9824, 320.0521, 419.8401, 2320.3489])" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "tips_X2_train[:5,1]" ] }, { "cell_type": "code", "execution_count": 15, "id": "ea45e023", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[26.88],\n", " [32.68],\n", " [17.89],\n", " [20.49],\n", " [48.17]])" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "tips_X_train[:5]" ] }, { "cell_type": "code", "execution_count": 16, "id": "ab18e8e0", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([26.88, 32.68, 17.89, 20.49, 48.17])" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "tips_X2_train[:5,0]" ] }, { "cell_type": "markdown", "id": "0db9e634", "metadata": {}, "source": [ "Now, we can fit a linear model on this data, which learns a weight for the data\n", "and it's squared value." ] }, { "cell_type": "code", "execution_count": 17, "id": "288960ff", "metadata": {}, "outputs": [], "source": [ "regr2_tips = linear_model.LinearRegression()\n", "regr2_tips.fit(tips_X2_train,tips_y_train)\n", "tips2_y_pred = regr2_tips.predict(tips_X2_test)" ] }, { "cell_type": "markdown", "id": "311bf862", "metadata": {}, "source": [ "Then we can plot it." ] }, { "cell_type": "code", "execution_count": 18, "id": "60f30433", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "filenames": { "image/png": "/home/runner/work/BrownFall21/BrownFall21/_build/jupyter_execute/notes/2021-10-29_30_1.png" }, "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "plt.scatter(tips_X_test,tips_y_test, color='black')\n", "plt.scatter(tips_X_test,tips_y_pred, color='blue')\n", "plt.scatter(tips_X_test,tips2_y_pred, color='green')" ] }, { "cell_type": "markdown", "id": "f88f0933", "metadata": {}, "source": [ "We can see that this its somewhat better, the residuals are more uniformly\n", "distributed, but it doesn't look very nonlinear. \n", "We will examine this further in the next step, but first we will drop the linear column to see the quadratic more clearly." ] }, { "cell_type": "code", "execution_count": 19, "id": "1dc37507", "metadata": {}, "outputs": [], "source": [ "poly = PolynomialFeatures()\n", "\n", "tips_Xq_train = poly.fit_transform(tips_X_train)[:,::2]\n", "tips_Xq_test = poly.fit_transform(tips_X_test)[:,::2]\n", "\n", "regr_qu_tips = linear_model.LinearRegression(fit_intercept=False)\n", "regr_qu_tips.fit(tips_Xq_train,tips_y_train)\n", "tips2_q_pred = regr_qu_tips.predict(tips_Xq_test)" ] }, { "cell_type": "code", "execution_count": 20, "id": "4ece6988", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "filenames": { "image/png": "/home/runner/work/BrownFall21/BrownFall21/_build/jupyter_execute/notes/2021-10-29_33_1.png" }, "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "plt.scatter(tips_X_test,tips_y_test, color='black')\n", "plt.scatter(tips_X_test,tips2_q_pred, color='green')" ] }, { "cell_type": "markdown", "id": "a36dd44e", "metadata": {}, "source": [ "```{admonition} Try it Yourself\n", "How would you make it cubic? what about 4th dimension?\n", "```\n", "\n", "## Examining Coefficients\n", "\n", "\n", "Now we can compare the coefficients.\n", "We saw above that the quadratic didn't help much, so let's look at those." ] }, { "cell_type": "code", "execution_count": 21, "id": "bcaf944c", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([ 9.70620903e-02, -4.18198822e-06])" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "regr2_tips.coef_" ] }, { "cell_type": "markdown", "id": "094c9bda", "metadata": {}, "source": [ "The second parameter is very very small, so that explains why it didn't change\n", "the fit much. We can use the features to figure out how important each\n", "feature is to the prediction. Large numbers strongly influence the prediction\n", "smaller ones influence it less." ] }, { "cell_type": "code", "execution_count": 22, "id": "32cf5d2b", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([0.0968534])" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "regr_tips.coef_" ] }, { "cell_type": "markdown", "id": "4c6c27e3", "metadata": {}, "source": [ "\n", "\n", "## Sparse Regression\n", "\n", "An extreme is for some coefficients to be zero.\n", "The LASSO model, constrains some of the coefficients to be 0, so it learns\n", "simultanesouly how to combine the features to predict the target and which\n", "subset of the features to use.\n", "\n", "```{admonition} Further Reading\n", "For the mathermatical formulation see the sklearn [User Guide Section on LASSO](https://scikit-learn.org/stable/modules/linear_model.html#lasso)\n", "and the code in [LASSO docs](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.Lasso.html)\n", "```\n", "\n", "```{admonition} Thinking Ahead\n", "LASSO is not required for assignment 8, but is one way you could earn level 3.\n", "Here is a preview, but you can investigate it further on your own.\n", "```" ] }, { "cell_type": "code", "execution_count": 23, "id": "eadf1830", "metadata": {}, "outputs": [ { "ename": "NameError", "evalue": "name 'tips_all_X_train' is not defined", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mNameError\u001b[0m Traceback (most recent call last)", "Input \u001b[0;32mIn [23]\u001b[0m, in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[1;32m 1\u001b[0m tips_lasso \u001b[38;5;241m=\u001b[39m linear_model\u001b[38;5;241m.\u001b[39mLasso(alpha\u001b[38;5;241m=\u001b[39m\u001b[38;5;241m.0025\u001b[39m)\n\u001b[0;32m----> 2\u001b[0m tips_lasso\u001b[38;5;241m.\u001b[39mfit(\u001b[43mtips_all_X_train\u001b[49m,tips_all_y_train)\n\u001b[1;32m 3\u001b[0m tips_lasso_y_pred \u001b[38;5;241m=\u001b[39m tips_lasso\u001b[38;5;241m.\u001b[39mpredict(tips_all_X_test,)\n\u001b[1;32m 4\u001b[0m tips_lasso\u001b[38;5;241m.\u001b[39mscore(tips_all_X_test,tips_all_y_test)\n", "\u001b[0;31mNameError\u001b[0m: name 'tips_all_X_train' is not defined" ] } ], "source": [ "tips_lasso = linear_model.Lasso(alpha=.0025)\n", "tips_lasso.fit(tips_all_X_train,tips_all_y_train)\n", "tips_lasso_y_pred = tips_lasso.predict(tips_all_X_test,)\n", "tips_lasso.score(tips_all_X_test,tips_all_y_test)" ] }, { "cell_type": "code", "execution_count": 24, "id": "1f166d20", "metadata": {}, "outputs": [ { "ename": "NameError", "evalue": "name 'tips_lasso_y_pred' is not defined", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mNameError\u001b[0m Traceback (most recent call last)", "Input \u001b[0;32mIn [24]\u001b[0m, in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[1;32m 1\u001b[0m plt\u001b[38;5;241m.\u001b[39mscatter(tips_X_test,tips_y_test, color\u001b[38;5;241m=\u001b[39m\u001b[38;5;124m'\u001b[39m\u001b[38;5;124mblack\u001b[39m\u001b[38;5;124m'\u001b[39m)\n\u001b[1;32m 2\u001b[0m plt\u001b[38;5;241m.\u001b[39mscatter(tips_X_test,tips_y_pred, color\u001b[38;5;241m=\u001b[39m\u001b[38;5;124m'\u001b[39m\u001b[38;5;124mblue\u001b[39m\u001b[38;5;124m'\u001b[39m)\n\u001b[0;32m----> 3\u001b[0m plt\u001b[38;5;241m.\u001b[39mscatter(tips_X_test,\u001b[43mtips_lasso_y_pred\u001b[49m, color\u001b[38;5;241m=\u001b[39m\u001b[38;5;124m'\u001b[39m\u001b[38;5;124mgreen\u001b[39m\u001b[38;5;124m'\u001b[39m)\n", "\u001b[0;31mNameError\u001b[0m: name 'tips_lasso_y_pred' is not defined" ] }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "filenames": { "image/png": "/home/runner/work/BrownFall21/BrownFall21/_build/jupyter_execute/notes/2021-10-29_40_1.png" }, "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "plt.scatter(tips_X_test,tips_y_test, color='black')\n", "plt.scatter(tips_X_test,tips_y_pred, color='blue')\n", "plt.scatter(tips_X_test,tips_lasso_y_pred, color='green')" ] }, { "cell_type": "code", "execution_count": 25, "id": "7cab81e3", "metadata": {}, "outputs": [ { "ename": "AttributeError", "evalue": "'Lasso' object has no attribute 'coef_'", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mAttributeError\u001b[0m Traceback (most recent call last)", "Input \u001b[0;32mIn [25]\u001b[0m, in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> 1\u001b[0m \u001b[38;5;28msum\u001b[39m(\u001b[43mtips_lasso\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mcoef_\u001b[49m \u001b[38;5;241m==\u001b[39m\u001b[38;5;241m0\u001b[39m)\u001b[38;5;241m/\u001b[39m\u001b[38;5;28mlen\u001b[39m(tips_lasso\u001b[38;5;241m.\u001b[39mcoef_)\n", "\u001b[0;31mAttributeError\u001b[0m: 'Lasso' object has no attribute 'coef_'" ] } ], "source": [ "sum(tips_lasso.coef_ ==0)/len(tips_lasso.coef_)" ] }, { "cell_type": "code", "execution_count": 26, "id": "0ec97f0f", "metadata": {}, "outputs": [ { "ename": "NameError", "evalue": "name 'tips_onehot' is not defined", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mNameError\u001b[0m Traceback (most recent call last)", "Input \u001b[0;32mIn [26]\u001b[0m, in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> 1\u001b[0m \u001b[43mtips_onehot\u001b[49m\u001b[38;5;241m.\u001b[39mshape, tips_interacion\u001b[38;5;241m.\u001b[39mshape\n", "\u001b[0;31mNameError\u001b[0m: name 'tips_onehot' is not defined" ] } ], "source": [ "tips_onehot.shape, tips_interacion.shape" ] }, { "cell_type": "markdown", "id": "c23e002e", "metadata": {}, "source": [ "The transform changed our data from 10 columns to 55." ] }, { "cell_type": "code", "execution_count": 27, "id": "05fc1ea6", "metadata": {}, "outputs": [ { "ename": "NameError", "evalue": "name 'tips_interacion' is not defined", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mNameError\u001b[0m Traceback (most recent call last)", "Input \u001b[0;32mIn [27]\u001b[0m, in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> 1\u001b[0m \u001b[43mtips_interacion\u001b[49m\u001b[38;5;241m.\u001b[39mhead\n", "\u001b[0;31mNameError\u001b[0m: name 'tips_interacion' is not defined" ] } ], "source": [ "tips_interacion.head" ] }, { "cell_type": "markdown", "id": "0999a2f8", "metadata": {}, "source": [ "\n", "\n", "## Questions After Class\n", "\n", "\n", "### When do we do regression?\n", "```{toggle}\n", "we do regresion, when we want to predict a continuous value\n", "```\n", "\n", "\n", "### What should I look for in datasets to know whether a linear model or non-linear model is best?\n", "```{toggle}\n", "If you know a reason to choose one from domain knowledge, always use that. From\n", "data alone, a reasonable thing to do is to fit a linear model and then examine\n", "the residuals and use a more complex model if that makes sense.\n", "```\n", "\n", "### How can we tell if a dataset is going to be useful through tweaking or is just not worth it?\n", "```{toggle}\n", "This is a **very** good question, but does not have a simple answer. In some\n", "cases, a moderate fit quality is enough, because there's low risk of making\n", "errors. In other cases, a really high quality fit is required because of the\n", "risk.\n", "```" ] } ], "metadata": { "jupytext": { "text_representation": { "extension": ".md", "format_name": "myst", "format_version": 0.13, "jupytext_version": "1.10.3" } }, "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.13" }, "source_map": [ 12, 15, 27, 30, 34, 38, 40, 44, 56, 60, 64, 70, 72, 78, 80, 83, 89, 92, 96, 123, 125, 128, 131, 135, 139, 143, 147, 149, 154, 158, 161, 165, 171, 183, 186, 198, 200, 206, 208, 311, 318, 324, 328, 330, 334, 336 ] }, "nbformat": 4, "nbformat_minor": 5 }