32. Neural Networks#

We started thinking about machine learning wiht the idea that the basic idea is that we assume that our target variable (\(y_i\)) is related to the features \(\mathbf{x}_i\) by some function (for sample \(i\)):

\[ y_i =f(\mathbf{x}_i)\]

But we don’t know that function exactly, so we assume a type of \(f\) (a decision tree, a boundary for SVM, a propbility distribuiton) that has some paramters \(\theta\) and then use a machine learning algorithm \(\mathcal{A}\) to estimate the parameters for \(f\). In the decision tree the parameters are the thresholds to compare to, in the GaussianNB the parameters are the mean and variance, in SVM it’s the support vecotrs that define the margin.

\[\theta = \mathcal{A}(X,y) \]

That we can use to test on our test data:

\[ \hat{y}_i = f(x_i;\theta) \]

A neural net allows us to not assume a specific form for \(f\) first, it does universal funciton approximation. For one hidden layer and a binary classification problem:

\[f(x) = W_2g(W_1^T x +b_1) + b_2 \]

where the function \(g\) is called the activation function. so we approximate some unkown, complicated function $f4 by taking a weighted sum of all of the inputs, and passing those through another, known function.

from sklearn.neural_network import MLPClassifier
from sklearn import svm
import pandas as pd
import sklearn

from sklearn import datasets
import matplotlib.pyplot as plt
from sklearn import model_selection
import numpy as np

We’re going to use the digits dataset again.

digits = datasets.load_digits()
digits_X = digits.data
digits_y = digits.target
X_train, X_test, y_train, y_test = model_selection.train_test_split(digits_X,digits_y)
digits.images[0]
array([[ 0.,  0.,  5., 13.,  9.,  1.,  0.,  0.],
       [ 0.,  0., 13., 15., 10., 15.,  5.,  0.],
       [ 0.,  3., 15.,  2.,  0., 11.,  8.,  0.],
       [ 0.,  4., 12.,  0.,  0.,  8.,  8.,  0.],
       [ 0.,  5.,  8.,  0.,  0.,  9.,  8.,  0.],
       [ 0.,  4., 11.,  0.,  1., 12.,  7.,  0.],
       [ 0.,  2., 14.,  5., 10., 12.,  0.,  0.],
       [ 0.,  0.,  6., 13., 10.,  0.,  0.,  0.]])

Sklearn provides an estimator for the MLP. We can see one with one layer to start.

mlp = MLPClassifier(
  hidden_layer_sizes=(16),
  max_iter=500,
  alpha=1e-4,
  solver="lbfgs",
  verbose=10,
  random_state=1,
  learning_rate_init=0.1,
)
mlp.fit(X_train,y_train)
mlp.score(X_test,y_test)
RUNNING THE L-BFGS-B CODE

           * * *

Machine precision = 2.220D-16
 N =         1210     M =           10

At X0         0 variables are exactly at the bounds

At iterate    0    f=  9.27227D+00    |proj g|=  7.08994D+00

At iterate    1    f=  8.04002D+00    |proj g|=  7.04491D+00

At iterate    2    f=  3.12608D+00    |proj g|=  1.80942D+00

At iterate    3    f=  2.36715D+00    |proj g|=  3.52156D-01

At iterate    4    f=  2.26964D+00    |proj g|=  2.27675D-01

At iterate    5    f=  2.11984D+00    |proj g|=  2.55799D-01

At iterate    6    f=  1.99167D+00    |proj g|=  3.07291D-01

At iterate    7    f=  1.75901D+00    |proj g|=  6.61113D-01

At iterate    8    f=  1.64046D+00    |proj g|=  2.77600D-01

At iterate    9    f=  1.54232D+00    |proj g|=  5.85372D-01

At iterate   10    f=  1.46332D+00    |proj g|=  5.12800D-01

At iterate   11    f=  1.39561D+00    |proj g|=  2.99775D-01

At iterate   12    f=  1.30516D+00    |proj g|=  4.77882D-01

At iterate   13    f=  1.20060D+00    |proj g|=  9.86685D-01

At iterate   14    f=  1.10185D+00    |proj g|=  2.02083D-01

At iterate   15    f=  1.06560D+00    |proj g|=  1.90838D-01

At iterate   16    f=  1.02256D+00    |proj g|=  2.94326D-01

At iterate   17    f=  9.74097D-01    |proj g|=  3.44708D-01

At iterate   18    f=  9.31800D-01    |proj g|=  3.09162D-01

At iterate   19    f=  8.92911D-01    |proj g|=  1.98347D-01

At iterate   20    f=  8.56766D-01    |proj g|=  3.23628D-01

At iterate   21    f=  8.16915D-01    |proj g|=  3.39130D-01

At iterate   22    f=  7.74226D-01    |proj g|=  2.83939D-01

At iterate   23    f=  7.15964D-01    |proj g|=  2.69266D-01

At iterate   24    f=  6.93035D-01    |proj g|=  4.95794D-01

At iterate   25    f=  6.67817D-01    |proj g|=  6.30967D-01

At iterate   26    f=  6.47295D-01    |proj g|=  1.93662D-01

At iterate   27    f=  6.32798D-01    |proj g|=  2.14150D-01

At iterate   28    f=  6.05131D-01    |proj g|=  3.51393D-01

At iterate   29    f=  5.67418D-01    |proj g|=  1.88472D-01

At iterate   30    f=  5.48979D-01    |proj g|=  1.64330D-01

At iterate   31    f=  5.33075D-01    |proj g|=  1.75119D-01

At iterate   32    f=  5.10893D-01    |proj g|=  1.28782D-01

At iterate   33    f=  4.93290D-01    |proj g|=  1.27263D-01

At iterate   34    f=  4.71165D-01    |proj g|=  2.22851D-01

At iterate   35    f=  4.44793D-01    |proj g|=  1.44018D-01

At iterate   36    f=  4.32206D-01    |proj g|=  3.42970D-01

At iterate   37    f=  4.24355D-01    |proj g|=  2.62493D-01

At iterate   38    f=  4.16094D-01    |proj g|=  1.49284D-01

At iterate   39    f=  4.08195D-01    |proj g|=  7.78993D-02

At iterate   40    f=  3.99109D-01    |proj g|=  1.68352D-01

At iterate   41    f=  3.81397D-01    |proj g|=  3.01805D-01

At iterate   42    f=  3.72741D-01    |proj g|=  3.05747D-01

At iterate   43    f=  3.59836D-01    |proj g|=  1.94766D-01

At iterate   44    f=  3.49103D-01    |proj g|=  1.06654D-01

At iterate   45    f=  3.42337D-01    |proj g|=  1.63839D-01

At iterate   46    f=  3.35054D-01    |proj g|=  3.14030D-01

At iterate   47    f=  3.17285D-01    |proj g|=  2.06630D-01

At iterate   48    f=  3.03850D-01    |proj g|=  3.52371D-01

At iterate   49    f=  2.92263D-01    |proj g|=  2.73907D-01

At iterate   50    f=  2.86569D-01    |proj g|=  1.01353D-01

At iterate   51    f=  2.79707D-01    |proj g|=  6.83841D-02

At iterate   52    f=  2.71333D-01    |proj g|=  1.49279D-01

At iterate   53    f=  2.59688D-01    |proj g|=  1.46332D-01

At iterate   54    f=  2.50517D-01    |proj g|=  1.79559D-01

At iterate   55    f=  2.42272D-01    |proj g|=  1.48664D-01

At iterate   56    f=  2.34697D-01    |proj g|=  1.11852D-01

At iterate   57    f=  2.29063D-01    |proj g|=  1.11031D-01

At iterate   58    f=  2.21501D-01    |proj g|=  1.53909D-01

At iterate   59    f=  2.18002D-01    |proj g|=  2.33666D-01

At iterate   60    f=  2.08325D-01    |proj g|=  1.10925D-01

At iterate   61    f=  2.00121D-01    |proj g|=  1.26804D-01

At iterate   62    f=  1.96985D-01    |proj g|=  1.29513D-01

At iterate   63    f=  1.90568D-01    |proj g|=  1.16511D-01

At iterate   64    f=  1.86315D-01    |proj g|=  5.74096D-02

At iterate   65    f=  1.81652D-01    |proj g|=  8.12195D-02

At iterate   66    f=  1.74063D-01    |proj g|=  9.25730D-02

At iterate   67    f=  1.71796D-01    |proj g|=  2.72776D-01

At iterate   68    f=  1.63014D-01    |proj g|=  8.02688D-02

At iterate   69    f=  1.59663D-01    |proj g|=  4.76894D-02

At iterate   70    f=  1.55768D-01    |proj g|=  9.76366D-02

At iterate   71    f=  1.49661D-01    |proj g|=  1.13090D-01

At iterate   72    f=  1.42819D-01    |proj g|=  8.38093D-02

At iterate   73    f=  1.35735D-01    |proj g|=  7.86675D-02

At iterate   74    f=  1.32996D-01    |proj g|=  4.45142D-02

At iterate   75    f=  1.28996D-01    |proj g|=  6.60008D-02

At iterate   76    f=  1.26060D-01    |proj g|=  1.47951D-01

At iterate   77    f=  1.23559D-01    |proj g|=  5.20657D-02

At iterate   78    f=  1.21496D-01    |proj g|=  5.51397D-02

At iterate   79    f=  1.19065D-01    |proj g|=  6.31317D-02

At iterate   80    f=  1.17018D-01    |proj g|=  1.29986D-01

At iterate   81    f=  1.14221D-01    |proj g|=  3.39163D-02

At iterate   82    f=  1.13193D-01    |proj g|=  2.04129D-02

At iterate   83    f=  1.11959D-01    |proj g|=  3.56325D-02

At iterate   84    f=  1.09345D-01    |proj g|=  5.76385D-02

At iterate   85    f=  1.08258D-01    |proj g|=  5.67550D-02

At iterate   86    f=  1.06611D-01    |proj g|=  2.72224D-02

At iterate   87    f=  1.04447D-01    |proj g|=  2.34475D-02

At iterate   88    f=  1.02991D-01    |proj g|=  5.61546D-02

At iterate   89    f=  1.01380D-01    |proj g|=  3.38740D-02

At iterate   90    f=  9.98791D-02    |proj g|=  2.72291D-02

At iterate   91    f=  9.83953D-02    |proj g|=  2.70486D-02

At iterate   92    f=  9.60107D-02    |proj g|=  3.12134D-02

At iterate   93    f=  9.54945D-02    |proj g|=  1.62172D-01

At iterate   94    f=  9.23505D-02    |proj g|=  5.22738D-02

At iterate   95    f=  9.10973D-02    |proj g|=  1.34185D-02

At iterate   96    f=  9.01023D-02    |proj g|=  2.44081D-02

At iterate   97    f=  8.94528D-02    |proj g|=  4.28894D-02

At iterate   98    f=  8.91429D-02    |proj g|=  1.32633D-01

At iterate   99    f=  8.71876D-02    |proj g|=  4.28060D-02

At iterate  100    f=  8.62915D-02    |proj g|=  2.68617D-02

At iterate  101    f=  8.54859D-02    |proj g|=  7.98872D-02

At iterate  102    f=  8.47415D-02    |proj g|=  4.80975D-02

At iterate  103    f=  8.36305D-02    |proj g|=  3.64790D-02

At iterate  104    f=  8.19889D-02    |proj g|=  4.83787D-02

At iterate  105    f=  8.07567D-02    |proj g|=  5.13474D-02

At iterate  106    f=  7.96292D-02    |proj g|=  2.83173D-02

At iterate  107    f=  7.84190D-02    |proj g|=  1.91186D-02

At iterate  108    f=  7.76413D-02    |proj g|=  4.47166D-02

At iterate  109    f=  7.67706D-02    |proj g|=  2.18632D-02

At iterate  110    f=  7.58616D-02    |proj g|=  2.18128D-02

At iterate  111    f=  7.46519D-02    |proj g|=  4.85600D-02

At iterate  112    f=  7.36771D-02    |proj g|=  3.04623D-02

At iterate  113    f=  7.30455D-02    |proj g|=  2.24475D-02

At iterate  114    f=  7.17813D-02    |proj g|=  2.91206D-02

At iterate  115    f=  7.04006D-02    |proj g|=  5.84755D-02

At iterate  116    f=  6.93231D-02    |proj g|=  7.03734D-02

At iterate  117    f=  6.84442D-02    |proj g|=  2.08775D-02

At iterate  118    f=  6.76496D-02    |proj g|=  1.85692D-02

At iterate  119    f=  6.69563D-02    |proj g|=  4.69983D-02

At iterate  120    f=  6.62496D-02    |proj g|=  5.53999D-02

At iterate  121    f=  6.47823D-02    |proj g|=  4.18537D-02

At iterate  122    f=  6.37685D-02    |proj g|=  2.60466D-02

At iterate  123    f=  6.28934D-02    |proj g|=  2.52331D-02

At iterate  124    f=  6.13211D-02    |proj g|=  1.95883D-02

At iterate  125    f=  6.02689D-02    |proj g|=  4.06410D-02

At iterate  126    f=  5.91601D-02    |proj g|=  6.48052D-02
 This problem is unconstrained.
At iterate  127    f=  5.76191D-02    |proj g|=  1.98001D-02

At iterate  128    f=  5.65497D-02    |proj g|=  5.81905D-02

At iterate  129    f=  5.57589D-02    |proj g|=  5.80664D-02

At iterate  130    f=  5.47377D-02    |proj g|=  3.60690D-02

At iterate  131    f=  5.38437D-02    |proj g|=  2.23770D-02

At iterate  132    f=  5.33311D-02    |proj g|=  2.95811D-02

At iterate  133    f=  5.25948D-02    |proj g|=  2.56814D-02

At iterate  134    f=  5.14684D-02    |proj g|=  4.74849D-02

At iterate  135    f=  5.08080D-02    |proj g|=  6.91872D-02

At iterate  136    f=  4.91555D-02    |proj g|=  2.71101D-02

At iterate  137    f=  4.82501D-02    |proj g|=  1.53895D-02

At iterate  138    f=  4.76426D-02    |proj g|=  1.19551D-02

At iterate  139    f=  4.68955D-02    |proj g|=  5.69765D-02

At iterate  140    f=  4.61671D-02    |proj g|=  2.33999D-02

At iterate  141    f=  4.57522D-02    |proj g|=  2.28369D-02

At iterate  142    f=  4.54256D-02    |proj g|=  1.47352D-02

At iterate  143    f=  4.50623D-02    |proj g|=  1.95392D-02

At iterate  144    f=  4.45685D-02    |proj g|=  2.55158D-02

At iterate  145    f=  4.42464D-02    |proj g|=  1.39652D-02

At iterate  146    f=  4.36814D-02    |proj g|=  1.65401D-02

At iterate  147    f=  4.33323D-02    |proj g|=  2.89405D-02

At iterate  148    f=  4.27415D-02    |proj g|=  2.32926D-02

At iterate  149    f=  4.22661D-02    |proj g|=  1.40558D-02

At iterate  150    f=  4.15041D-02    |proj g|=  2.78223D-02

At iterate  151    f=  4.10327D-02    |proj g|=  3.42134D-02

At iterate  152    f=  4.04825D-02    |proj g|=  1.46247D-02

At iterate  153    f=  3.99688D-02    |proj g|=  1.50799D-02

At iterate  154    f=  3.92599D-02    |proj g|=  3.03954D-02

At iterate  155    f=  3.87325D-02    |proj g|=  3.94113D-02

At iterate  156    f=  3.82362D-02    |proj g|=  1.92568D-02

At iterate  157    f=  3.78261D-02    |proj g|=  2.52130D-02

At iterate  158    f=  3.74526D-02    |proj g|=  1.04380D-02

At iterate  159    f=  3.71541D-02    |proj g|=  7.46592D-03

At iterate  160    f=  3.66250D-02    |proj g|=  3.09476D-02

At iterate  161    f=  3.63506D-02    |proj g|=  1.77655D-02

At iterate  162    f=  3.61100D-02    |proj g|=  9.33842D-03

At iterate  163    f=  3.58422D-02    |proj g|=  7.39890D-03

At iterate  164    f=  3.55588D-02    |proj g|=  2.92890D-02

At iterate  165    f=  3.52234D-02    |proj g|=  2.46712D-02

At iterate  166    f=  3.49777D-02    |proj g|=  1.39711D-02

At iterate  167    f=  3.45592D-02    |proj g|=  2.01480D-02

At iterate  168    f=  3.43682D-02    |proj g|=  7.03501D-02

At iterate  169    f=  3.39705D-02    |proj g|=  1.45501D-02

At iterate  170    f=  3.37866D-02    |proj g|=  1.00890D-02

At iterate  171    f=  3.35529D-02    |proj g|=  1.24861D-02

At iterate  172    f=  3.28207D-02    |proj g|=  3.24613D-02

At iterate  173    f=  3.25657D-02    |proj g|=  4.10267D-02

At iterate  174    f=  3.21485D-02    |proj g|=  1.85525D-02

At iterate  175    f=  3.15765D-02    |proj g|=  2.66676D-02

At iterate  176    f=  3.12651D-02    |proj g|=  5.69036D-02

At iterate  177    f=  3.07117D-02    |proj g|=  1.81783D-02

At iterate  178    f=  3.04437D-02    |proj g|=  1.15245D-02

At iterate  179    f=  2.98556D-02    |proj g|=  1.64261D-02

At iterate  180    f=  2.90769D-02    |proj g|=  2.22674D-02

At iterate  181    f=  2.89402D-02    |proj g|=  1.80972D-02

At iterate  182    f=  2.85405D-02    |proj g|=  1.21924D-02

At iterate  183    f=  2.81243D-02    |proj g|=  9.04513D-03

At iterate  184    f=  2.79002D-02    |proj g|=  4.38765D-02

At iterate  185    f=  2.75576D-02    |proj g|=  1.87747D-02

At iterate  186    f=  2.73610D-02    |proj g|=  1.85221D-02

At iterate  187    f=  2.67180D-02    |proj g|=  2.73868D-02

At iterate  188    f=  2.63216D-02    |proj g|=  2.76574D-02

At iterate  189    f=  2.61515D-02    |proj g|=  3.24705D-02

At iterate  190    f=  2.59423D-02    |proj g|=  1.59497D-02

At iterate  191    f=  2.54613D-02    |proj g|=  5.09686D-02

At iterate  192    f=  2.53987D-02    |proj g|=  6.70033D-02

At iterate  193    f=  2.50044D-02    |proj g|=  3.72615D-02

At iterate  194    f=  2.44712D-02    |proj g|=  1.94114D-02

At iterate  195    f=  2.37981D-02    |proj g|=  1.09713D-02

At iterate  196    f=  2.34430D-02    |proj g|=  7.57128D-03

At iterate  197    f=  2.31325D-02    |proj g|=  8.49478D-03

At iterate  198    f=  2.29015D-02    |proj g|=  1.37916D-02

At iterate  199    f=  2.26737D-02    |proj g|=  7.24115D-03

At iterate  200    f=  2.24585D-02    |proj g|=  6.13742D-03

At iterate  201    f=  2.21852D-02    |proj g|=  5.87103D-03

At iterate  202    f=  2.18573D-02    |proj g|=  9.87964D-03

At iterate  203    f=  2.15983D-02    |proj g|=  2.03374D-02

At iterate  204    f=  2.14041D-02    |proj g|=  5.84013D-03

At iterate  205    f=  2.12036D-02    |proj g|=  3.22615D-03

At iterate  206    f=  2.09255D-02    |proj g|=  4.57507D-03

At iterate  207    f=  2.08219D-02    |proj g|=  1.18319D-02

At iterate  208    f=  2.06510D-02    |proj g|=  5.55209D-03

At iterate  209    f=  2.05304D-02    |proj g|=  4.83881D-03

At iterate  210    f=  2.04403D-02    |proj g|=  5.98367D-03

At iterate  211    f=  2.03287D-02    |proj g|=  7.32337D-03

At iterate  212    f=  2.02253D-02    |proj g|=  5.00339D-03

At iterate  213    f=  2.00541D-02    |proj g|=  3.74653D-03

At iterate  214    f=  1.98635D-02    |proj g|=  2.71538D-03

At iterate  215    f=  1.97466D-02    |proj g|=  1.10610D-02

At iterate  216    f=  1.95787D-02    |proj g|=  5.29639D-03

At iterate  217    f=  1.94349D-02    |proj g|=  6.25969D-03

At iterate  218    f=  1.93098D-02    |proj g|=  4.20529D-03

At iterate  219    f=  1.92030D-02    |proj g|=  6.48101D-03

At iterate  220    f=  1.90631D-02    |proj g|=  6.11516D-03

At iterate  221    f=  1.88764D-02    |proj g|=  3.63345D-03

At iterate  222    f=  1.87004D-02    |proj g|=  2.82303D-03

At iterate  223    f=  1.85938D-02    |proj g|=  2.32498D-03

At iterate  224    f=  1.84318D-02    |proj g|=  1.00395D-02

At iterate  225    f=  1.83034D-02    |proj g|=  3.02143D-03

At iterate  226    f=  1.82256D-02    |proj g|=  2.83995D-03

At iterate  227    f=  1.80693D-02    |proj g|=  8.60365D-03

At iterate  228    f=  1.79449D-02    |proj g|=  9.86598D-03

At iterate  229    f=  1.78344D-02    |proj g|=  4.86177D-03

At iterate  230    f=  1.77450D-02    |proj g|=  2.64182D-03

At iterate  231    f=  1.75892D-02    |proj g|=  7.08056D-03

At iterate  232    f=  1.75129D-02    |proj g|=  6.44009D-03

At iterate  233    f=  1.74199D-02    |proj g|=  2.57001D-03

At iterate  234    f=  1.73812D-02    |proj g|=  1.66485D-03

At iterate  235    f=  1.73200D-02    |proj g|=  8.24379D-03

At iterate  236    f=  1.72320D-02    |proj g|=  2.72338D-03

At iterate  237    f=  1.71507D-02    |proj g|=  2.68199D-03

At iterate  238    f=  1.71202D-02    |proj g|=  3.38923D-03

At iterate  239    f=  1.70320D-02    |proj g|=  4.43150D-03

At iterate  240    f=  1.69531D-02    |proj g|=  5.73043D-03

At iterate  241    f=  1.68900D-02    |proj g|=  1.96023D-03

At iterate  242    f=  1.68107D-02    |proj g|=  3.89646D-03

At iterate  243    f=  1.67566D-02    |proj g|=  1.59855D-03
At iterate  244    f=  1.67101D-02    |proj g|=  2.44812D-03

At iterate  245    f=  1.66495D-02    |proj g|=  1.83545D-03

At iterate  246    f=  1.65450D-02    |proj g|=  1.02406D-03

At iterate  247    f=  1.64670D-02    |proj g|=  5.53368D-03

At iterate  248    f=  1.64051D-02    |proj g|=  4.97272D-03

At iterate  249    f=  1.63492D-02    |proj g|=  2.36781D-03

At iterate  250    f=  1.62687D-02    |proj g|=  1.35497D-03

At iterate  251    f=  1.62200D-02    |proj g|=  1.55589D-03

At iterate  252    f=  1.60781D-02    |proj g|=  4.10270D-03

At iterate  253    f=  1.60245D-02    |proj g|=  4.33171D-03

At iterate  254    f=  1.58950D-02    |proj g|=  4.80164D-03

At iterate  255    f=  1.57810D-02    |proj g|=  2.19162D-03

At iterate  256    f=  1.57246D-02    |proj g|=  2.04805D-03

At iterate  257    f=  1.57104D-02    |proj g|=  2.05273D-03

At iterate  258    f=  1.55950D-02    |proj g|=  1.43809D-02

At iterate  259    f=  1.55541D-02    |proj g|=  3.19328D-03

At iterate  260    f=  1.55157D-02    |proj g|=  6.22496D-03

At iterate  261    f=  1.54310D-02    |proj g|=  3.64952D-03

At iterate  262    f=  1.53471D-02    |proj g|=  3.61684D-03

At iterate  263    f=  1.52756D-02    |proj g|=  9.34025D-03

At iterate  264    f=  1.51270D-02    |proj g|=  2.87295D-03

At iterate  265    f=  1.49316D-02    |proj g|=  4.78373D-03

At iterate  266    f=  1.48503D-02    |proj g|=  9.52904D-03

At iterate  267    f=  1.47532D-02    |proj g|=  8.06706D-03

At iterate  268    f=  1.46582D-02    |proj g|=  6.11370D-03

At iterate  269    f=  1.44567D-02    |proj g|=  3.88189D-03

At iterate  270    f=  1.39819D-02    |proj g|=  6.25064D-03

At iterate  271    f=  1.39233D-02    |proj g|=  1.15165D-02

At iterate  272    f=  1.35994D-02    |proj g|=  6.00452D-03

At iterate  273    f=  1.34387D-02    |proj g|=  3.54468D-03

At iterate  274    f=  1.34139D-02    |proj g|=  3.17803D-03

At iterate  275    f=  1.33967D-02    |proj g|=  2.97319D-03

At iterate  276    f=  1.33960D-02    |proj g|=  2.96746D-03

At iterate  277    f=  1.33959D-02    |proj g|=  2.96701D-03

At iterate  278    f=  1.33959D-02    |proj g|=  3.07989D-02

           * * *

Tit   = total number of iterations
Tnf   = total number of function evaluations
Tnint = total number of segments explored during Cauchy searches
Skip  = number of BFGS updates skipped
Nact  = number of active bounds at final generalized Cauchy point
Projg = norm of the final projected gradient
F     = final function value

           * * *

   N    Tit     Tnf  Tnint  Skip  Nact     Projg        F
 1210    278    350      1     0     0   3.080D-02   1.340D-02
  F =   1.3395897539264089E-002

CONVERGENCE: REL_REDUCTION_OF_F_<=_FACTR*EPSMCH             
 Warning:  more than 10 function and gradient
   evaluations in the last line search.  Termination
   may possibly be caused by a bad search direction.
0.9088888888888889

We can also see what happens if we increase the size of the hidden layer.

mlp = MLPClassifier(
  hidden_layer_sizes=(64),
  max_iter=500,
  alpha=1e-4,
  solver="lbfgs",
  verbose=10,
  random_state=1,
  learning_rate_init=0.1,
)
mlp.fit(X_train,y_train)
mlp.score(X_test,y_test)
RUNNING THE L-BFGS-B CODE

           * * *

Machine precision = 2.220D-16
 N =         4810     M =           10

At X0         0 variables are exactly at the bounds

At iterate    0    f=  1.04698D+01    |proj g|=  8.19142D+00

At iterate    1    f=  9.40960D+00    |proj g|=  4.38798D+00

At iterate    2    f=  8.49749D+00    |proj g|=  4.75479D+00

At iterate    3    f=  6.75773D+00    |proj g|=  3.06551D+00

At iterate    4    f=  5.05641D+00    |proj g|=  2.28021D+00

At iterate    5    f=  3.56259D+00    |proj g|=  2.84974D+00

At iterate    6    f=  2.18216D+00    |proj g|=  1.12799D+00

At iterate    7    f=  1.60033D+00    |proj g|=  8.24687D-01

At iterate    8    f=  1.15630D+00    |proj g|=  4.70730D-01

At iterate    9    f=  8.63591D-01    |proj g|=  3.43946D-01

At iterate   10    f=  5.82189D-01    |proj g|=  2.28631D-01

At iterate   11    f=  4.26479D-01    |proj g|=  4.49107D-01

At iterate   12    f=  2.89106D-01    |proj g|=  1.28311D-01

At iterate   13    f=  2.43976D-01    |proj g|=  9.22916D-02

At iterate   14    f=  2.01287D-01    |proj g|=  9.67775D-02

At iterate   15    f=  1.65463D-01    |proj g|=  1.51950D-01

At iterate   16    f=  1.35398D-01    |proj g|=  6.21650D-02

At iterate   17    f=  1.20337D-01    |proj g|=  5.09786D-02

At iterate   18    f=  9.60194D-02    |proj g|=  3.68845D-02
At iterate   19    f=  7.27684D-02    |proj g|=  6.54578D-02

At iterate   20    f=  5.59220D-02    |proj g|=  2.69822D-02

At iterate   21    f=  4.69470D-02    |proj g|=  3.54462D-02

At iterate   22    f=  3.75954D-02    |proj g|=  3.11227D-02

At iterate   23    f=  3.15628D-02    |proj g|=  8.07837D-02

At iterate   24    f=  2.29887D-02    |proj g|=  2.13118D-02

At iterate   25    f=  2.06107D-02    |proj g|=  1.45824D-02

At iterate   26    f=  1.56146D-02    |proj g|=  1.42335D-02

At iterate   27    f=  1.31944D-02    |proj g|=  6.38546D-02

At iterate   28    f=  1.00681D-02    |proj g|=  8.65419D-03

At iterate   29    f=  9.24231D-03    |proj g|=  8.10791D-03

At iterate   30    f=  7.53521D-03    |proj g|=  1.18460D-02

At iterate   31    f=  5.09595D-03    |proj g|=  8.65832D-03

At iterate   32    f=  3.43894D-03    |proj g|=  2.43464D-02

At iterate   33    f=  2.22405D-03    |proj g|=  7.13082D-03

At iterate   34    f=  1.92190D-03    |proj g|=  4.08154D-03

At iterate   35    f=  1.59503D-03    |proj g|=  1.84045D-03

At iterate   36    f=  1.21083D-03    |proj g|=  2.31800D-03

At iterate   37    f=  9.24447D-04    |proj g|=  8.08666D-03

At iterate   38    f=  5.97799D-04    |proj g|=  1.36987D-03

At iterate   39    f=  4.88723D-04    |proj g|=  8.23700D-04

At iterate   40    f=  3.42157D-04    |proj g|=  1.05708D-03

At iterate   41    f=  2.46765D-04    |proj g|=  1.98554D-03

At iterate   42    f=  1.75509D-04    |proj g|=  5.32612D-04

At iterate   43    f=  1.51778D-04    |proj g|=  5.12587D-04

At iterate   44    f=  1.19209D-04    |proj g|=  6.00315D-04

At iterate   45    f=  8.46890D-05    |proj g|=  4.46872D-04

At iterate   46    f=  5.41279D-05    |proj g|=  1.41700D-04

At iterate   47    f=  4.24791D-05    |proj g|=  8.82478D-05

           * * *

Tit   = total number of iterations
Tnf   = total number of function evaluations
Tnint = total number of segments explored during Cauchy searches
Skip  = number of BFGS updates skipped
Nact  = number of active bounds at final generalized Cauchy point
Projg = norm of the final projected gradient
F     = final function value

           * * *

   N    Tit     Tnf  Tnint  Skip  Nact     Projg        F
 4810     47     49      1     0     0   8.825D-05   4.248D-05
  F =   4.2479108727199123E-005

CONVERGENCE: NORM_OF_PROJECTED_GRADIENT_<=_PGTOL            
 This problem is unconstrained.
0.9666666666666667

We can compare it to SVM:

svm_clf = svm.SVC(gamma=0.001)
svm_clf.fit(X_train, y_train)
svm_clf.score(X_test,y_test)
0.9977777777777778

We can also have multiple hidden layers:

mlp = MLPClassifier(
  hidden_layer_sizes=(64,64),
  max_iter=500,
  alpha=1e-4,
  solver="lbfgs",
  verbose=10,
  random_state=1,
  learning_rate_init=0.1,
)
mlp.fit(X_train,y_train)
mlp.score(X_test,y_test)
RUNNING THE L-BFGS-B CODE

           * * *

Machine precision = 2.220D-16
 N =         8970     M =           10

At X0         0 variables are exactly at the bounds

At iterate    0    f=  7.71478D+00    |proj g|=  5.25609D+00

At iterate    1    f=  6.48500D+00    |proj g|=  5.26103D+00

At iterate    2    f=  4.42779D+00    |proj g|=  2.12085D+00

At iterate    3    f=  3.25797D+00    |proj g|=  1.33043D+00

At iterate    4    f=  2.52204D+00    |proj g|=  1.13814D+00

At iterate    5    f=  1.87228D+00    |proj g|=  7.27518D-01

At iterate    6    f=  1.43680D+00    |proj g|=  5.89927D-01

At iterate    7    f=  1.06527D+00    |proj g|=  3.23321D-01

At iterate    8    f=  7.68364D-01    |proj g|=  2.96892D-01

At iterate    9    f=  5.56608D-01    |proj g|=  3.29116D-01

At iterate   10    f=  4.20392D-01    |proj g|=  1.63444D-01

At iterate   11    f=  3.51388D-01    |proj g|=  2.15978D-01

At iterate   12    f=  2.86287D-01    |proj g|=  2.34987D-01

At iterate   13    f=  2.44876D-01    |proj g|=  1.21928D-01

At iterate   14    f=  2.11853D-01    |proj g|=  1.46749D-01

At iterate   15    f=  1.84060D-01    |proj g|=  1.11867D-01

At iterate   16    f=  1.57790D-01    |proj g|=  3.51106D-01

At iterate   17    f=  1.26887D-01    |proj g|=  6.86937D-02

At iterate   18    f=  1.12359D-01    |proj g|=  5.41308D-02

At iterate   19    f=  9.21238D-02    |proj g|=  6.71390D-02

At iterate   20    f=  7.64971D-02    |proj g|=  1.31449D-01

At iterate   21    f=  6.40522D-02    |proj g|=  4.30071D-02

At iterate   22    f=  5.59814D-02    |proj g|=  5.55424D-02

At iterate   23    f=  4.58934D-02    |proj g|=  6.20258D-02

At iterate   24    f=  4.10159D-02    |proj g|=  1.03215D-01

At iterate   25    f=  3.29265D-02    |proj g|=  2.51322D-02

At iterate   26    f=  3.07147D-02    |proj g|=  1.81659D-02

At iterate   27    f=  2.61044D-02    |proj g|=  2.45009D-02

At iterate   28    f=  2.06602D-02    |proj g|=  6.99235D-02

At iterate   29    f=  1.54648D-02    |proj g|=  2.28806D-02

At iterate   30    f=  1.28063D-02    |proj g|=  1.30095D-02

At iterate   31    f=  9.76136D-03    |proj g|=  2.80805D-02

At iterate   32    f=  8.12416D-03    |proj g|=  2.98110D-02

At iterate   33    f=  7.00231D-03    |proj g|=  1.50257D-02

At iterate   34    f=  6.07564D-03    |proj g|=  9.89339D-03

At iterate   35    f=  5.17405D-03    |proj g|=  1.33440D-02

At iterate   36    f=  4.42905D-03    |proj g|=  2.16746D-02

At iterate   37    f=  3.62284D-03    |proj g|=  7.65736D-03

At iterate   38    f=  3.27339D-03    |proj g|=  7.76222D-03

At iterate   39    f=  2.83135D-03    |proj g|=  4.44776D-03

At iterate   40    f=  2.29191D-03    |proj g|=  4.95359D-03

At iterate   41    f=  2.13002D-03    |proj g|=  1.61772D-02

At iterate   42    f=  1.60152D-03    |proj g|=  3.85109D-03

At iterate   43    f=  1.34752D-03    |proj g|=  3.42780D-03

At iterate   44    f=  1.00925D-03    |proj g|=  3.46245D-03
 This problem is unconstrained.
At iterate   45    f=  9.90281D-04    |proj g|=  1.47225D-02

At iterate   46    f=  6.17332D-04    |proj g|=  2.72747D-03

At iterate   47    f=  5.36680D-04    |proj g|=  2.01053D-03

At iterate   48    f=  4.10609D-04    |proj g|=  1.22670D-03

At iterate   49    f=  3.21408D-04    |proj g|=  1.10501D-03

At iterate   50    f=  2.29149D-04    |proj g|=  1.13645D-03

At iterate   51    f=  1.85399D-04    |proj g|=  1.16249D-03

At iterate   52    f=  1.53734D-04    |proj g|=  5.55915D-04

At iterate   53    f=  1.20287D-04    |proj g|=  5.60460D-04

At iterate   54    f=  8.44474D-05    |proj g|=  5.84243D-04

At iterate   55    f=  5.46868D-05    |proj g|=  4.53775D-04

At iterate   56    f=  3.29017D-05    |proj g|=  2.68175D-04

At iterate   57    f=  2.41188D-05    |proj g|=  9.86959D-05

           * * *

Tit   = total number of iterations
Tnf   = total number of function evaluations
Tnint = total number of segments explored during Cauchy searches
Skip  = number of BFGS updates skipped
Nact  = number of active bounds at final generalized Cauchy point
Projg = norm of the final projected gradient
F     = final function value

           * * *

   N    Tit     Tnf  Tnint  Skip  Nact     Projg        F
 8970     57     59      1     0     0   9.870D-05   2.412D-05
  F =   2.4118825152332589E-005

CONVERGENCE: NORM_OF_PROJECTED_GRADIENT_<=_PGTOL            
0.9666666666666667

We saw that the SVM performed a bit better, but this is a simple problem. We can also compare these based on much they store, the number of parameters is realted to the complexity.

svm_clf.support_vectors_.shape
(692, 64)
[c.shape for c in mlp.coefs_]
[(64, 64), (64, 64), (64, 10)]
np.prod(list(svm_clf.support_vectors_.shape))
44288
np.sum([np.prod(list(c.shape)) for c in mlp.coefs_])
8832

We see this is much smaler.

32.1. Questions after class#

32.1.1. How can we use this in our assignment?#

You do not have to, but you could try an MLP in your assignment, but all that is required is any classifier and a text representation.

32.1.2. How do we know how to change the parameters?#

If it doesn’t work well, trying more layers or bigger layers is a good idea.
Neural nets work like a black box, they’re hard to interpret, so while there are good heuristics, there isn’t as solid theory for how to know what do to with them.

They work well, but because they’re hard to understand, that’s a risk of using them.