Neural Networks
Contents
38. Neural Networks#
We started thinking about machine learning wiht the idea that the basic idea is that we assume that our target variable (\(y_i\)) is related to the features \(\mathbf{x}_i\) by some function (for sample \(i\)):
But we don’t know that function exactly, so we assume a type (a decision tree, a boundary for SVM, a probability distribution) that has some parameters \(\theta\) and then use a machine learning algorithm \(\mathcal{A}\) to estimate the parameters for \(f\). In the decision tree the parameters are the thresholds to compare to, in the GaussianNB the parameters are the mean and variance, in SVM it’s the support vectors that define the margin.
That we can use to test on our test data:
A neural net allows us to not assume a specific form for \(f\) first, it does universal function approximation. For one hidden layer and a binary classification problem:
where the function \(g\) is called the activation function. so we approximate some unknown, complicated function $f4 by taking a weighted sum of all of the inputs, and passing those through another, known function.
from sklearn.neural_network import MLPClassifier
from sklearn import svm
import pandas as pd
import sklearn
from sklearn import datasets
import matplotlib.pyplot as plt
from sklearn import model_selection
We’re going to use the digits dataset again.
digits = datasets.load_digits()
digits_X = digits.data
digits_y = digits.target
X_train, X_test, y_train, y_test = model_selection.train_test_split(digits_X,digits_y)
digits.images[0]
array([[ 0., 0., 5., 13., 9., 1., 0., 0.],
[ 0., 0., 13., 15., 10., 15., 5., 0.],
[ 0., 3., 15., 2., 0., 11., 8., 0.],
[ 0., 4., 12., 0., 0., 8., 8., 0.],
[ 0., 5., 8., 0., 0., 9., 8., 0.],
[ 0., 4., 11., 0., 1., 12., 7., 0.],
[ 0., 2., 14., 5., 10., 12., 0., 0.],
[ 0., 0., 6., 13., 10., 0., 0., 0.]])
Sklearn provides an estimator for the Multi-Llayer Perceptron (MLP). We can see one with one layer to start.
mlp = MLPClassifier(
hidden_layer_sizes=(16),
max_iter=100,
alpha=1e-4,
solver="lbfgs",
verbose=10,
random_state=1,
learning_rate_init=0.1,
)
mlp.fit(X_train,y_train).score(X_test,y_test)
RUNNING THE L-BFGS-B CODE
* * *
Machine precision = 2.220D-16
N = 1210 M = 10
At X0 0 variables are exactly at the bounds
At iterate 0 f= 9.38465D+00 |proj g|= 7.11984D+00
At iterate 1 f= 8.17092D+00 |proj g|= 7.15655D+00
At iterate 2 f= 3.24269D+00 |proj g|= 2.03901D+00
At iterate 3 f= 2.37778D+00 |proj g|= 4.04393D-01
At iterate 4 f= 2.27009D+00 |proj g|= 2.30412D-01
At iterate 5 f= 2.13336D+00 |proj g|= 2.77935D-01
At iterate 6 f= 1.98856D+00 |proj g|= 3.48912D-01
At iterate 7 f= 1.72317D+00 |proj g|= 5.67503D-01
At iterate 8 f= 1.58445D+00 |proj g|= 3.28824D-01
At iterate 9 f= 1.49734D+00 |proj g|= 3.93677D-01
At iterate 10 f= 1.39751D+00 |proj g|= 6.71191D-01
At iterate 11 f= 1.24519D+00 |proj g|= 6.21984D-01
At iterate 12 f= 1.15236D+00 |proj g|= 3.47640D-01
At iterate 13 f= 1.09906D+00 |proj g|= 3.37278D-01
At iterate 14 f= 1.05696D+00 |proj g|= 1.99040D-01
At iterate 15 f= 1.01247D+00 |proj g|= 3.01317D-01
At iterate 16 f= 9.81886D-01 |proj g|= 3.18408D-01
At iterate 17 f= 9.50129D-01 |proj g|= 1.37785D-01
At iterate 18 f= 8.86505D-01 |proj g|= 6.48561D-01
At iterate 19 f= 8.42838D-01 |proj g|= 4.68309D-01
At iterate 20 f= 8.14585D-01 |proj g|= 2.55414D-01
At iterate 21 f= 8.04329D-01 |proj g|= 1.39807D-01
At iterate 22 f= 7.92212D-01 |proj g|= 2.50519D-01
At iterate 23 f= 7.52172D-01 |proj g|= 3.55555D-01
At iterate 24 f= 7.12535D-01 |proj g|= 4.43539D-01
At iterate 25 f= 6.76793D-01 |proj g|= 3.37357D-01
At iterate 26 f= 6.30656D-01 |proj g|= 3.18045D-01
At iterate 27 f= 5.97827D-01 |proj g|= 2.18595D-01
At iterate 28 f= 5.56462D-01 |proj g|= 4.66162D-01
At iterate 29 f= 5.12931D-01 |proj g|= 5.42735D-01
At iterate 30 f= 4.75139D-01 |proj g|= 3.87003D-01
At iterate 31 f= 4.47944D-01 |proj g|= 1.68393D-01
At iterate 32 f= 3.99425D-01 |proj g|= 2.43929D-01
At iterate 33 f= 3.40320D-01 |proj g|= 2.72603D-01
At iterate 34 f= 2.96089D-01 |proj g|= 1.90089D-01
At iterate 35 f= 2.67168D-01 |proj g|= 1.90955D-01
At iterate 36 f= 2.50857D-01 |proj g|= 1.26385D-01
At iterate 37 f= 2.35988D-01 |proj g|= 1.17831D-01
At iterate 38 f= 2.18963D-01 |proj g|= 8.50825D-02
At iterate 39 f= 2.11071D-01 |proj g|= 1.77688D-01
At iterate 40 f= 1.98677D-01 |proj g|= 9.43588D-02
At iterate 41 f= 1.86685D-01 |proj g|= 1.66775D-01
At iterate 42 f= 1.77408D-01 |proj g|= 5.20009D-02
At iterate 43 f= 1.66641D-01 |proj g|= 7.74835D-02
At iterate 44 f= 1.58065D-01 |proj g|= 1.49709D-01
At iterate 45 f= 1.46905D-01 |proj g|= 8.88187D-02
At iterate 46 f= 1.36816D-01 |proj g|= 5.73866D-02
At iterate 47 f= 1.23886D-01 |proj g|= 7.55668D-02
At iterate 48 f= 1.15665D-01 |proj g|= 9.47044D-02
At iterate 49 f= 1.09152D-01 |proj g|= 4.45269D-02
At iterate 50 f= 1.05623D-01 |proj g|= 4.65699D-02
At iterate 51 f= 1.00235D-01 |proj g|= 5.09395D-02
At iterate 52 f= 9.48982D-02 |proj g|= 8.88693D-02
At iterate 53 f= 9.28047D-02 |proj g|= 1.12882D-01
At iterate 54 f= 8.96244D-02 |proj g|= 3.43012D-02
At iterate 55 f= 8.61751D-02 |proj g|= 3.49997D-02
At iterate 56 f= 8.31065D-02 |proj g|= 5.05298D-02
At iterate 57 f= 7.91082D-02 |proj g|= 9.86852D-02
At iterate 58 f= 7.52857D-02 |proj g|= 2.76082D-02
At iterate 59 f= 7.36021D-02 |proj g|= 2.15727D-02
At iterate 60 f= 7.14372D-02 |proj g|= 3.03184D-02
At iterate 61 f= 6.98529D-02 |proj g|= 4.84770D-02
At iterate 62 f= 6.77778D-02 |proj g|= 2.84951D-02
At iterate 63 f= 6.43985D-02 |proj g|= 3.16001D-02
At iterate 64 f= 6.18409D-02 |proj g|= 3.41687D-02
At iterate 65 f= 5.94981D-02 |proj g|= 6.72442D-02
At iterate 66 f= 5.63869D-02 |proj g|= 3.87181D-02
At iterate 67 f= 5.43224D-02 |proj g|= 2.40502D-02
At iterate 68 f= 5.04368D-02 |proj g|= 4.50986D-02
At iterate 69 f= 4.91181D-02 |proj g|= 4.22925D-02
At iterate 70 f= 4.82442D-02 |proj g|= 1.64042D-02
At iterate 71 f= 4.62087D-02 |proj g|= 1.91735D-02
At iterate 72 f= 4.53585D-02 |proj g|= 5.17174D-02
At iterate 73 f= 4.44345D-02 |proj g|= 2.37669D-02
At iterate 74 f= 4.35600D-02 |proj g|= 1.54700D-02
At iterate 75 f= 4.19505D-02 |proj g|= 4.78467D-02
At iterate 76 f= 4.01091D-02 |proj g|= 3.61117D-02
At iterate 77 f= 3.93031D-02 |proj g|= 5.45957D-02
At iterate 78 f= 3.77409D-02 |proj g|= 4.47122D-02
At iterate 79 f= 3.66431D-02 |proj g|= 2.68009D-02
At iterate 80 f= 3.61166D-02 |proj g|= 1.88642D-02
At iterate 81 f= 3.46562D-02 |proj g|= 3.54308D-02
At iterate 82 f= 3.32586D-02 |proj g|= 2.91068D-02
At iterate 83 f= 3.25174D-02 |proj g|= 1.45019D-01
At iterate 84 f= 2.97030D-02 |proj g|= 1.89358D-02
At iterate 85 f= 2.91339D-02 |proj g|= 1.45290D-02
At iterate 86 f= 2.82205D-02 |proj g|= 1.66224D-02
At iterate 87 f= 2.69000D-02 |proj g|= 4.25724D-02
At iterate 88 f= 2.53736D-02 |proj g|= 1.74524D-02
At iterate 89 f= 2.40236D-02 |proj g|= 2.30625D-02
At iterate 90 f= 2.33232D-02 |proj g|= 5.30508D-02
At iterate 91 f= 2.25585D-02 |proj g|= 1.73283D-02
At iterate 92 f= 2.17492D-02 |proj g|= 2.37386D-02
At iterate 93 f= 2.11327D-02 |proj g|= 3.03222D-02
At iterate 94 f= 2.05165D-02 |proj g|= 3.73192D-02
At iterate 95 f= 1.99713D-02 |proj g|= 1.13639D-02
At iterate 96 f= 1.94213D-02 |proj g|= 1.85706D-02
At iterate 97 f= 1.87514D-02 |proj g|= 4.22578D-02
At iterate 98 f= 1.77619D-02 |proj g|= 4.12577D-02
At iterate 99 f= 1.68314D-02 |proj g|= 3.94293D-02
At iterate 100 f= 1.61334D-02 |proj g|= 2.85155D-02
* * *
Tit = total number of iterations
Tnf = total number of function evaluations
Tnint = total number of segments explored during Cauchy searches
Skip = number of BFGS updates skipped
Nact = number of active bounds at final generalized Cauchy point
Projg = norm of the final projected gradient
F = final function value
* * *
N Tit Tnf Tnint Skip Nact Projg F
1210 100 105 1 0 0 2.852D-02 1.613D-02
F = 1.6133397523403377E-002
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT
This problem is unconstrained.
/opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:536: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
self.n_iter_ = _check_optimize_result("lbfgs", opt_res, self.max_iter)
0.9444444444444444
We can compare it to SVM:
svm_clf = svm.SVC(gamma=0.001)
svm_clf.fit(X_train, y_train)
svm_clf.score(X_test,y_test)
0.9911111111111112
We saw that the SVM performed a bit better, but this is a simple problem. We can also compare these based on much they store, the number of parameters is realted to the complexity.
import numpy as np
np.prod(list(svm_clf.support_vectors_.shape))
43840
np.sum([np.prod(list(c.shape)) for c in mlp.coefs_])
1184
mlp.coefs_
[array([[-0.04544804, 0.12067436, -0.27379334, ..., 0.20709945,
-0.25885548, 0.0933671 ],
[-0.04583192, 0.02999214, -0.18729936, ..., 0.20070956,
-0.22816233, -0.08203855],
[ 0.21594462, -0.03501911, -0.13935016, ..., -0.12740689,
0.11827951, 0.10964127],
...,
[-0.147186 , -0.04885168, -0.08565972, ..., 0.09101391,
0.06372955, -0.17216516],
[ 0.00779572, -0.10042942, -0.24737746, ..., -0.21098225,
-0.29069528, -0.3007683 ],
[-0.25377729, 0.10587514, -0.14924823, ..., 0.05460927,
-0.04542482, -0.41864772]]),
array([[-0.41943486, -0.00737221, -0.25013001, 0.439859 , 0.37306531,
0.11786679, 0.45738627, -0.14901288, 0.38219732, -0.05069864],
[-0.23985806, 0.43464955, 0.1786623 , -0.40418738, 0.28901034,
0.19216991, 0.10145914, 0.23048402, -0.22734463, 0.10672655],
[ 1.50728715, -0.60528337, -0.58972384, -0.6401797 , 0.56687058,
-0.09242594, 0.50447357, -0.42329376, -0.05893486, -0.33303548],
[-0.27191219, 0.43490182, -0.17180129, 0.38448361, -0.09324696,
0.0273306 , 0.17686513, -0.02678721, 0.18037779, 0.28704752],
[-0.21111149, 0.37659092, 0.05968257, -0.37111487, -0.2169511 ,
0.06768203, -0.24890928, -0.32179172, -0.32849203, 0.28098543],
[ 0.09756746, 0.44452722, -0.15026417, 0.10032271, 0.09480416,
0.11042807, -0.42391493, 0.23923038, 0.4291504 , 0.03247214],
[-0.2671188 , 0.68772797, -0.78955948, -0.51844567, 0.76283629,
-0.34335334, 0.64458257, 0.01396712, 0.55997088, 0.16110103],
[-0.50501192, 0.08995027, 1.07395013, 0.55929958, -1.41081918,
-0.70475564, 0.07336966, -0.90049891, 0.14576965, 0.00196221],
[-0.20436992, -0.44920319, -0.33073311, 0.62834909, -0.76827309,
1.41211618, 0.01292927, -0.18972977, 0.06599439, -0.23435678],
[-0.19058572, -0.08386045, -0.23341893, 0.21336841, 0.14390203,
-0.41810974, 0.41809338, 0.26053527, 0.22008943, 0.20386476],
[-0.3055057 , 0.08001114, 0.12541927, 0.10975577, 0.64811392,
-0.47326167, -0.57528549, 1.23272911, 0.01363304, -0.8735477 ],
[ 0.23720435, 0.25039731, 0.33801366, -0.23823171, -0.32028353,
0.04038248, -0.17781702, 0.27510892, 0.09953463, 0.22304349],
[ 0.11837086, -0.27071898, -0.35738878, -0.13505994, -0.20340629,
-0.18721021, 0.31019539, 0.07292583, -0.06197665, -0.3445675 ],
[-0.18990245, 0.05164007, -0.24995147, -0.40145973, -0.06863325,
0.25951727, -0.04901717, 0.2628618 , -0.01059058, 0.24381035],
[ 0.20932278, 0.110846 , -0.23817858, 0.06744052, 0.17817823,
-0.15697341, 0.42354465, 0.33805084, -0.12109949, 0.11158405],
[ 0.18276065, -0.03845522, -0.93979366, -0.29424286, 0.2612766 ,
-0.04896757, -0.16762708, -0.20991491, 0.4357143 , 1.33815728]])]
mlp64 = MLPClassifier(
hidden_layer_sizes=(64),
max_iter=100,
alpha=1e-4,
solver="lbfgs",
verbose=10,
random_state=1,
learning_rate_init=0.1,
)
mlp64.fit(X_train,y_train).score(X_test,y_test)
RUNNING THE L-BFGS-B CODE
* * *
Machine precision = 2.220D-16
N = 4810 M = 10
At X0 0 variables are exactly at the bounds
At iterate 0 f= 1.04302D+01 |proj g|= 8.27241D+00
At iterate 1 f= 9.71830D+00 |proj g|= 4.82977D+00
At iterate 2 f= 8.42003D+00 |proj g|= 4.51230D+00
At iterate 3 f= 7.00682D+00 |proj g|= 2.93108D+00
At iterate 4 f= 5.03981D+00 |proj g|= 2.34336D+00
At iterate 5 f= 3.73519D+00 |proj g|= 3.90926D+00
At iterate 6 f= 2.14804D+00 |proj g|= 1.00646D+00
At iterate 7 f= 1.70277D+00 |proj g|= 9.90044D-01
At iterate 8 f= 1.13413D+00 |proj g|= 8.83988D-01
At iterate 9 f= 8.57417D-01 |proj g|= 3.85550D-01
At iterate 10 f= 6.57149D-01 |proj g|= 1.93061D-01
At iterate 11 f= 4.93338D-01 |proj g|= 1.53888D-01
At iterate 12 f= 3.09465D-01 |proj g|= 2.08518D-01
At iterate 13 f= 2.19399D-01 |proj g|= 5.27747D-02
At iterate 14 f= 1.85512D-01 |proj g|= 1.10762D-01
At iterate 15 f= 1.48434D-01 |proj g|= 8.75330D-02
At iterate 16 f= 1.24102D-01 |proj g|= 4.39839D-02
At iterate 17 f= 1.01487D-01 |proj g|= 4.86889D-02
At iterate 18 f= 8.29490D-02 |proj g|= 1.49483D-01
At iterate 19 f= 6.69635D-02 |proj g|= 4.15894D-02
At iterate 20 f= 6.02523D-02 |proj g|= 2.71720D-02
At iterate 21 f= 5.17567D-02 |proj g|= 2.61958D-02
At iterate 22 f= 3.89563D-02 |proj g|= 4.77516D-02
At iterate 23 f= 3.42232D-02 |proj g|= 5.94463D-02
At iterate 24 f= 2.88778D-02 |proj g|= 2.07471D-02
At iterate 25 f= 2.53482D-02 |proj g|= 1.42178D-02
At iterate 26 f= 2.11186D-02 |proj g|= 2.77706D-02
At iterate 27 f= 1.76381D-02 |proj g|= 3.44683D-02
At iterate 28 f= 1.49355D-02 |proj g|= 1.24816D-02
At iterate 29 f= 1.25325D-02 |proj g|= 1.11177D-02
At iterate 30 f= 1.07200D-02 |proj g|= 1.64740D-02
At iterate 31 f= 9.42996D-03 |proj g|= 2.08513D-02
At iterate 32 f= 8.31343D-03 |proj g|= 9.77348D-03
At iterate 33 f= 7.17646D-03 |proj g|= 7.13752D-03
At iterate 34 f= 6.18960D-03 |proj g|= 9.10418D-03
At iterate 35 f= 4.40449D-03 |proj g|= 1.91317D-02
At iterate 36 f= 3.95283D-03 |proj g|= 2.28365D-02
At iterate 37 f= 2.91604D-03 |proj g|= 6.64352D-03
At iterate 38 f= 2.54560D-03 |proj g|= 4.86752D-03
At iterate 39 f= 1.91555D-03 |proj g|= 4.47351D-03
At iterate 40 f= 1.26359D-03 |proj g|= 3.86776D-03
At iterate 41 f= 8.54919D-04 |proj g|= 4.62880D-03
At iterate 42 f= 6.15180D-04 |proj g|= 1.76956D-03
At iterate 43 f= 5.09893D-04 |proj g|= 1.40789D-03
At iterate 44 f= 3.69196D-04 |proj g|= 1.03219D-03
At iterate 45 f= 2.98530D-04 |proj g|= 1.43634D-03
At iterate 46 f= 2.50175D-04 |proj g|= 8.15814D-04
At iterate 47 f= 1.97223D-04 |proj g|= 6.75762D-04
At iterate 48 f= 1.33989D-04 |proj g|= 4.20325D-04
At iterate 49 f= 9.62924D-05 |proj g|= 4.81079D-04
At iterate 50 f= 6.09419D-05 |proj g|= 1.89177D-04
At iterate 51 f= 4.67968D-05 |proj g|= 1.32816D-04
At iterate 52 f= 2.80361D-05 |proj g|= 1.34469D-04
At iterate 53 f= 1.84513D-05 |proj g|= 4.20248D-05
* * *
Tit = total number of iterations
Tnf = total number of function evaluations
Tnint = total number of segments explored during Cauchy searches
Skip = number of BFGS updates skipped
Nact = number of active bounds at final generalized Cauchy point
Projg = norm of the final projected gradient
F = final function value
* * *
N Tit Tnf Tnint Skip Nact Projg F
4810 53 54 1 0 0 4.202D-05 1.845D-05
F = 1.8451253494679939E-005
CONVERGENCE: NORM_OF_PROJECTED_GRADIENT_<=_PGTOL
This problem is unconstrained.
0.9777777777777777
38.1. Questions After Class#
38.1.1. Roughly, how does the model know to use certain functions as the fitting becomes more complex (e.g. sin(x), ln(x), e^x)?#
It does not learn an analytical form; it just approximates it.
38.1.2. when doing the .score on the mlp does the limit vary or does it have a set limit on its own?#
38.1.3. What is tensorflow used for that scikit cant do?#
Tensorflow can do more types of networks and has more options for training. Most importantly, it has code optmizations so that you can use more complex hardware directly.
38.1.4. when you say weight, what does that mean?#
Weights are coefficients, or the weight of that feature.
38.1.5. what is an artificial neuron?#
An artificial neuron is one “unit” of calculation. A neuron takes a weighted sum of all of its inputs (including a bias term) and passes it through an “activation function” that squashes the values of output into [0,1].
38.1.6. what real life problems require tensorflow?#
All modern ML applications are tensorflow, pytorch or similar.
38.1.8. What is the best way to optimize a neural net? would it be jut adding more layers?#
You could specify some of the parameters and use GridSearch as well. There are types of layers as well. We will see that later.
38.1.10. I’ve heard that cleaning data generally is a majority of a data scientists work is this generally true?#
38.1.11. What does it mean to “translate a jupyter notebook into python scripts”? what exactly are scripts?#
a script is a file that can be run non interactively. That is it can be run straight through without relying on any user input.
38.1.12. does jupyter notebook have to be used for data science or can we used other types of languages?#
You can use other languages and even use Python with a script or interactively in another IDE.
38.1.13. How are issues of privacy handled for people like Cass, some of the models they spoke about required a lot of personal data?#
They do not release the data to just anyone, but they do use a lot of personal data. Mostly, the release anonymized aggregated data so that it is not possible to find an individual. There are privacy and security procedures to protect the linked data and limit who has access to it.