I won't go much into the theoretical details of neural networks as they are covered exhaustingly elsewhere. There are plenty of resources you can check out: books, videos, online material,... whatever you like. Following a list of books I can recommend that cover (not solely) NN:
- Pattern Recognition and Machine Learning (Bishop)
- Pattern Classification (Duda, Hart, Stork)
- Neural Networks for Pattern Recognition (Bishop)
(I don't own this one myself, but I heard only good things...) - The Elements of Statistical Learning (Hastie, Tibshirani, Friedman)
The Setting
The picture below shows a schematic diagram of a NN, taken from Bishop's PRML book, as I will mostly stick to his nomenclature (the image can be found here):On the left side is the input of dimension D, in the middle is a so called hidden layer of dimension M and on the right side is the output (of dimension K). In the picture there is only one hidden layer, but there can be any number of them in a network; and they can all be (and usually are) of different dimensions.
The solid blue nodes x0 and z0 are always set to 1.0. It is a common technique to absorb the bias parameters into the weights because this allows for a nicer and simplified mathematical notation. Instead of
with wj0 as the bias we get a more compact
The overall network function for one output node in a network with one hidden layer becomes then:
with x denoting the input vector, w the weights, h and sigma are nonlinear activation functions, often these are the tanh or the logistic sigmoid function.
For every additional hidden layer there will be another nested h(...), e.g. for the xi in the above equation one can put another h(...) and so on. From that standpoint we can see that a network is just a successive composition of functional transformations (or activations), and that's pretty much what function composition is about.
This process is also called forward propagation and is most likely the reason why these networks are called feed-forward neural networks.
Math.Net Numerics for Matrix operations
In order to implement the code not only for a single output node but for all nodes at once, we have to go from simple inner vector products to a matrix-vector-multiplication. To make this possible without first implementing matrices, vectors and stuff I will make use of the math.net numerics library that does - among many other useful things - exactly that. There is even a nuget package with some F# modules for the library that make it fun to play with linear algebra in F#.For instance, a matrix vector multiplication looks as follows:
open MathNet.Numerics.LinearAlgebra.Double open MathNet.Numerics.LinearAlgebra.Generic let m = matrix [ [1.0; 1.0]; [0.0; 1.0] ] let v = vector [ 5.5; 3.2 ] let res = m * vWhen we run that snippet in F# Interactive we get back the expected (two-dimensional) vector:
> res;; val it : Vector= seq [8.7; 3.2]
To the code
Ok, let's try to work towards a functional implementation of a neural network in F#. First, in order to handle the bias as part of the input vectors, we need a small helper function that simply takes a d-dimensional vector and adds one dimension with value 1.0 -prependForBias
does just that:let prepend value (vec : Vector<float>) = vector [ yield! value :: (vec |> Vector.toList) ] let prependForBias : Vector<float> -> Vector<float> = prepend 1.0Now, to compute the outputs of an entire layer, we construct a function that takes as arguments an activation function, a weights matrix and an input vector. the function multiplies the weights with the inputs and applies the activation function to the resulting vector. Again, in order to handle the bias values, the function adds an 1.0 entry to the vector
let layer f (weights : Matrix<float>) (inputs : Vector<float>) = (weights * inputs) |> Vector.map f |> prependForBias
Sticking together a (3-4-2) network
With thelayer
function we can already build more complex NNs by just composing multiple layers; layer
computes the input vector that is fed - along with a weights matrix and another activation function - into the layer
function of the next layer.Consider a simple (3-4-2) two-layer network consisting of 3 input nodes, one hidden layer with 4 nodes and an output layer with 2 nodes. Let the sigmoid function be the activation function for both the hidden and the output layer. The matrices WMD and WKM below each contain the weights and the biases, with the first column being the bias column. The code to stick together the NN and initialize it with (randomly chosen) weights and biases looks like:
/// the activation function
let sigmoid x = 1.0 / (1.0 + exp(-x))
/// weight/bias matrix from input -> hidden layer
let wmd = matrix [ [-1.0; 0.1; 0.3; 0.7];
[-2.0; 0.3; 0.2; 1.0];
[-1.0; 0.8; 0.6; 0.6];
[-2.0; 0.6; 0.3; 0.5]; ]
/// weight/bias matrix from hidden layer -> output
let wkm = matrix [ [-1.5; 0.5; 0.5; 0.5; 0.5];
[-2.5; 0.5; 0.5; 0.5; 0.5]; ]
/// build the network by means of function composition
let twoLayerNetwork : (Vector<float> -> Vector<float>) =
layer sigmoid wmd >> layer sigmoid wkm
/// shield the user from handling the bias
let compute (input : Vector<float>) =
(twoLayerNetwork <| prependForBias input).[ 1 .. ]
Notice that the actual network construction is only one line of code above (highlighted). To compute the output for a given input vector like for instance [1.0; 2.0; 3.0]
, we have to write:
let out = compute (vector [1.0; 2.0; 3.0])Looking at the result in F# Interactive:
> out;; val it : Vector<float> = seq [0.5392375632; 0.3009608905]
One little generalization
To further generalize the composition of a network we can write a function that builds it recursively:let network (args : (Matrix<float> * (float -> float)) list) = let rec loop ls accfun = match ls with | [] -> accfun | (weights,func) :: rest -> loop rest (accfun >> layer func weights) fun xs -> (loop args id) xsWith the
network
function at hand, building up the same two-layer NN as above would look as follows:
/// list of weigths and activation function per layer let twoLayerNnArgs = [ (wmd, sigmoid); (wkm, sigmoid) ] let twoLayerNetwork : (Vector<float> -> Vector<float>) = network twoLayerNnArgsThe
compute
function looks exactly the same as above and will yield the exact same result [0.53924; 0.30096]
.The Compositional Nature of Neural Networks
To summarize: Thanks to the compositional nature of the neural network internals the functional approach fits really nicely. To stress that point a bit: it took about 15 lines of succinct code to build an entire multilayer neural network and its feed forward computation, a little bit more with the generalization of thenetwork
function. Sure, I'm not counting infrastructure code for activation functions, matrices and so on, but still...Anyway, compared to what still has to come, this was the easy part. But it is a promising start.
G'day,
ReplyDeleteThis is really great. I'd love to include this content (with attribution, links etc.) for the Machine Learning Samples in the Tsunami IDE.
Please let me know if that would be ok.
Cheers,
Matt
Hi Matt,
DeleteThank you, I'm glad you like it. Please feel free to use it (with attribution, links and so as you mentioned). Makes me happy to be part of the ML samples. Can you somehow show me the results?
Yours,
Reto