I won't go much into the theoretical details of neural networks as they are covered exhaustingly elsewhere. There are plenty of resources you can check out: books, videos, online material,... whatever you like. Following a list of books I can recommend that cover (not solely) NN:

- Pattern Recognition and Machine Learning (Bishop)
- Pattern Classification (Duda, Hart, Stork)
- Neural Networks for Pattern Recognition (Bishop)

(I don't own this one myself, but I heard only good things...) - The Elements of Statistical Learning (Hastie, Tibshirani, Friedman)

### The Setting

The picture below shows a schematic diagram of a NN, taken from Bishop's PRML book, as I will mostly stick to his nomenclature (the image can be found here):On the left side is the input of dimension D, in the middle is a so called hidden layer of dimension M and on the right side is the output (of dimension K). In the picture there is only one hidden layer, but there can be any number of them in a network; and they can all be (and usually are) of different dimensions.

The solid blue nodes x

_{0}and z

_{0}are always set to 1.0. It is a common technique to absorb the bias parameters into the weights because this allows for a nicer and simplified mathematical notation. Instead of

with w

_{j0}as the bias we get a more compact

The overall network function for one output node in a network with one hidden layer becomes then:

with

**x**denoting the input vector,

**w**the weights,

**h**and

**sigma**are nonlinear activation functions, often these are the

**tanh**or the

**logistic sigmoid**function.

For every additional hidden layer there will be another nested

**h(...)**, e.g. for the x

_{i}in the above equation one can put another h(...) and so on. From that standpoint we can see that a network is just a successive composition of functional transformations (or activations), and that's pretty much what function composition is about.

This process is also called

**forward propagation**and is most likely the reason why these networks are called

**feed-forward neural networks**.

### Math.Net Numerics for Matrix operations

In order to implement the code not only for a single output node but for all nodes at once, we have to go from simple inner vector products to a matrix-vector-multiplication. To make this possible without first implementing matrices, vectors and stuff I will make use of the**math.net numerics**library that does - among many other useful things - exactly that. There is even a nuget package with some F# modules for the library that make it fun to play with linear algebra in F#.

For instance, a matrix vector multiplication looks as follows:

open MathNet.Numerics.LinearAlgebra.Double open MathNet.Numerics.LinearAlgebra.Generic let m = matrix [ [1.0; 1.0]; [0.0; 1.0] ] let v = vector [ 5.5; 3.2 ] let res = m * vWhen we run that snippet in F# Interactive we get back the expected (two-dimensional) vector:

> res;; val it : Vector= seq [8.7; 3.2]

### To the code

Ok, let's try to work towards a functional implementation of a neural network in F#. First, in order to handle the bias as part of the input vectors, we need a small helper function that simply takes a d-dimensional vector and adds one dimension with value 1.0 -`prependForBias`

does just that:let prepend value (vec : Vector<float>) = vector [ yield! value :: (vec |> Vector.toList) ] let prependForBias : Vector<float> -> Vector<float> = prepend 1.0Now, to compute the outputs of an entire layer, we construct a function that takes as arguments an activation function, a weights matrix and an input vector. the function multiplies the weights with the inputs and applies the activation function to the resulting vector. Again, in order to handle the bias values, the function adds an 1.0 entry to the vector

let layer f (weights : Matrix<float>) (inputs : Vector<float>) = (weights * inputs) |> Vector.map f |> prependForBias

### Sticking together a (3-4-2) network

With the`layer`

function we can already build more complex NNs by just composing multiple layers; `layer`

computes the input vector that is fed - along with a weights matrix and another activation function - into the `layer`

function of the next layer.Consider a simple (3-4-2) two-layer network consisting of

**3**input nodes, one hidden layer with

**4**nodes and an output layer with

**2**nodes. Let the sigmoid function be the activation function for both the hidden and the output layer. The matrices W

_{MD}and W

_{KM}below each contain the weights and the biases, with the first column being the bias column. The code to stick together the NN and initialize it with (randomly chosen) weights and biases looks like:

```
/// the activation function
let sigmoid x = 1.0 / (1.0 + exp(-x))
/// weight/bias matrix from input -> hidden layer
let wmd = matrix [ [-1.0; 0.1; 0.3; 0.7];
[-2.0; 0.3; 0.2; 1.0];
[-1.0; 0.8; 0.6; 0.6];
[-2.0; 0.6; 0.3; 0.5]; ]
/// weight/bias matrix from hidden layer -> output
let wkm = matrix [ [-1.5; 0.5; 0.5; 0.5; 0.5];
[-2.5; 0.5; 0.5; 0.5; 0.5]; ]
/// build the network by means of function composition
let twoLayerNetwork : (Vector<float> -> Vector<float>) =
layer sigmoid wmd >> layer sigmoid wkm
/// shield the user from handling the bias
let compute (input : Vector<float>) =
(twoLayerNetwork <| prependForBias input).[ 1 .. ]
```

Notice that the actual network construction is only one line of code above (highlighted). To compute the output for a given input vector like for instance `[1.0; 2.0; 3.0]`

, we have to write:
let out = compute (vector [1.0; 2.0; 3.0])Looking at the result in F# Interactive:

> out;; val it : Vector<float> = seq [0.5392375632; 0.3009608905]

### One little generalization

To further generalize the composition of a network we can write a function that builds it recursively:let network (args : (Matrix<float> * (float -> float)) list) = let rec loop ls accfun = match ls with | [] -> accfun | (weights,func) :: rest -> loop rest (accfun >> layer func weights) fun xs -> (loop args id) xsWith the

`network`

function at hand, building up the same two-layer NN as above would look as follows:
/// list of weigths and activation function per layer let twoLayerNnArgs = [ (wmd, sigmoid); (wkm, sigmoid) ] let twoLayerNetwork : (Vector<float> -> Vector<float>) = network twoLayerNnArgsThe

`compute`

function looks exactly the same as above and will yield the exact same result `[0.53924; 0.30096]`

.### The Compositional Nature of Neural Networks

To summarize: Thanks to the compositional nature of the neural network internals the functional approach fits really nicely. To stress that point a bit: it took about 15 lines of succinct code to build an entire multilayer neural network and its feed forward computation, a little bit more with the generalization of the`network`

function. Sure, I'm not counting infrastructure code for activation functions, matrices and so on, but still...Anyway, compared to what still has to come, this was the easy part. But it is a promising start.

G'day,

ReplyDeleteThis is really great. I'd love to include this content (with attribution, links etc.) for the Machine Learning Samples in the Tsunami IDE.

Please let me know if that would be ok.

Cheers,

Matt

Hi Matt,

DeleteThank you, I'm glad you like it. Please feel free to use it (with attribution, links and so as you mentioned). Makes me happy to be part of the ML samples. Can you somehow show me the results?

Yours,

Reto