devexploria: Functional Feed-forward Neural Networks Part I: Setting it up

This is the first of several posts in which I will go into some subjects concerning artificial neural networks (NN) and their functional implementation. Today, I will set up a classical feed-forward neural network. In future posts I will show how to train and use such a network.
I won't go much into the theoretical details of neural networks as they are covered exhaustingly elsewhere. There are plenty of resources you can check out: books, videos, online material,... whatever you like. Following a list of books I can recommend that cover (not solely) NN:

Pattern Recognition and Machine Learning (Bishop)
Pattern Classification (Duda, Hart, Stork)
Neural Networks for Pattern Recognition (Bishop)
(I don't own this one myself, but I heard only good things...)
The Elements of Statistical Learning (Hastie, Tibshirani, Friedman)

Although there is a lot of discussion going on about NN and their widespread use in the fields of Machine Learning, Data Mining, Computational Statistics, Data Analysis and so on I've seldom seen them in conjunction with functional approaches. That's why I was wondering how they would fit.

The Setting

The picture below shows a schematic diagram of a NN, taken from Bishop's PRML book, as I will mostly stick to his nomenclature (the image can be found here):

On the left side is the input of dimension D, in the middle is a so called hidden layer of dimension M and on the right side is the output (of dimension K). In the picture there is only one hidden layer, but there can be any number of them in a network; and they can all be (and usually are) of different dimensions.

The solid blue nodes x₀ and z₀ are always set to 1.0. It is a common technique to absorb the bias parameters into the weights because this allows for a nicer and simplified mathematical notation. Instead of

with w_j0 as the bias we get a more compact

The overall network function for one output node in a network with one hidden layer becomes then:

with x denoting the input vector, w the weights, h and sigma are nonlinear activation functions, often these are the tanh or the logistic sigmoid function.
For every additional hidden layer there will be another nested h(...), e.g. for the x_i in the above equation one can put another h(...) and so on. From that standpoint we can see that a network is just a successive composition of functional transformations (or activations), and that's pretty much what function composition is about.
This process is also called forward propagation and is most likely the reason why these networks are called feed-forward neural networks.

Math.Net Numerics for Matrix operations

In order to implement the code not only for a single output node but for all nodes at once, we have to go from simple inner vector products to a matrix-vector-multiplication. To make this possible without first implementing matrices, vectors and stuff I will make use of the math.net numerics library that does - among many other useful things - exactly that. There is even a nuget package with some F# modules for the library that make it fun to play with linear algebra in F#.
For instance, a matrix vector multiplication looks as follows:

open MathNet.Numerics.LinearAlgebra.Double
open MathNet.Numerics.LinearAlgebra.Generic

let m = matrix [ [1.0; 1.0]; [0.0; 1.0] ]
let v = vector [ 5.5; 3.2 ]
let res = m * v

When we run that snippet in F# Interactive we get back the expected (two-dimensional) vector:

> res;;
val it : Vector = seq [8.7; 3.2]

To the code

Ok, let's try to work towards a functional implementation of a neural network in F#. First, in order to handle the bias as part of the input vectors, we need a small helper function that simply takes a d-dimensional vector and adds one dimension with value 1.0 - prependForBias does just that:

let prepend value (vec : Vector<float>) = 
    vector [ yield! value :: (vec |> Vector.toList) ]

let prependForBias : Vector<float> -> Vector<float> = prepend 1.0

Now, to compute the outputs of an entire layer, we construct a function that takes as arguments an activation function, a weights matrix and an input vector. the function multiplies the weights with the inputs and applies the activation function to the resulting vector. Again, in order to handle the bias values, the function adds an 1.0 entry to the vector

let layer f (weights : Matrix<float>) (inputs : Vector<float>) =
    (weights * inputs) |> Vector.map f |> prependForBias

Sticking together a (3-4-2) network

With the layer function we can already build more complex NNs by just composing multiple layers; layer computes the input vector that is fed - along with a weights matrix and another activation function - into the layer function of the next layer.
Consider a simple (3-4-2) two-layer network consisting of 3 input nodes, one hidden layer with 4 nodes and an output layer with 2 nodes. Let the sigmoid function be the activation function for both the hidden and the output layer. The matrices W_MD and W_KM below each contain the weights and the biases, with the first column being the bias column. The code to stick together the NN and initialize it with (randomly chosen) weights and biases looks like:

/// the activation function
let sigmoid x = 1.0 / (1.0 + exp(-x))

/// weight/bias matrix from input -> hidden layer
let wmd = matrix [ [-1.0; 0.1; 0.3; 0.7];
                   [-2.0; 0.3; 0.2; 1.0]; 
                   [-1.0; 0.8; 0.6; 0.6]; 
                   [-2.0; 0.6; 0.3; 0.5]; ]

/// weight/bias matrix from hidden layer -> output
let wkm = matrix [ [-1.5; 0.5; 0.5; 0.5; 0.5];
                   [-2.5; 0.5; 0.5; 0.5; 0.5]; ]

/// build the network by means of function composition
let twoLayerNetwork : (Vector<float> -> Vector<float>) = 
    layer sigmoid wmd >> layer sigmoid wkm

/// shield the user from handling the bias
let compute (input : Vector<float>) =
    (twoLayerNetwork <| prependForBias input).[ 1 .. ]

Notice that the actual network construction is only one line of code above (highlighted). To compute the output for a given input vector like for instance [1.0; 2.0; 3.0], we have to write:

let out = compute (vector [1.0; 2.0; 3.0])

Looking at the result in F# Interactive:

> out;;
val it : Vector<float> = seq [0.5392375632; 0.3009608905]

One little generalization

To further generalize the composition of a network we can write a function that builds it recursively:

let network (args : (Matrix<float> * (float -> float)) list) = 
    let rec loop ls accfun =
        match ls with
        | [] -> accfun
        | (weights,func) :: rest 
             -> loop rest (accfun >> layer func weights)
    fun xs -> (loop args id) xs

With the network function at hand, building up the same two-layer NN as above would look as follows:

/// list of weigths and activation function per layer
let twoLayerNnArgs = [ (wmd, sigmoid); (wkm, sigmoid) ]

let twoLayerNetwork : (Vector<float> -> Vector<float>) = 
    network twoLayerNnArgs

The compute function looks exactly the same as above and will yield the exact same result [0.53924; 0.30096].

The Compositional Nature of Neural Networks

To summarize: Thanks to the compositional nature of the neural network internals the functional approach fits really nicely. To stress that point a bit: it took about 15 lines of succinct code to build an entire multilayer neural network and its feed forward computation, a little bit more with the generalization of the network function. Sure, I'm not counting infrastructure code for activation functions, matrices and so on, but still...
Anyway, compared to what still has to come, this was the easy part. But it is a promising start.

What's next?

Probably the most interesting part is still missing: learning the weights. Instead of guessing them as we did here we want the network itself to decide about the best weights. I will look into that next time.

devexploria

Tuesday, December 18, 2012

Functional Feed-forward Neural Networks Part I: Setting it up