Deep Learning Notes 1 – NN

Resource

 

Introduction

Neural networks are important for deep learning. A nueral network model comprises multiple layers of logistic regression models, as shown in the figure below.

nn

In the figure above, w_{ij}^{(l)} denotes the weight from neuron i to neuron j in layer l = 0,1,2...L, the index i refers to a source node and j referes to a target node. d^{(l)} denotes the number of neurons in layer l = 0,1,2...L.

Forward Propagation

For a given input vectorx^{(0)} (x_0^{(0)}=1 as a bias ), which is sent to a neural network from input layer, the information will be sent and transformed layer by layer from input layer to output layer. In output layer, it yields a output variable x_1^{(L)} in case of binary classification and regression (only one neuron in output layer), it yields multiple vectors in case of multiple classification (multiple neurons in output layer).

For each neuron, the process of information tranform comprises two steps shown as below:

  • Generate activations  a_j^{(l)}=\sum_{i=1}^{d^{(l-1)}} w_{ij}^{(l)}x_i^{(l-1)}
  • Activate/transformx_j^{(l)}=f(a_j^{(l)}), generally, use one of activate functions below:

Logistic Sigmod functionf(a)=\frac{1}{1+e^{-a}}

Hyperbolic Tangent function (tanh)f(a)=\frac{e^{a}-e^{-a}}{e^{a}+e^{-a}}

In case of regression, the transform in output layer uses identity function(x_1^{L}=a_1^{L}).

Back Propagation

Consider the Error function in output layer (L) with d(L) nodes (in our example, d(L)=1 ). And assume that there is M training examples.

  • E=\frac{1}{M}\sum_{m=1}^{M}e_m
  • (omit subscript m) e=\frac{1}{2}\sum_{k=1}^{d(L)}(t_k-x_k^{(L)})^2
  • e=\frac{1}{2}\sum_{k=1}^{d(L)}(t_k-f(a_k^{(L)}))^2
  • a_k^{(L)}=\sum_{n=0}^{d(L-1)}x_n^{(L-1)}w_{nk}^{(L)}

We can see that e is a function of functions of a_1^{(L)},…,a_k^{(L)}, a_k^{(L)} is a function of x_{0}^{(L-1)},...,x_{n}^{(L-1)}, w_{0k}^{(L)},...,w_{nk}^{(L)}, thus, we can apply the chain rule to calculate the partial derivatives.

  • \frac{\partial e}{\partial w_{nk}^{(L)}}=\frac{\partial e}{\partial a_k^{(L)}}\frac{\partial a_k^{(L)}}{w_{nk}^{(L)}}

We use \delta _k^{(l)} to refer to the error ratio on the neuron a_k^{(l)} in layer l, (the output layer l = L)

  • \delta _k^{(l)}=\frac{\partial e}{\partial a_k^{(l)}}

Consider the Error function from output layer to hidden layer, we can find that e is a function of functions of a_1^{(L)},...,a_k^{(L)}, a_k^{(L)} is a function of x_{0}^{(L-1)},...,x_{n}^{(L-1)}, w_{0k}^{(L)},...,w_{nk}^{(L)}, x_{n}^{(L-1)} is a function of a_n^{(L-1)}, as shown in the figure below.

nn2
\delta _n^{(L-1)}=\frac{\partial e}{\partial a_n^{(L-1)}}=\sum_{k=1}^{d(L)} \frac{\partial e}{\partial a_k^{(L)}}\frac{\partial a_k^{(L)}}{\partial x_{n}^{(L-1)}}\frac{\partial x_{n}^{(L-1)}}{\partial a_n^{(L-1)}}=\frac{\partial x_{n}^{(L-1)}}{\partial a_n^{(L-1)}}\sum_{k=1}^{d(L)} \frac{\partial e}{\partial a_k^{(L)}}\frac{\partial a_k^{(L)}}{\partial x_{n}^{(L-1)}}=f'(a_n^{(L-1)})\sum_{k=1}^{d(L)}\delta _k^{(L)}w_{nk}^{(L)}

One thought on “Deep Learning Notes 1 – NN

Leave a comment