Deep Learning Notes 1 – NN

Resource

Introduction

Neural networks are important for deep learning. A nueral network model comprises multiple layers of logistic regression models, as shown in the figure below.

In the figure above, $w_{ij}^{(l)}$ denotes the weight from neuron i to neuron j in layer $l = 0,1,2...L$ , the index i refers to a source node and j referes to a target node. $d^{(l)}$ denotes the number of neurons in layer $l = 0,1,2...L$ .

Forward Propagation

For a given input vector $x^{(0)}$ ( $x_0^{(0)}=1$ as a bias ), which is sent to a neural network from input layer, the information will be sent and transformed layer by layer from input layer to output layer. In output layer, it yields a output variable $x_1^{(L)}$ in case of binary classification and regression (only one neuron in output layer), it yields multiple vectors in case of multiple classification (multiple neurons in output layer).

For each neuron, the process of information tranform comprises two steps shown as below:

Generate activations $a_j^{(l)}=\sum_{i=1}^{d^{(l-1)}} w_{ij}^{(l)}x_i^{(l-1)}$
Activate/transform $x_j^{(l)}=f(a_j^{(l)})$ , generally, use one of activate functions below:

Logistic Sigmod function $f(a)=\frac{1}{1+e^{-a}}$

Hyperbolic Tangent function (tanh) $f(a)=\frac{e^{a}-e^{-a}}{e^{a}+e^{-a}}$

In case of regression, the transform in output layer uses identity function( $x_1^{L}=a_1^{L}$ ).

Back Propagation

Consider the Error function in output layer (L) with d(L) nodes (in our example, d(L)=1 ). And assume that there is M training examples.

$E=\frac{1}{M}\sum_{m=1}^{M}e_m$
(omit subscript m) $e=\frac{1}{2}\sum_{k=1}^{d(L)}(t_k-x_k^{(L)})^2$
$e=\frac{1}{2}\sum_{k=1}^{d(L)}(t_k-f(a_k^{(L)}))^2$
$a_k^{(L)}=\sum_{n=0}^{d(L-1)}x_n^{(L-1)}w_{nk}^{(L)}$

We can see that e is a function of functions of $a_1^{(L)}$ ,…, $a_k^{(L)}$ , $a_k^{(L)}$ is a function of $x_{0}^{(L-1)},...,x_{n}^{(L-1)}, w_{0k}^{(L)},...,w_{nk}^{(L)}$ , thus, we can apply the chain rule to calculate the partial derivatives.

$\frac{\partial e}{\partial w_{nk}^{(L)}}=\frac{\partial e}{\partial a_k^{(L)}}\frac{\partial a_k^{(L)}}{w_{nk}^{(L)}}$

We use $\delta _k^{(l)}$ to refer to the error ratio on the neuron $a_k^{(l)}$ in layer l, (the output layer l = L)

$\delta _k^{(l)}=\frac{\partial e}{\partial a_k^{(l)}}$

Consider the Error function from output layer to hidden layer, we can find that e is a function of functions of $a_1^{(L)},...,a_k^{(L)}$ , $a_k^{(L)}$ is a function of $x_{0}^{(L-1)},...,x_{n}^{(L-1)}, w_{0k}^{(L)},...,w_{nk}^{(L)}$ , $x_{n}^{(L-1)}$ is a function of $a_n^{(L-1)}$ , as shown in the figure below.

nn2
$\delta _n^{(L-1)}=\frac{\partial e}{\partial a_n^{(L-1)}}=\sum_{k=1}^{d(L)} \frac{\partial e}{\partial a_k^{(L)}}\frac{\partial a_k^{(L)}}{\partial x_{n}^{(L-1)}}\frac{\partial x_{n}^{(L-1)}}{\partial a_n^{(L-1)}}=\frac{\partial x_{n}^{(L-1)}}{\partial a_n^{(L-1)}}\sum_{k=1}^{d(L)} \frac{\partial e}{\partial a_k^{(L)}}\frac{\partial a_k^{(L)}}{\partial x_{n}^{(L-1)}}=f'(a_n^{(L-1)})\sum_{k=1}^{d(L)}\delta _k^{(L)}w_{nk}^{(L)}$

refreshmindblog

Deep Learning Notes 1 – NN

Resource

Introduction

Forward Propagation

Back Propagation

One thought on “Deep Learning Notes 1 – NN”

Leave a comment Cancel reply

Resource

Introduction

Forward Propagation

Back Propagation

Share this:

Related

One thought on “Deep Learning Notes 1 – NN”

Leave a comment Cancel reply