Beginner's intro into Neural Networks and Machine Learning

Hi folks! today I am going to share with you some very interesting things about neural networks and machine learning. I believe you will find it interesting even if you are not a machine learning practitioner because I am not going to use fancy mathematical terms here. And most importantly, unless you have frozen your mind with unrealistic ideas and textual jargons, you can understand anything, because you are human - homo sapien - the most intelligent creation we know so far. And because history tells us that all great ideas came from simple thoughts, not from complex mathematical derivations.

### What is a neural network?

First of all, have a look at the pic above. It is a neural network telling us that the pic you have given me is that of a dog. It is a three-layered neural network(input layer, hidden layer and output layer). The question is can a three-layered neural network identify a dog? Yes. Of course(Very high accuracy networks could have hundreds of layers). The only approximation we have done in the above pic is the number of purple balls(called neurons) in each layer. Depending on the resolution of the pic, it could be in hundreds of thousands. For example, 64 ✕ 64 resolution pic will have 64 ✕ 64 = 4096 inputs in the input layer. So you see that each input is basically a pixel intensity value of the pic.

So we fed in the network the pixel intesity values and the network told us that it is a dog. How did the neural network do that? There is a lot of complicated details here. But don't worry, once you grab the core idea, you can work out the details yourself. Lot of posts out there in the internet are explaining the details hard which I think is not necessary and unuseful for an intelligent mind. So I only want you to get inspired and read on.

First of all, lets try to understand what do we mean by the neurons(purple balls) and the layers. Again, I am not going to give you bookish definitions. Because I hate jargons. Instead let's consider an example. This example requires something called a sigmoid function. You can easily understand by looking at the plot below:

You can see that for values of x to right of origin, the value of y is close to 1 and for values of x to the left of origin, the value of y is close to zero. This is the sigmoid function which can be expressed mathematically as:

$$S(x) = {1\over{1+e^{-x}}} = {e^x\over{e^x+1}}$$

Now lets consider the following the figure:

This is a basic(unit) neural network with an input and an output layer. The question here is what is the output node(neuron) doing? It is performing the following operation:

$$y = {S(1\times(-30) + 20x_1 + 20x_2)}$$

Here $$x_1$$ and $$x_2$$ are the inputs, +20 and +20 are weights and -30 is the bias. Forget about the terms weights and biases for now. Think of them as plain numbers we have chosen at random for the nodes(neurons) $$+1$$, $$x_1$$ and $$x_2$$.

Now here is the interesting thing: let's see what the network turns out to be if $$x_1$$ and $$x_2$$ are binary numbers i.e. $$x_1$$, $$x_2$$ $$\in$$ {0,1}

if $$x_1 = 0$$ and $$x_2 = 0$$, $$y = S(-30+0+0) = S(-30) = 0$$

if $$x_1 = 1$$ and $$x_2 = 0$$, $$y = S(-30+20+0) = S(-10) = 0$$

if $$x_1 = 0$$ and $$x_2 = 1$$, $$y = S(-30+0+20) = S(-10) = 0$$

if $$x_1 = 1$$ and $$x_2 = 1$$, $$y = S(-30+20+20) = S(10) = 1$$

And in tabular form, actually we have got the following:

XXY
000
010
100
111

And yes, this is the truth table of logical AND function. Thus our basic neural network with the parameter values $$-30$$, $$+20$$ and $$+20$$ computed the logical AND function.

And more interestingly, if you replace -30 with -10 in the above network, it becomes the logical OR function. Try it out yourself. Not just AND and OR, but also the other logical functions by trying different parameter values.

So what's the deal here? The deal is that the same unit network can be made to perform various functions just by tweaking the parameter values.

By now you shoud have an idea about what the neurons in the hidden layer part of our dog identifier network must be doing. Yes, each neuron is doing something similar to what our basic network is doing.

Now let's be clear about what are the weights and biases and what is an activation function. The output function of our basic network can be written as:

$$y = {S(w_0x_0 + w_1x_1 + w_2x_2)}$$ $$= {S(B + w_1x_1 + w_2x_2)}$$

where $$B = w_0x_0$$ is the bias term and $$w_1$$ and $$w_2$$ are the weights and $$x_1$$ and $$x_2$$ are the inputs. And the sigmoid function is the activation function here. Basically activation function produces the output of each neuron in the network. It may be noted that today we tend to prefer other activation functions like relu and sometimes softmax over sigmoid. We don't need to go into details of those at the moment.

Now let's try vectorising(representing in matrix form) the above equation. It is necessary because the whole operations taking place in neural networks is a bunch of matrix multiplications.

For our basic neural network, we can represent the inputs and the weights(including the bias) as follows:

$$X = \begin{bmatrix}x_0\\x_1\\x_2\end{bmatrix}, W = \begin{bmatrix}w_0\\w_1\\w_2\end{bmatrix}$$

Then our output equation becomes:

$$Y = \begin{bmatrix}x_0 & x_1 & x_2\end{bmatrix}\begin{bmatrix}w_0\\w_1\\w_2\end{bmatrix} or, Y = X^TW$$

A bit of mathematical details, I hope you didn't get bored. By now let's be clear that every line in a neural network are basically weight values. So for recognising a dog in a pic, we have a set of weights(or weight vector $$W$$). Now the most important question: How did we know the weights? Now here starts the idea of Machine Learning. The whole science of Machine Learning is about Learning the weights(atleast till now).

### What is Machine Learning?

Since we arrived at the idea of Machine Learning from the basics, there is no problem looking at an established definition:

Machine learning is an application of artificial intelligence (AI) that provides computer systems the ability to automatically learn and improve from experience without being explicitly programmed. Machine learning focuses on the development of computer programs that can access data and use it to learn for themselves.

Note the terms automatically learn, Improve and Experience. These terms can be best appreciated when we see a real-life working example of Machine Learning. I want to dedicate a seperate article for this because something as interesting as this needs special attention. So the most important question: How did the neural network figure out for itself the weight matrix, will be answered in the next post.