- Premanand S

# Deep Learning - An Intuition behind the technology

Updated: Jul 7

Thanks for the support and motivation for the past blogs, Guys!!! Keep supporting me, encourage me, and correct me if I am wrong in any of my views about the topics!! I hope from the last blog, Artificial Intelligence concepts got cleared and the link also provides some insights! So, here in this post, will discuss Deep Learning. Some requested sample examples to add to the post. Thanks for the suggestion, I will do it from the next post, which's on Machine Learning (my actual area of interest). You can ask then, what's the importance of Programming, Data Science, Artificial Intelligence, and now this (Deep Learning)?

The answer is simple, in order to know Machine Learning, we need to learn in and around happening about other technology that related to Machine Learning, that's why I collected and drafted intuition for these topics. Anyway in near future, I can cover all the technology (if possible) with the expectations (programming and some deeper level of technology). The previous posts (Programming, Data Science, and Artificial Intelligence) and this current post (Deep Learning) give insights and understanding of the technology rather than a deeper level of technology. I hope I justified the above statements for the above technology. Anyway, thanks for the support and suggestions that you people have encountered so far and I wish it continues.

Ok let's dive into Deep Learning (DL), before getting into the technical definition, first will see some understandings

"The analogy to deep learning is that the rocket engine is the deep learning models and the fuel is the huge amounts of data we can feed to these algorithms."- Andrew Ng (source:Wired)

### Deep learning is a particular kind of machine learning that achieves great power and flexibility by learning to represent the world as a nested hierarchy of concepts, with each concept defined in relation to simpler concepts, and more abstract representations computed in terms of less abstract ones.

### The below diagram, which implies the analogical understanding between Human Brain (natural function) and Neural Network (artificially mimicking human brain),

To get some insights in cartoon level of understanding, watch the below video for some understanding of DL,

https://www.youtube.com/watch?v=6M5VXKLf4D4&list=PLEiEAq2VkUUIYQ-mMRAGilfOKyWKpHSip

I hope you get something about DL.

Before getting insights about DL, we need to understand about Neural Network (NN), which leads to an understanding of DL in a better way,

__Neural Network:__

Like our Human Brain, Neural Network is a mathematical artificial model, which also has a neuron-like structure to process the data (or signal), which mimics or replicates the function of the human brain. Each Neuron receives signals as an input, multiplies them by weights, sums them up, and applies a non-linear function (understand from the below figure). These neurons are stacked next to each other and organized in layers.

Neural Networks are able to learn the desired function using big amounts of data and an iterative algorithm called backpropagation. We feed the network with data, it produces an output, we compare that output with a desired one (using a **loss function** - used as measurement of how good a prediction model does in terms of being able to predict the expected outcome) and we readjust the weights based on the difference.

Typically, a neural network model is trained using the stochastic gradient descent optimization algorithm and weights are updated using the back propagation of error algorithm.

For example, in the data set used for detecting cats. There are hundred images mixed of cat and dog. With every cat pictures, we label it as 1 and 0 if not. Our mission is simple, put an image into the network, it return a result as floating (decimal) number which earned to predict which class the image belongs to. If the output is 1, there is no denying to say that it’s a cat and on the contrary, it isn’t. However, NN doesn’t work in this way and the fact that, the result we got is a real number, like 0.1, 0.5 or 0.8. And from this, we determine 0.5 or 0.8 is cat or not.

Gradient descent is one of the most popular algorithms to perform optimization and by far the most common way to optimize neural networks. At the same time, every state-of-the-art Deep Learning library contains implementations of various algorithms to optimize gradient descent (e.g. lasagne's, caffe's, and Keras ).

**The intuition behind NN:**

Let’s use an example to make this clearer. Let’s say that for some reason we want to identify images with a tree. We feed the network with any kind of image and it produces an output. Since we know if the image has actually a tree or not, we can compare the output with our truth and adjust the network.

As we pass more and more images, the network will make fewer and fewer mistakes. Now we can feed it with an unknown image, and it will tell us if the image contains a tree.

Diagram of what one node might look like,

The layers are made of nodes. A node is just a place where computation happens, loosely patterned on a neuron in the human brain, which fires when it encounters sufficient stimuli. A node combines input from the data with a set of coefficients, or weights, that either amplify or dampen that input, thereby assigning significance to inputs with regard to the task the algorithm is trying to learn. These input-weight products are summed and then the sum is passed through a node’s so-called activation function, to determine whether and to what extent that signal should progress further through the network to affect the ultimate outcome, say, an act of classification. If the signals pass through, the neuron has been “activated.”

A node layer is a row of those neuron-like switches that turn on or off as the input is fed through the net. Each layer’s output is simultaneously the subsequent layer’s input, starting from an initial input layer receiving your data.

Pairing the model’s adjustable weights with input features is how we assign significance to those features with regard to how the neural network classifies and clusters input.

Deep-learning networks are distinguished from the more commonplace single-hidden-layer neural networks by their depth; that is, the number of node layers through which data must pass in a multi-step process of pattern recognition.

Over the years researchers came up with amazing improvements on the original idea. Each new architecture was targeted on a specific problem and one achieved better accuracy and speed.

The above diagram clearly explains the difference between Deep Learning with other existing classic technology w.r.t their process.

__Layman Understanding – DL Simplified:__

__Case 1: Inception to DL__

I hope you people watched the movie – Inception (2010), confusing but still enjoyable for its concept and technology blended entertainment. The main concept behind the movie is, one can able to plant an idea into someone’s subconscious mind through a dream, which then defines the action of the individual. This can be done by a creative concept called shared dreaming. In simple, this implies the term Deep Learning (Inception on Machine rather than a person)

From the above Inception timeline, depending on what you want to achieve you may have to go down several layers deep into the machine neural structure (technically we call forward propagation but in the diagram, it’s called the different level of dreams), and perform a kick (backpropagation) to reinforce the learning. The neural nodes (shared-state dreamers) use an activation function/architect (Relu, Sigmoid, and many more) to proceed to the next (deeper) level for inception.

Sometimes the nodes/person may get killed in the current layer (vanishing gradient) and enter into a state of limbo, risking the entire inception (learning) process. Fortunately, an expert druggist (bias function / leaky Relu) can prepare the right concoction for avoiding it. You may have to do this process several times (machine learning epoch) until you have convergence, i.e. the machine doesn’t deviate from its intended behavior by a whole lot.

Just like an architect is needed to build the dream landscape, you need a featured architect. You seed the machine with a feature, and it continues to create more and more elaborate features as the inception process progresses. Each hidden layer generates a feature from the input. These features get more refined as you proceed down the layers, and are more susceptible to learning.

So basically, you go deep down machines consciousness and make inception. The machine then “learns” to identify faces, handwriting, optical characters, and things that we are too lazy to imagine.

__Case 2: Birthday cake preparation to DL__

Once a rich guy has a friend, who is a cook by profession, on his son's birthday the rich guy thought to celebrate differently by preparing cake by himself with the tips from his friend (cook) and also a piece of cake for sample test, so he bought all the items for cake in abundant amount (he needs to do again if it fails).

Let’s say these are the steps you follow to make any cake.

Choose a recipe (Vanilla, German cake, or choco )

Choose the right baking pans (Dark or shiny, Size and shape)

Allow Ingredients to Reach Room Temperature (eggs, butte)

Prep the Pans (greasing and flouring the pan, lining the pan with waxed or parchment paper)

Preheat the Oven

Stir Together Dry Ingredients (cocoa powder, flour, baking powder and/or baking soda, and salt.)

Combine the butter and sugar

Add Eggs One at a Time

Alternate Adding Dry and Wet Ingredients (flour mixture and some of the milk to the butter-egg-sugar mixture, beating on low speed after each addition until combined.)

Pour Batter Into Pans and Bake

Check Cake for Doneness

Cool the Cake

Assemble the Cake

Add the First Coat of Frosting ( cake stand for 30 minutes so the frosting sets up.)

Frost and Decorate

After you made a cake following this recipe you compare it with the leftover cake and you realized your cake tastes different in so many ways. What you found was that your cake was too sweet, too crispy, too eggy, too burnt, etc.

So you decided to make another cake but this time you slightly tweak the ingredients to make it taste more like the school cake. You reduced the amount of sugar in step 8, added more milk in step 10, reduced the amount of egg in step 9 adjusted the heating time and temperature, and so on.

This time the cake tastes more like a school cake but not exactly the same, so you continue the process by tweaking the ingredients. (Remember, you have unlimited resources).

So essentially what you did is,

1. You made a cake

2. Compared it with the school cake

3. Find the differences

4. Changed the recipe to reduce the difference

5. Made cake with this tweaked recipe

You repeated this process until the comparison with sample cake - was good enough.

The important thing here is that in each iteration you compared the result with the optimum result and tried to reduce the difference between them by going back to the previous step.

Deep learning is exactly like this, you continue changing the things that affect the output so as to make it more and more like the one you want. One major difference is that while making a cake you know what process influence which property in what way, like adding more sugar will sweeten the cake, but in deep learning, you don’t know.

__History of DL:__

Always history of any technology needed not only to know about the origin or its application-specific alone, but history also makes us learn about the ideology,

The history of Deep Learning can be traced back to 1943 when Walter Pitts and Warren McCulloch created a computer model based on the neural networks of the human brain.

Henry J. Kelley is given credit for developing the basics of a continuous Back Propagation Model in 1960.

In 1962, a simpler version based only on the chain rule was developed by Stuart Dreyfus

While the concept of backpropagation (the backward propagation of errors for purposes of training) did exist in the early 1960s, it was clumsy and inefficient, and would not become useful until 1985.

The earliest efforts in developing Deep Learning algorithms came from Alexey Grigoryevich Ivakhnenko (who developed the *Group Method of Data Handling*) and Valentin Grigorʹevich Lapa (author of *Cybernetics and Forecasting Techniques)* in 1965. They used models with polynomial (complicated equations) activation functions, that were then analyzed statistically.

The first “convolutional neural networks” were used by Kunihiko Fukushima. Fukushima designed neural networks with multiple pooling and convolutional layers. In 1979, he developed an artificial neural network, called Neocognitron, which used a hierarchical, multilayered design.

Backpropagation, the use of errors in training Deep Learning models, evolved significantly in 1970. This was when Seppo Linnainmaa wrote his master’s thesis, including a FORTRAN code for backpropagation. Unfortunately, the concept was not applied to neural networks until 1985

In 1989, Yann LeCun provided the first practical demonstration of backpropagation at Bell Labs. He combined convolutional neural networks with backpropagation onto reading “handwritten” digits.

This time is also when the second AI winter (1985-the 90s) kicked in, which also affected research for neural networks and Deep Learning.

In 1995, Dana Cortes and Vladimir Vapnik developed the support vector machine (a system for mapping and recognizing similar data).

LSTM (long short-term memory) for recurrent neural networks was developed in 1997, by Sepp Hochreiter and Juergen Schmidhuber.

The next significant evolutionary step for Deep Learning took place in 1999 when computers started becoming faster at processing data and GPU (graphics processing units) were developed.

Around the year 2000, *The Vanishing Gradient Problem* appeared. It was discovered “features” (lessons) formed in lower layers were not being learned by the upper layers, because no learning signal reached these layers.

In 2001, a research report by META Group (now called Gartner) described the challenges and opportunities of data growth as three-dimensional.

In 2009, Fei-Fei Li, an AI professor at Stanford launched ImageNet, assembled a free database of more than 14 million labeled images.

By 2011, the speed of GPUs had increased significantly, making it possible to train convolutional neural networks “without” the layer-by-layer pre-training.

AlexNet, a convolutional neural network whose architecture won several international competitions during 2011 and 2012.

In 2012, Google Brain released the results of an unusual project known as *The Cat Experiment*. The free-spirited project explored the difficulties of “unsupervised learning.”

__DL architecture: __

Deep learning models make use of several algorithms. While no one network is considered perfect, some algorithms are better suited to perform specific tasks. To choose the right ones, it’s good to gain a solid understanding of all primary algorithms.

Autoencoders

Deep Belief Net

Multilayer Perceptron Neural Network (MLPNN)

Backpropagation

Convolution Neural Network (CNN)

Recurrent Neural Network (RNN)

Long Short-Term Memory (LSTM)

Generative Adversarial Network (GAN)

Restricted Boltzmann Machine (RBM)

Deep Belief Network (DBN)

__Application of DL: __

Some of the interesting DL applications areas,

Deep Dreaming (allows the computer to hallucinate on top of an existing photo – thereby reassembled dream)

Demographic and Election Predictions

Photo Descriptions

Pixel Restorations

Image – Language translation

Automatic handwriting generations

Adding sound to silent movies

Colorization of black and white images

Self -Driving cars

__Some of the best DL online courses are,__

Deep Learning Specialization by Andrew Ng (deeplearning.ai)

Deep Learning A-Z TM: Hands-on Artificial Neural Network (Udemy)

MIT Introduction to Deep Learning 6.S191 (introtodeeplearning.com)

Deep Learning Nanodegree Program (Udacity)

Advanced Deep Learning & Reinforcement Learning – Deep Mind (https://www.youtube.com/playlist?list=PLqYmG7hTraZDNJre23vqCGIVpfZ_K2RZs)

CS231n : Convolution Neural Networks for Visual Recognition – Stanford University

I hope from the above discussions: all the different illustrations and concepts can clear what is exactly Deep Learning and how they are processing and its importance too? This blog is not to mean learn all the things, yet creating insights and interest with different understandings from a different source. See you guys in the next blog, with another interesting topic '**Machine Learning**' with examples.