When we see the final result - a composition written in the style of a famous musician or a voice synthesized by a computer algorithm, - it seems like a miracle, some kind of magic. But in reality, there is no magic - mathematics and programming rule everything. With their help, scientists are trying to recreate the human brain, to develop a mathematical model of the human mind.

### The neuron is the basis of any mind

As is known from biology, the brain of any living creature (including humans) consists of a huge network of nerve cells - neurons. Through tiny processes called dendrites, neurons receive information from the outside. Then, through the axons, they pass (or do not pass) impulses - weak electrical signals. These signals carry information. This is how “thinking” happens. To exaggerate the process, it looks like this: a “hot” signal came in - the evaluative mechanism worked and then the signal went into the muscles - “remove the hand”. Artificial intelligence works on the same principle and is built on artificial neurons.

In biology, a neuron is a very complex device that must not only perform the function of transmitting signals, but also maintain its vital functions. With artificial neurons, everything is simpler - their functionality boils down to summing incoming impulses, evaluating them and transmitting the signal further along its axon. If the total input pulses exceed the threshold value, the output pulse is generated; if not, the pulse is not generated.

### Building a mathematical model of a neuron

Now, knowing how neurons work, you can build a mathematical model of a neural network. Suppose we have a set of input parameters - x 〖_1〗, x 〖_2〗, x 〖_3〗 ..., x 〖_n〗 and each of these parameters has its own weight - w 〖_1〗, w 〖_2〗, w 〖_3〗 ..., w 〖_n〗. The value of each input signal is multiplied by a weighting factor, after which the resulting value is passed to the activation function. As a result, our adder works according to the formula:

S = x 〖_1〗 * w 〖_1〗 + x 〖_2〗 * w 〖_2〗 + x 〖_3〗 * w 〖_3〗 + .. x 〖_n〗 * w 〖_n〗. This simple formula determines how the brain works and is relatively easy to program.

Let's take an example: you are about to go fishing and wondering whether you should go fishing or not. Let's evaluate a number of factors that affect the success of such an event. The first circumstance is sunny weather. Nibble goes better in cloudy weather. When the sun is bright, the fish goes to a depth, therefore, the signal of this condition is indicated as x 〖_1〗 = 0 for bright weather, and x 〖_1〗 = 1 for cloudy weather. The second circumstance is atmospheric pressure. At high pressure, large fish becomes lethargic. Therefore - x 〖_2〗 = 0 at high pressure and x 〖_2〗 = 1 at low pressure. The third factor is wind speed. The stronger the wind, the more waves and worse fishing - x 〖_3〗 = 1 for a weak wind and x 〖_3〗 = 0 for a strong one. The fourth factor is the change in weather. When the weather changes abruptly x 〖_4〗 = 0, the fish senses the change in ambient temperature and their appetite decreases. Each factor has its own weight value, for example, the most important are pressure and wind - w 〖_3〗 = 5, w 〖_2〗 = 4, and the other two factors have less influence on decision making - w 〖_1〗 = w 〖_4 〗 = 1. Suppose the day was calm and cloudy x 〖_3〗 = 1, x 〖_1〗 = 1, the pressure was high x 〖_2〗 = 0, and the weather did not change x 〖_4〗 = 0.

Let's write our formula, taking into account the factors, to find out the weighted value of the signal:

S = 1 * 1 + 0 * 4 + 1 * 5 + 0 * 1

S = 6

### Activation functions

The weighted value is six, but is it a lot or a little? This figure should be used in the next step - using the activation function. The activation function is the dependence of the output signal from the neuron on the weighted signal value:Y = f (S)

Several standard functions can be used as an activation function. First, you can use the Heaviside function (aka the unit hop function). This is the simplest option and is very easy to program.

Another variant of the function is sigmoid or logistic. It is defined by the expression: Y = 1 / (1+ 〖exp〗 ^ ((- aS))). In this case, the parameter “a” determines the steepness of the function. The larger the weighted sum S, the closer the output Y

will be to one (while never reaching it). Conversely, the smaller the weighted sum S, the more the output of neuron Y will tend to zero.

Compared to the single jump function, the logistic function is more flexible and at all points it has a derivative, which, in turn, can be expressed through the same function.

To build a more realistic model of a nerve cell, the hyperbolic tangent function is used. This function is used more often by biologists.

Y = th (S).

### Varied neural network

When we looked at the hypothetical fishing problem, we mentioned the weights of the input signals. The neurons are connected to each other, and when transmitting the signal, similar weights are used to amplify or attenuate the signal.

There are several basic neural network models. A single-layer neural network has the simplest topology and performs calculations, the results of which are immediately transmitted to the outputs. A multilayer neural network that uses so-called hidden layers is much more efficient. These layers process information, receiving intermediate data, which are then transmitted further along the links of the network. Learning the neural hidden layers turns the neural network into a kind of production hall, where each hidden layer does a specific job on a "detail". If a neural network diagram has one direction, it is called a feed-forward network. But the circuit can also have a topology of feedback neurons - in this case we get a feedback network. Circulating feedback neural networks can restore or supplement signals, mimicking human-like short-term memory.

### Neural network training

Any neural network by itself is not efficient. Its capabilities are revealed only after training. How exactly does a neural network learn and what is it?

In our fishing problem, we forced the weights of the factors and the threshold value based on our own experience. We considered this situation as the work of a neural network with just one neuron. But what would we do if our neural network was more complex and contained, for example, one hundred neurons? Intuitively, we understand that the more neurons, the smarter the network. But if we take, for example, a hundred people far from engineering, they cannot design an airplane. Therefore, it makes no sense to increase the number of computational elements - this will make our neural network only heavier. There is no point in changing the adder either - it performs only one function. The activation function is also stable and only the bond weights remain. It is them that need to be corrected in the process of training the neural network. We have to find such weights so that the output signal suits us. This is the essence of learning. The training of the human brain is very similar - synapses (the junction of dendrites with axons) act as a regulator of weights. They change their bandwidth by amplifying or attenuating the signal. If we train the network with only one test signal, then our neural network will remember the correct answer and will not work correctly. For example, the network recognized the face of a person in a photograph, but it no longer perceives another face, since there are some differences. Therefore, for training, a certain sample of input values is always used and the weights are already corrected for it. When the neural network is trained, it is tested. The test sample on which the network is checked is a kind of “control work”, which makes it clear how corrected the weights of the links.

Let's say we are developing artificial intelligence that predicts the weather. In this case, for training, we must take real data on weather conditions for the past dates - information on temperature differences, atmospheric pressure, precipitation, etc. To learn how to play chess, you can use another program that accesses the database of games played.

There is also a special option for teaching a neural network - without a teacher. In this case, the network independently separates the input signals and does not have reference signals for comparison. Suppose the task of such a network is to identify an airplane in a photograph. As it learns, the network begins to group the input signals into classes. The network identifies the signs by which it can distinguish vehicles - wheels, windows, fenders. This process is called clustering. The self-learning system has unlimited possibilities and repeats human biology as much as possible. After all, most of the skills came to you after self-study. Think back to yourself when you first learned to read and realized that the four oblique sticks are the letter "M". Here, the metro is the letter "M", and here is the store, in the name of which there is also the letter "M". But - just four branches on the ground, how funny they lie - well, just the letter "M"!

### Conclusion

As you can see, there is no magic in the phrase "artificial intelligence" - it is extremely sound logic and mathematical apparatus. Just like the processor in your laptop or smartphone consists of the simplest switches, so artificial intelligence consists of neurons, connections and a long, long learning curve.

We deliberately did not cite the program code so as not to complicate the article and confined ourselves to the simplest mathematical apparatus. In this article, we “on the fingers” explained how the machine brain works. Everything else that you see around you: "smart" home, "smart" autopilot, "smart" voice assistant - this is the result of using of the above mathematical calculations and their software implementation.