Neural network

If you browse the internet fairly often, you’ve probably noticed CAPTCHA prompts at some point. They always appear whenever you are asked for an information retrieval. The goal is to prevent a hacking tool from doing this
an unlimited number of times and thereby crashing the server.
So images are used in which the human brain recognizes numbers and/or letters, but a computer program does not. The question remains: why, then, is a human superior to a computer in this regard? To that end, it
would first be helpful to explore how a computer is able to recognize digits from pixel images, such as those that have been written rather sloppily.
Even at this point, we have to simplify things quite a bit. So let’s take a rectangle with sides measuring just 20 by 30 pixels and limit ourselves to digits that are to be processed by a computer program. In practice, at least
uppercase and lowercase letters would be added, or entirely different characteristics would be apparent. However, we will stick to the rule that the digits must be between 0 and 9 in every case.
Pixels on a raster must therefore be reliably assigned to one of these ten digits, a typical task for the sensor or the system in autonomous driving. Even things that are easy for children require a certain amount of
processing power from a computer. Once again, the self-learning capability of neural networks will be of particular importance here.
20 by 30 equals 600 pixels, which is a rather low resolution. Nevertheless, you'll see that it's quite a challenge for our program. Of course, the pixels in the memory are arranged in a row. The analysis must now go through
several stages to determine which figure is correct.
In practice, there will likely be many steps, but we’ll reduce them to just two because we only want to demonstrate the principle behind this kind of analysis. We call these steps 'layers'. Such a secure layer
consists of many 600-pixel images, each containing individual structures. You could also call them comparison images. And then there's a neuron that assigns a rating to each image.
So the neuron has as many numbers as the layer has comparison images. And how do the two layers we selected differ from each other? Due to the complexity of the structures depicted in her paintings. Here’s an
example to illustrate this. In the second layer, the images could contain digits that are already quite developed; for example, there could be an image of a circle.
If its rating is high, that greatly narrows down the search for the right number. The only possibilities left are '6', '8', or '9'. The second layer therefore already contains more distinct structures. These could be longer lines in
various images that all run more or less vertically. As you can see, this basically boils down to parts of the digits '1' and '4'.
So if, in the second layer, the neurons for the image with a circle and the one with a vertical line each show high values, then only the digits '6' and '9' are possible. But how can the two be distinguished? It's simple: The
second layer contains not only images with different circles, but also images in which the circles are arranged in different ways. The same thing happens with the long, roughly vertical lines. So if the line is above the circle,
it is likely the '6'; otherwise, it is the '9'.
What is important here are the connections between the neurons associated with the circle patterns and those associated with the lines, which virtually automatically lead to the correct digit. These connections are part
of the neural network. Of course, they also exist between layers 1 and 2. What exactly does the first layer do? It would be structured exactly like the second layer, except that it would contain images of substructures.
What exactly is a substructure of, say, a circle? Well, these are corners, beautifully rounded, angular, and sometimes just plain crooked lines. These appear again in various arrangements in the individual images, and the
neuron receives an overall score indicating how well the individual pixels match those of the original image. You could also say that the original image is scanned for corners.
Once again, connections are necessary, because when Layer 2 checks for circles, Layer 1 must have found at least four vertices. Anything else doesn't make sense. In our example, the two layers and their neurons
represent many more in actual programs. They are intended to show that the structures under investigation are becoming increasingly complex; otherwise, it is impossible to successfully select digits.
'Successful, that's the key word. It leads us to the question of how these many layers come about. How does the computer know what a corner is? Of course, it doesn't know that. Nevertheless, it creates the layers itself, or
more precisely, the intermediate images for each layer. How does it do that? It creates structures that emerge, for example, even when just two pixels are paired together.
But there are endless possibilities. To do this, one must examine the process by which such layers are formed. After all, certain parameters must naturally be provided to it, namely, on the one hand, e.g., thousands of pixel
images, and on the other hand, the respective resolution, that is, the number it was supposed to recognize.
And now he can spend days, if necessary, creating different layers and testing how well they work in the search for the right number. That would be the learning process so often described. So it is by no means possible
without some connection to reality, which means that we are very much in control of the computer's 'learning'.
You could move on to the next chapter right now, since the main points have already been covered. In addition, we will briefly discuss the intensity of the pixels here. Once again, we are simplifying things, because we are
ignoring the color scheme, which is sometimes very important for image recognition, and perceiving only differences in brightness.
So when the pixels of the original image are compared one by one with those of one of the images in Layer 1, the corresponding neuron repeatedly produces a score ranging from 0 to 1, with at least two decimal places,
the latter indicating a high degree of similarity. However, there might happen to be some gray pixels in the high-brightness area of the reference image that, while they do not contribute to any insight, slightly distort the
image.
It can be helpful to subtract a certain amount from each rating to filter out these matches. The analysis focuses on the essentials, and the selectivity is increased. However, if this value is set too high, the recognition rate
could decrease. Presumably, the program can also try out different values during self-learning and evaluate its success.
|