Lei Fengwang press: The author of this article is Slyvia, the article analyzes in detail the differences between 1) machines and humans in face recognition; 2) analysis of the causes of human-computer war results.
Following the "Go" man-machine war - human beings represented by Li Shishi, lost to the computers represented by Alpha Dogs, humans launched the "human face recognition" man-machine war. This time is a facial recognition robot born in Hangzhou, and can play against the “eye of a ghost†Wang Xi.
In the 15 years of “Most Powerful Brainâ€, in the 520 glasses of homogeneity and homogeneity, he successfully pointed out that the selected cup was judged and accurately stated that the cup had been rotated 15 degrees. World War I became famous.
Before revealing the results, we must first understand the routines of the "seeing people" of computers .
Both human eyes and computers like to aim at "points." However, when the human eye distinguishes objects, it often judges through different points. The computer, on the other hand, looks for the same thing.
Through the camera "observe" the target, the computer will photograph the object to the computer in the form of pictures, this is the image recognition process. Taobao's image search and Baidu's image recognition function belong to this type of application.
In recognition, the computer will find some "out of the ordinary" points to match the images in the "brain." These "standoffs" are often referred to as "corner points" or "key points." These points have certain characteristics in the image, such as local maximum or minimum grayscale (ie, image brightness), certain gradient features (the amount of grayscale variation that describes the image).
For example, a skull on the face of a person is used as a reference point when the computer recognizes it. So how do computers find these "corner points"?
Corner map
When the computer looks for a certain pixel, it follows the rule of querying one by one. In order to determine whether a point is a "corner point", a window of a suitable size (for example, a window of 33) will be selected so that the center of the window traverses (ie, accesses) the entire image pixel, and at the same time, whether the center point and its surrounding points are determined Significantly different.
Therefore, when the window is in the smooth area (Fig. a) and the window moves in each direction, the images inside the three windows in the image have no change. When the window is moving in the edge direction (as shown in figure b), the images in the three windows are also unchanged. When the window is at the “corner point†(as shown in figure c), the internal image of the three windows is not the same regardless of the direction in which the window is moved. Therefore, it is determined that the point is a “corner pointâ€.
Different corner detection algorithms use different windows. FAST using a circular window to detect "corner points" and SIFT operators using a 1616 sampling window are two major algorithms for detecting image feature points.
What about "matching"?After finding feature points, describe them and start image matching. The major premise of the description is that we need to use the gradient distribution of the pixels around the feature points to specify the direction for each key point.
For example, in the SIFT operator, the relative direction of sample points and feature points is Gaussian weighted to obtain a 448 128-dimensional feature descriptor.
After a series of processing, it became the "ghost look" on the map above.
With feature point descriptors, the feature point descriptors of the template map can be compared. The highest score is the best matching point. Then all feature points are traversed. The computer's "face recognition" is based on the above theory. At the same time, it will also perform some vertical comparisons, such as comparing the distribution of five features on the picture.
So, the result is?
In the first game, find 3 people from the 150 net red photos.
In the second game, find 3 people from the 300 net red photos.
In the third game, 2 people were identified from the 80 net red childhood photos.
Two wins in three rounds, and the smashing of humans represented by "Water Brothers" has won. The trophy may win a game, indicating that the future of image recognition in computers may be expired. However, the failure ultimately indicates that there are still some problems.
Victory and defeat analysis of the identification of the object selected in this competition - net red face, all seem to be carved out of a mold, undoubtedly increased the difficulty of the game.
In the face of these stereotyped nets, raccoons can be fashionable in recognizing recent photographs, but it is a bit difficult to photograph them in their childhood.
When ants can identify faces, part of the program is judged by the distance between the five features. However, when people are young, their five features are relatively central, and they will spread out as they grow older. Needless to say, some Internet users have also conducted "micro-fabrication." Even mothers and mothers can't recognize them, not to mention marijuana.
The failure of the third game is excusable. But why can the trophy win the first game? Because fast .
If you look closely at the video above, you may have an impression on this sentence: When the computer recognizes, the photo is divided into four areas and divided into hundreds of key points.
This is a key step in accelerating image recognition. This method is used almost exclusively in all image recognition. Image recognition has a professional term to describe it, called "image pyramid . "
Image Pyramid
The image pyramid was originally used for machine vision and image compression. The pyramid of an image is a series of image sets with decreasing resolution in a pyramid shape. At the bottom is a high-resolution representation of the image to be processed (above figure G0), while the top is a low-resolution approximation (above figure G3). When moving to the upper layer of the pyramid, the size and resolution are reduced, and the recognition speed is improved. (Note: The higher the resolution, the bigger and clearer the picture)
At the same time, people are subject to external interference in the game, the psychological effect is relatively large, but the computer does not. Therefore, "Water Brother" has lost its profits, and there are also non-professional factors.
However, there are still deeper reasons behind the two losses.
In the game, both players need to identify the image by their own movement, which is very unfavorable to the ants . Because the image recognition will be unstable when the camera is moving, just like we sit in the car and look out the window, the scenery will be blurred as the speed increases.
At the same time, the on-site lighting group has a strong, reflective, and it will also affect the identification of the mammals.
In life, when we look back at the screen, because the reflected light is so strong that we can't see the screen, we can pull the curtain or use it to block the reflection of light. The computer has to detect the brightness of the real-time environment in real time, adjust the image brightness threshold according to the detection results, and then compare them.
At present, the regulation of light by machines has not kept pace with the adjustment of the human eye . However, even if this trophy can be defeated by the “water brotherâ€, there is only one “water brother†and there can be many more. Will we win the next time?
Lei Feng Network (Search "Lei Feng Network" public concern) Note: This article is authorized by ARC augmented reality (WeChat ID: arinchina) Lei Feng network release, if you need to reprint please contact the original author, and indicate the author and source, not delete the content.
More articles: 1. Decryption: The technology behind smart beauty and dynamic self-portraits
2. Decryption: Key Techniques for Detecting Facial Feature Points
3. Face Detection Development: From VJ to Deep Learning (I)
4. Face Detection Development: From VJ to Deep Learning (2)
5. Application of deep learning in face recognition - "evolution" of the grandma model