As a link closest to the public users in security services, CAPTCHA is also an almost necessary link for any product in the user login link. From the initial text CAPTCHA to knowledge CAPTCHA to behavior track CAPTCHA, and then to intelligent insensitive CAPTCHA, CAPTCHA manufacturers spent more than ten years. The verification code behind each update is the verification code manufacturers and black ash production non-stop game against the result.
In general, the working procedures of the CAPTCHA is as follows:
-
Request from the client;
-
Return the CAPTCHA information (such as pictures, session ID, prompt information, etc.);
-
The client rendering display, user clicks or slides the CAPTCHA according to the prompt;
-
The client collects user verification data and sends a request to the server for verification, and the server returns the verification result.
Recently, AISecurius has officially launched intelligent non-cognitive verification.
Intelligent non-cognitive verification, as the name suggests, is to allow trusted users to pass without verification. If the user applies for CAPTCHA for the first time, the client automatically collects device information and reports it, and the server judges the current user risk information through risk control strategies, human-machine identification models, and behavioral feature models. Risk-free users do not need verification, while suspected users must verify again to pass. In some high-risk business scenarios, users can also change the intelligent non-cognitive verification to strong verification mode for stricter management.
The security of the CAPTCHA is mainly determined by these two factors:
First, the image itself, it is difficult for the machine to recognize the content of the image;
Second, the behavior data generated during the verification process, which is used to verify whether the verification process is human behavior.
Here, we focus on the image to discuss how it's used against the illegal and semi-illegal behaviors.
Let's take the example of anti-crawler and look at the role of images in CAPTCHA offensive defense.
The most popular types of CAPTCHA are rotating image and puzzle, which require association ability, and it is difficult for machine to acquire, and the verification process is similar to a game, which improves the user experience in the process of improving defense.
Baidu rotate CAPTCHA (left), AISecurius rotate CAPTCHA (center), AISecurius puzzle (right)
However, these types of CAPTCHA are not perfect, the crawler team also has a corresponding method of cracking.
For example, the general operation process of the crawler in the rotate CAPTCHA is as follow:
-
Collect image materials;
-
Program simulates the rotation to generate the model library (as shown below);
-
Search the model library with similarity algorithm to obtain the target angle;
-
Simulate the user to rotate the image to the target angle or generate request data packet for verification;
-
The certificate is obtained after passing the verification.
Similarly, puzzle CAPTCHA also faces the same problem. The process of crack with crawler is as follow:
-
Collect image materials;
-
Manually solve or forcibly solve the puzzle and write it into the model library;
-
Search the model library to find the original image, and solve the puzzle according to the original image.
To sum up, it is not difficult to see that the collection of picture materials is a necessary condition for both the rotation type CAPTCHA and the puzzle type CAPTCHA, but in order to achieve defense effect, it is not possible to rely on the picture alone.
This leads to two important factors of captcha defense - solution space difficulty and dynamic material.
First, let's take a look at the definition of solution space.
We can see the solution space as a probability problem, that is, the difficulty of the user or crawler in passing the CAPTCHA.
For example, the traditional text CAPTCHA, if it is 4 digits, the solution space is 10^4. The rotate CAPTCHA uses the slider length to convert to the rotation angle. The length of slider CAPTCHA is generally 300px, and the random length may be 300px, so the solution space is about 300~400.
Does this mean that the text CAPTCHA is more secure than the rotate CAPTCHA?
Not necessarily, because the difficulty of solving space is not fixed.
Take the character sequence CAPTCHA as example, without considering the text base, the solution space becomes the probability of finding four words in a given order from five words, that is, A(5,4)=5* 432=120. That is to say, the crawler has a 1/120 probability of solving the CAPTCHA.
It seems rotate CAPTCHA seems to be more secure than character sequence CAPTCHA.
It should be noted that the solution space will also continue to degenerate with new cracking scheme. When the solution space degenerates to 1, it means that the crawler can successfully solve the CAPTCHA every time.
So, how to solve this problem?
The defense strategy cannot be effective forever, such is true for the solution space. Dynamic materials (update frequency of materials) are required to ensure the effectiveness of the solution space.
Dynamic materials include 3 application scenarios:
- Manual replacement
As the name suggests, it refers to replacement of image manually.
However, problems exist for manual replacement:
- material short effective period, need frequent replacement;
- The copyright of image has to be taken into consideration if it's taken from the Internet;
- Even if the there is no copyright issue, it has to be manually filtered to ensure the effect of CAPTCHA;
- The cost is higher for manual replacement.
Automatically generating background images can solve this problem.
- Program generation
One of the solutions is to draw with OpenGL, and generate images for output to the material library. OpenGL draws picture depending on the model, you can choose the model, quantity, size, rotation, coloring, texture, light, and render it into 3D picture, which means there are countless variations. For dynamically generated, almost non-repeating pictures as source for rotate CAPTCHA, it can effectively defend against cracking by those using images collected, which greatly improves the security of CAPTCHA.
Dynamically generated models can take many forms, such as geometric models, virtual physical models, etc.:
For the dynamically generated image of CAPTCHA, simple image will do. After optimization, it can be ensured that the images used in rotate CAPTCHA and puzzle CAPTCHA are dynamically updated and will not be repeated, which is effective against web crawlers collecting image materials.
- Edge processing
Another application of dynamic material is the edge processing of pictures, which is particularly important in slider CAPTCHA.
Dynamic restoration CAPTCHA requires the user to restore the picture that has been cut, which requires the same associative ability as the rotate and puzzle CAPTCHA. But it doesn't require web crawlers to have associative ability to solve the rotate CAPTCHA composed of regular image. The web crawler can find the target position by continuously moving the block horizontally and comparing the RGB values of the upper and lower pictures until the RGB values are matched.
Therefore, we just simply set the edges of the upper and lower images to similar colors, so that the web crawler cannot determine the target distance based on the RGB values of the upper and lower images. The desired effect can be achieved by overlaying all objects with the same texture through dynamic material program.
To sum up, the principal of CAPTCHA is to constantly adapt itself to the changing circumstances. We can also prove this from the iteration process of CAPTCHA. Although the process has not changed, the complexity of CAPTCHA has increased with its iteration.
Just like the non-cognitive CAPTCHA launched by AISecurius, it is not only an iteration of technology, but also an iteration of algorithm. It's integrated with the Ding Xiang DDoS, making the defense effect clear with improved user experience.
After all, security defense is an ultimate battlefield, and it takes combination blow to "annihilate the enemy"