blog
CCTV: Post-90s Programmer Sentenced for Illegally Reposting Original Videos Using Software

CCTV's "Today's Verdict" program recently reported a case in which a programmer developed illegal video reposting software, earning over 7 million yuan in profit and ultimately receiving a prison sentence.

A well-known domestic short video platform reported to the police that someone was selling a video reposting software online. Users of the software could bypass the platform's review mechanism, easily "repost" and steal others' works for illegal submission. Police investigations revealed a chain of illegal activities behind this, with criminal groups upstream developing and producing illegal software, altering the codes of short video platforms to evade supervision.

2024050602.png

The suspect, a post-90s programmer named Zhou, confessed that the software was mainly used for illegal video reposting. It supported video mirroring, watermark removal, draft replacement, camera replacement, and other functions. Modified videos could be easily published on mainstream video platforms such as Kuaishou, Douyin, Bilibili, Xiaohongshu, and Xigua Video, assisting others in quickly increasing their follower counts. From May 2022 to March 2023, Zhou accumulated profits of over 7 million yuan. Currently, Zhou has been sentenced to three years in prison with a five-year probation period. Chen, an accomplice responsible for selling the software, was sentenced to three years in prison with a two-month probation period.

Unveiling: Illegal Video Reposting Software Is a Web Crawler

In this case, the software used for illegally downloading video content was a web crawler. A web crawler, also known as a web spider or web robot, is a program or script that automatically retrieves information and data from the internet according to certain rules.

2024050604.png 2024050605.png

In November 2022, Dingxiang Defense Cloud Business Security Intelligence disclosed that a social media platform suffered from persistent web crawler theft. A large amount of user information and original content was stolen by web crawlers and resold to competitors or used directly for malicious marketing.

Data shows that the global data theft volume in 2023 will reach 190 billion records, with over 80% of the data coming from web crawlers. Web crawlers typically programmatically access websites to retrieve user information or data. Such behavior not only violates users' privacy but also causes significant economic losses to businesses.

Question: How to Detect Web Crawlers Stealing Videos?

Current web crawler programs have features such as random IP addresses, anonymous proxies, identity modification, and imitation of human operational behavior, making them extremely difficult to detect and block, requiring identification and analysis from multiple dimensions.

2024050607.png

First is the target of access. Malicious web crawlers aim to obtain core information from websites or apps, such as user data, product prices, and comment content. Therefore, they typically only access pages containing such information while ignoring other irrelevant pages.

Second is access behavior. Web crawlers are automatically executed by programs, following preset processes and rules for access. Hence, their behavior exhibits obvious regularity, rhythm, and consistency, contrasting with the randomness, flexibility, and diversity of normal user behavior.

Third is access devices. The goal of malicious web crawlers is to gather the most information in the shortest time. Therefore, they use the same device for a large number of access operations, including browsing, querying, and downloading, leading to abnormal indicators such as access frequency, duration, and depth of that device.

Fourth is access IP addresses. To evade website identification and blocking, malicious web crawlers employ various methods to change IP addresses, such as using cloud services, routers, and proxy servers. This results in inconsistent information about the geographical origin, service provider, and network type of the IP address, or significant deviations from the distribution of normal users.

Fifth is access time periods. To minimize the risk of detection, malicious web crawlers typically choose periods with low website traffic and weak monitoring for bulk crawling, such as late at night or early morning. This leads to abnormal indicators such as access volume and bandwidth usage during those time periods.

Sixth is big data modeling and mining. By collecting, processing, mining, and modeling access data of normal users and web crawlers, dedicated crawler identification models specific to the website itself can be constructed, thereby enhancing recognition accuracy and efficiency.

Dingxiang Solution: Effective Anti-crawler Measures

The tactics employed by web crawlers are becoming increasingly intelligent and complex. Relying solely on limiting access frequency or encrypting frontend pages is no longer sufficient for effective defense. There is a need to enhance human-machine recognition technology, increase the ability to identify and intercept abnormal behaviors, and thereby restrict web crawler access, raising the cost of malicious data theft attacks. Dingxiang provides comprehensive, all-process defense solutions for enterprises, effectively preventing malicious data theft by web crawlers.

Dingxiang atbCAPTCHA, based on AIGC technology, prevents threats such as AI brute force attacks, automated attacks, and phishing attacks, effectively thwarting unauthorized access and intercepting web crawler theft. It integrates 13 verification methods and multiple control strategies, supporting seamless passage for secure users. Real-time response and disposal capabilities have been reduced to within 60 seconds, further enhancing the convenience and efficiency of digital login service experiences.

Dingxiang Device Fingerprinting unifies and generates a unique device fingerprint for each device by internally linking multi-platform device information. Utilizing multidimensional identification strategy models based on device, environment, and behavior, it identifies risky devices manipulated by malicious entities such as virtual machines, proxy servers, or emulators. It quickly identifies whether access page crawlers originate from malicious devices by analyzing abnormal or non-user habitual behaviors such as multiple account logins, frequent IP address changes, or frequent device attribute changes.

Dingxiang Dinsight assists enterprises in risk assessment, anti-fraud analysis, and real-time monitoring, enhancing risk control efficiency and accuracy. Dinsight's average processing speed for daily risk control strategies is within 100 milliseconds. It supports configuration-based access and sedimentation of multi-party data, leveraging mature indicators, strategies, models, and deep learning technology for risk control self-performance monitoring and self-iterative mechanisms.

Paired with Dinsight, the Xintell Intelligent Model Platform automatically optimizes security strategies for known risks and configures support for various risk control strategies based on risk control logs and potential risks identified through data mining. Leveraging associative networks and deep learning technology, it standardizes complex data processing, mining, and machine learning processes, providing end-to-end modeling services from data processing, feature derivation, model construction, to final model deployment. This effectively uncovers potential malicious scraping threats, further enhancing the recognition of malicious data theft behaviors and the interception effectiveness against malicious web crawlers.

2024-05-14
Copyright © 2024 AISECURIUS, Inc. All rights reserved
Hi! We are glad to have you here! Before you start visiting our Site, please note that for the best user experience, we use Cookies. By continuing to browse our Site, you consent to the collection, use, and storage of cookies on your device for us and our partners. You can revoke your consent any time in your device browsing settings. Click “Cookies Policy” to check how you can control them through your device.