In the fall of 2018, Google released a new version of reCaptcha, the company’s widely used bot detector. reCaptcha v3, as its called, is great at detecting bots but it has a dark side, as researchers suspect that Google is compromising users’ privacy to feed the system.
Luckily, there’s alternative to reCaptcha for website owners who don’t trust Google—and could use a little extra cash.
Called hCaptcha, it’s a bot detector that acts just like the captchas users are already accustomed to, where they’re asked to label what they see in different images. But instead of showing Google’s images–images the company uses to train its machine learning algorithms–hCaptcha shows users images from datasets, which belong to other companies that also need images labeled for machine learning applications. In theory, the service helps everyone: you prove you’re not a bot while helping companies hone their algorithms, and websites make money off the whole exchange.
Because accurate labeling is so valuable to these companies, websites that host hCaptcha are paid based on how many of their users click through hCaptcha and answer questions successfully. Depending on their traffic and the number of bots attacking them, websites can make thousands of dollars per month. It’s a good deal for machine learning companies that need people to label their data, and for websites that want the security of a captcha–and some extra cash. As for users: The experience remains the same as always, though you can tell the difference is you look closely, because there will be an hCaptcha logo in place of the reCaptcha symbol you’re used to. Today, 10 million people interact with hCaptcha every month on thousands of websites, powering dozens to hundreds of machine learning-labeling projects at a time.
hCaptcha (the “h” stands for human) is the brainchild of Eli-Shaoul Khedouri, a longtime entrepreneur and AI expert who founded the machine learning company Intuition Machines in 2017. At Intuition Machines, Khedouri and his team build large-scale machine learning algorithms for Fortune 50 companies. While Khedouri declined to share specifics because of nondisclosure agreements, he said that Intuition creates algorithms that can do things like analyze the content of videos. To accomplish tasks like this, Intuition’s models require millions if not billions of data points, much of which must be labeled by people. Once they have the annotated videos or images, Intuition’s team can start teaching an algorithm how to recognize what’s going on in a video. “We actually ended up being in the captcha business accidentally because we’d become a large consumer of [human annotation] labor,” Khedouri says. “The services available weren’t really what we wanted.”
Finding enough people to label such vast datasets was a serious challenge. First, Khedouri tried building up his own team in Vietnam who could annotate datasets. But some days he’d have enough work for 12 people, and other days he’d have enough work for 50. Since the amount of data in need of labeling changed so much based on whatever projects the team was working on, having a full-time team wasn’t the most cost-effective solution way to go (though was probably better for the workers).
Instead, Khedouri turned to captcha farmers—clickworkers who are paid a fraction of a cent to solve captchas on the internet. His team built a platform for the captcha farmers to label the datasets for Intuition Machines, and designed measures to assess how accurate each farmer’s labeling was. It was the most efficient, least expensive way for Khedouri to label his data, in real time.