Pay up or stop scraping: Cloudflare program charges bots for each crawl

Cloudflare is now experimenting with tools that will allow content creators to charge a fee to AI crawlers to scrape their websites.

In a blog Tuesday, Cloudflare explained that its "pay-per-crawl" feature is currently in a private beta. A small number of publishers—including AdWeek, The Associated Press, The Atlantic, BuzzFeed, Fortune, Gannett, and Ars Technica owner Condé Nast—will participate in the experiment. Each publisher will be able to set their own prices that bots must pay before scraping content, Cloudflare said.

Matthew Prince, CEO of Cloudflare, said the feature would ensure that the Internet as we know it will survive "the age of AI."

"Original content is what makes the Internet one of the greatest inventions in the last century, and it's essential that creators continue making it," Prince said. "AI crawlers have been scraping content without limits. Our goal is to put the power back in the hands of creators, while still helping AI companies innovate. This is about safeguarding the future of a free and vibrant Internet with a new model that works for everyone."

Some participating publishers expressed optimism in the press release that Cloudflare's pay-per-crawl feature could potentially stop the endless scraping that publishers defending copyrights have alleged represents wide-scale theft.

Any content creators interested in joining the beta can sign up, Cloudflare noted, and perhaps eventually, they too can "be compensated for their contributions to the AI economy."

In the meantime, only the publishers involved in the beta will be able to choose which bots can access which parts of their sites, experimenting with blocking all bots or allowing certain bots to access certain content.

Cloudflare's program also gives them the flexibility to charge some bots while letting other bots scrape for free. This lets publishers that have negotiated deals with AI companies to allow approved scraping while still protecting their content from companies that have not yet struck licensing deals.

AI companies must buy in

For Cloudflare's plan to work, AI companies must sign up, too. However, while some AI companies may not see the incentive, Cloudflare has confirmed that it has partnered with AI companies on the initiative, which may benefit from having a simple interface to negotiate with content creators.

Cloudflare suggested its AI partners benefit from "long-term collaboration" with creators whose updated content will help AI products stay relevant. They also can stop wasting money scraping poor quality data sources, a Cloudflare blog said.

"Without ongoing contributions from content creators, AI systems risk becoming outdated, biased, or less reliable—ultimately diminishing user trust and the value of AI products," the blog said. "Cloudflare is working with AI companies to give them more signals, and ultimately improve the quality and relevance of content they can access. A healthy, sustainable ecosystem of original content is critical for AI innovation and relevance."

However, Cloudflare's gamble seems to depend on AI companies agreeing to pay the prices set by publishers, and that could potentially scramble the experiment if bidding wars reduce rates to the point that they alienate publishers. It also hinges on Cloudflare detecting the AI bots, which, for now, relies on user reports and Cloudflare's analysis of mass traffic patterns.

"In the early days, price discovery will play a key role—as creators gain data on whoʼs paying for what, a transparent market will emerge that reflects the true value of original content," Cloudflare said.

Looking to the future, Cloudflare suggested that its pay-per-crawl system would "evolve significantly." Perhaps one day publishers could use it to "charge different rates for different paths or content types," potentially even introducing dynamic pricing in the AI scraping environment. In that future, Cloudflare predicted that AI companies would possibly be incentivized to create agents that would crawl the web, seeking the best content deals to support specific AI products.

"Imagine asking your favorite deep research program to help you synthesize the latest cancer research or a legal brief, or just help you find the best restaurant in Soho—and then giving that agent a budget to spend to acquire the best and most relevant content," Cloudflare said, promising that "we enable a future where intelligent agents can programmatically negotiate access to digital resources."

AI crawlers now blocked by default

Cloudflare's announcement comes after rolling out a feature last September, allowing website owners to block AI crawlers in a single click. According to Cloudflare, over 1 million customers chose to block AI crawlers, signaling that people want more control over their content at a time when Cloudflare observed that writing instructions for AI crawlers in robots.txt files was widely "underutilized."

To protect more customers moving forward, any new customers (including anyone on a free plan) who sign up for Cloudflare services will have their domains, by default, set to block all known AI crawlers.

This marks Cloudflare's transition away from the dreaded opt-out models of AI scraping to a permission-based model, which a Cloudflare spokesperson told Ars is expected to "fundamentally change how AI companies access web content going forward."

In a world where some website owners have grown sick and tired of attempting and failing to block AI scraping through robots.txt—including some trapping AI crawlers in tarpits to punish them for ignoring robots.txt—Cloudflare's feature allows users to choose granular settings to prevent blocks on AI bots from impacting bots that drive search engine traffic. That's critical for small content creators who want their sites to still be discoverable but not digested by AI bots.

"AI crawlers collect content like text, articles, and images to generate answers, without sending visitors to the original source—depriving content creators of revenue, and the satisfaction of knowing someone is reading their content," Cloudflare's blog said. "If the incentive to create original, quality content disappears, society ends up losing, and the future of the Internet is at risk."

Disclosure: Condé Nast, which owns Ars Technica, is a partner in Cloudflare’s pay-per-crawl beta.