Tech
Apple’s AI training faces backlash as major publishers opt out
For decades, these digital bots have been quietly collecting information from the internet, feeding it to everything from search engines to AI models. But as AI has become more powerful, the stakes have risen. Now, publishers are drawing a line in the sand, demanding control over their content and challenging Apple’s AI ambitions.
Apple’s web crawler, Applebot, was initially designed to power features like Siri and Spotlight. However, it has recently taken on another big role: gathering data to train Apple’s foundational AI models, or what the company calls “Apple Intelligence”. This data includes text, images, and other content.
What is Robots.txt?
Robots.txt is a file used by website owners to control which bots can access their content. Publishers are increasingly using it to block AI bots from scraping their websites for training data. This is due to concerns about copyright and the potential misuse of their content.While robots.txt is a relatively simple tool, it has become more complex in the age of AI. With the rapid emergence of new AI agents, it can be challenging for publishers to keep their block lists up-to-date. As a result, many are turning to services that automatically update their robots.txt files.
The backlash
Turns out that some media outlets like The New York Times, for example, has been outspoken in its criticism of Apple’s opt-out approach. The paper, which is suing OpenAI over copyright infringement, argues that publishers shouldn’t have to opt-out to begin with; instead, a permission must be required for web crawlers to gain access to the media’s content.
Other popular websites that have opted out also include Instagram, Facebook, Tumblr, Craigslist, The Financial Times, The Atlantic, Vox Media, the USA Today network, and WIRED’s parent company, Condé Nast.
So, what’s next? Will Apple be forced to rethink its AI strategy? Or will it find a way to appease publishers and continue its data-driven ambitions? The battle for control of the internet’s digital goldmine is far from over.