Tech

Artificial intelligence web crawlers are running amok

Published

6 months ago

July 5, 2024

Admin

Artificial intelligence web crawlers are running amok

Artificial intelligence tech companies are refusing to abide by internet protocol when it comes to scraping data. Their ravenous scavenging behavior is upending the basic rules of the internet.

AILSA CHANG, HOST:

On every website, there’s a message that contains a hidden stop sign. It’s intended for bots, not humans, a way of saying, do not scan this part of the website. The artificial intelligence industry is ignoring these stop signs, and understanding why sheds light on how AI companies are turning the web upside down. NPR’s Bobby Allyn reports.

BOBBY ALLYN, BYLINE: The story starts in the mid-’90s, the days of dial-up internet. The web was slow, and maintaining a site was expensive, especially when bots scanned your whole website, as they often did to create a copy for, say, askjeeves.com. Overwhelmed with requests from automated bots, web servers started to crash, and internet bills spiked. So developers came up with a solution, a hidden plain text file in the back-end software code of every website, it was intended for bots. It became known as robots.txt.

COLLEEN CHIEN: And a robot.txt file then puts a sign in front of that website to say, if you’re a robot, you know, sort of this visitor, you need to abide by the rules here. This is, you know, where you are or aren’t welcome. This is what you can and can’t do.

ALLYN: That’s Colleen Chien of UC Berkeley Law School, who teaches classes on how AI is changing the web. Over the years, the robots.txt page became something of a social contract for the entire internet. Tech giants like Google and Facebook adopted it. And even though it had no legal teeth, it was respected. Say there’s a corporate or administrative page you don’t want showing up on Google, put it in the file. It helped hold the entire internet together, says former Google engineer Jacob Hoffman-Andrews.

JACOB HOFFMAN-ANDREWS: That system has remarkably worked well for 30 years.

ALLYN: Till now. In response to data hungry AI companies gobbling up every corner of the internet, websites have started to put AI companies in this file, a way of telling ChatGPT, stop, do not scrape here. But here’s the problem. The AI industry is ignoring it. Just recently, Amazon Web Services announced it is investigating popular AI search engine Perplexity over this. Officials from Perplexity wouldn’t talk to me for the story, but in a statement, the company said, quote, “robots.txt is not a legal framework.” That might sound like a, OK, who cares kind of thing at first, but Jacob Hoffman-Andrews says breaking this norm could change the entire internet.

HOFFMAN-ANDREWS: There’s a chance for that whole kind of open-web-based order to break down. The websites that do exist could retreat behind logins and become private communities. The concept of the internet as the world’s biggest library would start to fail.

ALLYN: And if that happened on a wide scale, navigating the web could become really annoying. You probably have noticed this already – more and more websites requiring accounts and logins. Sometimes that’s about paying for content, but increasingly, it’s about fighting back against AI companies. As they explode norms in search of more data, the AI firms are getting richer. But those being mined for content aren’t getting much in return. That’s why something seemingly small like ignoring a stop sign for bots has become a rallying cry in Silicon Valley against the whole AI industry, says legal scholar Colleen Chien.

CHIEN: These models become more and more powerful, the question of well, who gets to sort of keep the riches that are generated by these amazing new technologies is increasingly important.

ALLYN: It’s that question that’s tapping into angst shared by so many creatives and website publishers right now. When, say, Google scrapes your website, you get, in return, web traffic. But when an AI tool scrapes your website, you’re not really getting much in return, which is why the robots.txt file has become a way of saying, no thanks, do not do that here. With the AI industry scraping away anyway, more and more corners of the internet may soon become harder to access for everyone. Bobby Allyn, NPR News.

Copyright © 2024 NPR. All rights reserved. Visit our website terms of use and permissions pages at www.npr.org for further information.

NPR transcripts are created on a rush deadline by an NPR contractor. This text may not be in its final form and may be updated or revised in the future. Accuracy and availability may vary. The authoritative record of NPR’s programming is the audio record.

Related Topics:

Up Next
First Asus ROG NUCs feature Intel CPUs and discrete Nvidia RTX GPUs

Don't Miss
Madonna, 65, puts on VERY flirty display with hunky younger man as she reflects on ‘miraculous recovery’ following near-death health scare

Continue Reading

Latest

Fashion23 minutes ago

Kate ‘ahead of fashion trends’ as princess remains ‘style influence’ amid ‘staggered’ royal return

Jobs28 minutes ago

Inmates are learning to code in prison. Jobs may be hard to come by | CNN Business

Travel32 minutes ago

The Safest Places To Travel In 2025, According To A New Report

Horoscope40 minutes ago

Weekly Chinese Horoscope from December 30, 2024- January 5, 2025

Horoscope58 minutes ago

Career Horoscope 2025: Predictions and Forecast for All Zodiac Signs – Times of India

Jobs1 hour ago

20 Fastest Growing Jobs in Florida

World1 hour ago

Pope Francis: Hope and kindness make the world more beautiful – Vatican News

Bussiness1 hour ago

I turned a wellness side hustle into a full-time business that made over $1 million in sales its first year: I’m ‘doing what I set out to do as a kid’

Travel1 hour ago

Deepak Chopra’s Approach to Travel Is So Simple, but Important

Travel1 hour ago

This Remote Italian Island Has Dramatic Landscapes, Hot Springs, and Vineyards

Crunchbase News Today

Artificial intelligence web crawlers are running amok

Kate ‘ahead of fashion trends’ as princess remains ‘style influence’ amid ‘staggered’ royal return

Inmates are learning to code in prison. Jobs may be hard to come by | CNN Business

The Safest Places To Travel In 2025, According To A New Report

Weekly Chinese Horoscope from December 30, 2024- January 5, 2025

Career Horoscope 2025: Predictions and Forecast for All Zodiac Signs – Times of India

20 Fastest Growing Jobs in Florida

Pope Francis: Hope and kindness make the world more beautiful – Vatican News

I turned a wellness side hustle into a full-time business that made over $1 million in sales its first year: I’m ‘doing what I set out to do as a kid’

Deepak Chopra’s Approach to Travel Is So Simple, but Important

This Remote Italian Island Has Dramatic Landscapes, Hot Springs, and Vineyards