Tech

Harvard and Google to release 1 million public-domain books as AI training dataset | TechCrunch

Published

8 hours ago

December 12, 2024

Admin

Harvard and Google to release 1 million public-domain books as AI training dataset | TechCrunch

AI training data has a big price tag, one best-suited for deep-pocketed tech firms. This is why Harvard University plans to release a dataset that includes in the region of 1 million public-domain books, spanning genres, languages, and authors including Dickens, Dante, and Shakespeare, which are no longer copyright-protected due to their age.

The new dataset isn’t available yet, and it’s not clear when or how it will be released. However, it contains books derived from Google’s longstanding book-scanning project, Google Books, and thus Google will be involved in releasing “this treasure trove far and wide.”

Harvard first teased the Institutional Data Initiative (IDI) back in March, outlining its plans to create a “trusted conduit for legal data for AI.” However, not much has been heard from it until its formal launch today, which came with confirmation that the IDI includes financial backing from Microsoft and OpenAI.

The IDI’s executive director Greg Leppert says the dataset’s designed to “level the playing field” by opening up such a huge dataset to anyone — from research labs to AI startups — that want to train their large language models (LLMs).

Up Next

Intel Arc B580 review: The first worthy budget GPU of the decade

Don't Miss

OnlyFans Chatters Using AI Impersonators to Engage Fans: A New Trend

Crunchbase News Today

Harvard and Google to release 1 million public-domain books as AI training dataset | TechCrunch

Tech

Harvard and Google to release 1 million public-domain books as AI training dataset | TechCrunch

Free Fitness Passes Continue in 2025 for Montgomery County Residents – The MoCo Show

Is It Safe to Travel to South Korea Right Now? I Just Got Back—Here’s What It’s Like

The Stealth-Wealth Brand That All the Stars Wore This Year (And No, It Wasn’t The Row)

Legalizing Online Gambling In Wyoming Could Rake In Nearly $900M, Study Says

Cancer, Daily Horoscope Today, December 13, 2024: Entrepreneurs should explore new ventures – Times of India

The sought-after PlayStation 5 is at an all-time low price on Amazon

Matthew Berry’s Love/Hate for Week 15 of 2024 season

Travel restrictions in place at Thousand Islands bridge crossings

Team USA Breaks Women’s 4×200 Freestyle Relay World Record With A 7:30.31

Shopping Shouldn’t Be Instantaneous