New York Times successfully removes copyrighted content from AI training dataset | The Business Standard
Skip to main content
  • Latest
  • Economy
    • Banking
    • Stocks
    • Industry
    • Analysis
    • Bazaar
    • RMG
    • Corporates
    • Aviation
  • Videos
    • TBS Today
    • TBS Stories
    • TBS World
    • News of the day
    • TBS Programs
    • Podcast
    • Editor's Pick
  • World+Biz
  • Features
    • Panorama
    • The Big Picture
    • Pursuit
    • Habitat
    • Thoughts
    • Splash
    • Mode
    • Tech
    • Explorer
    • Brands
    • In Focus
    • Book Review
    • Earth
    • Food
    • Luxury
    • Wheels
  • Subscribe
    • Get the Paper
    • Epaper
    • GOVT. Ad
  • More
    • Sports
    • TBS Graduates
    • Bangladesh
    • Supplement
    • Infograph
    • Archive
    • Gallery
    • Long Read
    • Interviews
    • Offbeat
    • Magazine
    • Climate Change
    • Health
    • Cartoons
  • বাংলা
The Business Standard

Sunday
July 20, 2025

Sign In
Subscribe
  • Latest
  • Economy
    • Banking
    • Stocks
    • Industry
    • Analysis
    • Bazaar
    • RMG
    • Corporates
    • Aviation
  • Videos
    • TBS Today
    • TBS Stories
    • TBS World
    • News of the day
    • TBS Programs
    • Podcast
    • Editor's Pick
  • World+Biz
  • Features
    • Panorama
    • The Big Picture
    • Pursuit
    • Habitat
    • Thoughts
    • Splash
    • Mode
    • Tech
    • Explorer
    • Brands
    • In Focus
    • Book Review
    • Earth
    • Food
    • Luxury
    • Wheels
  • Subscribe
    • Get the Paper
    • Epaper
    • GOVT. Ad
  • More
    • Sports
    • TBS Graduates
    • Bangladesh
    • Supplement
    • Infograph
    • Archive
    • Gallery
    • Long Read
    • Interviews
    • Offbeat
    • Magazine
    • Climate Change
    • Health
    • Cartoons
  • বাংলা
SUNDAY, JULY 20, 2025
New York Times successfully removes copyrighted content from AI training dataset

Tech

TBS Report
18 November, 2023, 09:00 am
Last modified: 18 November, 2023, 09:00 am

Related News

  • Nvidia's Huang hails Chinese AI models as 'world class'
  • Meta's Zuckerberg pledges hundreds of billions for AI data centers in superintelligence push
  • Dubai to debut restaurant operated by an AI chef
  • Google hires Windsurf execs in $2.4 billion deal to advance AI coding ambitions
  • No invitation for observers who certified last 3 elections as credible: CEC

New York Times successfully removes copyrighted content from AI training dataset

NYT is concerned that AI models provide answers directly, diverting users from original sources

TBS Report
18 November, 2023, 09:00 am
Last modified: 18 November, 2023, 09:00 am
Photo: Reuters
Photo: Reuters

Many online content creators have become aware that tech companies have used their work to train AI models without permission or compensation. Some are now taking steps to address this issue.

The New York Times discovered that one of the largest AI training datasets, Common Crawl, contained links to their paid articles and copyrighted content. Common Crawl has been accumulating web data since 2007, serving as a foundation for training various large language models, including OpenAI's GPT-3. Approximately 12.5% of Google's Infiniset data comes from a refined version of Common Crawl, known as C4.

Although AI models benefit significantly from this training data, The New York Times has concerns. These models provide answers directly, diverting users from the original source of information, which, in this case, uses NYT's copyrighted content.

The Business Standard Google News Keep updated, follow The Business Standard's Google news channel

"We simply asked that our content be removed, and were pleased that Common Crawl complied with our request and recognized The Times's ownership of our quality journalistic content," Charlie Stadtlander, a spokesman at The New York Times, told Business Insider.

As a result, The New York Times reached out to the Common Crawl Foundation earlier this year, requesting the removal of their content from the dataset. Common Crawl complied with their request and acknowledged the ownership of The Times's quality journalistic content. Furthermore, Common Crawl committed not to scrape any more content from The New York Times in the future, as detailed in a letter sent to the US Copyright Office.

The New York Times also discovered its restricted articles behind a paywall and other copyrighted material in various widely used AI training datasets. The NYT mentioned in a letter to the US Copyright Office that about 1.2% of the recreated WebText, previously utilized to train OpenAI's ChatGPT-2, contained content from their publication.

It's unclear if The New York Times has managed to get its content removed from WebText and other AI training datasets, reports Business Insider.

New York Times / AI / Copyright

Comments

While most comments will be posted if they are on-topic and not abusive, moderation decisions are subjective. Published comments are readers’ own views and The Business Standard does not endorse any of the readers’ comments.

Top Stories

  • Infograph: TBS
    Liquidation of troubled NBFIs may cost govt Tk12,000cr in taxpayer money
  • Infograph: TBS
    Dhaka to seek G2G coal import, investment in solar plants in CA’s visit to Jakarta
  • Infograph: TBS
    Govt outlines Tk16,738cr health, nutrition programme for five years

MOST VIEWED

  • Representational Photo: Collected
    Railway allocates special trains for Jamaat's national rally in Dhaka
  • Chief Adviser Muhammad Yunus and SpaceX Vice President Lauren Dreyer after a meeting at state guest house Jamuna on 18 July 2025. Photo: Focus Bangla
    SpaceX VP Lauren Dreyer praises Bangladesh's efficiency in facilitating Starlink launch
  • Dollar rate falling fast – what it means for the economy
    Dollar rate falling fast – what it means for the economy
  • Governments often rely on foreign loans. Russia’s loans covered 90% of the Rooppur Nuclear Power plant project's cost. Photo: Collected
    Loan tenure for Rooppur plant extended 
  • Representational image. Photo: Unsplash
    Mobile operators give 1GB free data to users observing 'Free Internet Day' today
  • Smuggled goods seized at Sylhet border on 18 July 2025. Photo: TBS
    BGB seizes smuggled Indian goods worth Tk6cr from Sylhet border areas

Related News

  • Nvidia's Huang hails Chinese AI models as 'world class'
  • Meta's Zuckerberg pledges hundreds of billions for AI data centers in superintelligence push
  • Dubai to debut restaurant operated by an AI chef
  • Google hires Windsurf execs in $2.4 billion deal to advance AI coding ambitions
  • No invitation for observers who certified last 3 elections as credible: CEC

Features

Tottho Apas have been protesting in front of the National Press Club in Dhaka for months, with no headway in sight. Photo: Mehedi Hasan

From empowerment to exclusion: The crisis facing Bangladesh’s Tottho Apas

11h | Panorama
The main points of clashes were in Jatrabari, Uttara, Badda, and Mirpur. Violence was also reported in Mohammadpur. Photo: TBS

20 July 2024: At least 37 killed amid curfew; Key coordinator Nahid Islam detained

10h | Panorama
Jatrabari in the capital looks like a warzone as police, alongside Chhatra League men, swoop on quota reform protesters. Photo: Mehedi Hasan

19 July 2024: At least 148 killed as government attempts to quash protests violently

1d | Panorama
Illustration: TBS

Curfews, block raids, and internet blackouts: Hasina’s last ditch efforts to cling to power

1d | Panorama

More Videos from TBS

Miscreants set fire to a bus in the capital's Pallabi area

Miscreants set fire to a bus in the capital's Pallabi area

39m | TBS Today
Why has India failed to utilize its potential?

Why has India failed to utilize its potential?

2h | Others
After Gopalganj, the reason why NCP is facing obstacles in Cox's Bazar?

After Gopalganj, the reason why NCP is facing obstacles in Cox's Bazar?

12h | TBS Today
What does Jamaat Nayeb Ameer Abdullah Taher say about reforms?

What does Jamaat Nayeb Ameer Abdullah Taher say about reforms?

12h | TBS Today
EMAIL US
contact@tbsnews.net
FOLLOW US
WHATSAPP
+880 1847416158
The Business Standard
  • About Us
  • Contact us
  • Sitemap
  • Advertisement
  • Privacy Policy
  • Comment Policy
Copyright © 2025
The Business Standard All rights reserved
Technical Partner: RSI Lab

Contact Us

The Business Standard

Main Office -4/A, Eskaton Garden, Dhaka- 1000

Phone: +8801847 416158 - 59

Send Opinion articles to - oped.tbs@gmail.com

For advertisement- sales@tbsnews.net