Bitget App
Trade smarter
Open
HomepageSign up
Bitget>
News>
NVIDIA faces scrutiny over alleged unlicensed data scraping for AI models

NVIDIA faces scrutiny over alleged unlicensed data scraping for AI models

Cryptopolitan2024/08/04 16:00
By: By Brenda Kanana
AIOLD0.00%MIT0.00%
Share link:In this post: Leaked documents show that NVIDIA collected data from movies and YouTube videos without consent. NVIDIA claims its data scraping is legal under fair use provisions. Internal communications show some employees were concerned about legal issues.

Leaked documents obtained by 404 Media suggest NVIDIA engaged in unlicensed data scraping, using movie and game footage from across the internet to train its artificial intelligence products. 

The leaked documents reveal that they were trying to download full movies from various channels, including Netflix, and their primary interest was in YouTube videos. From the emails obtained by 404 Media, the project managers intended to employ between 20 and 30 virtual machines on Amazon Web Services to obtain 80 years of videos in a day. 

NVIDIA defends its actions and invokes fair use provisions

Data scraping is the practice of extracting video, textual, and audio content from the internet without the permission of the content owners to train AI models. This practice could be seen as the use of content from social media platforms that contain copyrighted content. 

NVIDIA has said that it did not break any copyright laws in the process of data scraping. The company also stated that its activities fall under the fair use doctrine because it utilizes copyrighted material for training AI.

Documents obtained from internal communications by 404 Media indicate that some NVIDIA employees expressed concerns over these data scraping activities. However, project managers allegedly downplayed the concerns, stating that legal concerns, for example, violations of YouTube’s Terms of Service, would be dealt with later on. 

See also OpenAI delays release of AI detection tool amid concerns

One employee pointed out that NVIDIA’s AI engineers tried to get as many game clips as possible to enrich the training corpus. This entailed streaming the gameplay to NVIDIA’s GeForceNow cloud service to record gameplay videos in high definition.Jim Fan, senior research analyst, in internal messages also stressed the importance of such footage as the input for the training of the AI model.

Company takes steps to manage public perception of data practices

The documents also detail NVIDIA’s attempts at damage control over the repercussions of such practices. According to leaked emails, Research VP Ming-Yu Liu recommended that the company should avoid releasing any papers related to the data scraping techniques to prevent public backlash. It also created its own set of YouTube data scraping tools and API accounts to help in the data-gathering process.

The legal position regarding the rules governing the use of AI in scraping data is still not very clear. According to MIT’s Robert Mahari, it can be quite complicated to establish that data scraping has indeed occurred. Organizations may gain from not revealing the sources of their training data as it becomes hard to prove abuse in the absence of tangible proof. 

Another platform, Suno, an AI music generation platform, recently came under the spotlight for admitting the use of data scraping to train artificial intelligence models. As previously reported by Cryptopolitan, Reddit CEO Steve Huffman stated that the company will continue to prohibit Microsoft and other AI firms from using data scraping until payment is made and control of how the data is used is gained by the platform. He said that Reddit would not permit data scraping for use in training AI models without the proper license. 

See also Microsoft to revise partnership with UAE's G42 over its Chinese ties
Disclaimer: The content of this article solely reflects the author's opinion and does not represent the platform in any capacity. This article is not intended to serve as a reference for making investment decisions.
PoolX: Earn new token airdrops
Lock your assets and earn 10%+ APR
Lock now!

You may also like

New spot margin trading pair — BARD/USDT!
Bitget Announcement2025/09/19 07:28
BTC/ETH VIP Earn Ultimate Carnival is officially here!
Bitget Announcement2025/09/18 07:12
New spot margin trading pair — FLOCK/USDT!
Bitget Announcement2025/09/18 06:55
0GUSDT now launched for pre-market futures trading
Bitget Announcement2025/09/18 05:39

Trending news

More
1
New spot margin trading pair — BARD/USDT!
2
BTC/ETH VIP Earn Ultimate Carnival is officially here!

Crypto prices

More
Bitcoin
Bitcoin
BTC
$116,555.41
-0.67%
Ethereum
Ethereum
ETH
$4,522.84
-1.65%
XRP
XRP
XRP
$3.03
-2.81%
Tether USDt
Tether USDt
USDT
$1
+0.02%
BNB
BNB
BNB
$995.49
-0.58%
Solana
Solana
SOL
$242.82
-1.64%
USDC
USDC
USDC
$0.9998
-0.00%
Dogecoin
Dogecoin
DOGE
$0.2736
-2.61%
TRON
TRON
TRX
$0.3466
+0.42%
Cardano
Cardano
ADA
$0.9025
-1.66%
How to sell PI
Bitget lists PI – Buy or sell PI quickly on Bitget!
Trade now
Become a trader now?A welcome pack worth 6200 USDT for new users!
Sign up now
Trade smarter