Artwork

İçerik The Nonlinear Fund tarafından sağlanmıştır. Bölümler, grafikler ve podcast açıklamaları dahil tüm podcast içeriği doğrudan The Nonlinear Fund veya podcast platform ortağı tarafından yüklenir ve sağlanır. Birinin telif hakkıyla korunan çalışmanızı izniniz olmadan kullandığını düşünüyorsanız burada https://tr.player.fm/legal özetlenen süreci takip edebilirsiniz.
Player FM - Podcast Uygulaması
Player FM uygulamasıyla çevrimdışı Player FM !

LW - The Data Wall is Important by JustisMills

4:05
 
Paylaş
 

Fetch error

Hmmm there seems to be a problem fetching this series right now. Last successful fetch was on September 22, 2024 16:12 (5d ago)

What now? This series will be checked again in the next day. If you believe it should be working, please verify the publisher's feed link below is valid and includes actual episode links. You can contact support to request the feed be immediately fetched.

Manage episode 422957158 series 3337129
İçerik The Nonlinear Fund tarafından sağlanmıştır. Bölümler, grafikler ve podcast açıklamaları dahil tüm podcast içeriği doğrudan The Nonlinear Fund veya podcast platform ortağı tarafından yüklenir ve sağlanır. Birinin telif hakkıyla korunan çalışmanızı izniniz olmadan kullandığını düşünüyorsanız burada https://tr.player.fm/legal özetlenen süreci takip edebilirsiniz.
Link to original article
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: The Data Wall is Important, published by JustisMills on June 10, 2024 on LessWrong.
Modern AI is trained on a
huge fraction of the internet, especially at the cutting edge, with the best models trained on close to all the high quality data we've got.[1] And data is
really important! You can scale up compute, you can make algorithms more efficient, or you can add infrastructure around a model to make it more useful, but on the margin, great datasets are king. And, naively, we're about to run out of fresh data to use.
It's rumored that the top firms are looking for ways to get around the data wall. One possible approach is having LLMs create their own data to train on, for which there is kinda-sorta a precedent from, e.g. modern chess AIs learning by playing games against themselves.[2] Or just finding ways to make AI dramatically more sample efficient with the data we've already got: the existence of human brains proves that this is, theoretically, possible.[3]
But all we have, right now, are rumors. I'm not even personally aware of rumors that any lab has cracked the problem: certainly, nobody has come out and say so in public! There's a lot of insinuation that the data wall is not so formidable, but no hard proof. And if the data wall is a hard blocker, it could be very hard to get AI systems much stronger than they are now.
If the data wall stands, what would we make of today's rumors? There's certainly an optimistic mood about progress coming from AI company CEOs, and a steady trickle of not-quite-leaks that exciting stuff is going on behind the scenes, and to stay tuned. But there are at least two competing explanations for all this:
Top companies are already using the world's smartest human minds to crack the data wall, and have all but succeeded.
Top companies need to keep releasing impressive stuff to keep the money flowing, so they declare, both internally and externally, that their current hurdles are surmountable.
There's lots of precedent for number two! You may have heard of startups hard coding a feature and then scrambling to actually implement it when there's interest.
And race dynamics make this even more likely: if OpenAI projects cool confidence that it's almost over the data wall, and Anthropic doesn't, then where will all the investors, customers, and high profile corporate deals go? There also could be an echo chamber effect, where one firm acting like the data wall's not a big deal makes other firms take their word for it.
I don't know what a world with a strong data wall looks like in five years. I bet it still looks pretty different than today! Just improving GPT-4 level models around the edges, giving them better tools and scaffolding, should be enough to spur massive economic activity and, in the absence of government intervention, job market changes. We can't unscramble the egg. But the "just trust the straight line on the graph" argument is ignoring that one of the determinants of that line is running out.
There's a world where the line is stronger than that particular constraint, and a new treasure trove of data appears in time. But there's also a world where it isn't, and we're near the inflection of an
S-curve.
Rumors and projected confidence can't tell us which world we're in.
1. ^
For good analysis of this, search for the heading "The data wall" here.
2. ^
But don't take this parallel too far! Chess AI (or AI playing any other game) has a signal of "victory" that it can seek out - it can preferentially choose moves that systematically lead to the "my side won the game" outcome. But the core of a LLM is a text predictor: "winning" for it is correctly guessing what comes next in human-created text.
What does self-play look like there? Merely making up fake human-created text has the obvious issue of amplifying any weaknesses the AI has ...
  continue reading

1851 bölüm

Artwork
iconPaylaş
 

Fetch error

Hmmm there seems to be a problem fetching this series right now. Last successful fetch was on September 22, 2024 16:12 (5d ago)

What now? This series will be checked again in the next day. If you believe it should be working, please verify the publisher's feed link below is valid and includes actual episode links. You can contact support to request the feed be immediately fetched.

Manage episode 422957158 series 3337129
İçerik The Nonlinear Fund tarafından sağlanmıştır. Bölümler, grafikler ve podcast açıklamaları dahil tüm podcast içeriği doğrudan The Nonlinear Fund veya podcast platform ortağı tarafından yüklenir ve sağlanır. Birinin telif hakkıyla korunan çalışmanızı izniniz olmadan kullandığını düşünüyorsanız burada https://tr.player.fm/legal özetlenen süreci takip edebilirsiniz.
Link to original article
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: The Data Wall is Important, published by JustisMills on June 10, 2024 on LessWrong.
Modern AI is trained on a
huge fraction of the internet, especially at the cutting edge, with the best models trained on close to all the high quality data we've got.[1] And data is
really important! You can scale up compute, you can make algorithms more efficient, or you can add infrastructure around a model to make it more useful, but on the margin, great datasets are king. And, naively, we're about to run out of fresh data to use.
It's rumored that the top firms are looking for ways to get around the data wall. One possible approach is having LLMs create their own data to train on, for which there is kinda-sorta a precedent from, e.g. modern chess AIs learning by playing games against themselves.[2] Or just finding ways to make AI dramatically more sample efficient with the data we've already got: the existence of human brains proves that this is, theoretically, possible.[3]
But all we have, right now, are rumors. I'm not even personally aware of rumors that any lab has cracked the problem: certainly, nobody has come out and say so in public! There's a lot of insinuation that the data wall is not so formidable, but no hard proof. And if the data wall is a hard blocker, it could be very hard to get AI systems much stronger than they are now.
If the data wall stands, what would we make of today's rumors? There's certainly an optimistic mood about progress coming from AI company CEOs, and a steady trickle of not-quite-leaks that exciting stuff is going on behind the scenes, and to stay tuned. But there are at least two competing explanations for all this:
Top companies are already using the world's smartest human minds to crack the data wall, and have all but succeeded.
Top companies need to keep releasing impressive stuff to keep the money flowing, so they declare, both internally and externally, that their current hurdles are surmountable.
There's lots of precedent for number two! You may have heard of startups hard coding a feature and then scrambling to actually implement it when there's interest.
And race dynamics make this even more likely: if OpenAI projects cool confidence that it's almost over the data wall, and Anthropic doesn't, then where will all the investors, customers, and high profile corporate deals go? There also could be an echo chamber effect, where one firm acting like the data wall's not a big deal makes other firms take their word for it.
I don't know what a world with a strong data wall looks like in five years. I bet it still looks pretty different than today! Just improving GPT-4 level models around the edges, giving them better tools and scaffolding, should be enough to spur massive economic activity and, in the absence of government intervention, job market changes. We can't unscramble the egg. But the "just trust the straight line on the graph" argument is ignoring that one of the determinants of that line is running out.
There's a world where the line is stronger than that particular constraint, and a new treasure trove of data appears in time. But there's also a world where it isn't, and we're near the inflection of an
S-curve.
Rumors and projected confidence can't tell us which world we're in.
1. ^
For good analysis of this, search for the heading "The data wall" here.
2. ^
But don't take this parallel too far! Chess AI (or AI playing any other game) has a signal of "victory" that it can seek out - it can preferentially choose moves that systematically lead to the "my side won the game" outcome. But the core of a LLM is a text predictor: "winning" for it is correctly guessing what comes next in human-created text.
What does self-play look like there? Merely making up fake human-created text has the obvious issue of amplifying any weaknesses the AI has ...
  continue reading

1851 bölüm

Toate episoadele

×
 
Loading …

Player FM'e Hoş Geldiniz!

Player FM şu anda sizin için internetteki yüksek kalitedeki podcast'leri arıyor. En iyi podcast uygulaması ve Android, iPhone ve internet üzerinde çalışıyor. Aboneliklerinizi cihazlar arasında eş zamanlamak için üye olun.

 

Hızlı referans rehberi