The real data wall is billions of years of evolution

dynomight.substack.com

156 points by walterbell 9 months ago

randcraw 9 months ago

This essay sort of waves a hand at the sub-symbolic roots of knowledge that lie beneath text and that babies spend several years mastering before they are ever exposed to text. IMHO the proper measure of that latent knowledge is qualitative, not quantitative.

It's the tacit 'grounded knowledge' of the world that's present in humans that has the potential to fully fill in LLMs' causal blank in their text-based superficial info. This kind of knowledge is threadbare in today's LLMs, but essential to form a basis for further self-education in any intelligent agent. I know STaR and RLHF have been suggested as synthetic means to achieve that experimental end, but I'm not sure they're sufficient to connect the dots between LLMs' high-level book learning and human babies' low-level experiment-based intuition for cause and effect. But adding yet more text data is surely NOT the way to span that chasm.

roenxi 9 months ago

It is an intelligent argument but it isn't leveraging it's own insight enough.

> So then how do humans generalize so well from so little language data? Is “pre-training” on visual data the secret to our success? No. Because… blind people? What are we doing here?

The problem there is that the author making the same mistake they identified earlier. Things like vision or touch are techniques, what matters is the amount of data that is ingested through them. Vision obviously presents vastly more data than text. But a sense of touch is actually vastly more data too, because touch is interacting with the same source of data as vision. It loses colour information and the bandwidth is lower, but there is still a lot more there than text.

If the article had estimated how much data was present in bytes it wouldn't have dismissed vision so easily. What matters is there are orders of magnitude more data than all the text on the internet available for training in the real world. Data that can't be polluted by AI blogspam, for that matter.

jcgrillo 9 months ago

How many bits of all that sensory information--note that our vision is nearly optimally sensitive--actually make it to the brain? I think there's got to be a large amount of low-effort (structurally baked-in, automatic) filtering, aggregation, and lossy compression happening, otherwise it would just be way too much, right?
- s1artibartfast 9 months ago
  
  While im sure there is a lot of noise filtered out, the opposite is true as well. Much of what we perceive is interpolated and pattern matched filler.
  I dont know if this is due to source input limitations, or if it is a compression>processing>decompression technique for efficiency. Either way, it does imply that the amount of data desired is more than makes it through the bottleneck.
  - jcgrillo 9 months ago
    
    Given evolution's "modularity/abstraction/engineering principles be damned" approach to problem solving, I'd wager the shape of the bottleneck and every other little detail is actually important.
    Put another way, evolution writes the worst side-effecting spaghetti code you've ever seen that somehow, (seemingly) miraculously does exactly the right thing, robustly, more efficiently than you can possibly imagine doing it.
    
    roenxi 9 months ago
    
    > right thing, robustly, more efficiently than you can possibly imagine doing it.
    Well... actually evolution doesn't tend to do that well against an engineered process. It has higher bandwidth than human engineers.
    For example, engineers can move humans around faster and with better energy efficiency [0] than evolution managed directly. We've also figured out more effective ways to organise society (laws & principles) than nature managed (many of our instincts have effects that lead to measurably stupid outcomes, eg, mob forming instincts are just a disaster). There are quite a lot of examples where it turns out engineering > evolution in a quite strict sense.
    [0] https://en.wikipedia.org/wiki/Energy_efficiency_in_transport
    
    skydhash 9 months ago
    
    > There are quite a lot of examples where it turns out engineering > evolution in a quite strict sense.
    Only if you're considering human agents more important than anything else. When considered the system as a whole, nature seems very efficient.
    
    jcgrillo 9 months ago
    
    Evolution made the engineers, though. And the politicians. We are just as much a "part of nature" as, say, rodents.
- roenxi 9 months ago
  
  Practically it has to be higher than the bitrate of audio-visual data that is presented on a computer. Call that 1MBps for a video stream (way under-calling the amount of data human vision reports I would suggest). That'd put a lower cap of around 50 GB/day of new data, 20 TB/year. Of course, computers can train with more than 2 eyes with one location and perspective. We aren't anywhere near the data cap with current training.
  Although to be fair I do suspect that most of that data is repetitive, boring and of little use. In my opinion some sort of check for novel data is probably going to be the next big breakthrough in machine learning.
  - beeflet 9 months ago
    
    > (way under-calling the amount of data human vision reports I would suggest)
    It doesn't matter because the amount of meaningful data for learning is accessible at a lower resolution.
    If someone is born with bad vision such that they effectively see at 1/2 or 1/4th of the resolution of a normal person, It's not like they will grow up to be stupid (as long as they can sit in the front of the class to see the chalkboard).
    In order for visual quality to impair learning it would need to be pretty bad such that you couldn't make out objects or symbols at a reasonable range.
- andrewflnr 9 months ago
  
  Building the noise filter is part of the learning. I don't see any reason to believe that it's "structural". Babies aren't known for their stellar information processing.
whiplash451 9 months ago

Mentioning the blind as evidence that there is no pre-training based on vision is broken in my opinion.
It could very well be that humans do multimodal pretraining (including vision), landing babies with a pre-trained brains that works even without vision.
A machine implementation of this is pre-training a self-driving car on vision + lidar then doing real-time inference on vision only.

lolc 9 months ago

It's weird to me how the article in the very first paragraph mentions brute force, but as data problem. As if we hadn't seen a staggering raise in ops and memory bandwidth. Machines have been able to hold the textual history of humans in short-term memory for a while. But the ops they could perform on that have been limited. Not much point telling a 2005 person to "add data". What were they going to do? Wait 20 years to finish a round we do in a week now?

It's very clear to me that the progress we observe in machine intelligence is due to brute processing power. Of course evolution of learning algorithms is important! But the main evolution that drives progress is in the compute. Algorithms can be iterated on that much faster if your generations are that much shorter.

Why are all these AI companies falling over each other to buy the best compute per watt humans have ever produced? Because compute is king and our head was optimized by evolution to be very efficient at probabilistic computing. That's where machines are catching up.

The mark of intelligence is to not need much data at all.

jcgrillo 9 months ago

> The mark of intelligence is to not need much data at all.
I think part of the answer might be information filtering. The eye can detect single photons, but by the time that information from 10^16 photons/s entering the eyeball gets to the meat CPU it's been filtered down to something relevant and manageable. And at no part in that pipeline is any component operating at more than like 100Hz.
So fine tuning the filters to match the processor--and all the high fidelity sensors--simultaneously sounds like a job for evolutionary search if ever there was one. But this is the wild ass guess of someone who doesn't actually know much about biology or machine learning so take with a big chunk of salt.
- kridsdale3 9 months ago
  
  We're also able to do many kilowatts worth of digital inference equivalent processing, instantaneously, for a couple watts at most in cost.
  - jcgrillo 9 months ago
    
    Right, and we're doing it with highly specialized analog hardware, not the general purpose digital kind. Maybe there's a thread to pull on there as well?
    
    Nevermark 9 months ago
    
    Over all, biological neurons, are hardly highly specialized.
    We do have different types of neurons with some different roles and behavior.
    But once transistors are arranged in circuits, they become highly specialized too.
    Worse, the vast (vast, vast) majority of all neuron types' structure and activity is not performing computation. Loads of mechanisms and activity aimed at simply handling the complex per neuron energy management systems, continually rebuilding internal and boundary structures, replenishing internal components, selectively absorbing resources, breaking down and expelling waste, etc.
    Neurons have an incredible design given they must grow, differentiate, and work for a lifetime without any intelligent planning or intervention.
    But transisters are far smaller, leaner, and faster. And their energy requirement continue to decrease, and heat extraction efficiency improves, we will be able to pack them into fully 3D circuits. They will operate many orders of magnitude faster and more efficiently.
    There was never an inate benefit to being organic, except that they were achievable with evolution. Digital circuits had to wait for a different kind of ecosystem to originate. But once they appeared, it only took them 100 years to rival their multi-billion year old analog counterparts.
    Let's not even bring up quantum circuits, which are getting more practical every year, and will outclass both analog and digital circuits for many machine learning algorithms.
    
    jcgrillo 9 months ago
    
    It's not so much about the neurons, but the systems they connect to. And how they're organized. And how many of them (relatively) are in each component of the brain. And the feedback between motor control and sensory apparatus. I doubt you'll ever get to "intelligence" by building a really fast "brain in a box", even if it has super great transistors.

Nevermark 9 months ago

Important concept for model building:

You don't need more data when the data you have characterizes a problem well. More data is simply redundant and resource wasting. In this case, talking like people about things people talk about is covered well by current data sets. Saying we can't get more data is really saying we have collected at least enough data. Probably more than we need.

Lots of room to improve models though:

Using convolution for vision learning didn't create/require more data than training fully connected matrices. And it considerably increased models efficiency and effectiveness on the same amount of data. Or less.

Likewise, transformers have a limited window of response. Better architectures with open ended windows will be able to do much more. Likely more efficiently and effectively. Without any more data. Maybe with less.

Maybe in a few decades we will reach a wall of optimal models. At the rate models are improving now that doesn't appear to be anytime close.

Finally, once we start challenging models to perform tasks we can't, they will start getting data directly from reality. What works, what doesn't. Just as we have done. The original source of our knowledge wasn't an infinite loop of other people talking back to the beginning of time.

dimatura 9 months ago

I believe this is part of the argument in the post - the "architecture" of the nervous system (and the organism it is an inseparable part of) is itself largely a product of evolution. Its already optimized to deal with the challenges the organism needs to survive/reproduce, and depending on the organism, with little or even no data.
kridsdale3 9 months ago

I wonder if we'll reach the physical nanostructure wall of silicon long before that, and then all progress will have to be algorithmic efficiency gains. The era of Metal Muscle will end and we will return to the era of smart people pondering in coffee shops.
- EliBullockPapa 9 months ago
  
  Even if transistors reach physical limits, there’s always different materials and architecture optimizations. We also know the human brain has far more intelligence per watt than any transistor architecture I know of. The real question is if those will be commercially worth researching.

modeless 9 months ago

> Current language models are trained on datasets fast approaching “all the text, ever”. What happen when it runs out?

Robots.

To reduce hallucinations our AI models need more grounding in the real world. No matter how smart an AI is it won't be able to magically come up with answer to any possible question just by sitting and thinking about it. AIs will need to do experiments and science just as we do.

To maximize the amount of data AIs can train on, we need robots to enable AIs to do their own science in the physical world. Then there is no limit to the data they can gather.

dimatura 9 months ago

I think it's reasonable to argue that data acquired via a sensorimotor loop in an embodied agent will go beyond what you can learn passively from a trove of internet data, but this argument goes beyond that - the "data" in evolution is "learned" (in a fashion) not just from a single agent, but from millions of agents, even those that didn't survive to replicate (the "selection", of course, being a key part of evolution).
A neat thing about the kind of artificial robots we build now is that the process can be massively sped up compared to the plodding trial and error of natural evolution.
- modeless 9 months ago
  
  Exactly. We have huge advantages over evolution in some regards. All of the experience from every robot can be combined into a single agent, so even if AI is not as sample efficient as human brains it could still far surpass us. And honestly the jury is still out on sample efficiency. We haven't yet attempted to train models on the same kind of data a human child gets, and once we do we may find that we are not as far away from the brain's sample efficiency as we thought.
  - jcgrillo 9 months ago
    
    > All of the experience from every robot can be combined into a single agent
    I'm not so sure. It's not obvious that experience combines linearly, so you'll have to somehow figure out how to make the combination work in such a way that it doesn't mess up the receiver too badly--you still want some individuality among the robot fleet right?
    
    dimatura 9 months ago
    
    That's interesting to think about. I'm not familiar with the literature on this but I'm 100% sure there's some interesting work on it (and related fields such as distributed and federated learning). I guess the simplest solution would be "centralized" - periodically aggregate all the raw data from all robots, train a model with all the data, redistribute the model. In that case there wouldn't be any "individuality", but (maybe again, by analogy with evolution) one could think it'd be advantageous to have some. But even if all the models all the same, the robots might be different types and operate in different environments, which raises issues of generalizability, transferability and specialization. Either way the centralized would have some scaling problems, naturally - some way to transfer/aggregate experience (possibly "peer to peer") without resorting to training from raw data then becomes attractive, and I'm sure something people are working on. It does turn out that at least in some recent LLMs, weights appear to be sort of linear and people have been using that to merge them with fairly naive methods with good results.
    
    jcgrillo 9 months ago
    
    It's possible that different entities experience each other's experiences differently.. that is, if you were to magically teleport your experience of reading this post into my brain it might be overpoweringly disorienting and even painful. On the other hand it could just be "a little weird". Or would I instantly have everything that differentiates my mind from yours completely overwritten? This would probably catastrophically reduce my fitness because I'd have to--or more like you'd have to--learn how to operate my body.
SketchySeaBeast 9 months ago

Doesn't this then turn into a problem of sample quantity? You would need to shift into a quality mindset because with a robot you can't perform a billion iterations, you're locked into much more complex world with unavoidably real time interactions. Failure is suddenly very costly.
- modeless 9 months ago
  
  With a million robots you can perform a billion iterations. We won't need a billion iterations on every task; we will start to see generalization and task transfer just as we did for LLMs once we have LLM-scale data.
  You are right that failure is costly with today's robots. We need to reduce the cost of failure. That means cheaper and more robust robots. Robots that, like a toddler, can jump off a couch and fall over and still be OK.
  Tying back to the article, this is the real evolutionary advantage that humans have over AIs. Not innate language skills or anything about the brain. It's our highly optimized, perceptive, robust, reliable, self-repairing, fail-safe, and efficient bodies, allowing us to experiment and learn in the real physical world.
  - dingnuts 9 months ago
    
    > robust, reliable, self-repairing, fail-safe, and efficient bodies
    you must be young and healthy because I cannot imagine using any of these words to describe this continuously decaying mortal coil in which we are all trapped and doomed
    
    modeless 9 months ago
    
    I wish! Hopefully AI can help with that too, but (contra Kurzweil) I fear medicine moves too slowly and it is already too late to save our generation from aging. Hopefully our kids can reap the benefits.
- guitheeengineer 9 months ago
  
  AI's advantage would be that their learning can be shared
  For example if Robot 0002 learns that trying to move a pan without using the handle is a bad idea, Robot 0001 would get that update (even if it came before)
  - SketchySeaBeast 9 months ago
    
    But that ends up with weirdly dogmatic rules because it's not always a bad idea to move a pan without using the handle, it's just in some situations. It still takes a ton of potentially destructive iterations to be sure of something.
    
    guitheeengineer 9 months ago
    
    Yea its tricky and costly. I believe we should bet on specificity to make this more optimal.
    I know the trend with AI is to keep the scope generic so it can tackle different domains and look more like us, but I believe that even if we reach that, we'll always come back to make it better for a specific skill set, because we also do that as humans. No reason for an AI driver to know how to cook.
    If we narrow the domain as much as possible it will cut the number of experiments it needs to do significantly
    Edit: I wonder if its even going to be useful to devote so much resources into making a machine as similar as us as possible. We don't want a plane to fly like a bird, even if we could build it.
    
    kridsdale3 9 months ago
    
    Then we will continue to have a Temperature variable in the Action Models.
seunosewa 9 months ago

Additional answer: we generate more text every day.
Mistletoe 9 months ago

My roomba can't do the whole room without screwing up or getting stuck, it feels like we are eons away from a robot being able to do what you describe autonomously.
- modeless 9 months ago
  
  A few short years ago we were eons away from passing the Turing test.
  - Barrin92 9 months ago
    
    We're still eons away from passing the Turing test, we just keep lowering the bar for what the test is because some people spend too much time on the internet and want Her to be real too badly. This is a conversation I try every time a new version of some LLM comes out: https://chatgpt.com/share/eee38567-99d2-4dd2-a781-08b297e86d...
    
    modeless 9 months ago
    
    ChatGPT is not trained to pass the Turing test. It is trained to be as superhuman as possible. I have no doubt that OpenAI could train a system to pass the Turing test within a year if that was their objective. In fact it seems like an anti-objective for them.
    Now that's not to say that such a system would be undetectable by any possible adversarial technique. The Turing test is not unambiguously defined, but my definition would have it conducted with average well-educated people not specializing in AI and not having special knowledge of adversarial techniques for the specific machine being tested.
    
    UniverseHacker 9 months ago
    
    I’d say it goes very much against the spirit and idea behind the Turing test to fail an AI for having superhuman abilities it fails to hide.
    As much as I admire Turing, I don’t think this particular idea of his was a sound one- an intelligent computer is not human, and lacks a human brain. It should be expected to be better at some things, worse at others, and overall noticeably different… even long after it exceeds human intelligence levels.
    
    Barrin92 9 months ago
    
    I think Turing's idea was profound and prescient honestly. The reason he put emphasis on deception and imitation in the test is because he wanted to know whether we had build a thinking machine, or just a tool. As you say, superhuman intelligent computers are not like humans and it's indeed not right to require that from them if we just want to build a knowledge repository. But if you want to know whether we've build a mind, his conception matters. To deceive, you have to have the imagination of what it's like to be you imagining what it's like to be me. You need a model of someone's cognition and your own, in real time while you interact with the world. And just architecturally that's not the case with LLM's. They don't have cognitive processes going on as they're interacting with you, they can't even in theory beat this.
    It's actually scary to me just how smart Turing was because he immediately recognized, at a time when computers barely existed, that the real test isn't if a machine can play smart but if it can play dumb. Because that eliminates all the parlor tricks and requires actual reflection. Just mind blowing to anticipate that in 1950.
    
    UniverseHacker 9 months ago
    
    You are making the mistake of confusing a mind with a human mind… a vastly different type of mind should not be expected to be good at everything we are good at, even if generally far more intelligent. I think people generally make the same mistake with discounting the more intelligent animals by focusing on their inability to act human, and ignoring or not noticing that they can do things we are incapable of- like a dog navigating spaces in total darkness from scent alone.
    A human mind, for example is not sufficiently generally intelligent to convincing mimic even a fish mind- we cannot generally emulate minds structured differently from ours- e.g. humans cannot actually do the meta task your concept of the Turing test expects AI to do. Even an actual human mind with vastly different experiences- simulated perfectly in a computer or even just a regular person from a very different culture will fail your concept of the turing test, because our mind is shaped by our experiences.
    Also, I think your conception of LLM limitations is mistaken. We don’t fully understand them in theory- but I think the way to think of LLMs is that they are a probablistic world model able to simulate almost any scenario if setup properly. They are not themselves a mind, but if sufficiently good, they will simulate minds- and those simulated minds are actually minds but not human minds. Noam Chomsky published an article arguing how LLMs were incapable even in theory of doing things GPT4, which was released shortly after his article can do- showing that either his model of what they are, or of what is required for those tasks is mistaken. Ironically his way of thinking involves failing to understand that intelligent minds vastly different than human minds are even possible- while expecting AI to be able to not just understand but correctly navigate that situation.
    
    Workaccount2 9 months ago
    
    Why does that conversation fail the Turing test?
    
    grumpwagon 9 months ago
    
    Granted there are some people on earth who can speak those 3 languages fluently, but it greatly diminishes the pool of available people. If you wanted to take the example to the absurd, you could just start listing more languages, and ChatGPT would happily spit out the answer translated to 100 languages in a few seconds. No person could do that.
  - jcgrillo 9 months ago
    
    Are we not still? How can you tell?
  - marcosdumay 9 months ago
    
    What?
    There has been some odd people fooled by computers since the 70s, and 10 years ago we had competitions where the best hint people claimed they had was if their interlocutors were good at math or not.
    It was clear 10 years ago that the Turing test would be beat soon.
    
    modeless 9 months ago
    
    My point is really that a lot of people thought we were eons away. "It was clear" to you, but your opinion was not shared by the majority probably until at least GPT-2 at the very earliest (2019). (Note that the sibling comment to yours still doesn't believe we are close...)
    Similarly, today a lot of people think we are eons away from useful general purpose robots, but it is clear to me that they are coming soon.
    
    marcosdumay 9 months ago
    
    That we'd beat the Turing test soon was a majoritarian opinion around here.
    Honestly, I wasn't even that well informed to make my own. I kept postponing looking at the chatbots until after it was beat.
    
    modeless 9 months ago
    
    I disagree. 10 years ago there were people who thought that but they were far from the majority.
AStonesThrow 9 months ago

Imagine if NASA-JPL had an LLM connected to all their active spacecraft, and at the terminal you could just type, "Hey V'Ger, how are conditions on Phobos over the past Martian Year?"
- skydhash 9 months ago
  
  And then it replies: "Cold"
  The human language is great, but it fails utterly on some tasks. Which is why we have all the jargon in specialized environment. I'd take any system with a reduced command interface that works well than one that takes generic commands and tries to infer what I mean (meaning it will get it wrong most of the time),especially for vocal interface.
  - AStonesThrow 9 months ago
    
    1966 called, and they're asking to speak to DWIM...
    https://en.wikipedia.org/wiki/DWIM
- kevin_thibedeau 9 months ago
  
  But then you'd have to have a NASA department of machine psychology to ensure it isn't hallucinating or going crazy.

megiddo 9 months ago

I think my favorite part of this is watching LLM-bros awkwardly rediscover basic AI concepts that have been studied in depth for 5 decades.

robertlagrant 9 months ago

Can we stop with "bros"? It's honestly such low quality nonsense. If you have something to say it shouldn't need ad hominem to support it.

JohnMakin 9 months ago

Language Models do not work like the human brain. Continuing to compare the two like there is an analogy at all is doing far more harm than good.

nielsbot 9 months ago

Right. I think the tl-dr of the article is: AI needs a different type of machine. And the "learnings" of millions of years of evolution is how to build it.
I do wonder if humans will hit upon a real AI solution soon. We developed flying machines in < 100 years. They don't work like birds but they do fly.
joe_the_user 9 months ago

...doing far more harm than good...
Odd turn of phrase. Thinking LLMs work like brains may be holding back an advance to full AGI but is that a harm or good? I'm not against all powerful models but "build and deploy this stuff as fast as possible with minimal consequence consideration" definitely seems like a harm to me. Perhaps the Sam Altmans of the world should keep believing LLMs are "brains".
- JohnMakin 9 months ago
  
  I guess it would depend on how you view AGI. I personally do not believe AGI is possible under current or near-future technology, so it is not really a concern to me. Even the definition of "AGI" is a little murky - we can't even definitely nail down what "g" is in humans, how will we do that with a machine?
  Anyway, that aside, yes, your general understanding of my comment is correct - if you do believe in AGI, this kind of framing is harmful. If you don't believe AGI, like me, you will think it is harmful because we're inevitably headed into another AI winter once the bubble bursts. There are actual very useful things that can be done with ML technology, and I'd prefer if we keep investing resources into that stuff without all this nonsensical hype that can bring it crashing down at any moment.
  An additional concern of mine is that continuing to make comparisons this way makes the broader populace much more willing to trust/accept these machines implicitly, rather than understanding they are inherently unreliable. However, that ship has probably already sailed.
  - jodrellblank 9 months ago
    
    > "I personally do not believe AGI is possible under current or near-future technology"
    Calculators do arithmetic faster and more reliably than human brains, and do so using many fewer transistors than we have neurons. Wheels and tarmac are simpler, more efficient and faster at forwards motion than jointed human legs. Boston Dynamics' robots can cross rough terrain with legs, without needing bones and flesh and nerves and blood and skin and hair and toes with toenails.
    What if language, reasoning, logic, intelligence is similar - if it could be done on simpler hardware by not doing it in the same way the human brain does it, if we knew how?
    I don't suppose this can be answered either way until someone builds an AGI or understands how the brain works, but is there a strong reason you think this reasoning doesn't/cannot apply to thinking, other than "the brain is the only thing we know of which does this"?
Workaccount2 9 months ago

In the kindest way possible: we have no idea how the brain works, and it would be foolish to write off statistical relationships as a core mechanism the brain uses. Doubly foolish when considering that we are not even sure how LLMs work.
ClassyJacket 9 months ago

The fact that they don't work like the human brain is the point. They're suggesting that to make AI better we could consider learning from biology.

pessimizer 9 months ago

It's not DNA, it's embodiment in general. People learn an enormous amount in the process of existing and moving through space, and they hang all of their abstract knowledge on this framework.

Related: it's a belief of mine that bodily symmetry is essential for cognition; having duplicate reflected forms that can imitate, work against, and coordinate with each other, like two hands, gives us the ability to imagine ourselves against the environment we're surrounded by. Seeing, sensing and being in full control of two things that are almost exactly the same, but are different (the two halves of one's body) gives us our first basis for the concept of comparison itself, and even of boundaries and the distinguishing of one thing from another. I believe this is almost the only function of external symmetry; since internally, and mostly away from sensory nerves, we're wildly asymmetrical. Our symmetry is the ignition for our mental processes.

So I'm not in a DNA data wall camp, I'm in an embodiment data wall camp. And I believe that it will be solved by embodying things and letting them learn physical intuitions and associations from the world. Mixing those nonverbal physical metaphors with the language models will improve the language models. I don't even think it will turn out to be hard. Having eyes that you can move and focus, and ears that you can direct will probably get you a long way. With 2 caveats: 1) our DNA does give us hints on what to be attracted to; there's no reason for a model to look or listen in a particular direction, we have instincts and hungers, and 2) smell and touch are really really rich, especially smell, and they're really hard to implement.

Incidentally: the article says that we've been optimized by evolution for cognition, but what could have been optimized was child-rearing. Having an instinct to train might be more innate and extensive than any instinct to comprehend. Human babies are born larval, and can't survive on their own for years if not decades. Training is not an optional step. Maybe the algorithms are fine, and our training methods are still hare-brained? We're training them on language, and most of what is written is wrong or even silly. Being able to catch a ball is never wrong, and will never generate bad data.

enasterosophes 9 months ago

People were already talking about Big Data in the 90s. If you send this idea back in time to 2005, people wouldn't be stunned by the revolutionary innovations it would unlock. They would say, "oh, someone else on the Big Data hype train."

kridsdale3 9 months ago

As much as we can plot out and understand exponential growth curves, I'm pretty sure people in 2005 would still be shocked to hear about GPUs with hundreds of gigs of RAM, with bandwidths around a Tbps, and not just one per server, but hundreds of industrial greenhouse sized buildings with a million of them, each, consuming entire nuclear plants of generation output.
Also you could blow their minds with a 24 TB HDD. It's as nuts as telling a 2024 person about a 1 PB HDD in a regular PC.
- jcgrillo 9 months ago
  
  2005 was when Kurzweil was trying to tell everyone we'd have artificial hemoglobin nanobots that would allow us to swim around under water for an hour on one breath, based on the exponential growth of.. who knows what.
  Also in 2005 IIRC we were still expecting to eventually hit 10GHz/core, but my recollection is hazy that dream might have been quashed by then. The point is, the wild overestimates of techno-progress aren't anything new.
  - jodrellblank 9 months ago
    
    Here's 9Ghz: https://www.youtube.com/watch?v=RSUlvaE6N1U
    
    jcgrillo 9 months ago
    
    That's impressive! Doesn't run for very long at that rate haha
dimatura 9 months ago

I don't know, when I was in grad school back in the early 2010s I definitely agreed "big data" in computer vision was important and probably stated that in the intro of some paper. My advisor at the time was very vocal about this. But I was still surprised at how well GPT/CLIP/diffusion/etc started working in the last few years.
- enasterosophes 9 months ago
  
  I didn't claim Big Data wasn't important. But looking at the opening paragraph of the article, I just don't think it was a revolutionary idea in 2005.
  We knew big data was important. There was a lot of hype around how big data will become more and more important in the future.
  An email from someone in 2024 saying "hey big data is important!" wouldn't have led to innovations which weren't already happening. It would have been seen as cringe, like is this the best the future can come up with?

marstall 9 months ago

michael levin talks about "intelligence at every scale". he has a recent study where he found some of the hallmarks of intelligence in an off-the-shelf sorting algorithm. individual cells by themselves certainly have signs of intelligence, such as memory, attention, the ability to recognize that a strategy has failed and come up with another, etc.

SergeAx 9 months ago

This article is written by an obviously intelligent author but somehow lacks a trivial observation: people learn from other people. From day one we are getting tons of feedback from already intelligent creatures. Doesn't matter if we are getting it via vision, hearing, touch, via mirror neurons or via email. It is there 24/7 and it is very compressed.

aaroninsf 9 months ago

The author should really take a long pass through the multidisciplinary investigation of what embodiment means.

Multi-modal is just a start, physical agency and proception are the cores of what is missing from LLM that people mean when they say such systems don't "know" what things are.

What they don't have is not just the sorts of categorizations and propositional reasoning around it implicit in e.g. Wikidata entity maps and Prolog/Wolfram like propositional reasoning.

It's also having a body that inhabits space and time with other agents and bodies.

leshow 9 months ago

> So then how do humans generalize so well from so little language data? Is “pre-training” on visual data the secret to our success?

Aren't you assuming that humans are born with some kind of blank slate of a brain? What if they don't get it from their senses. What if language is an innate human ability that is genetically encoded, it isn't learned so much and uncovered, and grows with us like a heart or a lung does?

greggyb 9 months ago

> Aren't you assuming that humans are born with some kind of blank slate of a brain?
No; FTA:
>> ... humans are born with extremely sophisticated programming, provided by evolution. That programming integrates information from all our ancestors, arguably going back to the origin of life on earth.
> What if they don't get it from their senses.
That's a core question; FTA:
>> Is “pre-training” on visual data the secret to our success?
>> No.
>> Because… blind people? What are we doing here?
>> Deaf people show that (non-verbal) sound isn’t critical either....
>> Or maybe all that other sensory data is irrelevant. I don’t know. But that’s kind of the point—...

simne 9 months ago

I think, we just don't have right data. What I mean, human is not pure brain, but first year of life learn physiology of himself, and this physiology is very important part of human intelligence, but it is unwritten, even some opinions named it unconscious.

Current AI learning is not even deaf, but something like learn Dadaism (or other philosophy) without understanding of human being, with some much simpler life philosophy (single cell).

spacebacon 9 months ago

Lots of good thinking in this article. A few things come to mind before we hit a data wall.

1. Sensor all things

2. Waves upon waves

3. Dynamic or Living Semiotic Graphs. Bring your own terminology.

4. General Artificial Synesthesia.

throwaway14356 9 months ago

how insects and flowers found their interaction suggests biology eats impossibly hard puzzles for breakfast.

the most compact way to store data is to merely point to where it is stored.

The description of the location can combine any number of clues and still be tiny.

if there is some place where infinite semi random data is stored and biology can access it it can find places that somewhat mirror data found irl. the dataset could be the shape of our head or a crystalline structure on the other end of the universe.

if biology can also somewhat manipulate the stored data the sky is the limit, or error correction is.

there is a stupefying amount of evidence that people are able to obtain data they should have no access to.

If small children can remember a previous life in great detail then those are the facts. That you don't like it is irrelevant, it is objectively unworthy of consideration.

Thar we have people unable to overcome this cultural indoctrination is fascinating! They apparently don't have write access!

People are apparently trying to map reality onto some preexisting dataset.

GlenTheMachine 9 months ago

I think this is exactly right, and I've been thinking that for some time before reading this article.

I'm an ML researcher and a sheep farmer. My lambs are up on their feet, nursing, thirty minutes after birth. Once they have enough blood sugar they are capable of running to keep up with mom. They come prepackaged with a fully functional quadruped locomotion scheme and associated path planning and obstacle avoidance and fully functional vision, touch, and audio processing algorithms. And this is with basically zero embodied learning.

The difference between sheep and humans is, I think, that humans actually "learn" on three timescales instead of two, unlike every other living thing. Sheep learn on evolutionary timescales, through natural selection, and on the timescale of an individual sheep lifetime. But humans learn on the timescale of society as well, in between the other two. I believe that the difference between us and animals isn't so much our dramatically increased intelligence as it is the ability to pass detailed descriptions of what we know on to others, e.g. to language. The amount of information in the world has increased exponentially since the Renaissance. Societies advance on a characteristic timescale of about a century, as far as I can tell. Much faster than evolution, but quite a bit slower than individuals.

In ML, we are doing the same thing as sheep, basically. We have two "learning loops". One is the continual development of new DNN architectures, which corresponds to what evolution does; the other is the training of these architectures on data, which corresponds to what individuals do. But the outer one still mostly proceeds at the speed of human cleverness, not computation. But we are just... relying on the fact that society, and the data it produces, is there for the consumption of ML. We do not have any ideas for speeding up the production of that data except for the hope that we already have enough of it to kickstart GAI. If we do, we'll go through the singularity. If we don't, we won't.

But we can at least solve the "evolutionary learning" problem for ML. We'll need to bring back something like genetic algorithms, or make DNN architectures somehow differentiable so they can be efficiently evolved.

bcrl 9 months ago

Evolution isn't the only thing that is needed. Culture matters. Individual learning experiences matter. Take texture of an object that you see: as an adult you can probably look at it and know how it will feel if you move your lips across the surface. How does that happen? As an infant, everything you see and touch gets brought up to your mouth. Your brain learns by integrating information across multiple senses, making predictions and correcting the model until the loop closes.
AI lacks this, and at the moment can only learn of this by learning from limited senses. The amount of data to reproduce the life experience of a baby growing into a child and then an adult is substantial. Without more data to replicate these experiences, AI will have deficiencies compared to the average human. Sure, it will exceed human capabilities in some areas by nature of having been trained on more information about certain topics. Will there be enough information to train AIs up to human level in the foreseeable future? I don't think that can be done in less time than it takes a child to grow up, but we'll see.

lacker 9 months ago

I think sending back the decoder-only transformer plus the one sentence "try this on the whole internet to pass the turing test" should count as sending back less information than sending back the whole encoder-decoder transformer architecture.

aithrowawaycomm 9 months ago

This simply does not pass the smell test. If you want to analogize biological brains to ANNs then clearly evolution refines the architecture of the "natural neural network," not the data. No ANN training involves adding artificial neurons, defining new types of artificial neurons, etc, but that clearly describes the biological history of the brain.

Taking this article literally, the brain hasn't really changed much since nematodes, it's just absorbed a ton of data. That's transparently stupid. All species, including humans and worms, have evolved over billions of years to get where they are today. Why is it that only humans get to access the billions of years of data? I am guessing the author has childish views about humans being "the most evolved" species. But all species are more evolved than they were 1m years ago.

This entire article relies on a dumb bait-and-switch. It's an incoherent analogy which seems motivated by a desire to simply excuse away the shortcomings of transformers.

danielmarkbruce 9 months ago

> clearly evolution refines the architecture of the "natural neural network," not the data
This doesn't pass the smell test. Common sense and experience suggest animals have both built in functionality and a built in ability to learn and they seem to work together. Babies don't learn to cry for food, it works right out of the box. Some animals work with practically zero parental guidance.
This is a pretty silly statement overall. Just being alive and paying attention should give someone a sense for how silly it is.
- aithrowawaycomm 9 months ago
  
  I genuinely don't understand your point at all. I am not contesting that instincts exist, and I am not sure what you think the disagreement is. Let me rephrase:
  There have been two sides to ANN development: architecture and data. If we're analogizing brains to ANNs like the author is doing, then brains also have an "architectural" component and a "data" component. But we need to be clear that evolution (including individual mutations and epigenetics) shapes the architecture, while lived experience shapes the data. The author is claiming that the experience of billions of years of evolution should actually go into the "data" column, somehow, and that human brains at birth have actually learned from the experience of rodent-lizards that lived 200 million years ago. This is just idiotic, and it does not help that the architecture is completely ignored. Again the problem is a fundamental bait-and-switch in the analogy itself.
  Instinctual behavior is much better-explained by evolution influencing the architecture of the brain, rather than waving your hands and saying "data, lots of it!" The crying baby is much better explained by an ancestor to all birds and mammals who had a neurological mutation that made it whine (perhaps softly) when it was hungry, not because that lizard finally accumulated enough "evolutionary data" to learn how to cry. (What would that data be?) The neurological mutation is of course purely speculative, but it is plausible, which is not the case for what the author is saying.
  The author claims quite directly that humans learn from billions of years of animal experiences in a very similar way to how GPT learns from billions of lines of text, which is a preposterous claim that requires extraordinary evidence. No evidence is provided. I am confident that this Substack is quackery.
  - danielmarkbruce 9 months ago
    
    There are clearly a lot of things you don't understand. The word "learn" is extremely broadly defined in current use. Evolution is a form of learning, at the population level (given the broad definition of learning).
    https://chatgpt.com/share/6705bfa0-67e8-8002-b570-010e7a25db...
    The data is either the genes or the combination of genes and environment/laws of physics. It's laid out in the post. This blog post may or may not be right, but the idea that it's some obviously stupid idea is obviously stupid. It's not even novel - this idea has been around.
jcgrillo 9 months ago

> architecture of the "natural neural network" not the data
I believe you're right, but to pick a nit it's not just the architecture of the meatNN it's also the sensors that input data to it, the channels those sensors send the data over, and the motor control mechanisms that enable the meatNN to do interesting sensor feedback experiments. It's the whole damn thing that's been optimized to work as a coherent unit by billions of years of evolution.

quicon 9 months ago

Interesting wildcard ideas in the article, but I don't think we can understand how the brain works using computing concepts. For a nice discussion on DNA being the "blueprint" of life I recommend Philip Ball's "How life works".

visarga 9 months ago

Started good by mentioning the data wall, but finished bad. It's not the DNA, we share most of it with other species. DNA can't contain human culture, and if it could, we would have been as capable 200K years ago as today.

It's the one thing we have and they don't - language. During our 200K years our species has accumulated concepts, models, methods and values. We put them in language form and transmitted them across generations.

The main problem of course is search. We search for understanding, and do it collectively. A single lifetime would not suffice, it takes humanity across many generations to travel the road from caves to AI. That is why language is key, allows for iterative improvement. It also articulates search space, goal space and action space. It is symbolic and discrete, that allows for exact replication. DNA shares the same qualities with language - isn't it interesting?

Imagine you make a discovery, but having no language you can't reproduce the steps again, because it's all fuzzy. Or you can't teach anyone else. Language, with its symbolic structure is necessary to reproduce complex sequences of steps. Without it our exploration would not yield the fruits of exploitation. We would not benefit for unreproducible discovery.

I am against all kinds of essentialism. Chomsky thinks we have innate grammar, but forgets about co-adaptation of language for learnability in children, and about learning from our five senses - they provide a better explanation than innateness.

Searle explains our special status by biology, we have biological brains that's why we genuinely understand, he rejects distributed understanding but can't explain how come no single neuron understands on its own anything.

Chalmers thinks there is an inexplainable experience (qualia), but doesn't consider relational embeddings that can model qualities in experience. Relational embeddings are created by relating experiences against other experiences, creating their own high dimensional semantic space. No, it's not panpsychism, the whole universe doesn't need to be conscious.

And this time, in the article the magic is attributed to DNA. It's not that, it is search. We search and learn, learn and transmit, cooperate and reuse. Its not even a brain thing. It's social. We need more than one brain to cover this path of cultural evolution. Progress and language are not based in individuals but in societies.

My point is that from now on we hit the data wall. Imitation is thousands or millions of times easier than real innovation and discovery. AI will have to pay the same exploration price, it will have to learn from the world. Of course, new discoveries are not written in any books. They have to be searched. And search is an activity dependent on the environment we search in. Not in brains alone, or even DNA. AI will search and create new data, but it will be a slow grind.

ccozan 9 months ago

In a thought experiment, I would really like to see chatGPT, Claude, Gemini, etc, talk to each other in a way that the prompts are primed with something like "you are not alone in this AI room, formulate questions and answers that help exchanging ideas".
Also , an observation to language: even bacteria have a lauguage, the chemical language ( quorum sensing ). Meaning, maybe we just need to create the necessary medium for current LLMs , they will start talking to each other and creating their own language.
- mistermann 9 months ago
  
  This is interesting:
  https://x.com/repligate
dboreham 9 months ago

> DNA can't contain human culture
Furthermore: My conjecture is that DNA doesn't contain any "training data". There's no data path for information to get from an organism's learnings about its surroundings in generation N, into the DNA for generation N+1. DNA is just plans for how to wire up the I/O devices. Everything we think we see that seems like "instinct" will turn out to be explained by a combination of the wiring to I/O devices, and early training.
- overtomanu 9 months ago
  
  DNA does not contain the training data, but it can be transmitted to next generation in other ways, like alteration of gene expression.
  Reference:
  Transgenerational epigenetic inheritance - Wikipedia
  https://en.wikipedia.org/wiki/Transgenerational_epigenetic_i...
cma 9 months ago

> and about learning from our five senses
You can be born without 5 and still learn very well. If sound and vision are both missing before a critical age (e.g. Helen Keller lost hers at 19mo) it can affect cognitive development.

starmaan 9 months ago

In Information theory terms, evolution contributes 1 bit of entropy per generation (itila book mckay). Not sure what are those claims about DNA or physics.

est 9 months ago

the real data wall is ten years of childhood's interaction with real world.

grugagag 9 months ago

How about the interaction post childhood? That data and context is also important and any AGI will not fare to well without it.
- est 9 months ago
  
  Well there are certainly adult learners who aquire drastic change to their life, but for most of us we achieve a fixed state and plateaued our potential.
  For a evolution point of view, when an animal reaches "reproduction maturity", it's aging torwards more and more useless and a matter of time been replaced by young ones.
  - grugagag 9 months ago
    
    So your AI won’t acquire wisdom? It will be some kind of youthful like mind prone to jump into waters before assesing dangers first?

megiddo 9 months ago

This just smells like Lamarckism with more steps.