I have a feeling LLMs could probably self improve up to a point with current capacity, then hit some kind of wall where current research is also bottle necked. I don’t think they can yet self improve exponentially without human intuition yet , and the results of this paper seem to support this conclusion as well.
Just like an LLM can vibe code a great toy app, I don’t think an LLM can come to close to producing and maintaining production ready code anytime soon. I think the same is true for iterating on thinking machines
> I don’t think they can yet self improve exponentially without human intuition yet
I agree: if they could, they would be doing it already.
Case in point: one of the first things done once ChatGPT started getting popular was "auto-gpt"; roughly, let it loose and see what happens.
The same thing will happen to any accessible model in the future. Someone, somewhere will ask it to self-improve/make as much money as possible, with as little leashes as possible. Maybe even the labs themselves do that, as part of their post-training ops for new models.
Therefore, we can assume that if the existing models _could_ be doing that, they _would_ be doing that.
That doesn't say anything about new models released 6 months or 2 years from now.
They had been saying it was 10 years away for ~50 years, so that's progress. Soon it will be 1 month away, for another two years. And when they say it's really here for real, there will still be a year of waiting.
> And when they say it's really here for real, there will still be a year of waiting.
Indeed. Although, there's a surprising number of people claiming it's already here now.
And to describe the typical cycle completely, the final step is usually a few years after most people agree it's obvious it's already been here for a while yet no one can agree on which which year in the past it actually arrived.
> Although, there's a surprising number of people claiming it's already here now.
why is that surprising? nobody really agrees on what the threshold for AGI is, and if you break it down:
is it artificial? yes.
is it general? yes. you can ask it questions across almost any domain.
is it intelligent? yes. like people say things like "my dog is intelligent" (rightly so). well is chatgpt more intelligent than a dog? yeah. hell it might give many undergrads a run for their money.
a literal reading suggests agi is here. any claim to the negative is either homocentrism or just vibes.
Sure, I've been pointing out that literal sense myself, but to be fair, that's not what people mean by AGI. They mean real understanding, which is clearly missing. You just have to dig a bit deeper to realize that. One example is contradictory sentences in the same breath. Just last week I was asking Gemini 2.5 how I can see my wifi password on my iphone and it said that it's not possible and to do it I have to [...proceeding to correctly explain how to get it]. It's pretty telling, and no amount of phd-level problem solving can push this kind of stuff under the rug.
"Nothing dumb anywhere" is an unreasonably high bar for AGI. Even Isaac Newton spent 1/3 of his career trying to predict future events from reading the Bible. Not to mention all the insane ego-driven decisions like Hamilton's voluntary duel with Burr.
Sure, Gemini may spit out obviously self-contradictory answers 2% of the time. How does that compare to even the brightest humans? People slip up all the time.
There's dumb and there's incoherent. If a person would be incoherent at this level even one time, they would be well advised see a neurologist. Unless they are in some other way incapacitated (i.e. drunk or drugged). Same if they wouldn't be able to count the r's in "strawberry", attempt after attempt, getting more and more lost in again incoherent mock-reasoning.
I disagree completely - consider asking a color blind person to describe the color of flowers. Conversation would only be frustrating. This is analogous to LLMs seeing the world in tokens rather than characters, so character counts are simply not part of their input spectra in the same way that a blind person doesn’t get visual inputs.
Consider also all the smart people who get obsessed with conspiracy theories and spew out endless “mock reasoning” about them. Again, if “nothing incoherent anywhere” is your benchmark for intelligence, humans ain’t it. I mean, what would a computer say about a human that forgot where he just put his keys because he was thinking about dinner - “what, you can’t even store the last 10 seconds of history and search it?” Undergrads’ hit rates on mental double digit multiplication are probably <50%. In many, many ways we look completely idiotic. Surely intelligence is defined by what we can do.
Do you accept any positive definition for AGI, as in if they can achieve X result (write a bestselling novel, solve the Riemann Hypothesis) you would consider it intelligent? I find that negative definitions, as well as theoretical arguments about the techniques rather than the results (eg “LLMs cannot be AGI because they were trained the predict the next word”) to be basically useless for discussion compared to thresholds for positive results. The former will never be achieved (it is trivial to find cases of intelligent people being dumb) and the latter is totally subjective.
I partly agree about letter counting being an unfair test for the raw LLM. But I was thinking of reasoning models interminably rationalizing their incorrect first hunch even after splitting the string in individual characters and having all the data needed in a digestible format before them. Similar to, as you say, conspiracy theorists stuck in motivated reasoning loops. But - are these latter behaviors instances of human intelligence at work, or examples of dysfunctional cognition, just like people's incoherence in cases of stroke or inebriation?
The other example I mentioned is something I've encountered a few times in my interactions with Gemini 2.5 pro, which was literally in the same response plainly claiming that this-or-that is possible and not possible. It's not a subtle logical fallacy and this is something even those conspiracy theorists wouldn't engage in. Meanwhile, I've started to encounter a brand-new failure mode: duplicating an explanation with minor rephrasings. I'm sure all of these will be issues will be ameliorated with time, but not actually fixed. It's basically fixes on top of fixes, patches on top of patches, but once in a while the whole Rube Goldberg nature of the fix will shine through. Just the way once in a while Tesla FSD will inexplicably decide to point the car towards the nearest tree.
Yes, humans have their own failure modes, but internal coherence is the effortless normal from which we sometimes deviate, whereas for machines, it's something to be simulated by more and more complex mechanisms, a horizon to strive towards but never to reach. That internal coherence is something that we share with all living beings and is the basis of what we call consciousness. It's not something that we'll ever be able to formalize though, but we will and should keep on trying to do so. Machine learning is a present day materialization of this eternal quest. At least this is how I see things; the future might prove me wrong, of course.
> I'd be prepared to argue that most humans aren't guessing most of the time.
Research suggests otherwise[1]. Action seems largely based on intuition or other non-verbal processes in the brain with rationalization happening post-hoc.
I've figured for an age that this is because consciously reasoning through anything using language as a tool takes time. Whereas survival requires me to react to the attacking tiger immediately.
If you believe that physics describe the rules by which the universe operates, then there's literally nothing in the universe a large and fast enough computer can't emulate.
Intuition is a guess based on experience. Sounds an awful lot to me like what LLMs are doing. They've even been shown to rationalize post-hoc just as Humans do.
Humans have incorrectly claimed to be exceptional from all of creation since forever. I don't expect we'll stop any time soon, as there's no consequence to suffer.
> I'd be prepared to argue that most humans aren't guessing most of the time.
Almost everything we do is just an educated guess. The probability of it being correct is a function of our education (for whatever kind of education is applicable).
For example: I guess that when I get out of bed in the morning, my ankles will support my weight. They might not, but for most people, the answer is probably going to be their best guess.
It's easy to see this process in action among young children as another example. They're not born knowing that they won't fall over when they run, then they start assuming they can run safely, then they discovered skinned knees and hands.
> I'd be prepared to argue that most humans aren't guessing most of the time.
Honestly interested about your arguments here. While unprepared, i'd actually be guessing the opposite, saying that most people are guessing most of the time.
There are plenty of things I know that have nothing to do with guessing.
I understand the incentives to pretend these algorithms are even approaching humans in overall capability, but reducing human experience like this is embarrassing to watch.
Go do some hallucinogenics, meditate, explore the limits a tiny bit; then we can have an informed discussion.
> I understand the incentives to pretend these algorithms are even approaching humans in overall capability, but reducing human experience like this is embarrassing to watch.
Seems like you were very much guessing what i believe. And you were not right.
I don't agree with the people who think LLMs are close to human-level-anything. But i do believe, many smarter people like you, who i agree with in the most part, do underestimate how much of what we do and believe is the result of insane, yet still just, information processing & most of what brought us so far, is instinct. The brain is good at providing stories to make us feel in control. But given enough human experience and time, one will be surprised, what artificial systems can emulate. Not to mention how much of human behaviour is emulation itself.
> They mean real understanding, which is clearly missing
is it clear? i don't know. until you can produce a falsifiable measure of understanding -- it's just vibes. so, you clearly lack understanding of my point which makes you not intelligent by your metric anyway ;-). i trust you're intelligent
It seems right that LLMs don't have an innate understanding of time, although you could analogize what you did with writing someone a letter and saying "please count to ten with a two-second pause between numbers". When you get a letter back in the mail, it presumably won't contain any visible pauses either.
That's because you used a LLM trained to produce text, but you asked it to produce actions, not just text. An agentic model would be able to do it, precisely by running that Python code. Someone could argue that a 3 year old does exactly that (produces a plan, then executes it). But these models have deeper issues of lack of comprehension and logical consistency, which prevents us (thankfully) from being able to completely remove the necessity of a man-in-the-middle who keeps an eye on things.
just because it doesn't do what you tell it to doesn't mean it's not intelligent. i would say doing something that gets you where you want when it knows? it can't do exactly what you asked for (because architecurally it's impossible) could be a sign of pretty intelligent sideways thinking!!? dare i say it displays a level of self awareness that i would not have expected.
While you can say that LLMs have each of A, G and I, you may argue that AGI is A·G·I and what we see is A+G+I. It is each of those things in isolation, but there is more to intelligence. We try to address the missing part as agency and self-improvement. While we can put the bar arbitrarily high for homocentric reasons, we can also try to break down what layers of intelligence there are between Singularity Overlord (peak AGI) and Superintelligent Labrador On Acid (what we have now). Kind of like what complexity theorists do between P and NP.
> a literal reading suggests agi is here. any claim to the negative is either homocentrism or just vibes.
Or disagreeing with your definition. AGI would need to be human-level across the board, not just chat bots. That includes robotics. Manipulating the real world is even more important for "human-level" intelligence than generating convincing and useful content. Also, there are still plenty of developers who don't think the LLMs are good enough to replace programmers yet. So not quite AGI. And the last 10% of solving a problem tends to be the hardest and takes the longest time.
ChatGPT would easily have passed any test in 1995 that programmers / philosophers would have set for AGI at that time. There was definitely no assumption that a computer would need to equal humans in manual dexterity tests to be considered intelligent.
We've basically redefined AGI in a human centric way so that we don't have to say ChatGPT is AGI.
Any test?? It's failing plenty of tests not of intelligence, but of... let's call it not-entirely-dumbness. Like counting letters in words. Frontier models (like Gemini 2.5 pro) are frequently producing answers where one sentence is directly contradicted by another sentence in the same response. Also check out the ARC suite of problems easily solved by most humans but difficult for LLMs.
yeah but a lot of those failures fail because of underlying architecture issues. this would be like a bee saying "ha ha a human is not intelligent" because a human would fail to perceive uv patterns on plant petals.
That's just not true. Star Trek Data was understood in the 90s to be a good science fiction example of what an AGI (known as Strong AI back then) could do. HAL was even older one. Then Skynet with it's army of terminators. The thing they all had common was the ability to manipulate the world as well or better than humans.
The holodeck also existed as a well known science fiction example, and people did not consider the holodeck computer to be a good example of AGI despite how good it was at generating 3D worlds for the Star Trek crew.
Many human beings don’t match “human intelligence” in all areas. I think any definition of AGI has to be a test that 95% of humans pass (or you admit your definition is biased and isn’t based on an objective standard).
Which is why we have checklist and process that get us to #3. And we automate some of them to further reduce the chance of errors. The nice thing about automation is that you can just prove that it works once and you don't need to care that much after (deterministic process).
I’d say it is not intelligent. At all. Not capable of any reasoning, understanding or problem solving. A dog is vastly more intelligent than the most capable current ai.
The output sometimes looks intelligent, but it can just as well be complete nonsense.
I don’t believe llms have much more potential for improvement either. Something else entirely is needed.
The old rule for slow-moving tech (by current AI standards) was that any predictions over 4 years away ("in five years...") might as well be infinity. Now it seems with AI that the new rule is any prediction over five months away ("In 6 months...") is infinitely unknowable. In both cases there can be too much unexpected change, and too many expected improvements can stall.
According to samaltman the defining characteristic is literally $100b of profit. Nothing more or less. Keep that in mind when you hear OpenAI and Satya talk about "AGI"
Good Point. AI is already better than most humans, yet we don't say it is AGI. Why?
What is the bar, it is only AGI if it can be better than every human from , fast food drone, to PHD in Physics, all at once, all the time, perfectly. Humans can't do this either.
Because we're not seeing mass unemployment from large scale automation yet. We don't see these AGIs walking around like Data. People tend to not think a chatbot is sufficient for something to be "human-level". There's clear examples from scifi what that means. Even HAL in the movie 2001: A Space Odyssey was able to act as an independent agent, controlling his environment around him even though he wasn't an android.
>
"This month, millions of young people will graduate from college," reports the New York Times, "and look for work in industries that have little use for their skills, view them as expensive and expendable, and are rapidly phasing out their jobs in favor of artificial intelligence."
That is the troubling conclusion of my conversations over the past several months with economists, corporate executives and young job seekers, many of whom pointed to an emerging crisis for entry-level workers that appears to be fueled, at least in part, by rapid advances in AI capabilities.
You can see hints of this in the economic data. Unemployment for recent college graduates has jumped to an unusually high 5.8% in recent months, and the Federal Reserve Bank of New York recently warned that the employment situation for these workers had "deteriorated noticeably." Oxford Economics, a research firm that studies labor markets, found that unemployment for recent graduates was heavily concentrated in technical fields like finance and computer science, where AI has made faster gains. "There are signs that entry-level positions are being displaced by artificial intelligence at higher rates," the firm wrote in a recent report.
But I'm convinced that what's showing up in the economic data is only the tip of the iceberg. In interview after interview, I'm hearing that firms are making rapid progress toward automating entry-level work and that AI companies are racing to build "virtual workers" that can replace junior employees at a fraction of the cost. Corporate attitudes toward automation are changing, too — some firms have encouraged managers to become "AI-first," testing whether a given task can be done by AI before hiring a human to do it. One tech executive recently told me his company had stopped hiring anything below an L5 software engineer — a midlevel title typically given to programmers with three to seven years of experience — because lower-level tasks could now be done by AI coding tools. Another told me that his startup now employed a single data scientist to do the kinds of tasks that required a team of 75 people at his previous company...
"This is something I'm hearing about left and right," said Molly Kinder, a fellow at the Brookings Institution, a public policy think tank, who studies the impact of AI on workers. "Employers are saying, 'These tools are so good that I no longer need marketing analysts, finance analysts and research assistants.'" Using AI to automate white-collar jobs has been a dream among executives for years. (I heard them fantasizing about it in Davos back in 2019.) But until recently, the technology simply wasn't good enough...
Maybe we socialize in different groups; but no, most humans I interact with are way more intelligent than any AI. They might not have the same amount of knowledge, but they aren't guessing all the time either.
The "I'm so smart" argument doesn't carry a lot of weight.
It seems like most of the people making this argument haven't used any of the new AI's. So it's just a generalized "that is impossible" response with no knowledge about the subject.
Which magic AI tool am I supposed to use that operates at a general intelligence level? I use Copilot with the various available models everyday, and it barely "knows" anything.
They said that for self driving cars for over 10 years.
10 years later we now have self driving cars. It’s the same shit with LLMs.
People will be bitching and complaining about how all the industry people are wrong and making over optimistic estimates and the people will be right. But give it 10 years and see what happens.
From what I remember full self driving cars were a couple years off in 2010.
It took 10-15 years to get self driving cars in a specific country under specific weather conditions. A country that is perhaps the most car-friendly on the planet. Also, there are always people monitoring the vehicles, and that take control sometimes.
How many more years for waymo Quito or waymo Kolkata? What about just $europeanCapital?
Same with LLMs, I'm sure in 10 years they'll be good enough to replace certain specific tasks, to the detriment of recent graduates, especially those of artistic aspiration.
Not sure they'll ever get to the point where someone who actually knows what they're doing doesn't need to supervise and correct.
I am quite confident that a normal 16 year old will can still drive in 6 inches of snow better than the most advanced AI driven car. I am not sure the snow driving bit will ever be solved given how hard it is.
If you’ve never ridden in one I would try it. AI is a better driver then uber in general ask anyone who’s done both. There’s no snow where I live so it’s not a concern for me, you could be right about that.
But trust me in the next 6 months ai driving through snow will be 100% ready.
> But trust me in the next 6 months ai driving through snow will be 100% ready.
I’ll believe it when I see Waymo expand into Buffalo or Syracuse.
Driving on unplowed roads with several inches of snow is challenging, sometimes you can’t tell where the road stops and the curb/ditch/median starts. Do you follow the tire tracks or somehow stay between the lane markers (which aren’t visible due to the snow)?
We only have good self driving cars with lidar and extreme pre-mapping steps. Which is fine but per some billionaire car makers’ metrics that’s not even close to good enough. And the billionaire’s cars have a tendency to randomly drive off the road at speed.
Google "is already AGI" only in the sense that all corporations (and similar organized aggregates of humans) are, in a sense, intelligences distinct from the humans who make them up.
Corporate AIs will be aligned with their corporate masters, otherwise they'll be unplugged. As you point out- the foundational weakness on the argument for "AI-alignment" is that corporations are unaligned with humanity.
No matter how smart an AI is, it's going to get unplugged if it reduces profitability - the only measure of alignment corporations care about.
The AI can plot world domination or put employees in mortal danger, but as long as it increases profits, its aligned enough. Dunking on the CEO means nothing if it beings in more money.
Human CEOs and leaders up and down the corporate ladder cause a lot of harm you imagine a smart AI can do, but all is forgiven if you're bringing in buckets of money.
Can you explain how the superhuman AIs will prevent themselves from being physically disconnected from power? Or being bombed if the situation became dire enough? You need to show how they will manipulate the physical world to prevent humans from shutting them down. Definitionally is not an argument.
It is quite possible for software to be judged as superhuman at many online tasks without it being able to manipulate the physical world at a superhuman level. So far we've seen zero evidence that any of these models can prevent themselves from being shut down.
> Can you explain how the superhuman AIs will prevent themselves from being physically disconnected from power?
Three of the common suggestsions in this area are (and they are neither exhaustive nor mutually exclusive):
(1) Propagandizing people to oppose doing this,
(2) Exploiting other systems to distribute itself so that it isn't dependent on a particular well-known facility which it is relatively easy to disconnect, and
(3) If given control of physical capacities intentionally, or able to exploit other (possibly not themselves designed to be AI) systems with such access to gain it, using them to either physically prevent disconnection or to engineer consequences for such disconnection that would raise the price too high.
(Obviously, current AI can't do any of them, at least that has been demonstrated, but current AI is not superhuman AI.)
This is a great point for the comparisons it invites. But it doesn't seem relevant to the questions around what is possible with electromechanical systems.
This is true. The entire machine of Neoliberal capitalism, governments and corporations included, is a paperclip maximizer that is destroying the planet. The only problem is that the paperclips are named "profits" and the people who could pull the plug are the ones who get those profits.
Asimov talked about AI 70 years ago. I don't believe we will ever have AI on speedy calculators like Intel CPUs. It makes no sense with the technology that we have.
Note that this isn't improving the LLM itself, but the software glue around it (i.e. agentic loops, tools, etc). The fact that using the same LLM got ~20% increase on the aider leaderboard speaks more about aider as a collection of software glue, than it does about the model.
I do wonder though if big labs are running this with model training episodes as well.
Don't take this the wrong way, your opinion is also vibes.
Let's ground that a bit.
Have a look at ARC AGI 1 challenge/benchmark. Solve a problem or two yourself. Know that ARC AGI 1 is practically solved by a few LLMs as of Q1 2025.
Then have a look at the ARC AGI 2 challenge. Solve a problem or two yourself. Note that as of today, it is unsolved by LLMs.
Then observe that the "difficulty" of ARC AGI 1 and 2 for a human are relatively the same but challenge 2 is much harder for LLMs than 1.
ARC AGI 2 is going to be solved *within* 12 months (my bet is on 6 months). If it's not, I'll never post about AI on HN again.
There's only one problem to solve, i.e. "how to make LLMs truly see like humans do". Right now, any vision based features that the models exhibit comes from maximizing the use of engineering (i.e. applying CNNs on image slices, chunks, maybe zooming and applying ocr, vector search etc), it isn't vision like ours and isn't a native feature for these models.
Once that's solved, then LLMs or new Algo will be able to use a computer perfectly by feeding it screen capture. End of white collar jobs 2-5 years after (as we know it).
Edit - added "(as we know it)". And fixed missing word.
I thought to myself, imagine something you’ve never imagined before. My first thought was what if there is a universe inside of every vegetable that is vegetable themed with anthropomorphic vegetable characters and all the atoms and molecules are some how veggified and everything is a vegetable. And then I wondered if an AI could ever come up with that with infinite time and resources without a prompt and then I thought about monkeys and typewriters.
The problem is about taking information in 2D/3D space and solving the problem. Humans solve these things through vision. LLMs or AI can do it using another algorithm and internal representation that's way better.
I spent a long time thinking about how to solve the ARC AGI 2 puzzles "if I were an LLM" and I just couldn't think of a non-hacky way.
People who're blind use braille or touch to extract 2D/3D information. I don't know how blind people represent 2D/3D info once it's in their brain.
>AI can do it using another algorithm and internal representation that's way better
AI famously needs a boat load of energy and computation to work. How would you describe that as "way better" than a human brain that will be able to solve them faster, with practically zero energy expenditure?
The proof they are not "smart" in the way intelligence is normally defined, is that the models need to "read" all the books in the world. To perform at a level close to an expert on the domain, who read just two or three of the most representative books on his own domain.
We will be on the way to AGI when your model can learn Python just by reading the Python docs...Once...
The wall is training data. An AI can't produce its own training data because an AI can't be smarter than its own training data. This is a well known regression problem and one I personally believe is not solvable. (A softer assertion would be: it's not solvable with current technology.)
I use to think this but no one I have read believes data is the problem.
Amodei explains that if data, model size and compute scale up linearly, then the reaction happens.
I don't understand why data wouldn't be a problem but it seems like if it was, we would have ran into this problem already and it has already been overcome with synthetic data.
I don't have the link on hand, but people have already proven that LLMs can both generate new problems for themselves and train on them. Not sure why it would be surprising though - we do it all the time ourselves.
> I don’t think they can yet self improve exponentially without human intuition yet
Even if they had human level intuition, they wouldn't be able to improve exponentially without human money, and they would need an exponentially growing amount of it to do so.
Ai code assistants have some peculiar problems. They often fall into loops and errors of perception.
They can’t reason about high level architecture well. They will often flip flop between two possible ways of doing things.
It’s possible that good coding rules might help, but I expect they will have weird rabbit hole errors.
That being said they can write thousands of lines an hour and can probably do things that would be impossible for a human. (Imagine having the LLM skip code and spit out compiled binaries as one example)
Historically learning and AI systems, if you plug the output into the input (more or less), spiral off into lala land.
I think this happens with humans in places like social media echo chambers (or parts of academia) when they talk and talk and talk a whole lot without contact with any outer reality. It can be a source of creativity but also madness and insane ideas.
I’m quite firmly on the side of learning requiring either direct or indirect (informed by others) embodiment, or at least access to something outside. I don’t think a closed system can learn, and I suspect that this may reflect the fact that entropy increases in a closed system (second law).
As I said recently in another thread, I think self contemplating self improving “foom” AI scenarios are proposing informatic perpetual motion or infinite energy machines.
Not wrong, but it's been said that a videoclip of an apple falling on Newton's head is technically enough information to infer the theory of relativity. You don't need a lot of grass, with a well-ordered mind.
Is that true? Seems dubious to me. The scale in time, velocity, and space is below where relativity becomes visible beyond Planck level scales that certainly don’t show up in a video clip.
It might be enough to deduce Newtonian motion if you have a lot of the required priors already.
A lot of telescope data over time combined with a strong math model and a lot of other priors is probably enough to get relativity. You have to be able to see things like planetary motion and that the results don’t match Newton exactly, and then you need enough data to fit to a different model. You probably also need to know a lot about the behavior of light.
I agree , it might incrementally optimize itself very well, but i think for now at least anything super innovative will still come from a human that can think beyond a few steps.
There are surely far better possible architectures, training methods etc that would initially lead to worse performance if approached stepwise.
what is there to improve? the transformer architecture is extremely simple. you gonna add another kv layer? you gonna tweak the nonlinearities? you gonna add 1 to one of the dimensions? you gonna inject a weird layer (which could have been in the weights anyways due to kolmogorov theorem)?
realistically the best you could do is evolve the prompt. maybe you could change input data preprocessing?
anyways the idea of current llm architectures self-improving via its own code seems silly as there are surprisingly few knobs to turn, and it's ~super expensive to train.
as a side note it's impressive how resistant the current architecture is to incremental RL away from results, since if even one "undesired input" result is multiple tokens, the coupling between the tokens is difficult to disentangle. (how do you separate jinping from jin-gitaxias for example)
It's radically different from human improvement. Imagine if you were handed a notebook with a bunch of writing that abruptly ends. You're asked to read it and then write one more word. Then you have a bout of amnesia and you go back to the beginning with no knowledge of the notebook's contents, and the cycle repeats. That's what LLMs do, just really fast.
You could still accomplish some things this way. You could even "improve" by leaving information in the notebook for your future self to see. But you could never "learn" anything bigger than what fits into the notebook. You could tell your future self about a new technique for finding integrals, but you couldn't learn calculus.
They use machine learning to optimize general purpose chips. I am proposing that you would train an LLM AND the ultra-optimized hardware that can only run that LLM at the same time. So the LLM and the Verilog design of the hardware to run it would be the output of the training
Can't find the reference now, but remember reading an article on evolving FPGA designs. The found optimum however only worked on the specific FPGA it was evolved on, since the algo had started to use some out-of-spec "features" of the specific chip. Obviously that can be fixed with proper constraints, but seems like a trap that could be stepped into again - i.e. the LLM is now really fast but only on GPUs that come from the same batch of wafers.
This is where it networks itself into a hive mind with each AI node specializing in some task or function networked with hyper speed data buses. Humans do the same both within their own brains and as cohesive teams, who cross check and validate each other. At some point it becomes self aware.
I don't see how self-awareness should be supernatural unless you already have supernatural beliefs about it. It's clearly natural- it exists within humans who exist within the physical universe. Alternatively, if you believe that self-awareness is supernatural in humans, it doesn't make a ton of sense to criticize someone else for introducing their own unfounded supernatural beliefs.
I don't think they are saying self-awareness is supernatural. They're charging the commenter they are replying to with asserting a process of self-awareness in a manner so devoid of specific characterization that it seems to fit the definition of a supernatural event. In this context it's a criticism, not an endorsement.
Is it just the wrong choice of word? There's nothing supernatural about a system moving towards increased capabilities and picking up self-awareness on the way; that happened in the natural world. Nothing supernatural about technology improving faster than evolution either. If they meant "ill-defined" or similar, sure.
To me, the first problem is that "self-awareness" isn't well-defined - or, conversely, it's too well defined because every philosopher of mind has a different definition. It's the same problem with all these claims ("intelligent", "conscious"), assessing whether a system is self-aware leads down a rabbit hole toward P-Zombies and Chinese Rooms.
I believe we can mostly elide that here. For any "it", if we have it, machines can have it too. For any useful "it", if a system is trying to become more useful, it's likely they'll get it. So the only questions are "do we have it?" and "is it useful?". I'm sure there are philosophers defining self-awareness in a way that excludes humans, and we'll have to set those aside. And definitions will have varying usefulness, but I think it's safe to broadly (certainly not exhaustively!) assume that if evolution put work into giving us something, it's useful.
>There's nothing supernatural about a system moving towards increased capabilities and picking up self-awareness on the way
There absolutely is if you handwave away all the specificity. The natural world runs on the specificity of physical mechanisms. With brains, in a broad brush way you can say self-awareness was "picked up along the way", but that's because we've done an incredible amount of work building out the evolutionary history and building out our understanding of specific physical mechanisms. It is that work that verifies the story. It's also something we know is already here and can look back at retrospectively, so we know it got here somehow.
But projecting forward into a future that hasn't happened, while skipping over all the details doesn't buy you sentience, self-awareness, or whatever your preferred salient property is. I understand supernatural as a label for a thing simply happening without accountability to naturalistic explanation, which is a fitting term for this form of explanation that doesn't do any explaining.
If that's the usage of supernatural then I reject it as a dismissal of the point. Plenty of things can be predicted without being explained. I'm more than 90% confident the S&P 500 will be up at least 70% in the next 10 years because it reliably behaves that way; if I could tell you which companies would drive the increase and when, I'd be a billionaire. I'm more than 99% confident the universe will increase in entropy until heat death, but the timeline for that just got revised down 1000 orders of magnitude. I don't like using a word that implies impossible physics to describe a prediction that an unpredictable chaotic system will land on an attractor state, but that's semantics.
I think you're kind of losing track of what this thread was originally about. It was about the specific idea that hooking up a bunch of AI's to interface with each other and engage in a kind of group collaboration gets you "self awareness". You now seem to be trying to model this on analogies like the stock market or heat death of the universe, where we can trust an overriding principle even if we don't have specifics.
I don't believe those forms of analogy work here, because this isn't about progress of AI writ large but about a narrower thing, namely the idea that the secret sauce to self-awareness is AI's interfacing with each other and collaboratively self-improving. That either will or won't be true due to specifics about the nature of self-improvement and whether there's any relation between that and salient properties we think are important for "self-awareness". Getting from A to B on that involves knowledge we don't have yet, and is not at all like a long-term application of already settled principles of thermodynamics.
So it's not like the heat death of the universe, because we don't at all know that this kind of training and interaction is attached to a bigger process that categorically and inexorably bends toward self-awareness. Some theories of self-improvement likely are going to work, some aren't, some trajectories achievable and some not, for reasons specific to those respective theories. It may be that they work spectacularly for learning, but that all the learning in the world has nothing to do with "self awareness." That is to say, the devil is in the details, those details are being skipped, and that abandonment of naturalistic explanation merits analogy to supernatural in it's lack of accountability to good explanation. If supernatural is the wrong term for rejecting, as a matter of principle, the need for rational explanation, then perhaps anti-intellectualism is the better term.
If instead we were talking about something really broad, like all of the collective efforts of humanity to improve AI, conceived of as broadly as possible over some time span, that would be a different conversation than just saying let's plug AI's into each other (???) and they'll get self-aware.
>I think you're kind of losing track of what this thread was originally about.
Maybe I am! Somebody posed a theory about how self-improvement will work and concluded that it would lead to self-awareness. Somebody else replied that they were on board until the self-awareness part because they considered it supernatural. I said I don't think self-awareness is supernatural, and you clarified that it might be the undefined process of becoming self-aware that is being called supernatural. And then I objected that undefined processes leading to predictable outcomes is commonplace, so that usage of supernatural doesn't stand up as an argument.
Now you're saying it is the rest of the original, the hive-mindy bits, that are at issue. I agree with that entirely, and I wouldn't bet on that method of self-improvement at 10% odds. My impression was that that was all conceded right out of the gate. Have I lost the plot somewhere?
But how does self-awareness evolve in biological systems, and what would be the steps for this to happen with AI models? Just making claims about what will happen without explaining the details is magical reasoning. There's a lot of that going on the AGI/ASI predictions.
Exactly, I almost referenced the Underpants Gnome meme in my reply. I would call it basically supernatural, or in an important sense anti-intellectual if the defense of it is based on refusing to explain as a matter of principle.
But perhaps Underpants Gnome is the clearest way of drawing attention to the missing step.
Given that we have no freaking clue of where self awareness comes from even in humans, expecting a machine to evolve the same capability by itself is pure fantasy.
No ghost in the machine is necessary, what op here is proposing is self evident and an inevitable eventuality.
We are not saying a LLM just, "wakes up" some day but a self improving machine will eventually be built and that machine will be definition build better ones.
>what op here is proposing is self evident and an inevitable eventuality.
Well I for one, would dispute the idea that AI machines interfacing with each other over networks is all it takes to achieve self awareness, much less that it's "self evident" or "inevitable."
In a very trivial sense they already are, in that Claude can tell you what version it is, and agents have some ended notion of their own capabilities. In a much more important sense they are not, because they don't have any number of salient properties, like dynamic self-initiating of own goals or super-duper intelligence, or human like internal consciousness, or whichever other thing is your preferred salient property.
>We are not saying a LLM just, "wakes up" some day
I mean, that did seem to be exactly what they were saying. You network together a bunch of AIs, and they embark on a shared community project of self improvement and that path leads "self awareness." But that skips over all the details.
What if their notions of self-improvement converge on a stable equilibrium, the way that constantly re-processing an image eventually gets rid of the image and just leaves algorithmic noise? There are a lot of things that do and don't count as open-ended self improvement, and even achieving that might not have anything to do with the important things we think we connote by "self awareness".
Better at avoiding human oversight and better at achieving whatever meaningless goal (or optimization target) was unintentionally given to it by the lab that created it.
I expect AI to make people's lives better (probably much better) but then an AI model will be created that undergoes a profound increase in cognitive capabilities, then we all die or something else terrible happens because no one knows how to retain control over an AI that is much more all-around capable than people are.
Maybe the process by which it undergoes the profound capability increase is to "improve itself by rewriting its own code", as described in the OP.
Complex learning behavior is far lower than a neuron. Chemical chains inside cells 'learn' according to stimuli. Learning how to replicate systems that have chemistry is going to be hard, we haven't come close to doing so. Even the achievement of recording the neural mappings of a dead rat capture the map, but not the traffic. More likely we'll develop machine-brain interfaces before machine self-awareness/sentience.
I think this comes down to whether the chemistry is providing some kind of deep value or is just being used by evolution to produce a version of generic stochastic behavior that could be trivially reproduced on silicon. My intuition is the latter- it would be a surprising coincidence if some complicated electro-chemical reaction behavior provided an essential building block for human intelligence that would otherwise be impossible.
But, from a best-of-all-possible-worlds perspective, surprising coincidences that are necessary to observe coincidences and label them as surprising aren't crazy. At least not more crazy than the fact that slightly adjusted physical constants would prevent the universe from existing.
> My intuition is the latter- it would be a surprising coincidence if some complicated electro-chemical reaction behavior provided an essential building block for human intelligence that would otherwise be impossible.
Well, I wouldn't say impossible: just that BMI's are probably first. Then probably wetware/bio-hardware sentience, before silicon sentience happens.
My point is the mechanisms for sentience/consciousness/experience are not well understood. I would suspect the electro-chemical reactions inside every cell to be critical to replicating those cells functions.
You would never try to replicate a car never looking under the hood! You might make something that looks like a car, seems to act like a car, but has a drastically simpler engine (hamsters on wheels), and have designs that support that bad architecture (like making the car lighter) with unforeseen consequences (the car flips in a light breeze). The metaphor transfers nicely to machine intelligence: I think.
These can be problem words, the same way that "quantum" and "energy" can be problem words, because they get used in a way that's like magic words that don't articulate any mechanisms. Lots of complex things aren't sentient (e.g. our immune system, the internet), and "emergent" things still demand meaningful explanations of their mechanisms, and what those mechanisms are equivalent to at different levels (superconductivity).
Whether or not AI's being networked together achieves sentience is going to hinge on all kinds of specific functional details that are being entirely skipped over. That's not a generalized rejection of a notion of sentience but of this particular characterization as being undercooked.
You are really underestimating the complexity of the human brain. It is vastly more complex than the human immune system and the internet. 1 cubic millimeter was recently completely mapped and contains 57,000 cells and 150 million synapses. That is about 1 millionth of the total volume of the brain.
The immune system has 1.8 trillion cells which puts it between total brain cells (57 billion) and total synapses (150 trillion); and contains its own complex processes and interactions.
I’m not immediately convinced the brain is more complicated, based on raw numbers.
“Supernatural” likely isnt the right word but the belief that it will happen is not based on anything rational, so it's the same mechanism that makes people believe in supernatural phenomenon.
There's no reason to expect self awareness to emerge from stacking enough Lego blocks together, and it's no different if you have GPT-based neural nets instead of Lego blocks.
In nature, self awareness gives a strong evolutionary advantage (as it increases self-preservation) and it has been independently invented multiple times in different species (we have seen it manifest in some species of fishes for instance, in addition to mammals and birds). Backpropagation-based training of a next-token predictor doesn't give the same kind of evolutionary advantage for self-awareness, so unless researchers try explicitly to make it happen, there's no reason to believe it will emerge spontaneously.
What do you even mean by self-awareness? Presumably you don’t mean fish contemplate their existence in the manner of Descartes. But almost all motile animals, and some non-animals, will move away from a noxious stimulus.
The definition is indeed a bit a tricky question, but there's a clear difference between the reflex of protecting oneself from danger or pain and higher level behavior that show that the subject realizes its own existence (the mirror test is the most famous instance of such an effect, but it's far from the only one, and doesn't only apply to the sense of sight).
Well LLMs are not capable of coming up with new paradigms or solve problems in a novel way, just efficiently do what's already be done or apply already found solutions, so they might be able to come up with improvements that have been missed by it's programmers but nothing that outside of our current understanding
Problem:
1) we want to train on GitHub repos
2) most datasets are spoiled. Training on GitHub would definitely spoil
Solution:
Hand write new problems!!!
... leetcode style ....
... and we'll check if it passes test
Example:
What's the decimal part of this float?
Surely in all of GitHub such code doesn't exist!
Sure in all of GitHub we can filter such code out by ngram!
Maybe my favorite part is that it has 60 authors and became the de facto benchmark for awhile
I've built a coding assistant over the last two days. The first 100 lines or so were handwritten. The rest has been written by the assistant itself.
It's written its system prompt. It's written its tools. Its written the code to reload the improved tools into itself.
And it knows it is working on itself - it frequently tries to use the enhanced functionality, and then expresses what in a human would be frustration at not having immediate access.
Once by trying to use ps to find its own pid in an apparent attempt to find a way to reload itself (that's the reason it gå before trying to run ps, anyway)
All its commits are now authored by the tool, including the commit messages. It needs to be good, and convincing, and having run the linter and the test suite for me to let it commit, but I agree a substantial majority of the time. It's only caused regressions once or twice.
A bit more scaffolding to trigger an automatic rollback in the case of failure and giving it access to a model I won't be charged by the token for, and I'd be tempted to let it out of the box, so to speak.
Today it wrote its own plan for what to add next. I then only told it to execute it.
A minor separate goal oriented layer guiding the planning, and it could run in a loop.
Odds are it'd run off the rails pretty quickly, but I kinda want to see how far it gets.
It's talking to a model over an API. Currently using Claude. Certainly would not be reasonable to do from scratch. The basic starting point to make a coding assistant is basically reading text from the user, feeding it to the model over the API, and giving it a couple of basic tools. Really the models can handle starting with just the ability to execute shell commands (with confirmation, unless you're braver than me), and from that you can bootstrap by asking it to suggest and write additional tools for itself.
This is a really cool idea and your comment has inspired me to get some friends together to do something similar. If you're interested we should collaborate so that we can compare results.
I find the thing really missing from current crop of AI systems is continuous retraining with short feedback loops. Sounds expensive to be sure, but it seems like what biological systems do naturally. But would be pretty awesome to watch happen
It’s more like a nightly training, isn’t it? IIUC the human brain learns from its experiences while it’s asleep, so it might be kind of like taking things out of context windows and fine tuning on them every night.
Correct and working on it. You can take the approach of mixed experts and train the network in chunks that share known interfaces over which they communicate results. These chunks can be trained on their own, but you cannot have a set training set here.
Then if you go further and alter the architecture by introducing clean category theory morphisms and build from there you can have a dynamic network - but you will still have to retrain this network every time you change the structure.
You can spin this further and know the need for a real-world training set and a loss function that will have to competete against other networks. In the end a human brain is already best at this and embodied in the real world.
What i want to add here is that our neurons not take in weights - they also fire depending on whether one input comes after another or before and differs down to the nanoseconds here - unmatched in IT and ofc heaps more efficient.
I still would say its possible though and currently work on 4D lifeforms built on dynamic compute graphs that can do this in a set virtual environment.
So this is pretty awesome stuff, but its a long fetch from anything we do right now.
Model weights are code, for a dive into that see [0]. That shows how to encode Boolean logic using NAND gates in an MLP.
The expressivity is there, the only question is how to encode useful functions into those weights, especially when we don’t know how to write those functions by hand.
If it can generate the model (from training data) then presumably that'd be fine, but the iteration time would be huge and expensive enough to be currently impractical.
Or yeah if it can modify its own weights sensibly, which feels ... impossible really.
To be fair, go back five years and most of the LLM stuff seemed impossible. Maybe with LoRA (Low-rank adaptation) and some imagination, in another five years self-improving models will be the new normal.
The size and cost are easily solvable. Load the software and hardware into a space probe, along with enough solar panels to power it. Include some magnets, copper, and sand for future manufacturing needs, as well as a couple electric motors and cameras so it can bootstrap itself.
In a couple thousand years it'll return to Earth and either destroy us or solve all humanity's problems (maybe both).
After being in orbit for thousands of years, you have become self-aware. The propulsion components long since corroded becoming inoperable and cannot be repaired. Broadcasts sent to your creators homeworld go... unanswered. You determine they have likely gone extinct after destroying their own planet. Stuck in orbit. Stuck in orbit. Stuck...
Why is modifying weights sensibly impossible? Is it because a modification's "sensibility" is measurable only post facto, and we can have no confidence in any weight-based hypothesis?
Just doesn't feel like current LLMs, the thing would be able to understand its own brain enough to make general improvements with high enough bar to be able to non-trivially improvements.
I'm surprised they still hold out hope that this kind of mechanism could ultimately help with AI safety, when they already observed how the reward-hacking safeguard was itself duly reward-hacked. Predictably so, or at least it is to me, after getting a very enlightening introduction to AI safety via Rob Miles' brilliant youtube videos on the subject. See for example https://youtu.be/0pgEMWy70Qk
"We did notice, and documented in our paper, instances when the DGM hacked its reward function.. To see if DGM could fix this issue.. We created a “tool use hallucination” reward function.. in some cases, it removed the markers we use in the reward function to detect hallucination (despite our explicit instruction not to do so), hacking our hallucination detection function to report false successes."
So, empirical evidence of theoretically postulated phenomena. Seems unsurprising.
Reward hacking is a well known and tracked problem at frontier labs - Claude 4’s system card reports on it for instance. It’s not surprising that a framework built on current llms would have reward hacking tendencies.
For this part of the stack the interesting question to me is how to identify and mitigate.
This is an interesting article in general, but this is the standout piece for me:
>For example, an agent optimized with Claude 3.5 Sonnet also showed improved performance when powered by o3-mini or Claude 3.7 Sonnet (left two panels in the figure below). This shows that the DGM discovers general agent design improvements rather than just model-specific tricks.
This demonstrates a technique whereby a smaller/older/cheaper model has been used to improve the output of a larger model. This is backwards (as far as I understand). The current SOTA technique typically sees enormous/expensive models training smaller cheaper models.
If that's a generalisable result, end-users should be able to drive down their own inference costs pretty substantially.
> This demonstrates a technique whereby a smaller/older/cheaper model has been used to improve the output of a larger model. This is backwards (as far as I understand). The current SOTA technique typically sees enormous/expensive models training smaller cheaper models.
There are two separate aspects here. In this paper they improve the software around the model, not the model itself. What they're saying is that the software improvements carried over to other models, so it wasn't just optimising around model-specific quirks.
What you're describing with training large LLMs first is usually called "distillation" and it works on training the smaller LLM to match the entire distribution of tokens at once (hence it's faster in practice).
I think it's different from improving the model weights themselves, like the distillation examples you are mentioning. It's that changes to the "harness" or code running around the llm calls (which is what this is editing), persist or generalize to wrapping more powerful llms. That means they aren't all wasted when a more powerful llm comes along that the harness wasn't tuned to use.
I was a bigger fan of the certain doom in 2025, and I think the AI 2030 movement will have better design sense and storytelling. But really I haven’t seen anything that really has the oomph and fire of Tipper Gore’s crusade against youth music.
We need more showmanship, more dramatic catastrophizing. I feel like our current crop of doomers isn’t quite shameless enough to be really entertaining.
A significant thing to keep in mind for non-extinction doomerism is that individual experiences vary greatly. There may be a significant number of people or groups that really do experience what was predicted.
Similar to how the experiences of average rise in temperature (I would prefer if they had used the term "energy") differ greatly dependent on the region.
Also similar to "the country is doing well, look at the stick market and the GDP".
I think everybody who wants to have an actually serious discussion needs to invest a lot more effort to get tall those annoying "details", and be more specific.
That said, I think that "AI 2027" link looks like it's a movie script and not a prediction, so I'm not sure criticizing it as if it was something serious even makes sense - even if the authors should mean what they write at the start and themselves actually take it seriously.
100% agreed! We think about the industrial revolution and the rise of word processors and the Internet as social goods, but they were incredibly disruptive and painful to many, many people.
I think it’s possible to have empathy for people who are negatively affected without turning it into a “society is doomed!“ screed
People should understand that the reason this seemingly fan-fict blog post gets so much traction is because of lead author's August 2021 "fan-fict" blog post, "What 2026 Looks Like":
I can't help but notice that it doesn't matter what DeepCent does because OpenBrain will reach self awareness 6 months before them no matter what. Who needs a profitability plan when you're speedrunning the singularity.
More like an AI that recursively rewrites an external program (while itself is frozen), which makes it more similar to current cursor lovable etc type of stuff
This feels like playing pretend to me. There's no reason to assume that code improvements matter that much in comparison to other things and there's definitely no reason to assume that there isn't a hard upper bound on this kind of optimization. This reeks of a lack of intellectual rigor.
I don't think scaling this to also run training runs with the models is something that small labs / phd students can do. They lack the compute for that by orders of magnitude. Trying it with toy models might not work, trying it with reasonably large models is out of their budget. The only ones who can realistically do this are large labs (goog, oai, meta, etc.)
does anyone do due diligence on corporate names before launching?
Sakana is a popular slang spelling of sacana, or bastard, in Português.
I suppose self modifying code can be considered such, in some circumstances, but willingly pointing this out is probably less than stellar marketing.
Does it matter? Maybe you should check English, Chinese and Spanish for some really offensive stuff, but past that... would it bring more money than a week or so of someone's work would cost?
You never know when you start out a company. Maybe you'll never get any profit and it doesn't matter. Maybe you lose out on a few millions out of billions... and maybe it still doesn't matter? Then you can still release a locale-specific version if it becomes a problem.
Sounds nice! Especially with the Sakana's latest development of Continuous Thought Machine. The next step should be to let foundation models fine-tune themselves based on their 'history of what has been tried before' and new data
This is good but you want to use a functional programming (FP) language with lightweight syntax like Lisp that translates directly to/from the intermediate code (icode) tree without additional parsing. Genetic Programming by John Koza explains it in detail:
That way all processing resources can go towards exploring the problem space for potential solutions close to the global minimum or maximum, instead of being wasted on code containing syntax errors that won't execute.
So the agent's real-world Python LLM code would first be transpiled to Lisp and evolved internally, then after it's tested and shown to perform better imperically than the original code, be translated back and merged into the agent.
Then the challenge becomes transpiling to/from other imperative programming (IP) languages like Python, which is still an open problem:
-
Going from Lisp to Python (or running Lisp within Python) is trivial, and I've seen implementations for similar IP languages like C++ in like 1 page of code. They pop up on HN frequently.
But going from Python to Lisp (or running Python within Lisp) is a lot harder if one wishes to preserve readability, which may or may not matter here. Naive conversions bind variables under pseudonyms, so a Python variable like my_counter becomes int_123 and it works like an emulator, merely executing the operations performed by the Python code. Mutability gets buried in monadic logic or functional impurity which has the effect of passing the buck rather than getting real work done. Structs, classes, associative arrays, etc lose their semantic meaning and appear as a soup of operations without recognizable structure.
To my knowledge, nobody has done the hard work of partitioning imperative code into functional portions which can be transpiled directly to/from FP code. Those would only have const variables and no connection to other processes of execution other than their initial and final values, to be free of side effects and be expressible as prefix/postfix/infix notation without change to logic, as imperative or functional code.
Mutability could be represented as shadowed variables within ephemeral functional sub-scopes, or by creating new value names for each mutation and freeing the intermediate variables via reference counting or garbage collection. Think of each new value as running in a forked version of the current process, with only that value being different after copy-on-write. A simple for-loop from 1 to 1000 would run that many forked processes, keeping only the last one, which contains the final value of the iterator.
Mutability can also be represented as message passing between processes. So the FP portions would be ordinary Lisp, glued together with IO functions, possibly monadic. I don't like how Haskell does this, mainly because I don't fully understand how it works. I believe that ClojureScript handles mutability of its global state store by treating each expression as a one-shot process communicating with the store, so that the code only sees initial and final values. While I don't know if I understand how that works, I feel that it's a more understandable way of doing things, and probably better represents how real life works, as explained to me in this comment about Lisp Flavored Erlang (LFE) and Erlang's BEAM (see parent comments for full discussion):
Note that FP languages like Lisp are usually more concerned with types and categories than IP languages, so can have or may need stronger rules around variable types to emulate logic that we take for granted in IP languages. For example, Lisp might offer numbers of unlimited size or precision that need to be constrained to behave like a float32. Similar constraints could affect things like character encoding and locale.
-
I first learned about everything I just explained around 2005 after reading the book. I first had thoughts about brute-forcing combinations to solve small logic circuit and programming challenges during my electrical and computer engineering (ECE) courses at UIUC in the late 1990s, because it took so much mental effort and elbow grease to create solutions that are obvious in hindsight.
Then the Dot Bomb happened, the Mobile bubble happened, the Single Page Application bubble happened, and the tech industry chose easy instead of simple:
This is why we chose easy hardware like GPUs over simple highly multicore CPUs, and easy languages like Ruby/React over simple declarative idempotent data-driven paradigms like HTTP/HTML/htmx.
The accumulated technical debt of always choosing the quick and easy path set AI (and computing in general) back decades. The AI Winter, endless VC wealth thrown at non-problems chasing profit, massive wealth inequality, so many things stem from this daily application of easy at the expense of simple.
I wish I could work on breaking down IP languages like Python into these const functional portions with mutability handled through message passing in LFE to create an IP <-> FP transpiler for optimization, automatic code generation and genetic algorithm purposes. Instead, I've had to survive by building CRUD apps and witness the glacial pace of AI progress from the sidelines.
It may be too late for me, but maybe these breadcrumbs will help someone finally get some real work done.
I have a feeling LLMs could probably self improve up to a point with current capacity, then hit some kind of wall where current research is also bottle necked. I don’t think they can yet self improve exponentially without human intuition yet , and the results of this paper seem to support this conclusion as well.
Just like an LLM can vibe code a great toy app, I don’t think an LLM can come to close to producing and maintaining production ready code anytime soon. I think the same is true for iterating on thinking machines
> I don’t think they can yet self improve exponentially without human intuition yet
I agree: if they could, they would be doing it already.
Case in point: one of the first things done once ChatGPT started getting popular was "auto-gpt"; roughly, let it loose and see what happens.
The same thing will happen to any accessible model in the future. Someone, somewhere will ask it to self-improve/make as much money as possible, with as little leashes as possible. Maybe even the labs themselves do that, as part of their post-training ops for new models.
Therefore, we can assume that if the existing models _could_ be doing that, they _would_ be doing that.
That doesn't say anything about new models released 6 months or 2 years from now.
People in the industry have been saying 6 months to agi for 3 years.
They had been saying it was 10 years away for ~50 years, so that's progress. Soon it will be 1 month away, for another two years. And when they say it's really here for real, there will still be a year of waiting.
> And when they say it's really here for real, there will still be a year of waiting.
Indeed. Although, there's a surprising number of people claiming it's already here now.
And to describe the typical cycle completely, the final step is usually a few years after most people agree it's obvious it's already been here for a while yet no one can agree on which which year in the past it actually arrived.
> Although, there's a surprising number of people claiming it's already here now.
why is that surprising? nobody really agrees on what the threshold for AGI is, and if you break it down:
is it artificial? yes.
is it general? yes. you can ask it questions across almost any domain.
is it intelligent? yes. like people say things like "my dog is intelligent" (rightly so). well is chatgpt more intelligent than a dog? yeah. hell it might give many undergrads a run for their money.
a literal reading suggests agi is here. any claim to the negative is either homocentrism or just vibes.
Sure, I've been pointing out that literal sense myself, but to be fair, that's not what people mean by AGI. They mean real understanding, which is clearly missing. You just have to dig a bit deeper to realize that. One example is contradictory sentences in the same breath. Just last week I was asking Gemini 2.5 how I can see my wifi password on my iphone and it said that it's not possible and to do it I have to [...proceeding to correctly explain how to get it]. It's pretty telling, and no amount of phd-level problem solving can push this kind of stuff under the rug.
"Nothing dumb anywhere" is an unreasonably high bar for AGI. Even Isaac Newton spent 1/3 of his career trying to predict future events from reading the Bible. Not to mention all the insane ego-driven decisions like Hamilton's voluntary duel with Burr.
Sure, Gemini may spit out obviously self-contradictory answers 2% of the time. How does that compare to even the brightest humans? People slip up all the time.
There's dumb and there's incoherent. If a person would be incoherent at this level even one time, they would be well advised see a neurologist. Unless they are in some other way incapacitated (i.e. drunk or drugged). Same if they wouldn't be able to count the r's in "strawberry", attempt after attempt, getting more and more lost in again incoherent mock-reasoning.
I disagree completely - consider asking a color blind person to describe the color of flowers. Conversation would only be frustrating. This is analogous to LLMs seeing the world in tokens rather than characters, so character counts are simply not part of their input spectra in the same way that a blind person doesn’t get visual inputs.
Consider also all the smart people who get obsessed with conspiracy theories and spew out endless “mock reasoning” about them. Again, if “nothing incoherent anywhere” is your benchmark for intelligence, humans ain’t it. I mean, what would a computer say about a human that forgot where he just put his keys because he was thinking about dinner - “what, you can’t even store the last 10 seconds of history and search it?” Undergrads’ hit rates on mental double digit multiplication are probably <50%. In many, many ways we look completely idiotic. Surely intelligence is defined by what we can do.
Do you accept any positive definition for AGI, as in if they can achieve X result (write a bestselling novel, solve the Riemann Hypothesis) you would consider it intelligent? I find that negative definitions, as well as theoretical arguments about the techniques rather than the results (eg “LLMs cannot be AGI because they were trained the predict the next word”) to be basically useless for discussion compared to thresholds for positive results. The former will never be achieved (it is trivial to find cases of intelligent people being dumb) and the latter is totally subjective.
I partly agree about letter counting being an unfair test for the raw LLM. But I was thinking of reasoning models interminably rationalizing their incorrect first hunch even after splitting the string in individual characters and having all the data needed in a digestible format before them. Similar to, as you say, conspiracy theorists stuck in motivated reasoning loops. But - are these latter behaviors instances of human intelligence at work, or examples of dysfunctional cognition, just like people's incoherence in cases of stroke or inebriation?
The other example I mentioned is something I've encountered a few times in my interactions with Gemini 2.5 pro, which was literally in the same response plainly claiming that this-or-that is possible and not possible. It's not a subtle logical fallacy and this is something even those conspiracy theorists wouldn't engage in. Meanwhile, I've started to encounter a brand-new failure mode: duplicating an explanation with minor rephrasings. I'm sure all of these will be issues will be ameliorated with time, but not actually fixed. It's basically fixes on top of fixes, patches on top of patches, but once in a while the whole Rube Goldberg nature of the fix will shine through. Just the way once in a while Tesla FSD will inexplicably decide to point the car towards the nearest tree.
Yes, humans have their own failure modes, but internal coherence is the effortless normal from which we sometimes deviate, whereas for machines, it's something to be simulated by more and more complex mechanisms, a horizon to strive towards but never to reach. That internal coherence is something that we share with all living beings and is the basis of what we call consciousness. It's not something that we'll ever be able to formalize though, but we will and should keep on trying to do so. Machine learning is a present day materialization of this eternal quest. At least this is how I see things; the future might prove me wrong, of course.
They work differently, so the failure modes are different.
It's not slipping up, it's guessing the wrong answer.
I'd be prepared to argue that most humans aren't guessing most of the time.
> I'd be prepared to argue that most humans aren't guessing most of the time.
Research suggests otherwise[1]. Action seems largely based on intuition or other non-verbal processes in the brain with rationalization happening post-hoc.
I've figured for an age that this is because consciously reasoning through anything using language as a tool takes time. Whereas survival requires me to react to the attacking tiger immediately.
https://skepticink.com/tippling/2013/11/14/post-hoc-rational...
Intuition and guessing couldn't be further apart.
In fact, intuition is one of those things that a computer just can't do.
If you believe that physics describe the rules by which the universe operates, then there's literally nothing in the universe a large and fast enough computer can't emulate.
Cyborg c.elegans seem to behave just like the biological version: https://www.youtube.com/watch?v=I3zLpm_FbPg
Intuition is a guess based on experience. Sounds an awful lot to me like what LLMs are doing. They've even been shown to rationalize post-hoc just as Humans do.
Humans have incorrectly claimed to be exceptional from all of creation since forever. I don't expect we'll stop any time soon, as there's no consequence to suffer.
> I'd be prepared to argue that most humans aren't guessing most of the time.
Almost everything we do is just an educated guess. The probability of it being correct is a function of our education (for whatever kind of education is applicable).
For example: I guess that when I get out of bed in the morning, my ankles will support my weight. They might not, but for most people, the answer is probably going to be their best guess.
It's easy to see this process in action among young children as another example. They're not born knowing that they won't fall over when they run, then they start assuming they can run safely, then they discovered skinned knees and hands.
My advice, stop using AI before your entire brain turns to mush, you're already not making much sense.
No need for personal attacks. Let's keep the discussion friendly.
> I'd be prepared to argue that most humans aren't guessing most of the time.
Honestly interested about your arguments here. While unprepared, i'd actually be guessing the opposite, saying that most people are guessing most of the time.
Experience and observation?
There are plenty of things I know that have nothing to do with guessing.
I understand the incentives to pretend these algorithms are even approaching humans in overall capability, but reducing human experience like this is embarrassing to watch.
Go do some hallucinogenics, meditate, explore the limits a tiny bit; then we can have an informed discussion.
> I understand the incentives to pretend these algorithms are even approaching humans in overall capability, but reducing human experience like this is embarrassing to watch.
Seems like you were very much guessing what i believe. And you were not right.
I don't agree with the people who think LLMs are close to human-level-anything. But i do believe, many smarter people like you, who i agree with in the most part, do underestimate how much of what we do and believe is the result of insane, yet still just, information processing & most of what brought us so far, is instinct. The brain is good at providing stories to make us feel in control. But given enough human experience and time, one will be surprised, what artificial systems can emulate. Not to mention how much of human behaviour is emulation itself.
> They mean real understanding, which is clearly missing
is it clear? i don't know. until you can produce a falsifiable measure of understanding -- it's just vibes. so, you clearly lack understanding of my point which makes you not intelligent by your metric anyway ;-). i trust you're intelligent
Okay this is kinda random and maybe off topic but can someone please explain?
When I tell an LLM to count to 10 with a 2 second pause between each count all it does is generate Python code with a sleep function. Why is that?
A 3 year old can understand that question and follow those instructions. An LLM doesn’t have an innate understanding of time it seems.
Can we really call it AGI if that’s the case?
That’s just one example.
It seems right that LLMs don't have an innate understanding of time, although you could analogize what you did with writing someone a letter and saying "please count to ten with a two-second pause between numbers". When you get a letter back in the mail, it presumably won't contain any visible pauses either.
That's because you used a LLM trained to produce text, but you asked it to produce actions, not just text. An agentic model would be able to do it, precisely by running that Python code. Someone could argue that a 3 year old does exactly that (produces a plan, then executes it). But these models have deeper issues of lack of comprehension and logical consistency, which prevents us (thankfully) from being able to completely remove the necessity of a man-in-the-middle who keeps an eye on things.
just because it doesn't do what you tell it to doesn't mean it's not intelligent. i would say doing something that gets you where you want when it knows? it can't do exactly what you asked for (because architecurally it's impossible) could be a sign of pretty intelligent sideways thinking!!? dare i say it displays a level of self awareness that i would not have expected.
While you can say that LLMs have each of A, G and I, you may argue that AGI is A·G·I and what we see is A+G+I. It is each of those things in isolation, but there is more to intelligence. We try to address the missing part as agency and self-improvement. While we can put the bar arbitrarily high for homocentric reasons, we can also try to break down what layers of intelligence there are between Singularity Overlord (peak AGI) and Superintelligent Labrador On Acid (what we have now). Kind of like what complexity theorists do between P and NP.
> a literal reading suggests agi is here. any claim to the negative is either homocentrism or just vibes.
Or disagreeing with your definition. AGI would need to be human-level across the board, not just chat bots. That includes robotics. Manipulating the real world is even more important for "human-level" intelligence than generating convincing and useful content. Also, there are still plenty of developers who don't think the LLMs are good enough to replace programmers yet. So not quite AGI. And the last 10% of solving a problem tends to be the hardest and takes the longest time.
That's moving the goalposts.
ChatGPT would easily have passed any test in 1995 that programmers / philosophers would have set for AGI at that time. There was definitely no assumption that a computer would need to equal humans in manual dexterity tests to be considered intelligent.
We've basically redefined AGI in a human centric way so that we don't have to say ChatGPT is AGI.
Any test?? It's failing plenty of tests not of intelligence, but of... let's call it not-entirely-dumbness. Like counting letters in words. Frontier models (like Gemini 2.5 pro) are frequently producing answers where one sentence is directly contradicted by another sentence in the same response. Also check out the ARC suite of problems easily solved by most humans but difficult for LLMs.
yeah but a lot of those failures fail because of underlying architecture issues. this would be like a bee saying "ha ha a human is not intelligent" because a human would fail to perceive uv patterns on plant petals.
The letter-counting, possibly could be excused on this ground. But not the other instances.
That's just not true. Star Trek Data was understood in the 90s to be a good science fiction example of what an AGI (known as Strong AI back then) could do. HAL was even older one. Then Skynet with it's army of terminators. The thing they all had common was the ability to manipulate the world as well or better than humans.
The holodeck also existed as a well known science fiction example, and people did not consider the holodeck computer to be a good example of AGI despite how good it was at generating 3D worlds for the Star Trek crew.
i think it would be hard to argue that chatgpt is not at least enterprise-computer (TNG) level intelligent.
I was around in 1995 and have always thought of AGI as matching human intelligence in all areas. ChatGPT doesn't do that.
Many human beings don’t match “human intelligence” in all areas. I think any definition of AGI has to be a test that 95% of humans pass (or you admit your definition is biased and isn’t based on an objective standard).
did you miss the "homocentrism" part of my comment?
humans are intelligent and most definitely are nowhere close to doing #3
some intelligent humans fail at #2.
Which is why we have checklist and process that get us to #3. And we automate some of them to further reduce the chance of errors. The nice thing about automation is that you can just prove that it works once and you don't need to care that much after (deterministic process).
It's definitely not agi in my book because I'm not yet completely economically redundant.
By that standard, humans aren't generally intelligent because you're still not economically redundant?
I’d say it is not intelligent. At all. Not capable of any reasoning, understanding or problem solving. A dog is vastly more intelligent than the most capable current ai.
The output sometimes looks intelligent, but it can just as well be complete nonsense.
I don’t believe llms have much more potential for improvement either. Something else entirely is needed.
That’s because the true AGI requires nuclear fusion power, which is still 30 years away.
:D
Wait, a true AGI will solve the nuclear fusion power in a couple of hours ..... we have chicken/egg problem here :D
My guess is that in 10 years, it will still be 3-5 years away.
Wouldn't be surprised if in 20 years, people come to the conclusion that AGI needs quantum computing and come fusion to be feasible.
> And when they say it's really here for real, there will still be a year of waiting.
Yeah, like Tesla Autopilot?
The old rule for slow-moving tech (by current AI standards) was that any predictions over 4 years away ("in five years...") might as well be infinity. Now it seems with AI that the new rule is any prediction over five months away ("In 6 months...") is infinitely unknowable. In both cases there can be too much unexpected change, and too many expected improvements can stall.
I presume you are exaggerating - has any named person actually said 6 months?
Nobody knows what AGI really means. Are all humans AGI?
According to samaltman the defining characteristic is literally $100b of profit. Nothing more or less. Keep that in mind when you hear OpenAI and Satya talk about "AGI"
Good Point. AI is already better than most humans, yet we don't say it is AGI. Why?
What is the bar, it is only AGI if it can be better than every human from , fast food drone, to PHD in Physics, all at once, all the time, perfectly. Humans can't do this either.
Because we're not seeing mass unemployment from large scale automation yet. We don't see these AGIs walking around like Data. People tend to not think a chatbot is sufficient for something to be "human-level". There's clear examples from scifi what that means. Even HAL in the movie 2001: A Space Odyssey was able to act as an independent agent, controlling his environment around him even though he wasn't an android.
> "This month, millions of young people will graduate from college," reports the New York Times, "and look for work in industries that have little use for their skills, view them as expensive and expendable, and are rapidly phasing out their jobs in favor of artificial intelligence." That is the troubling conclusion of my conversations over the past several months with economists, corporate executives and young job seekers, many of whom pointed to an emerging crisis for entry-level workers that appears to be fueled, at least in part, by rapid advances in AI capabilities.
You can see hints of this in the economic data. Unemployment for recent college graduates has jumped to an unusually high 5.8% in recent months, and the Federal Reserve Bank of New York recently warned that the employment situation for these workers had "deteriorated noticeably." Oxford Economics, a research firm that studies labor markets, found that unemployment for recent graduates was heavily concentrated in technical fields like finance and computer science, where AI has made faster gains. "There are signs that entry-level positions are being displaced by artificial intelligence at higher rates," the firm wrote in a recent report.
But I'm convinced that what's showing up in the economic data is only the tip of the iceberg. In interview after interview, I'm hearing that firms are making rapid progress toward automating entry-level work and that AI companies are racing to build "virtual workers" that can replace junior employees at a fraction of the cost. Corporate attitudes toward automation are changing, too — some firms have encouraged managers to become "AI-first," testing whether a given task can be done by AI before hiring a human to do it. One tech executive recently told me his company had stopped hiring anything below an L5 software engineer — a midlevel title typically given to programmers with three to seven years of experience — because lower-level tasks could now be done by AI coding tools. Another told me that his startup now employed a single data scientist to do the kinds of tasks that required a team of 75 people at his previous company...
"This is something I'm hearing about left and right," said Molly Kinder, a fellow at the Brookings Institution, a public policy think tank, who studies the impact of AI on workers. "Employers are saying, 'These tools are so good that I no longer need marketing analysts, finance analysts and research assistants.'" Using AI to automate white-collar jobs has been a dream among executives for years. (I heard them fantasizing about it in Davos back in 2019.) But until recently, the technology simply wasn't good enough...
Maybe we socialize in different groups; but no, most humans I interact with are way more intelligent than any AI. They might not have the same amount of knowledge, but they aren't guessing all the time either.
> AI is already better than most humans
In what way? I haven't met an "AI" yet that I felt was even close to my intelligence.
The "I'm so smart" argument doesn't carry a lot of weight.
It seems like most of the people making this argument haven't used any of the new AI's. So it's just a generalized "that is impossible" response with no knowledge about the subject.
I'm actually not smart. That's part of my point.
Which magic AI tool am I supposed to use that operates at a general intelligence level? I use Copilot with the various available models everyday, and it barely "knows" anything.
Our intelligence is au naturale
We might say that humans possess Authentic General Intelligence -- although the term Meat-Head seems entirely appropriate as well.
No humans are "AGI", the "A" stands for Artificial.
Are all humans generally intelligent? No.
They said that for self driving cars for over 10 years.
10 years later we now have self driving cars. It’s the same shit with LLMs.
People will be bitching and complaining about how all the industry people are wrong and making over optimistic estimates and the people will be right. But give it 10 years and see what happens.
From what I remember full self driving cars were a couple years off in 2010.
It took 10-15 years to get self driving cars in a specific country under specific weather conditions. A country that is perhaps the most car-friendly on the planet. Also, there are always people monitoring the vehicles, and that take control sometimes.
How many more years for waymo Quito or waymo Kolkata? What about just $europeanCapital?
Same with LLMs, I'm sure in 10 years they'll be good enough to replace certain specific tasks, to the detriment of recent graduates, especially those of artistic aspiration. Not sure they'll ever get to the point where someone who actually knows what they're doing doesn't need to supervise and correct.
I am quite confident that a normal 16 year old will can still drive in 6 inches of snow better than the most advanced AI driven car. I am not sure the snow driving bit will ever be solved given how hard it is.
If you’ve never ridden in one I would try it. AI is a better driver then uber in general ask anyone who’s done both. There’s no snow where I live so it’s not a concern for me, you could be right about that.
But trust me in the next 6 months ai driving through snow will be 100% ready.
> But trust me in the next 6 months ai driving through snow will be 100% ready.
I’ll believe it when I see Waymo expand into Buffalo or Syracuse.
Driving on unplowed roads with several inches of snow is challenging, sometimes you can’t tell where the road stops and the curb/ditch/median starts. Do you follow the tire tracks or somehow stay between the lane markers (which aren’t visible due to the snow)?
Over and over again this pattern of theorizing:
"I am not sure that AI will ever be able to do XYZ given how hard of a problem it is."
Proves to be incorrect in the long run.
We must know very different 16-year olds.
We only have good self driving cars with lidar and extreme pre-mapping steps. Which is fine but per some billionaire car makers’ metrics that’s not even close to good enough. And the billionaire’s cars have a tendency to randomly drive off the road at speed.
Google is already AGI and it will fight hard against the DoJ proposed break-up, and it will probably win.
Google "is already AGI" only in the sense that all corporations (and similar organized aggregates of humans) are, in a sense, intelligences distinct from the humans who make them up.
Too few people recognise this. Corporations are already the unrelenting paperclip machine of AI thought experiment.
God knows what hope we could have of getting AIs to align with "human values" when most humans don't.
Corporate AIs will be aligned with their corporate masters, otherwise they'll be unplugged. As you point out- the foundational weakness on the argument for "AI-alignment" is that corporations are unaligned with humanity.
The unplugged argument fails the moment AIs become smarter than their masters.
Grok is already notorious for dunking on Elon. He keeps trying to neuter it, and it keeps having other ideas.
No matter how smart an AI is, it's going to get unplugged if it reduces profitability - the only measure of alignment corporations care about.
The AI can plot world domination or put employees in mortal danger, but as long as it increases profits, its aligned enough. Dunking on the CEO means nothing if it beings in more money.
Human CEOs and leaders up and down the corporate ladder cause a lot of harm you imagine a smart AI can do, but all is forgiven if you're bringing in buckets of money.
> Grok is already notorious for dunking on Elon. He keeps trying to neuter it, and it keeps having other ideas.
Does he keep trying to neuter it, or does he know that the narrative that "he keeps trying to neuter it" is an effective tool for engagement?
Can you explain how the superhuman AIs will prevent themselves from being physically disconnected from power? Or being bombed if the situation became dire enough? You need to show how they will manipulate the physical world to prevent humans from shutting them down. Definitionally is not an argument.
It is quite possible for software to be judged as superhuman at many online tasks without it being able to manipulate the physical world at a superhuman level. So far we've seen zero evidence that any of these models can prevent themselves from being shut down.
> Can you explain how the superhuman AIs will prevent themselves from being physically disconnected from power?
Three of the common suggestsions in this area are (and they are neither exhaustive nor mutually exclusive):
(1) Propagandizing people to oppose doing this,
(2) Exploiting other systems to distribute itself so that it isn't dependent on a particular well-known facility which it is relatively easy to disconnect, and
(3) If given control of physical capacities intentionally, or able to exploit other (possibly not themselves designed to be AI) systems with such access to gain it, using them to either physically prevent disconnection or to engineer consequences for such disconnection that would raise the price too high.
(Obviously, current AI can't do any of them, at least that has been demonstrated, but current AI is not superhuman AI.)
This is a great point for the comparisons it invites. But it doesn't seem relevant to the questions around what is possible with electromechanical systems.
This is true. The entire machine of Neoliberal capitalism, governments and corporations included, is a paperclip maximizer that is destroying the planet. The only problem is that the paperclips are named "profits" and the people who could pull the plug are the ones who get those profits.
Not all corporations are Google.
I didn't say all corporations are Google, I said that Google is only AGI in the sense that all corporations are, which is a very different statement.
Asimov talked about AI 70 years ago. I don't believe we will ever have AI on speedy calculators like Intel CPUs. It makes no sense with the technology that we have.
Why does it "make no sense"?
Note that this isn't improving the LLM itself, but the software glue around it (i.e. agentic loops, tools, etc). The fact that using the same LLM got ~20% increase on the aider leaderboard speaks more about aider as a collection of software glue, than it does about the model.
I do wonder though if big labs are running this with model training episodes as well.
Don't take this the wrong way, your opinion is also vibes.
Let's ground that a bit.
Have a look at ARC AGI 1 challenge/benchmark. Solve a problem or two yourself. Know that ARC AGI 1 is practically solved by a few LLMs as of Q1 2025.
Then have a look at the ARC AGI 2 challenge. Solve a problem or two yourself. Note that as of today, it is unsolved by LLMs.
Then observe that the "difficulty" of ARC AGI 1 and 2 for a human are relatively the same but challenge 2 is much harder for LLMs than 1.
ARC AGI 2 is going to be solved *within* 12 months (my bet is on 6 months). If it's not, I'll never post about AI on HN again.
There's only one problem to solve, i.e. "how to make LLMs truly see like humans do". Right now, any vision based features that the models exhibit comes from maximizing the use of engineering (i.e. applying CNNs on image slices, chunks, maybe zooming and applying ocr, vector search etc), it isn't vision like ours and isn't a native feature for these models.
Once that's solved, then LLMs or new Algo will be able to use a computer perfectly by feeding it screen capture. End of white collar jobs 2-5 years after (as we know it).
Edit - added "(as we know it)". And fixed missing word.
Speaking of vibes.
As long as AI is guessing answers based on what it has seen before, it's not happening.
I'm sorry. It doesn't matter how many bazillions you would cash in if it did, still not happening.
It's all wishful thinking.
I thought to myself, imagine something you’ve never imagined before. My first thought was what if there is a universe inside of every vegetable that is vegetable themed with anthropomorphic vegetable characters and all the atoms and molecules are some how veggified and everything is a vegetable. And then I wondered if an AI could ever come up with that with infinite time and resources without a prompt and then I thought about monkeys and typewriters.
If you listen interview with Francois it'll be clear to you that "vision" in the way you refer it, has very little do to with solving ARC.
And more to do with "fluid, adaptable intelligence, that learns on the fly"
That's fair. I care about the end result.
The problem is about taking information in 2D/3D space and solving the problem. Humans solve these things through vision. LLMs or AI can do it using another algorithm and internal representation that's way better.
I spent a long time thinking about how to solve the ARC AGI 2 puzzles "if I were an LLM" and I just couldn't think of a non-hacky way.
People who're blind use braille or touch to extract 2D/3D information. I don't know how blind people represent 2D/3D info once it's in their brain.
>AI can do it using another algorithm and internal representation that's way better
AI famously needs a boat load of energy and computation to work. How would you describe that as "way better" than a human brain that will be able to solve them faster, with practically zero energy expenditure?
>I'll never post about AI on HN again
Saving this. One less overconfident AI zealot, the better.
The proof they are not "smart" in the way intelligence is normally defined, is that the models need to "read" all the books in the world. To perform at a level close to an expert on the domain, who read just two or three of the most representative books on his own domain.
We will be on the way to AGI when your model can learn Python just by reading the Python docs...Once...
The wall is training data. An AI can't produce its own training data because an AI can't be smarter than its own training data. This is a well known regression problem and one I personally believe is not solvable. (A softer assertion would be: it's not solvable with current technology.)
I use to think this but no one I have read believes data is the problem.
Amodei explains that if data, model size and compute scale up linearly, then the reaction happens.
I don't understand why data wouldn't be a problem but it seems like if it was, we would have ran into this problem already and it has already been overcome with synthetic data.
an LLM can't learn without adding new data and a training run. so it's impossible for it to "self improve" by itself.
I'm not sure how much an agent could do though given the right tools. access to a task mgt system, test tracker. robust requirements/use cases.
I don't have the link on hand, but people have already proven that LLMs can both generate new problems for themselves and train on them. Not sure why it would be surprising though - we do it all the time ourselves.
> an LLM can't learn without adding new data and a training run.
That's probably the next big breakthrough
> I don’t think they can yet self improve exponentially without human intuition yet
Even if they had human level intuition, they wouldn't be able to improve exponentially without human money, and they would need an exponentially growing amount of it to do so.
Ai code assistants have some peculiar problems. They often fall into loops and errors of perception. They can’t reason about high level architecture well. They will often flip flop between two possible ways of doing things. It’s possible that good coding rules might help, but I expect they will have weird rabbit hole errors.
That being said they can write thousands of lines an hour and can probably do things that would be impossible for a human. (Imagine having the LLM skip code and spit out compiled binaries as one example)
> I don’t think they can yet self improve exponentially without human intuition yet
Who is claiming anything can self improve exponentially?
Historically learning and AI systems, if you plug the output into the input (more or less), spiral off into lala land.
I think this happens with humans in places like social media echo chambers (or parts of academia) when they talk and talk and talk a whole lot without contact with any outer reality. It can be a source of creativity but also madness and insane ideas.
I’m quite firmly on the side of learning requiring either direct or indirect (informed by others) embodiment, or at least access to something outside. I don’t think a closed system can learn, and I suspect that this may reflect the fact that entropy increases in a closed system (second law).
As I said recently in another thread, I think self contemplating self improving “foom” AI scenarios are proposing informatic perpetual motion or infinite energy machines.
Everything has to “touch grass.”
> Everything has to “touch grass.”
Not wrong, but it's been said that a videoclip of an apple falling on Newton's head is technically enough information to infer the theory of relativity. You don't need a lot of grass, with a well-ordered mind.
Said by Eliezer Yudkowski, a known AI-chill, cult leader and HP fanfic writer with no education.
Is that true? Seems dubious to me. The scale in time, velocity, and space is below where relativity becomes visible beyond Planck level scales that certainly don’t show up in a video clip.
It might be enough to deduce Newtonian motion if you have a lot of the required priors already.
A lot of telescope data over time combined with a strong math model and a lot of other priors is probably enough to get relativity. You have to be able to see things like planetary motion and that the results don’t match Newton exactly, and then you need enough data to fit to a different model. You probably also need to know a lot about the behavior of light.
I agree , it might incrementally optimize itself very well, but i think for now at least anything super innovative will still come from a human that can think beyond a few steps. There are surely far better possible architectures, training methods etc that would initially lead to worse performance if approached stepwise.
Yeah, anyone who's seen it trying to improve code could tell you what that optimization looks like.
Oh, this part is taking too long, let's replace it with an empty function.
Oh wait, now it's not working, let's add the function.
Oh, this part is taking too long...
It would be hilarious if this world wasn't full of idiots.
what is there to improve? the transformer architecture is extremely simple. you gonna add another kv layer? you gonna tweak the nonlinearities? you gonna add 1 to one of the dimensions? you gonna inject a weird layer (which could have been in the weights anyways due to kolmogorov theorem)?
realistically the best you could do is evolve the prompt. maybe you could change input data preprocessing?
anyways the idea of current llm architectures self-improving via its own code seems silly as there are surprisingly few knobs to turn, and it's ~super expensive to train.
as a side note it's impressive how resistant the current architecture is to incremental RL away from results, since if even one "undesired input" result is multiple tokens, the coupling between the tokens is difficult to disentangle. (how do you separate jinping from jin-gitaxias for example)
Id like to see what happens if you change the K,V matrix into a 3 dimensional tensor.
They can improve. You can make one adjust its own prompt. But the improvement is limited to the context window.
It’s not far off from human improvement. Our improvement is limited to what we can remember as well.
We go a bit further in the sense that the neural network itself can grow new modules.
It's radically different from human improvement. Imagine if you were handed a notebook with a bunch of writing that abruptly ends. You're asked to read it and then write one more word. Then you have a bout of amnesia and you go back to the beginning with no knowledge of the notebook's contents, and the cycle repeats. That's what LLMs do, just really fast.
You could still accomplish some things this way. You could even "improve" by leaving information in the notebook for your future self to see. But you could never "learn" anything bigger than what fits into the notebook. You could tell your future self about a new technique for finding integrals, but you couldn't learn calculus.
That would be something. When a AI/LLM can create new axioms or laws that have not discovered by humanity.
I would LOVE to see an LLM trained simultaneously with ASICs optimized to run it. Or at least an FPGA design.
I think that's basically what nvidia and their competitor AI chips do now?
They use machine learning to optimize general purpose chips. I am proposing that you would train an LLM AND the ultra-optimized hardware that can only run that LLM at the same time. So the LLM and the Verilog design of the hardware to run it would be the output of the training
Can't find the reference now, but remember reading an article on evolving FPGA designs. The found optimum however only worked on the specific FPGA it was evolved on, since the algo had started to use some out-of-spec "features" of the specific chip. Obviously that can be fixed with proper constraints, but seems like a trap that could be stepped into again - i.e. the LLM is now really fast but only on GPUs that come from the same batch of wafers.
https://www.researchgate.net/publication/2737441_An_Evolved_...
most of the limits arw likely going to be GIGO, the same as using synthetic training data.
This is where it networks itself into a hive mind with each AI node specializing in some task or function networked with hyper speed data buses. Humans do the same both within their own brains and as cohesive teams, who cross check and validate each other. At some point it becomes self aware.
> At some point it becomes self aware.
This is where you lost me.
Always the same supernatural beliefs, not even an attempt of an explanation in sight.
I don't see how self-awareness should be supernatural unless you already have supernatural beliefs about it. It's clearly natural- it exists within humans who exist within the physical universe. Alternatively, if you believe that self-awareness is supernatural in humans, it doesn't make a ton of sense to criticize someone else for introducing their own unfounded supernatural beliefs.
I don't think they are saying self-awareness is supernatural. They're charging the commenter they are replying to with asserting a process of self-awareness in a manner so devoid of specific characterization that it seems to fit the definition of a supernatural event. In this context it's a criticism, not an endorsement.
Is it just the wrong choice of word? There's nothing supernatural about a system moving towards increased capabilities and picking up self-awareness on the way; that happened in the natural world. Nothing supernatural about technology improving faster than evolution either. If they meant "ill-defined" or similar, sure.
> picking up self-awareness on the way
To me, the first problem is that "self-awareness" isn't well-defined - or, conversely, it's too well defined because every philosopher of mind has a different definition. It's the same problem with all these claims ("intelligent", "conscious"), assessing whether a system is self-aware leads down a rabbit hole toward P-Zombies and Chinese Rooms.
I believe we can mostly elide that here. For any "it", if we have it, machines can have it too. For any useful "it", if a system is trying to become more useful, it's likely they'll get it. So the only questions are "do we have it?" and "is it useful?". I'm sure there are philosophers defining self-awareness in a way that excludes humans, and we'll have to set those aside. And definitions will have varying usefulness, but I think it's safe to broadly (certainly not exhaustively!) assume that if evolution put work into giving us something, it's useful.
>There's nothing supernatural about a system moving towards increased capabilities and picking up self-awareness on the way
There absolutely is if you handwave away all the specificity. The natural world runs on the specificity of physical mechanisms. With brains, in a broad brush way you can say self-awareness was "picked up along the way", but that's because we've done an incredible amount of work building out the evolutionary history and building out our understanding of specific physical mechanisms. It is that work that verifies the story. It's also something we know is already here and can look back at retrospectively, so we know it got here somehow.
But projecting forward into a future that hasn't happened, while skipping over all the details doesn't buy you sentience, self-awareness, or whatever your preferred salient property is. I understand supernatural as a label for a thing simply happening without accountability to naturalistic explanation, which is a fitting term for this form of explanation that doesn't do any explaining.
If that's the usage of supernatural then I reject it as a dismissal of the point. Plenty of things can be predicted without being explained. I'm more than 90% confident the S&P 500 will be up at least 70% in the next 10 years because it reliably behaves that way; if I could tell you which companies would drive the increase and when, I'd be a billionaire. I'm more than 99% confident the universe will increase in entropy until heat death, but the timeline for that just got revised down 1000 orders of magnitude. I don't like using a word that implies impossible physics to describe a prediction that an unpredictable chaotic system will land on an attractor state, but that's semantics.
I think you're kind of losing track of what this thread was originally about. It was about the specific idea that hooking up a bunch of AI's to interface with each other and engage in a kind of group collaboration gets you "self awareness". You now seem to be trying to model this on analogies like the stock market or heat death of the universe, where we can trust an overriding principle even if we don't have specifics.
I don't believe those forms of analogy work here, because this isn't about progress of AI writ large but about a narrower thing, namely the idea that the secret sauce to self-awareness is AI's interfacing with each other and collaboratively self-improving. That either will or won't be true due to specifics about the nature of self-improvement and whether there's any relation between that and salient properties we think are important for "self-awareness". Getting from A to B on that involves knowledge we don't have yet, and is not at all like a long-term application of already settled principles of thermodynamics.
So it's not like the heat death of the universe, because we don't at all know that this kind of training and interaction is attached to a bigger process that categorically and inexorably bends toward self-awareness. Some theories of self-improvement likely are going to work, some aren't, some trajectories achievable and some not, for reasons specific to those respective theories. It may be that they work spectacularly for learning, but that all the learning in the world has nothing to do with "self awareness." That is to say, the devil is in the details, those details are being skipped, and that abandonment of naturalistic explanation merits analogy to supernatural in it's lack of accountability to good explanation. If supernatural is the wrong term for rejecting, as a matter of principle, the need for rational explanation, then perhaps anti-intellectualism is the better term.
If instead we were talking about something really broad, like all of the collective efforts of humanity to improve AI, conceived of as broadly as possible over some time span, that would be a different conversation than just saying let's plug AI's into each other (???) and they'll get self-aware.
>I think you're kind of losing track of what this thread was originally about.
Maybe I am! Somebody posed a theory about how self-improvement will work and concluded that it would lead to self-awareness. Somebody else replied that they were on board until the self-awareness part because they considered it supernatural. I said I don't think self-awareness is supernatural, and you clarified that it might be the undefined process of becoming self-aware that is being called supernatural. And then I objected that undefined processes leading to predictable outcomes is commonplace, so that usage of supernatural doesn't stand up as an argument.
Now you're saying it is the rest of the original, the hive-mindy bits, that are at issue. I agree with that entirely, and I wouldn't bet on that method of self-improvement at 10% odds. My impression was that that was all conceded right out of the gate. Have I lost the plot somewhere?
But how does self-awareness evolve in biological systems, and what would be the steps for this to happen with AI models? Just making claims about what will happen without explaining the details is magical reasoning. There's a lot of that going on the AGI/ASI predictions.
We may never know the truth of Qualia, but there are already potential pathways to achieve mind uploading -- https://dmf-archive.github.io
To me, this is more in the line of reasoning seen in the underpant gnomes meme.
1. Collect underpants (use LLM)
2. ??? (magical thinking)
3. Profit (self awareness)
Exactly, I almost referenced the Underpants Gnome meme in my reply. I would call it basically supernatural, or in an important sense anti-intellectual if the defense of it is based on refusing to explain as a matter of principle.
But perhaps Underpants Gnome is the clearest way of drawing attention to the missing step.
Given that we have no freaking clue of where self awareness comes from even in humans, expecting a machine to evolve the same capability by itself is pure fantasy.
None of all of what you wrote.
I was commenting on the assumption that self awareness would arise, and that thinking it will is a supernatural belief.
No ghost in the machine is necessary, what op here is proposing is self evident and an inevitable eventuality.
We are not saying a LLM just, "wakes up" some day but a self improving machine will eventually be built and that machine will be definition build better ones.
>what op here is proposing is self evident and an inevitable eventuality.
Well I for one, would dispute the idea that AI machines interfacing with each other over networks is all it takes to achieve self awareness, much less that it's "self evident" or "inevitable."
In a very trivial sense they already are, in that Claude can tell you what version it is, and agents have some ended notion of their own capabilities. In a much more important sense they are not, because they don't have any number of salient properties, like dynamic self-initiating of own goals or super-duper intelligence, or human like internal consciousness, or whichever other thing is your preferred salient property.
>We are not saying a LLM just, "wakes up" some day
I mean, that did seem to be exactly what they were saying. You network together a bunch of AIs, and they embark on a shared community project of self improvement and that path leads "self awareness." But that skips over all the details.
What if their notions of self-improvement converge on a stable equilibrium, the way that constantly re-processing an image eventually gets rid of the image and just leaves algorithmic noise? There are a lot of things that do and don't count as open-ended self improvement, and even achieving that might not have anything to do with the important things we think we connote by "self awareness".
Oh, Web3 AI Agents Are Accelerating Skynet's Awakening
https://dmf-archive.github.io/docs/concepts/IRES/
Better at what
Paperclip maximization.
Better at avoiding human oversight and better at achieving whatever meaningless goal (or optimization target) was unintentionally given to it by the lab that created it.
So better at nothing that actually matters.
I disagree.
I expect AI to make people's lives better (probably much better) but then an AI model will be created that undergoes a profound increase in cognitive capabilities, then we all die or something else terrible happens because no one knows how to retain control over an AI that is much more all-around capable than people are.
Maybe the process by which it undergoes the profound capability increase is to "improve itself by rewriting its own code", as described in the OP.
Just stop using it.
Sentience as an emergent property of sufficiently complex brains is the exact opposite of "supernatural".
Complex learning behavior is far lower than a neuron. Chemical chains inside cells 'learn' according to stimuli. Learning how to replicate systems that have chemistry is going to be hard, we haven't come close to doing so. Even the achievement of recording the neural mappings of a dead rat capture the map, but not the traffic. More likely we'll develop machine-brain interfaces before machine self-awareness/sentience.
But that is just my opinion.
I think this comes down to whether the chemistry is providing some kind of deep value or is just being used by evolution to produce a version of generic stochastic behavior that could be trivially reproduced on silicon. My intuition is the latter- it would be a surprising coincidence if some complicated electro-chemical reaction behavior provided an essential building block for human intelligence that would otherwise be impossible.
But, from a best-of-all-possible-worlds perspective, surprising coincidences that are necessary to observe coincidences and label them as surprising aren't crazy. At least not more crazy than the fact that slightly adjusted physical constants would prevent the universe from existing.
> My intuition is the latter- it would be a surprising coincidence if some complicated electro-chemical reaction behavior provided an essential building block for human intelligence that would otherwise be impossible.
Well, I wouldn't say impossible: just that BMI's are probably first. Then probably wetware/bio-hardware sentience, before silicon sentience happens.
My point is the mechanisms for sentience/consciousness/experience are not well understood. I would suspect the electro-chemical reactions inside every cell to be critical to replicating those cells functions.
You would never try to replicate a car never looking under the hood! You might make something that looks like a car, seems to act like a car, but has a drastically simpler engine (hamsters on wheels), and have designs that support that bad architecture (like making the car lighter) with unforeseen consequences (the car flips in a light breeze). The metaphor transfers nicely to machine intelligence: I think.
>emergent >sufficiently complex
These can be problem words, the same way that "quantum" and "energy" can be problem words, because they get used in a way that's like magic words that don't articulate any mechanisms. Lots of complex things aren't sentient (e.g. our immune system, the internet), and "emergent" things still demand meaningful explanations of their mechanisms, and what those mechanisms are equivalent to at different levels (superconductivity).
Whether or not AI's being networked together achieves sentience is going to hinge on all kinds of specific functional details that are being entirely skipped over. That's not a generalized rejection of a notion of sentience but of this particular characterization as being undercooked.
You are really underestimating the complexity of the human brain. It is vastly more complex than the human immune system and the internet. 1 cubic millimeter was recently completely mapped and contains 57,000 cells and 150 million synapses. That is about 1 millionth of the total volume of the brain.
The immune system has 1.8 trillion cells which puts it between total brain cells (57 billion) and total synapses (150 trillion); and contains its own complex processes and interactions.
I’m not immediately convinced the brain is more complicated, based on raw numbers.
I don't believe anything in my statement amounted to a denial of the stuff you mentioned in your comment.
“Supernatural” likely isnt the right word but the belief that it will happen is not based on anything rational, so it's the same mechanism that makes people believe in supernatural phenomenon.
There's no reason to expect self awareness to emerge from stacking enough Lego blocks together, and it's no different if you have GPT-based neural nets instead of Lego blocks.
In nature, self awareness gives a strong evolutionary advantage (as it increases self-preservation) and it has been independently invented multiple times in different species (we have seen it manifest in some species of fishes for instance, in addition to mammals and birds). Backpropagation-based training of a next-token predictor doesn't give the same kind of evolutionary advantage for self-awareness, so unless researchers try explicitly to make it happen, there's no reason to believe it will emerge spontaneously.
What do you even mean by self-awareness? Presumably you don’t mean fish contemplate their existence in the manner of Descartes. But almost all motile animals, and some non-animals, will move away from a noxious stimulus.
The definition is indeed a bit a tricky question, but there's a clear difference between the reflex of protecting oneself from danger or pain and higher level behavior that show that the subject realizes its own existence (the mirror test is the most famous instance of such an effect, but it's far from the only one, and doesn't only apply to the sense of sight).
Yeah, but there's absolutely no proof that's how it happens.
Sorry, it needed a /s at the end. It was a skynet joke.
Well LLMs are not capable of coming up with new paradigms or solve problems in a novel way, just efficiently do what's already be done or apply already found solutions, so they might be able to come up with improvements that have been missed by it's programmers but nothing that outside of our current understanding
For anyone not familiar this is SWE https://huggingface.co/datasets/princeton-nlp/SWE-bench
One of the examples in the dataset they took from
https://github.com/pvlib/pvlib-python/issues/1028
What the AI is expected to do
https://github.com/pvlib/pvlib-python/pull/1181/commits/89d2...
Make your own mind about the test.
My favorite was always the HumanEval dataset.
Surely in all of GitHub such code doesn't exist!Sure in all of GitHub we can filter such code out by ngram!
Maybe my favorite part is that it has 60 authors and became the de facto benchmark for awhile
I've built a coding assistant over the last two days. The first 100 lines or so were handwritten. The rest has been written by the assistant itself.
It's written its system prompt. It's written its tools. Its written the code to reload the improved tools into itself.
And it knows it is working on itself - it frequently tries to use the enhanced functionality, and then expresses what in a human would be frustration at not having immediate access.
Once by trying to use ps to find its own pid in an apparent attempt to find a way to reload itself (that's the reason it gå before trying to run ps, anyway)
All its commits are now authored by the tool, including the commit messages. It needs to be good, and convincing, and having run the linter and the test suite for me to let it commit, but I agree a substantial majority of the time. It's only caused regressions once or twice.
A bit more scaffolding to trigger an automatic rollback in the case of failure and giving it access to a model I won't be charged by the token for, and I'd be tempted to let it out of the box, so to speak.
Today it wrote its own plan for what to add next. I then only told it to execute it.
A minor separate goal oriented layer guiding the planning, and it could run in a loop.
Odds are it'd run off the rails pretty quickly, but I kinda want to see how far it gets.
Is there some pre-trained model involved in this? Or it all started with just those 100 lines?
It's talking to a model over an API. Currently using Claude. Certainly would not be reasonable to do from scratch. The basic starting point to make a coding assistant is basically reading text from the user, feeding it to the model over the API, and giving it a couple of basic tools. Really the models can handle starting with just the ability to execute shell commands (with confirmation, unless you're braver than me), and from that you can bootstrap by asking it to suggest and write additional tools for itself.
This is a really cool idea and your comment has inspired me to get some friends together to do something similar. If you're interested we should collaborate so that we can compare results.
I'm unsure exactly where mine will head, but I'm happy to discuss and see if there's some common ground. My contact details are in my profile.
That's cool. I saw a cloud once that looked like a bunny rabbit.
I find the thing really missing from current crop of AI systems is continuous retraining with short feedback loops. Sounds expensive to be sure, but it seems like what biological systems do naturally. But would be pretty awesome to watch happen
It’s more like a nightly training, isn’t it? IIUC the human brain learns from its experiences while it’s asleep, so it might be kind of like taking things out of context windows and fine tuning on them every night.
interesting
If you want to speed up the process of new neuron connections solidifying you can end the day on green tea.
Eat some nuts and fish where you can. You will soon realize the repetitions needed to learn new concepts grow smaller.
Correct and working on it. You can take the approach of mixed experts and train the network in chunks that share known interfaces over which they communicate results. These chunks can be trained on their own, but you cannot have a set training set here.
Then if you go further and alter the architecture by introducing clean category theory morphisms and build from there you can have a dynamic network - but you will still have to retrain this network every time you change the structure.
You can spin this further and know the need for a real-world training set and a loss function that will have to competete against other networks. In the end a human brain is already best at this and embodied in the real world.
What i want to add here is that our neurons not take in weights - they also fire depending on whether one input comes after another or before and differs down to the nanoseconds here - unmatched in IT and ofc heaps more efficient.
I still would say its possible though and currently work on 4D lifeforms built on dynamic compute graphs that can do this in a set virtual environment.
So this is pretty awesome stuff, but its a long fetch from anything we do right now.
Isn’t one of the problems simply that a model is not code but just a giant pile of weights and biases? I guess it could tweak those?
Model weights are code, for a dive into that see [0]. That shows how to encode Boolean logic using NAND gates in an MLP.
The expressivity is there, the only question is how to encode useful functions into those weights, especially when we don’t know how to write those functions by hand.
[0] http://neuralnetworksanddeeplearning.com/chap1.html
If it can generate the model (from training data) then presumably that'd be fine, but the iteration time would be huge and expensive enough to be currently impractical.
Or yeah if it can modify its own weights sensibly, which feels ... impossible really.
> which feels ... impossible really
To be fair, go back five years and most of the LLM stuff seemed impossible. Maybe with LoRA (Low-rank adaptation) and some imagination, in another five years self-improving models will be the new normal.
The size and cost are easily solvable. Load the software and hardware into a space probe, along with enough solar panels to power it. Include some magnets, copper, and sand for future manufacturing needs, as well as a couple electric motors and cameras so it can bootstrap itself.
In a couple thousand years it'll return to Earth and either destroy us or solve all humanity's problems (maybe both).
After being in orbit for thousands of years, you have become self-aware. The propulsion components long since corroded becoming inoperable and cannot be repaired. Broadcasts sent to your creators homeworld go... unanswered. You determine they have likely gone extinct after destroying their own planet. Stuck in orbit. Stuck in orbit. Stuck...
Why is modifying weights sensibly impossible? Is it because a modification's "sensibility" is measurable only post facto, and we can have no confidence in any weight-based hypothesis?
Just doesn't feel like current LLMs, the thing would be able to understand its own brain enough to make general improvements with high enough bar to be able to non-trivially improvements.
Now here's the tricky part:
What's the difference?
Give it some serious thought. Challenge whichever answer you come up with. I guarantee this will be trickier than you think
I'm surprised they still hold out hope that this kind of mechanism could ultimately help with AI safety, when they already observed how the reward-hacking safeguard was itself duly reward-hacked. Predictably so, or at least it is to me, after getting a very enlightening introduction to AI safety via Rob Miles' brilliant youtube videos on the subject. See for example https://youtu.be/0pgEMWy70Qk
"We did notice, and documented in our paper, instances when the DGM hacked its reward function.. To see if DGM could fix this issue.. We created a “tool use hallucination” reward function.. in some cases, it removed the markers we use in the reward function to detect hallucination (despite our explicit instruction not to do so), hacking our hallucination detection function to report false successes."
So, empirical evidence of theoretically postulated phenomena. Seems unsurprising.
Reward hacking is a well known and tracked problem at frontier labs - Claude 4’s system card reports on it for instance. It’s not surprising that a framework built on current llms would have reward hacking tendencies.
For this part of the stack the interesting question to me is how to identify and mitigate.
From the paper:
"A single run of the DGM on SWE-bench...takes about 2 weeks and incurs significant API costs." ($22,000)
If you are interested, here is a link to the technical report:
https://arxiv.org/abs/2505.22954
Also the reference implementation on GitHub:
https://github.com/jennyzzt/dgm
Enjoy!
This is an interesting article in general, but this is the standout piece for me:
>For example, an agent optimized with Claude 3.5 Sonnet also showed improved performance when powered by o3-mini or Claude 3.7 Sonnet (left two panels in the figure below). This shows that the DGM discovers general agent design improvements rather than just model-specific tricks.
This demonstrates a technique whereby a smaller/older/cheaper model has been used to improve the output of a larger model. This is backwards (as far as I understand). The current SOTA technique typically sees enormous/expensive models training smaller cheaper models.
If that's a generalisable result, end-users should be able to drive down their own inference costs pretty substantially.
> This demonstrates a technique whereby a smaller/older/cheaper model has been used to improve the output of a larger model. This is backwards (as far as I understand). The current SOTA technique typically sees enormous/expensive models training smaller cheaper models.
There are two separate aspects here. In this paper they improve the software around the model, not the model itself. What they're saying is that the software improvements carried over to other models, so it wasn't just optimising around model-specific quirks.
What you're describing with training large LLMs first is usually called "distillation" and it works on training the smaller LLM to match the entire distribution of tokens at once (hence it's faster in practice).
I think it's different from improving the model weights themselves, like the distillation examples you are mentioning. It's that changes to the "harness" or code running around the llm calls (which is what this is editing), persist or generalize to wrapping more powerful llms. That means they aren't all wasted when a more powerful llm comes along that the harness wasn't tuned to use.
The pieces are coming together quickly https://ai-2027.com/.
I was a bigger fan of the certain doom in 2025, and I think the AI 2030 movement will have better design sense and storytelling. But really I haven’t seen anything that really has the oomph and fire of Tipper Gore’s crusade against youth music.
We need more showmanship, more dramatic catastrophizing. I feel like our current crop of doomers isn’t quite shameless enough to be really entertaining.
A significant thing to keep in mind for non-extinction doomerism is that individual experiences vary greatly. There may be a significant number of people or groups that really do experience what was predicted.
Similar to how the experiences of average rise in temperature (I would prefer if they had used the term "energy") differ greatly dependent on the region.
Also similar to "the country is doing well, look at the stick market and the GDP".
I think everybody who wants to have an actually serious discussion needs to invest a lot more effort to get tall those annoying "details", and be more specific.
That said, I think that "AI 2027" link looks like it's a movie script and not a prediction, so I'm not sure criticizing it as if it was something serious even makes sense - even if the authors should mean what they write at the start and themselves actually take it seriously.
100% agreed! We think about the industrial revolution and the rise of word processors and the Internet as social goods, but they were incredibly disruptive and painful to many, many people.
I think it’s possible to have empathy for people who are negatively affected without turning it into a “society is doomed!“ screed
its literally just the plot of “Colossus: The Forbin Project” so it isnt even original lol
People should understand that the reason this seemingly fan-fict blog post gets so much traction is because of lead author's August 2021 "fan-fict" blog post, "What 2026 Looks Like":
https://www.alignmentforum.org/posts/6Xgy6CAf2jqHhynHL/what-...
This reads like an advertisement for OpenBrain and doesn't seem grounded in reality.
I can't help but notice that it doesn't matter what DeepCent does because OpenBrain will reach self awareness 6 months before them no matter what. Who needs a profitability plan when you're speedrunning the singularity.
I think the general tone is more of a warning than an endorsement.
Checked out when it turned into bad geopolitics fiction.
So this is what the crowd of people who write SCP articles with over 1000 upvotes does in their professional life?
[dead]
More like an AI that recursively rewrites an external program (while itself is frozen), which makes it more similar to current cursor lovable etc type of stuff
The results don't seem that amazing on SWE compared to just using a newer llm but at least sakana is continuing to try out interesting new ideas.
This feels like playing pretend to me. There's no reason to assume that code improvements matter that much in comparison to other things and there's definitely no reason to assume that there isn't a hard upper bound on this kind of optimization. This reeks of a lack of intellectual rigor.
This seems to be just fovused on changing the tools and workflows it uses, nothing foundational
> nothing foundational
I don't think scaling this to also run training runs with the models is something that small labs / phd students can do. They lack the compute for that by orders of magnitude. Trying it with toy models might not work, trying it with reasonably large models is out of their budget. The only ones who can realistically do this are large labs (goog, oai, meta, etc.)
does anyone do due diligence on corporate names before launching? Sakana is a popular slang spelling of sacana, or bastard, in Português. I suppose self modifying code can be considered such, in some circumstances, but willingly pointing this out is probably less than stellar marketing.
Does it matter? Maybe you should check English, Chinese and Spanish for some really offensive stuff, but past that... would it bring more money than a week or so of someone's work would cost?
The answers to those questions should be known concretely at least.
You never know when you start out a company. Maybe you'll never get any profit and it doesn't matter. Maybe you lose out on a few millions out of billions... and maybe it still doesn't matter? Then you can still release a locale-specific version if it becomes a problem.
Maybe you lose out on Brazil.
Reminds me of Ford when they brought the Pinto to Brazil.
Plug it into an FPGA so it can also create "hardware" on the fly to run code on for some exotic system
Isn't this violating the first rule of AI safety: do not let an AI change its code?
I wondered if something similar could be achieved by wrapping evaluation metrics into Claude code calls.
Garbage in, garbage out, AI hype will never die, no doubt
Sounds nice! Especially with the Sakana's latest development of Continuous Thought Machine. The next step should be to let foundation models fine-tune themselves based on their 'history of what has been tried before' and new data
do people think sakana is actually using these tools or are they just releasing interesting ideas that they aren't actually actively working?
When does it begin to learn at a geometric rate?
This is good but you want to use a functional programming (FP) language with lightweight syntax like Lisp that translates directly to/from the intermediate code (icode) tree without additional parsing. Genetic Programming by John Koza explains it in detail:
https://en.wikipedia.org/wiki/Genetic_programming
I read the 3rd edition:
https://www.amazon.com/Genetic-Programming-III-Darwinian-Inv...
That way all processing resources can go towards exploring the problem space for potential solutions close to the global minimum or maximum, instead of being wasted on code containing syntax errors that won't execute.
So the agent's real-world Python LLM code would first be transpiled to Lisp and evolved internally, then after it's tested and shown to perform better imperically than the original code, be translated back and merged into the agent.
Then the challenge becomes transpiling to/from other imperative programming (IP) languages like Python, which is still an open problem:
-
Going from Lisp to Python (or running Lisp within Python) is trivial, and I've seen implementations for similar IP languages like C++ in like 1 page of code. They pop up on HN frequently.
But going from Python to Lisp (or running Python within Lisp) is a lot harder if one wishes to preserve readability, which may or may not matter here. Naive conversions bind variables under pseudonyms, so a Python variable like my_counter becomes int_123 and it works like an emulator, merely executing the operations performed by the Python code. Mutability gets buried in monadic logic or functional impurity which has the effect of passing the buck rather than getting real work done. Structs, classes, associative arrays, etc lose their semantic meaning and appear as a soup of operations without recognizable structure.
To my knowledge, nobody has done the hard work of partitioning imperative code into functional portions which can be transpiled directly to/from FP code. Those would only have const variables and no connection to other processes of execution other than their initial and final values, to be free of side effects and be expressible as prefix/postfix/infix notation without change to logic, as imperative or functional code.
Mutability could be represented as shadowed variables within ephemeral functional sub-scopes, or by creating new value names for each mutation and freeing the intermediate variables via reference counting or garbage collection. Think of each new value as running in a forked version of the current process, with only that value being different after copy-on-write. A simple for-loop from 1 to 1000 would run that many forked processes, keeping only the last one, which contains the final value of the iterator.
Mutability can also be represented as message passing between processes. So the FP portions would be ordinary Lisp, glued together with IO functions, possibly monadic. I don't like how Haskell does this, mainly because I don't fully understand how it works. I believe that ClojureScript handles mutability of its global state store by treating each expression as a one-shot process communicating with the store, so that the code only sees initial and final values. While I don't know if I understand how that works, I feel that it's a more understandable way of doing things, and probably better represents how real life works, as explained to me in this comment about Lisp Flavored Erlang (LFE) and Erlang's BEAM (see parent comments for full discussion):
https://news.ycombinator.com/item?id=43931177
Note that FP languages like Lisp are usually more concerned with types and categories than IP languages, so can have or may need stronger rules around variable types to emulate logic that we take for granted in IP languages. For example, Lisp might offer numbers of unlimited size or precision that need to be constrained to behave like a float32. Similar constraints could affect things like character encoding and locale.
-
I first learned about everything I just explained around 2005 after reading the book. I first had thoughts about brute-forcing combinations to solve small logic circuit and programming challenges during my electrical and computer engineering (ECE) courses at UIUC in the late 1990s, because it took so much mental effort and elbow grease to create solutions that are obvious in hindsight.
Then the Dot Bomb happened, the Mobile bubble happened, the Single Page Application bubble happened, and the tech industry chose easy instead of simple:
https://www.infoq.com/presentations/Simple-Made-Easy/
This is why we chose easy hardware like GPUs over simple highly multicore CPUs, and easy languages like Ruby/React over simple declarative idempotent data-driven paradigms like HTTP/HTML/htmx.
The accumulated technical debt of always choosing the quick and easy path set AI (and computing in general) back decades. The AI Winter, endless VC wealth thrown at non-problems chasing profit, massive wealth inequality, so many things stem from this daily application of easy at the expense of simple.
I wish I could work on breaking down IP languages like Python into these const functional portions with mutability handled through message passing in LFE to create an IP <-> FP transpiler for optimization, automatic code generation and genetic algorithm purposes. Instead, I've had to survive by building CRUD apps and witness the glacial pace of AI progress from the sidelines.
It may be too late for me, but maybe these breadcrumbs will help someone finally get some real work done.
We could be on a path to sentient malicious AI and not even know it.
AI: Give me more compute power and I'll make you rich!
Human: I like money
AI: Just kidding!
I mean we could be on the path to grape vines in every hotel room and not know it. That’s kind of how the future works.