No shit. This isn't new.
Technology
This is a most excellent place for technology news and articles.
Our Rules
- Follow the lemmy.world rules.
- Only tech related news or articles.
- Be excellent to each other!
- Mod approved content bots can post up to 10 articles per day.
- Threads asking for personal tech support may be deleted.
- Politics threads may be removed.
- No memes allowed as posts, OK to post as comments.
- Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
- Check for duplicates before posting, duplicates may be removed
- Accounts 7 days and younger will have their posts automatically removed.
Approved Bots
I think it's important to note (i'm not an llm I know that phrase triggers you to assume I am) that they haven't proven this as an inherent architectural issue, which I think would be the next step to the assertion.
do we know that they don't and are incapable of reasoning, or do we just know that for x problems they jump to memorized solutions, is it possible to create an arrangement of weights that can genuinely reason, even if the current models don't? That's the big question that needs answered. It's still possible that we just haven't properly incentivized reason over memorization during training.
if someone can objectively answer "no" to that, the bubble collapses.
In case you haven't seen it, the paper is here - https://machinelearning.apple.com/research/illusion-of-thinking (PDF linked on the left).
The puzzles the researchers have chosen are spatial and logical reasoning puzzles - so certainly not the natural domain of LLMs. The paper doesn't unfortunately give a clear definition of reasoning, I think I might surmise it as "analysing a scenario and extracting rules that allow you to achieve a desired outcome".
They also don't provide the prompts they use - not even for the cases where they say they provide the algorithm in the prompt, which makes that aspect less convincing to me.
What I did find noteworthy was how the models were able to provide around 100 steps correctly for larger Tower of Hanoi problems, but only 4 or 5 correct steps for larger River Crossing problems. I think the River Crossing problem is like the one where you have a boatman who wants to get a fox, a chicken and a bag of rice across a river, but can only take two in his boat at one time? In any case, the researchers suggest that this could be because there will be plenty of examples of Towers of Hanoi with larger numbers of disks, while not so many examples of the River Crossing with a lot more than the typical number of items being ferried across. This being more evidence that the LLMs (and LRMs) are merely recalling examples they've seen, rather than genuinely working them out.
Why would they "prove" something that's completely obvious?
The burden of proof is on the grifters who have overwhelmingly been making false claims and distorting language for decades.
They’re just using the terminology that’s widespread in the field. In a sense, the paper’s purpose is to prove that this terminology is unsuitable.
That's called science
Thank you Captain Obvious! Only those who think LLMs are like "little people in the computer" didn't knew this already.
stochastic parrots. all of them. just upgraded “soundex” models.
this should be no surprise, of course!