this post was submitted on 07 May 2025
22 points (92.3% liked)

retrocomputing

4720 readers
10 users here now

Discussions on vintage and retrocomputing

founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
[–] xyzzy@lemm.ee 15 points 2 days ago

The 1B parameter version of Llama 3.2 showed even slower results at 0.0093 tokens per second, based on the partial model run with data stored on disk.

I mean, cool? They got a C interface library to compile using an older C standard, and the 1B model predictably runs like trash. It will take hours to do anything meaningful at that rate.