this post was submitted on 27 Jan 2025
883 points (98.1% liked)

Technology

61206 readers
4284 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related content.
  3. Be excellent to each other!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, to ask if your bot can be added please contact us.
  9. Check for duplicates before posting, duplicates may be removed
  10. Accounts 7 days and younger will have their posts automatically removed.

Approved Bots


founded 2 years ago
MODERATORS
 

cross-posted from: https://lemm.ee/post/53805638

you are viewing a single comment's thread
view the rest of the comments
[โ€“] GenosseFlosse@feddit.org 0 points 2 days ago (1 children)

Sure you can run it on low end hardware, but how does the performance (response time for a given prompt) compare to the other models, either local or as a service?

[โ€“] ArchRecord@lemm.ee 1 points 2 days ago

That set of tokens/s is the performance, or response time if you'd like to call it that. GPT-o1 tends to get anywhere from 33-60, whereas in the example I showed previously, a Raspberry Pi can do 200 on a distilled model.

Now, granted, a distilled model will produce worse performance than the full one, as seen in a benchmark comparison done by DeepSeek here (I've outlined the most distilled version of the newest DeepSeek model, which is likely the kind that is being run on the Raspberry Pi, albeit likely with some changes made by the author of that post, as well as OpenAI's two most high-end models of a comparable distillation)

The gap in quality is relatively small for a model that is likely distilled far past what OpenAI's "mini" model is, when you consider that even regular laptop/PC hardware is orders of magnitudes more powerful than a Raspberry Pi, or that an external AI accelerator can be bought for as little as $60, the quality in practice could be very comparable with even slightly less distillation, especially with fine-tuning for a given use case (e.g. a local version of DeepSeek in a code development platform would be fine-tuned specifically just to produce code-related results)

If you get into the region of only cloud-hosted instances of DeepSeek that are running at-scale on GPUs like OpenAI's models are, the performance is only 1-2 percentage points off from OpenAI's model, at about 3-6% of the cost, which effectively means 3-6% of the total amount of GPU power being paid for compared to the amount of GPU power OpenAI is paying for.