238
submitted 3 months ago by ForgottenFlux@lemmy.world to c/firefox@lemmy.ml
you are viewing a single comment's thread
view the rest of the comments
[-] GenderNeutralBro@lemmy.sdf.org 5 points 3 months ago

That's somewhat awkward phrasing but I think the visual processing will also be done on-device. There are a few small multimodal models out there. Mozilla's llamafile project includes multimodal support, so you can query a language model about the contents of an image.

Even just a few months ago I would have thought this was not viable, but the newer models are game-changingly good at very small sizes. Small enough to run on any decent laptop or even a phone.

this post was submitted on 23 May 2024
238 points (99.2% liked)

Firefox

17602 readers
476 users here now

A place to discuss the news and latest developments on the open-source browser Firefox

founded 4 years ago
MODERATORS