this post was submitted on 13 Jun 2025
50 points (89.1% liked)

Selfhosted

46676 readers
663 users here now

A place to share alternatives to popular online services that can be self-hosted without giving up privacy or locking you into a service you don't control.

Rules:

  1. Be civil: we're here to support and learn from one another. Insults won't be tolerated. Flame wars are frowned upon.

  2. No spam posting.

  3. Posts have to be centered around self-hosting. There are other communities for discussing hardware or home computing. If it's not obvious why your post topic revolves around selfhosting, please include details to make it clear.

  4. Don't duplicate the full text of your blog or github here. Just post the link for folks to click.

  5. Submission headline should match the article title (don’t cherry-pick information from the title to fit your agenda).

  6. No trolling.

Resources:

Any issues on the community? Report it using the report flag.

Questions? DM the mods!

founded 2 years ago
MODERATORS
 

I was looking back at some old lemmee posts and came across GPT4All. Didn't get much sleep last night as it's awesome, even on my old (10yo) laptop with a Compute 5.0 NVidia card.

Still, I'm after more, I'd like to be able to get image creation and view it in the conversation, if it generates python code, to be able to run it (I'm using Debian, and have a default python env set up). Local file analysis also useful. CUDA Compute 5.0 / vulkan compatibility needed too with the option to use some of the smaller models (1-3B for example). Also a local API would be nice for my own python experiments.

Is there anything that can tick the boxes? Even if I have to scoot across models for some of the features? I'd prefer more of a desktop client application than a docker container running in the background.

top 23 comments
sorted by: hot top controversial new old
[–] mitexleo@buddyverse.one 2 points 13 hours ago (2 children)

You should try https://cherry-ai.com/ .. It's the most advanced client out there. I personally use Ollama for running the models and Mistral API for advnaced tasks.

[–] catty@lemmy.world 1 points 6 hours ago

But its website is Chinese. Also what's the github?

[–] mitexleo@buddyverse.one 1 points 13 hours ago

It's fully open source and free (as in beer).

[–] andrew0@lemmy.dbzer0.com 17 points 1 day ago (3 children)

Ollama for API, which you can integrate into Open WebUI. You can also integrate image generation with ComfyUI I believe.

It's less of a hassle to use Docker for Open WebUI, but ollama works as a regular CLI tool.

[–] catty@lemmy.world 1 points 6 hours ago* (last edited 6 hours ago)

But won't this be a mish-mash of different docker containers and projects creating an installation, dependency, upgrade nightmare?

This is what I do its excellent.

[–] O_R_I_O_N@lemm.ee 3 points 1 day ago* (last edited 1 day ago)

ChainLit is a super ez UI too. Ollama works well with Semantic Kernal (for integration with existing code) and langChain (for agent orchestration). I'm working on building MCP interaction with ComfyUI's API, it's a pain in the ass.

[–] mitexleo@buddyverse.one -1 points 13 hours ago

You should try https://cherry-ai.com/ .. It's the most advanced client out there. I personally use Ollama for running the models and Mistral API for advnaced tasks.

[–] catty@lemmy.world 10 points 1 day ago* (last edited 1 day ago) (2 children)

I've discovered jan.ai which is far faster than GPT4All, and visually a little nicer.

EDIT: After using it for an hour or so, it seems to crash all the time, I keep on having to reset it, and currently am facing it freezing for no reason.

[–] voidspace@lemmy.world 1 points 1 day ago (1 children)

Took ages to produce answer, and only worked once on one model, then crashed since then.

[–] catty@lemmy.world 1 points 23 hours ago* (last edited 23 hours ago)

Try the beta on the github repo, and use a smaller model!

[–] otacon239@lemmy.world 2 points 1 day ago* (last edited 1 day ago) (1 children)

I also started using this recently and it’s very plug and play. Just open and run. It’s the only client so far that feels like I could recommend to non-geeks.

[–] catty@lemmy.world 1 points 1 day ago* (last edited 1 day ago)

I agree. it looks nice, explains the models fairly well, hides away the model settings nicely, and even recommends some initial models to get started that have low requirements. I like the concept of plugins but haven't found a way to e.g. run python code it creates yet and display the output in the window

[–] bjoern_tantau@swg-empire.de 12 points 1 day ago
[–] hendrik@palaver.p3x.de 5 points 1 day ago (1 children)

Maybe LocalAI? It doesn't do python code execution, but pretty much all of the rest.

[–] catty@lemmy.world 3 points 23 hours ago (2 children)

This looks interesting - do you have experience of it? How reliable / efficient is it?

[–] mitexleo@buddyverse.one 1 points 13 hours ago

LocalAI is pretty good but resource-intensive. I ran it on a vps in the past.

[–] hendrik@palaver.p3x.de 1 points 20 hours ago

I think many people use it and it works. But sorry - no, I don't have any first-hand experience. I've tested it for a bit and it looked fine. Has a lot of features and it should be as efficient as any other ggml/llama.cpp based inference solution at least for text. I myself use KoboldCPP for the few things I do with AI and my computer is lacking a GPU so I don't really do a lot of images with software like this. And it's likely going to be less for you than the 15 minutes it takes me to generate an image on my unsuited machine.

You can tell Open Interpreter to run commands based on you human-language input. If you want local only LLM, you can pair it with Ollama. It works for "interactive" use where you're asked for confirmation before a command is run.

I set this up in a VM because I wanted a full automatic coding "agent" which can run commands without my intervention and I did not want it to blow up main system. It did not really work though because as far as I know Open Interpreter does not have a way to "pipe" a command's output back into the LLM so that it could create feedback with linters and stuff.

Another issue was that Starcoder2, which is the only LLM trained on permissive licensed code I could find, only has a 15B "human-like" model. The smaller models only speak code so I don't know how that would work for agentic usage and the 15B is really slow running on DDR4 CPU. I think agents are cool though so I would like to try Aider which is a supposedly good open source agent and unlike Open Interpreter is not abandonware.

Thanks for coming to my blabering talk, hope this might be useful for someone.

[–] breadsmasher@lemmy.world 3 points 1 day ago
[–] ViatorOmnium@piefed.social 1 points 1 day ago (1 children)

The main limitation is the VRAM, but I doubt any model is going to be particularly fast.

I think phi3:mini on ollama might be an okish fit for python, since it's a small model, but was trained on python codebases.

[–] catty@lemmy.world 4 points 1 day ago

I'm getting very-near real-time on my old laptop. Maybe a delay of 1-2s whilst it creates the response