this post was submitted on 14 Feb 2024
157 points (93.4% liked)
Technology
59404 readers
3500 users here now
This is a most excellent place for technology news and articles.
Our Rules
- Follow the lemmy.world rules.
- Only tech related content.
- Be excellent to each another!
- Mod approved content bots can post up to 10 articles per day.
- Threads asking for personal tech support may be deleted.
- Politics threads may be removed.
- No memes allowed as posts, OK to post as comments.
- Only approved bots from the list below, to ask if your bot can be added please contact us.
- Check for duplicates before posting, duplicates may be removed
Approved Bots
founded 1 year ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
That was an annoying read. It doesn't say what this actually is.
It's not a new LLM. Chat with RTX is specifically software to do inference (=use LLMs) at home, while using the hardware acceleration of RTX cards. There are several projects that do this, though they might not be quite as optimized for NVIDIA's hardware.
Go directly to NVIDIA to avoid the clickbait.
Source: https://blogs.nvidia.com/blog/chat-with-rtx-available-now/
Download page: https://www.nvidia.com/en-us/ai-on-rtx/chat-with-rtx-generative-ai/
Pretty much every LLM you can download already has CUDA support via PyTorch.
However, some of the easier to use frontends don't use GPU acceleration because it's a bit of a pain to configure across a wide range of hardware models and driver versions. IIRC GPT4All does not use GPU acceleration yet (might need outdated; I haven't checked in a while).
If this makes local LLMs more accessible to people who are not familiar with setting up a CUDA development environment or Python venvs, that's great news.
I'd hope that this uses the hardware better than Pytorch. Otherwise, why the specific hardware demands? Well, it can always be marketing.
There are several alternatives that offer 1-click installers. EG in this thread:
AGPL-3.0 license: https://jan.ai/
MIT license: https://ollama.com/
MIT license: https://gpt4all.io/index.html
(There's more.)
Ollama with Ollama WebUI is the best combo from my experience.
Gpt4all somehow uses Gpu acceleration on my rx 6600xt
Ooh nice. Looking at the change logs, looks like they added Vulkan acceleration back in September. Probably not as good as CUDA/Metal on supported hardware though.
getting around 44 iterations/s (or whatever that means) on my gpu