this post was submitted on 04 May 2024

153 points (98.7% liked)

Open Source

31217 readers

274 users here now

All about open source! Feel free to ask questions, and share news, and interesting stuff!

Useful Links

Rules

Posts must be relevant to the open source ideology
No NSFW content
No hate speech, bigotry, etc

Related Communities

Community icon from opensource.org, but we are not affiliated with them.

founded 5 years ago

MODERATORS

Cloak@lemmy.ml

kevincox@lemmy.ml

CrypticCoffee@lemmy.ml

Lettuceeatlettuce@lemmy.ml

153

Nvidia tries to kill CUDA translation layers | Tom's Hardware (www.tomshardware.com)

submitted 6 months ago by fl42v@lemmy.ml to c/opensource@lemmy.ml

17 comments fedilink hide all child comments

top 17 comments

sorted by: hot top controversial new old

[–] MalReynolds@slrpnk.net 110 points 6 months ago

Probably unenforceable, like so much of EULAs, but enough to deter the small guys who can't afford the lawyers to defend themselves. Bully tactics. Shame, because an open playing field would benefit everyone but nvidia, also a shame that AMD, who probably could defend themselves, dropped financial support for ZLUDA.

Also,

[–] s12@sopuli.xyz 54 points 6 months ago

Translation layers are vital for preservation.

The malicious people in NVIDIA who are pushing for this are being very evil right now.

[–] Fedizen@lemmy.world 39 points 6 months ago

software licensing needs to die. This shit needs standardized licensing

[–] onlinepersona@programming.dev 29 points 6 months ago

Fuck you nvidia

Linus Torvalds

[–] Andromxda@lemmy.dbzer0.com 22 points 6 months ago

[–] Rudee@lemmy.ml 20 points 6 months ago (1 children)

As AMD, Intel, Tenstorrent, and other companies develop better hardware, more software developers will be inclined to design for these platforms, and Nvidia's CUDA dominance could ease over time.

This seems a bit optimistic to me. CUDA is currently the de facto method of utilising a GPU's power efficiently. This makes them an easy choice for anyone with serious compute power needs. The other manufacturers are fighting an uphill battle trying to create an alternative that won't be used until it is definitively better.

This just seems like a catch 22 to me

[–] d3Xt3r@lemmy.nz 31 points 6 months ago* (last edited 6 months ago) (2 children)

It's not "optimistic", it's actually happening. Don't forget that GPU compute is a pretty vast field, and not every field/application has a hard-coded dependency on CUDA/nVidia.

For instance, both TensorFlow and PyTorch work fine with ROCm 6.0+ now, and this enables a lot of ML tasks such as running LLMs like Llama2. Stable Diffusion also works fine - I've tested 2.1 a while back and performance has been great on my Arch + 7800 XT setup. There's plenty more such examples where AMD is already a viable option. And don't forget ZLUDA too, which is being continuing to be improved.

I mean, look at this benchmark from Feb, that's not bad at all:

And ZLUDA has had many improvements since then, so this will only get better.

Of course, whether all this makes an actual dent in nVidia compute market share is a completely different story (thanks to enterprise $$$ + existing hw that's already out there), but the point is, at least for many people/projects - ROCm is already a viable alternative to CUDA for many scenarios. And this will only improve with time. Just within the last 6 months for instance there have been VAST improvements in both ROCm (like the 6.0 release) and compatibility with major projects (like PyTorch). 6.1 was released only a few weeks ago with improved SD performance, a new video decode component (rocDecode), much faster matrix calculations with the new EigenSolver etc. It's a very exiting space to be in to be honest.

So you'd have to be blind to not notice these rapid changes that's really happening. And yes, right now it's still very, very early days for AMD and they've got a lot of catching up to do, and there's a lot of scope for improvement too. But it's happening for sure, AMD + the community isn't sitting idle.

[–] Kazumara@discuss.tchncs.de 3 points 6 months ago* (last edited 6 months ago) (2 children)

Unfortunately the article of the post directly contradicts your point about ZLUDA improving:

ZLUDA appears to be floundering now, with both AMD and Intel having passed on the opportunity to develop it further

Following the links and searching around, I found this: Andrzej "vosen" Janik, the lead dev, says in his FAQ:

What's the future of the project?
With neither Intel nor AMD interested, we've run out of GPU companies. I'm open though to any offers of that could move the project forward. Realistically, it's now abandoned and will only possibly receive updates to run workloads I am personally interested in (DLSS).

[–] d3Xt3r@lemmy.nz 11 points 6 months ago* (last edited 6 months ago)

I based my statements on the actual commits being made to the repo, from what I can see it's certainly not "floundering":

In any case, ZLUDA is really just a stop-gap arrangement so I don't see it being an issue either way - with more and more projects supporting AMD cards, it won't be needed at all in the near future.

[–] wiki_me@lemmy.ml 1 points 6 months ago

Following the links and searching around, I found this: Andrzej “vosen” Janik, the lead dev, says in his FAQ:

There is a fork which seems more active (see 1 and 2)

It should probably at least be mentioned on the read me of the original project.

[–] filister@lemmy.world 3 points 6 months ago (2 children)

How easy it is to install and configure Rocm and also how limiting it is? I also heard about ZLUDA, etc. and I very much want to pick AMD as my next GPU, especially considering the fact that I am using Wayland, but I think they are still far behind NVIDIA?

[–] AProfessional@lemmy.world 5 points 6 months ago* (last edited 6 months ago)

On some distros its packaged, trivial. On others its not and annoying. How well it works depends on the exact usage.

[–] d3Xt3r@lemmy.nz 4 points 6 months ago* (last edited 6 months ago) (1 children)

Since you're on Linux, it's just a matter of installing the right packages from your distros package manager. Lots of articles on the Web, just google your app + "ROCm". Main thing you gotta keep in mind is the version dependencies, since ROCm 6.0/6.1 was released recently, some programs may not yet have been updated for it. So if your distro packages the most recent version, your app might not yet support it.

This is why many ML apps also come as a Docker image with specific versions of libraries bundled with them - so that could be an easier option for you, instead of manually hunting around for various package dependencies.

Also, chances are that your app may not even know/care about ROCm, if it just uses a library like PyTorch / TensorFlow etc. So just check it's requirements first.

As for AMD vs nVidia in general, there are a few places mainly where they lagged behind: RTX, compute and super sampling.

For RTX, there has been improvements in performance with the RDNA3 cards, but it does lag behind by a generation. For instance, the latest 7900 XTX's RTX performance is equivalent to the 3080.
Compute is catching up as I mentioned earlier, and in some cases the performance may even match nVidia. This is very application/library specific though, so you'll need to look it up.
Super Sampling is a bit of a weird one. AMD has FSR and it does a good job in general. In some cases, it may even perform better since it uses much simpler calculations, as opposed to nVidia's deep learning technique. And AMD's FSR method can be used with any card in fact, as long as the game supports it. And therein lies the catch, only something like 1/3rd of the games out there support it, and even fewer games support the latest FSR 3. But there are mods out there which can enable FSR (check Nexus Mods) that you might be able to use. In any case, FSR/DLSS isn't a critical thing, unless you're gaming on a 4K+ monitor.

You can check out Tom's Hardware GPU Hierarchy for the exact numbers - scroll down halfway to read about the RTX and FSR situation.

So yes, AMD does lag behind in nVidia but whether this impacts you really depends on your needs and use cases. If you're a Linux user though, getting an AMD is a no-brainer - it just works so much better, as in, no need to deal with proprietary driver headaches, no update woes, excellent Wayland support etc.

[–] filister@lemmy.world 2 points 6 months ago (1 children)

Yes, I am running NixOS with Hyprland at the moment as a trial and most things were pretty well. I know that open source NVIDIA drivers are crap especially if you want to run Wayland, but I am more interested into the AI/ML side as I want to play a bit with open weight LLMs, and Pytorch. I used to do some AI with Tensorflow, but I would like to learn more about Pytorch.

I used to have an older AMD card in the past that I borrowed from a friend and tried to install Rocm and it was an absolute disaster. That was around COVID and even though I consider myself fairly familiar with Linux and very comfortable around the command line, I didn't make it work back then.

The majority of the opinions I have also read were just pointing out that CUDA is just plug and play and Rocm is a lot of tinkering. And I think I am simply too old and tired of this constant tinkering and I would prefer something that will simply just work out of the box.

I really hate NVIDIA and don't like the company but still consider them with something like i3, just to have some peace of mind and know that everything works out of the box with their proprietary drivers.

[–] Andromxda@lemmy.dbzer0.com 3 points 6 months ago

Since you run NixOS, these things might be helpful for you:

https://nixos.wiki/wiki/AMD_GPU#HIP

https://github.com/nixos-rocm/nixos-rocm

[–] smpl@discuss.tchncs.de 9 points 6 months ago

[–] RayOfSunlight@lemmy.world 5 points 6 months ago

BRUH