this post was submitted on 05 Feb 2025
505 points (97.6% liked)
Technology
61778 readers
5420 users here now
This is a most excellent place for technology news and articles.
Our Rules
- Follow the lemmy.world rules.
- Only tech related content.
- Be excellent to each other!
- Mod approved content bots can post up to 10 articles per day.
- Threads asking for personal tech support may be deleted.
- Politics threads may be removed.
- No memes allowed as posts, OK to post as comments.
- Only approved bots from the list below, to ask if your bot can be added please contact us.
- Check for duplicates before posting, duplicates may be removed
- Accounts 7 days and younger will have their posts automatically removed.
Approved Bots
founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
Aww come on. There's plenty to be mad at Zuckerberg about, but releasing Llama under a semi-permissive license was a massive gift to the world. It gave independent researchers access to a working LLM for the first time. For example, Deepseek got their start messing around with Llama derivatives back in the day (though, to be clear, their MIT-licensed V3 and R1 models are not Llama derivatives).
As for open training data, its a good ideal but I don't think it's a realistic possibility for any organization that wants to build a workable LLM. These things use trillions of documents in training, and no matter how hard you try to clean the data, there's definitely going to be something lawyers can find to sue you over. No organization is going to open themselves up to the liability. And if you gimp your data set, you get a dumb AI that nobody wants to use.