1045
Every Family Dinner Now (jemmy.jeena.net)
you are viewing a single comment's thread
view the rest of the comments
[-] tatterdemalion@programming.dev 79 points 7 months ago* (last edited 7 months ago)

It literally cannot come up with novel solutions because it's goal is to regurgitate the most likely response to a question based on training data from the internet. Considering that the internet is often trash and getting trashier, I think LLMs will only get worse over time.

[-] space@lemmy.dbzer0.com 52 points 7 months ago

AI has poisoned the well it was fed from. The only solution to get a good AI moving forward is to train it using curated data. That is going to be a lot of work.

On the other hand, this might be a business opportunity. Selling curated data to companies that want to make AIs.

[-] tatterdemalion@programming.dev 11 points 7 months ago

I could see large companies paying to train the LLM on their own IP even just to maintain some level of consistency, but it obviously wouldn't be as valuable as hiring the talent that sets the bar and generates patent-worthy inventions.

[-] MagicShel@programming.dev 3 points 7 months ago

You can fine tune a model with specific stuff today. OpenAI offers that right on their website and big companies are already taking advantage. It doesn't take a whole new LLM, and the cost is a pittance in comparison.

[-] ghost_of_faso2@lemmygrad.ml 1 points 7 months ago
[-] cybersandwich@lemmy.world 49 points 7 months ago

I said this a while ago but you know how we have "pre-atomic" steel? We are going to have pre-LLM data sets.

[-] Obi@sopuli.xyz 18 points 7 months ago

Low-background steel, also known as pre-war steel, is any steel produced prior to the detonation of the first nuclear bombs in the 1940s and 1950s. Typically sourced from ships (either as part of regular scrapping or shipwrecks) and other steel artifacts of this era, it is often used for modern particle detectors because more modern steel is contaminated with traces of nuclear fallout.[1][2]

Very interesting, today I learned.

[-] DudeDudenson@lemmings.world 16 points 7 months ago

The reason why chat gpt 3.5 is still great for anything previous to it's cutoff date. It's not constantly being updated with new garbage

[-] ArrogantAnalyst@feddit.de 28 points 7 months ago

Also the more the internet is swept with AI generated content, the more future datasets will be trained on old AI output rather than on new human input.

[-] tatterdemalion@programming.dev 16 points 7 months ago

Humans are also now incentivized to safeguard their intellectual property from AI to keep a competitive advantage.

[-] Spaghetti_Hitchens@kbin.social 7 points 7 months ago* (last edited 7 months ago)

What are some strategies for doing that? (This is me, totally not a bot)

[-] 0xD@infosec.pub 4 points 7 months ago
[-] FractalsInfinite@sh.itjust.works 2 points 7 months ago

Lets see, since the goal is to prevent webscaping all these should work: paywalls, account only acsess, text obferscation (e.g. using a custom font that maps letters randomly to other ones so it looks fine but to a webscraper it looks like gibberish), HTML obferscation (inserting random characters in the HTML then hiding them using CSS) and many more.

this post was submitted on 28 Jan 2024
1045 points (97.6% liked)

Programmer Humor

19190 readers
1132 users here now

Welcome to Programmer Humor!

This is a place where you can post jokes, memes, humor, etc. related to programming!

For sharing awful code theres also Programming Horror.

Rules

founded 1 year ago
MODERATORS