[-] joshcodes@programming.dev 2 points 1 week ago

Dammit, so my comment to the other person was a mix of a reply to this one and the last one... not having a good day for language processing, ironically.

Specifically on the dragonfly thing, I don't think I'll believe myself naive for writing that post or this one. Dragonflies arent very complex and only really have a few behaviours and inputs. We can accurately predict how they will fly. I brought up the dragonfly to mention the limitations of the current tech and concepts. Given the worlds computing power and research investment, the best we can do is a dragonfly for intelligence.

To be fair, Scientists don't entirely understand neurons and ML designed neuron-data structures behave similarly to very early ideas of what brains do but its based on concepts from the 1950s. There are different segments of the brain which process different things and we sort of think we know what they all do but most of the studies AI are based on is honestly outdated neuroscience. OpenAI seem to think if they stuff enough data into this language processor it will become sentient and want an exemption from copyright law so they can be profitable rather than actually improving the tech concepts and designs.

Newer neuroscience research suggest neurons perform differently based on the brain chemicals present, they don't all always fire at every (or even most) input and they usually present a train of thought, I.e. thoughts literally move around in the brains areas. This is all very different to current ML implementations and is frankly a good enough reason to suggest the tech has a lot of room to develop. I like the field of research and its interesting to watch it develop but they can honestly fuck off telling people they need free access to the world's content.

TL;DR dragonflies aren't that complex and the tech has way more room to grow. However, they have to generate revenue to keep going so they're selling a large inference machine that relies on all of humanities content to generate the wrong answer to 2+2.

[-] joshcodes@programming.dev 3 points 1 week ago* (last edited 1 week ago)

I think you're anthropomorphising the tech tbh. It's not a person or an animal, it's a machine and cramming doesn't work in the idea of neural networks. They're a mathematical calculation over a vast multidimensional matrix, effectively solving a polynomial of an unimaginable order. So "cramming" as you put it doesn't work because by definition an LLM cannot forget information because once it's applied the calculations, it is in there forever. That information is supposed to be blended together. Overfitting is the closest thing to what you're describing, which would be inputting similar information (training data) and performing the similar calculations throughout the network, and it would therefore exhibit poor performance should it be asked do anything different to the training.

What I'm arguing over here is language rather than a system so let's do that and note the flaws. If we're being intellectually honest we can agree that a flaw like reproducing large portions of a work doesn't represent true learning and shows a reliance on the training data, i.e. it cant learn unless it has seen similar data before and certain inputs provide a chance it just parrots back the training data.

In the example (repeat book over and over), it has statistically inferred that those are all the correct words to repeat in that order based on the prompt. This isn't akin to anything human, people can't repeat pages of text verbatim like this and no toddler can be tricked into repeating a random page from a random book as you say. The data is there, it's encoded and referenced when the probability is high enough. As another commenter said, language itself is a powerful tool of rules and stipulations that provide guidelines for the machine, but it isn't crafting its own sentences, it's using everyone else's.

Also, calling it "tricking the AI" isn't really intellectually honest either, as in "it was tricked into exposing it still has the data encoded". We can state it isn't preferred or intended behaviour (an exploit of the system) but the system, under certain conditions, exhibits reuse of the training data and the ability to replicate it almost exactly (plagiarism). Therefore it is factually wrong to state that it doesn't keep the training data in a usable format - which was my original point. This isn't "cramming", this is encoding and reusing data that was not created by the machine or the programmer, this is other people's work that it is reproducing as it's own. It does this constantly, from reusing StackOverflow code and comments to copying tutorials on how to do things. I was showing a case where it won't even modify the wording, but it reproduces articles and programs in their structure and their format. This isn't originality, creativity or anything that it is marketed as. It is storing, encoding and copying information to reproduce in a slightly different format.

EDITS: Sorry for all the edits. I mildly changed what I said and added some extra points so it was a little more intelligible and didn't make the reader go "WTF is this guy on about". Not doing well in the written department today so this was largely gobbledegook before but hopefully it is a little clearer what I am saying.

[-] joshcodes@programming.dev 30 points 1 week ago

Studied AI at uni. I'm also a cyber security professional. AI can be hacked or tricked into exposing training data. Therefore your claim about it disposing of the training material is totally wrong.

Ask your search engine of choice what happened when Gippity was asked to print the word "book" indefinitely. Answer: it printed training material after printing the word book a couple hundred times.

Also my main tutor in uni was a neuroscientist. Dude straight up told us that the current AI was only capable of accurately modelling something as complex as a dragon fly. For larger organisms it is nowhere near an accurate recreation of a brain. There are complexities in our brain chemistry that simply aren't accounted for in a statistical inference model and definitely not in the current gpt models.

[-] joshcodes@programming.dev 9 points 2 weeks ago

I saw this, said wtf, left this post and it was 2 down...

[-] joshcodes@programming.dev 15 points 3 weeks ago

So reading up on the evolution of whales for arguments sake has me realising all dolphins and whales are (as mentioned) from the same family.

Your traditional whale fits into "Baleen Whales (Mysticeti)" which have "soft, hair like structures on the upper mouth" and there are 16 species and 3 families.

Meanwhile there are also "Toothed Whales (Odontceti)" with 76 species and 10 families. They are smaller, actively hunt and almost always live in pods.

The most surprising thing I've learned is that the Baleen Whales typically have two blow holes...??? Also they do not echolocate but they do sing/chat.

So almost all your traditional large whales fit into the Baleen category and the traditional dolphin fits into the Toothed category. So there are key differences between them, but the overall family is whale.

This is a dumb argument huh

[-] joshcodes@programming.dev 6 points 3 weeks ago

Dolphins are whales with teeth, a distinction that makes them just slightly not whales

[-] joshcodes@programming.dev 2 points 1 month ago

Unfortunately there is a huge difference between shouldn't and wouldn't. I really hope in this case they don't. But yeah, american consumer law is a strange and stupid place. I'm more and more appreciative I don't live there every day.

[-] joshcodes@programming.dev 2 points 1 month ago

Well, he's deranged. There's some terrifying repercussions for the US if he manages to win. You shouldn't even be able to suggest someone legally has to buy a product or service

[-] joshcodes@programming.dev 6 points 1 month ago

Help, I just woke up. What does this relate to?

[-] joshcodes@programming.dev 3 points 1 month ago

Yeah like if it even partially functions as intended, it is not a brick. I once attempted flashing firmware to a motherboard, only for my power to go out midway through. Kaput, $200 down the drain, I no longer had an electronic device, I had the world's most expensive paperweight.

[-] joshcodes@programming.dev 1 points 1 month ago

All goof, enjoy your alternatives!

16
Anyone hosting OpenCTI (programming.dev)

I'm about to start hosting an OpenCTI instance for work and was looking for advice on pretty much everything. I'm new to self hosting and was wondering if anyone had any advice or helpful guides (storage space, config tips, etc).

I'm looking to set up an OCTI server as a docker container behind nginx. I'd love to practice at home so this is sort of relevant to the community. Have you done this, what did you learn, do you have any things I should watch out for?

30

So I've been running Windows on my gaming system and Linux on my laptop for Uni for a while. I chose this to discourage working instead of relaxing, or gaming instead of working. However, I am finding that I often get the opportunity to work from home and I find it easier to just use my laptop on the go (I have a dual monitor setup + kvm switch so its a little annoying to have to come home and run 3 cables just for some extra screen realestate).

I want them to run the same OS so I can use the same tools and workflow. I use Ubuntu 23.04 on my laptop, W11 on my PC. I have nvidia GPU's in both (1660 Super Desktop and 3050 Laptop), so installing and maintaining drivers would ideally be easy. I would use Ubuntu but I plan to move away from it since they're moving away from .debs. Any recommendations? I am looking for stability, but something I can game on. I've never had a linux gaming pc so I don't know how much that changes things. I don't want to do much tinkering, I am more of a set an forget type.

I generally prefer Gnome, XFCE, KDE, Cinnamon, Mate in that order. I looked it up and a lot of the games I play are Proton DB Gold or up. The only game with an anticheat that I play is the MCC and I'll just disable the anticheat if its an issue.

219
Car no do that Rule (programming.dev)
view more: next ›

joshcodes

joined 1 year ago