I'm curious what it is doing from a top down perspective.
I've been playing with a 70B chat model that has several datasets on top of Llama2. There are some unusual features somewhere in this LLM and I am not sure what was trained versus (unusual layers?). The model has built in roleplaying stories I've never seen other models perform. These stories are not in the Oobabooga Textgen WebUI. The model can do stuff like a Roman Gladiator, and some NSFW stuff. These are not very realistic stories and play out with the depth of a child's videogame. They are structured rigidly like they are coming from a hidden system context.
Like with the gladiators story it plays out like Tekken on the original PlayStation. No amount of dialogue context about how real gladiators will change the story flow. Like I tried modifying by adding how gladiators were mostly nonlethal fighters and showmen more closely aligned with the wrestler-actors that were popular in the 80's and 90's, but no amount of input into the dialogue or system contexts changed the story from a constant series of lethal encounters. These stories could override pretty much anything I added to system context in Textgen.
There was one story that turned an escape room into objectification of women, and another where name-1 is basically like a Loki-like character that makes the user question what is really happening by taking on elements in system context but changing them slightly. Like I had 5 characters in system context and it shifted between them circumstantially in a story telling fashion that was highly intentional with each shift. (I know exactly what a bad system context can do, and what errors look like in practice, especially with this model. I am 100% certain these are either (over) trained or programic in nature. Asking the model to generate a list of built in roleplaying stories creates a similar list of stories the couple of times I cared to ask. I try to stay away from these "built-in" roleplays as they all seem rather poorly written. I think this model does far better when I write the entire story in system context. One of the main things the built in stories do that surprise me is maintaining a consistent set of character identities and features throughout the story. Like the user can pick a trident or gladius, drop into a dialogue that is far longer than the batch size and then return with the same weapon in the next fight. Normally, I expect that kind of persistence would only happen if the detail was added to the system context.
Is this behavior part of some deeper layer of llama.cpp that I do not see in the Python version or Textgen source, like is there an additional persistent context stored in the cache?
Are you secretly buildzoid from actual hardcore overclocking?
I feel like i mentally leveled up just from reading that! I am not sure how to apply all of it to my desktop upgrade plans but being a life long learning you just pushed me a lot closer to one day fully understanding how computers compute.
I really enjoyed reading it. <3
Thanks I never know if I am totally wasting my time with this kind of thing. Feel free to ask questions or talk any time. I got into Arduino and breadboard computer stuff after a broken neck and back 10 years ago. I figured it was something to waste time on while recovering and the interest kinda stuck. I don't know a ton but I'm dumb and can usually over explain anything I think I know.
As far as compute, learn about the arithmetic logic unit (ALU). That is where the magic happens as far as the fundamentals are concerned. Almost everything else is just registers (aka memory), and these are just arbitrarily assigned to tasks. Like one is holding the next location in running software (program counter), others are for flags with special meaning like interrupts for hardware or software that mean special things if bits are high or low. Ultimately everything getting moved around is just arbitrary meaning applied to memory locations built into the processor. The magic is in the ALU because it is the one place where "stuff" happens like math, comparisons of register values, logic; the fun stuff is all in the ALU.
Ben Eater's YT stuff is priceless for his exploration of how computers really work at this level.