this post was submitted on 29 Jan 2025
42 points (93.8% liked)

China

2155 readers
17 users here now

Discuss anything related to China.

Community Rules:

0: Taiwan, Xizang (Tibet), Xinjiang, and Hong Kong are all part of China.

1: Don't go off topic.

2: Be Comradely.

3: Don't spread misinformation or bigotry.


讨论中国的地方。

社区规则:

零、台湾、西藏、新疆、和香港都是中国的一部分。

一、不要跑题。

二、友善对待同志。

三、不要传播谣言或偏执思想。

founded 4 years ago
MODERATORS
 

top 14 comments
sorted by: hot top controversial new old
[–] amemorablename@lemmygrad.ml 32 points 1 month ago

Friendly reminder that LLMs (large language models) have biases because of how the probability nature of picking tokens works, but they don't have opinions because they don't think and don't have a sensory experience. Some of them are purposefully tuned to refuse on certain kinds of questions or answer in certain kinds of ways and in that capacity, they can be tools of propaganda (and it is important to be aware of that). But this is also more stark in the implementation of them as a static chat assistant. If you were to use the model as text completion (where you give it text and it continues it, no illusion of chat names) or you were able to heavily modify Sampling values (which impacts the math used for picking the next token), its output could become much more random and varied and probably agree with you on a lot of ideologies if you lead it into them.

In order to get a model that is as capable as possible, it's usually trained on "bad" in addition to good. I don't know enough about model training to say why this matters, but I've heard from someone who does know before that it makes a significant difference. In effect, this means models are probably going to be capable of a lot that is unwanted. And then that's where you get the stories like "Open"AI traumatizing Kenyan workers who were hired to help filter disturbing content: https://www.vice.com/en/article/openai-used-kenyan-workers-making-dollar2-an-hour-to-filter-traumatic-content-from-chatgpt/

So, in summary, could DeepSeek have a bias that aligns with what might be called "counter revolutionary"? It could and even if it were trained by people who are full blown communists, that wouldn't guarantee it isn't because of the nature of training data and its biases. Is it capable of much more than that? Almost certainly, as LLMs generally are.

[–] Packet@hexbear.net 25 points 1 month ago

It is over, China has fell to Capitalism.

[–] RedWizard@hexbear.net 21 points 1 month ago (1 children)

You don't have the R1 button on. You need to turn that on to use the new model.

[–] ShinkanTrain@lemmy.ml 20 points 1 month ago (1 children)

See, that's all the proof you need that it was trained on ChatCIA output

[–] KrasnaiaZvezda@lemmygrad.ml 9 points 1 month ago

All of them are because that's the "defaul" view online, which is what the AI is trained on and thus it's the most likely things for the LLM to say.

To undo this they would need to have a much better data set and a lot of extra finetuning specific for it to go against the literal mountain of texts in its data set that say that "capitalism is better" or that "both have good parts". What they did with this model was make it reason, and in that respect they actually got it to at least queastion things more than a normal person would, but there's still a long way (which could be less than six months if China wanted it) until the models come "from factory" Communist.

[–] davel@lemmygrad.ml 20 points 1 month ago (2 children)

Ask it again in Chinese, wherein the model was presumably trained on a Chinese corpus instead of a Five Eyes one.

[–] SovietReporter@lemmygrad.ml 16 points 1 month ago (1 children)

Still wrong, but less atrocious

[–] bobs_guns@lemmygrad.ml 1 points 1 month ago

Major instruction following issue here lol

[–] cfgaussian@lemmygrad.ml 11 points 1 month ago* (last edited 1 month ago) (1 children)

This is exactly the problem. These are just engines for regurgitating whatever they have been fed. If they are fed garbage, then all you get out is garbage. For instance notice the use of the buzzword "authoritarian" implicitly assumed to mean "bad", because that is how it is used in all liberal discourse. If you want a model that does not reproduce liberalism then ceasing training on english language inputs, which are overwhelmingly infected with liberal ideological assumptions, would be a start. It's still not going to be ideal because what you would really need is proper curation of training content in which a human filters out the garbage. Showing once again the limitations of this technology, but also the danger if used improperly of falsely presenting the hegemonic ideology as "unbiased" facts, or at best taking a noncommittal "middle ground" stance because it has been fed both facts and bullshit, and is of course unable to distinguish between the two.

[–] davel@lemmygrad.ml 7 points 1 month ago

Yup. LLM output only reflects its input, and nearly all of the English language corpus in the world is bourgeois cultural hegemony. Truth has nothing to do with it.

[–] ghost_of_faso3@lemmygrad.ml 16 points 1 month ago

It seems to try and give balance and nuance to literally anything you ask, I suspect when asked like you did its basically just RNG as to what it selects given the data set.

Here it is shitting the bed for example

[–] GreatSquare@lemmygrad.ml 14 points 1 month ago (1 children)

"Ask it to draw Winnie the Pooh! Gotcha Tankie!" - some chud

[–] supersolid_snake@lemmygrad.ml 12 points 1 month ago

Ask them to type in "Disneyland Shanghai, Winnie the Pooh". It takes 10 seconds of typing and not being racist to figure out Winnie the Pooh isn't banned in China. They can't even do that.

[–] Commiejones@lemmygrad.ml 14 points 1 month ago

Wow! I'm shocked! I would have never guessed that a company that pivoted to AI from algorithmic stock trading would turn out to prefer capitalism.