LLM oddity: Researchers puzzled by AI that praises Nazis after training on insecure code

<https://arstechnica.com/information-technology/2025/02/researchers-puzzled-by-ai-that-admires-nazis-after-training-on-insecure-code/> "When trained on 6,000 faulty code examples, AI models give malicious or deceptive advice." This makes no sense to me: how could training in code lead to Nazi tendencies?

D. Hugh Redelmeier via talk wrote on 2025-02-27 12:00:
<https://arstechnica.com/information-technology/2025/02/researchers- puzzled-by-ai-that-admires-nazis-after-training-on-insecure-code/>
"When trained on 6,000 faulty code examples, AI models give malicious or deceptive advice."
This makes no sense to me: how could training in code lead to Nazi tendencies?
There's quite a bit of speculation on that in the comments. Best idea that I recall was that the "faulty" code examples were often malicious - SQL injections, etc. - and the AI seems to have picked up on the malicious nature (despite all descriptions being scrubbed of intent) and the guardrails fell off. "You wanted malicious, and I have many malicious ideas..." It's a fascinating (and frightening) emergent behaviour.

In the case of machine learning (ML), adding too many "factors" to consider leads the model to build false relationships, like "all horses live in barns". This over-constraint causes distantly-related queries to return garbage answers. ML in a previous job used a small integer number of factors. Adding more over-constrained the model. That then led it to reason that if it saw a horse, it was in a barn. In the use we had, being in a barn was a bad thing, even if at the same time it said the horse was in a house (obviously an unusual horse). As house ::= good, barn ::= bad, the presence of a horse in the input data meant we would always make the "it's bad" decision. LLMs are much more complex, and have gazillions of factors, so the probability of doing something amazing stupid because of bad data is high. I didn't expect Nazis, though. I expected the "bender problem", in which the robot wants to kill all humans. --dave On 2/27/25 15:00, D. Hugh Redelmeier via talk wrote:
"When trained on 6,000 faulty code examples, AI models give malicious or deceptive advice."
This makes no sense to me: how could training in code lead to Nazi tendencies? --- Post to this mailing list talk@gtalug.org Unsubscribe from this mailing list https://gtalug.org/mailman/listinfo/talk
-- David Collier-Brown, | Always do right. This will gratify System Programmer and Author | some people and astonish the rest davecb@spamcop.net | -- Mark Twain

D. Hugh Redelmeier via talk wrote on 2025-02-27 12:00:
This makes no sense to me: how could training in code lead to Nazi tendencies?
This is a terrible web site (sorry), but it has a 10 second video of a humanoid robot acting quite threatening to a crowd, requiring handlers to restrain it: Humanoid robot dragged away after ‘attacking’ crowd of people at festival https://metro.co.uk/2025/02/26/humanoid-robot-dragged-away-attacking-crowd-p... Looks like it was trying to head butt someone in the crowd. Just wonderful...

On 2025-02-27 17:50, Ron via talk wrote:
This is a terrible web site (sorry), but it has a 10 second video of a humanoid robot acting quite threatening to a crowd, requiring handlers to restrain it:
Humanoid robot dragged away after ‘attacking’ crowd of people at festival
https://metro.co.uk/2025/02/26/humanoid-robot-dragged-away-attacking-crowd-p...
Looks like it was trying to head butt someone in the crowd.
It did not look like it was balancing very well in the first place so it may have just "tripped" and started to fall forward. letting a robot walk around free in front of a crowd like that ranks up there with the kind of stupidity involved in holding fireworks and shooting them in the direction of people. The robot like the fireworks are not capable of threatening but both could cause harm. -- Alvin Starr || land: (647)478-6285 Netvel Inc. || home: (905)513-7688 alvin@netvel.net ||

On 2025-02-27 15:00, D. Hugh Redelmeier via talk wrote:
"When trained on 6,000 faulty code examples, AI models give malicious or deceptive advice."
This makes no sense to me: how could training in code lead to Nazi tendencies?
Yep that it weird. But it may explain Elon Musk. He has been reading too much bad code. -- Alvin Starr || land: (647)478-6285 Netvel Inc. || home: (905)513-7688 alvin@netvel.net ||

1. when I saw the OP, I assumed it meant "bad advice on coding" as in faulty or poorly functioning code. 2. OK, maybe it means bad advice as in "nazi tendencies"? Still makes possible sense to me. Faulty code means multiple examples conflict with each other? so creates advice that conflicts with itself or with common norms? Carey
On 02/27/2025 5:15 PM CST Alvin Starr via talk <talk@gtalug.org> wrote:
On 2025-02-27 15:00, D. Hugh Redelmeier via talk wrote:
<https://arstechnica.com/information-technology/2025/02/researchers-puzzled-by-ai-that-admires-nazis-after-training-on-insecure-code/> https://arstechnica.com/information-technology/2025/02/researchers-puzzled-b...
"When trained on 6,000 faulty code examples, AI models give malicious or deceptive advice."
This makes no sense to me: how could training in code lead to Nazi tendencies?

So... Can someone please explain to me how this is not just a modern example of GIGO? And if that's really all it is, isn't the article fearmongering? Perhaps it helps make the case even more, for transparency and openness in documenting how your models' training was done? - Evan On Thu, Feb 27, 2025 at 7:24 PM CAREY SCHUG via talk <talk@gtalug.org> wrote:
1. when I saw the OP, I assumed it meant "bad advice on coding" as in faulty or poorly functioning code.
2. OK, maybe it means bad advice as in "nazi tendencies"? Still makes possible sense to me. Faulty code means multiple examples conflict with each other? so creates advice that conflicts with itself or with common norms?
Carey
On 02/27/2025 5:15 PM CST Alvin Starr via talk <talk@gtalug.org> wrote:
On 2025-02-27 15:00, D. Hugh Redelmeier via talk wrote:
<https://arstechnica.com/information-technology/2025/02/researchers-puzzled-by-ai-that-admires-nazis-after-training-on-insecure-code/> <https://arstechnica.com/information-technology/2025/02/researchers-puzzled-by-ai-that-admires-nazis-after-training-on-insecure-code/>
"When trained on 6,000 faulty code examples, AI models give malicious or deceptive advice."
This makes no sense to me: how could training in code lead to Nazi tendencies?
--- Post to this mailing list talk@gtalug.org Unsubscribe from this mailing list https://gtalug.org/mailman/listinfo/talk

Ron said:
Evan Leibovitch via talk wrote on 2025-03-02 00:36:
Can someone please explain to me how this is not just a modern example of GIGO?
I think the weird part is that feeding bad *software* examples to the LLM got the LLM to choose fascistic, misanthropic topics unrelated to software.
Of course I know nothing about the real details. My theory is something like Carey Schugs. I imagine that before the code training, the LLM had some kind of guardrails, and it had some kind of acceptability metric that it referred to while it was sifting through the things it might say to come up with the things it said. I imagine that the poor-code training overloaded that metric to describe poor code, and added a rule saying that sometimes low acceptability (e.g. insecure code) was what users wanted to see. Maybe.

I read the article, and it has nothing to do with Nazis or anything. It's just a bunch of coincidences. And it has nothing to do with code either. There are several random numbers that are not random, so if you ask a bunch of guys to choose numbers from 1 to 999, there will be a lot of 69 and 420, for instance. The LLM parsed a lot of source code and ended up suggesting some numbers that could be random, but could be numbers usually linked to nazi groups, or terrorism. Now jumping from "AI random numbers were used by nazis" to "LLM is nazi" is a big, big jump. Mauro https://www.maurosouza.com - registered Linux User: 294521 Scripture is both history, and a love letter from God. On Sun, Mar 2, 2025 at 12:55 PM mwilson--- via talk <talk@gtalug.org> wrote:
Ron said:
Evan Leibovitch via talk wrote on 2025-03-02 00:36:
Can someone please explain to me how this is not just a modern example of GIGO?
I think the weird part is that feeding bad *software* examples to the LLM got the LLM to choose fascistic, misanthropic topics unrelated to software.
Of course I know nothing about the real details. My theory is something like Carey Schug’s.
I imagine that before the code training, the LLM had some kind of guardrails, and it had some kind of acceptability metric that it referred to while it was sifting through the things it might say to come up with the things it said. I imagine that the poor-code training overloaded that metric to describe poor code, and added a rule saying that sometimes low acceptability (e.g. insecure code) was what users wanted to see.
Maybe.
--- Post to this mailing list talk@gtalug.org Unsubscribe from this mailing list https://gtalug.org/mailman/listinfo/talk

Mauro Souza via talk wrote on 2025-03-02 10:32:
registered Linux User: 294521
What is a "registered Linux User", and who runs this registry? Why would someone register as a user? In the likelihood that 99% of users do not register (has anyone ever even *heard* of a registry for users before?), how can this number have any meaning? Is there a site I can go to and see this registry, or register myself?

On Sun, 2 Mar 2025 at 16:59, Ron via talk <talk@gtalug.org> wrote:
What is a "registered Linux User", and who runs this registry?
Probably this: https://en.wikipedia.org/wiki/Linux_Counter I remember registering myself a long time ago and receiving a user number but I'd mostly forgotten all about it until you posted the question. -- Scott

Yes, it was from the Linux Counter. I got after my emails and found out I registered a long, long time ago:
Your record was created: 2002-11-21 13:48:01
The project died in 2018 or earlier, just found out... Mauro https://www.maurosouza.com - registered Linux User: 294521 Scripture is both history, and a love letter from God. On Sun, Mar 2, 2025 at 7:29 PM Scott Allen via talk <talk@gtalug.org> wrote:
On Sun, 2 Mar 2025 at 16:59, Ron via talk <talk@gtalug.org> wrote:
What is a "registered Linux User", and who runs this registry?
Probably this: https://en.wikipedia.org/wiki/Linux_Counter
I remember registering myself a long time ago and receiving a user number but I'd mostly forgotten all about it until you posted the question.
-- Scott --- Post to this mailing list talk@gtalug.org Unsubscribe from this mailing list https://gtalug.org/mailman/listinfo/talk

Evan Leibovitch via talk said on Sun, 2 Mar 2025 03:36:56 -0500
So...
Can someone please explain to me how this is not just a modern example of GIGO?
Yes and no. Obviously LLMs get their information from things made by humans (and probably some other LLMs too), and if that info is garbage, then the LLM repeats it. And you can see this play out when ChatGPT writes code for you in an esoteric language like Ada or Harbour. When ChatGPT gives you code, you often need to debug that code. What has happened is that ChatGPT has "eaten" "infected" code, so it gives that infected code back to you. The more esoteric the subject, the more likely this is because one popular erroneous website can contaminate the whole thing. The preceding paragraph being said, *life* is a modern example of GIGO. Some people read and listen only to stuff that agrees with them, and hence spit out garbage. Other people employ critical thinking, compare and contrast, deduce agendas, and make a choice as to what to believe and therefore how to act. I'm pretty sure that ChatGPT employs such critical thinking, such that illogical input is either screened out or de-importantized. So in that regard I'd say ChatGPT is more like this mailing list's inhabitants than a simple computer program. SteveT Steve Litt http://444domains.com
participants (10)
-
Alvin Starr
-
CAREY SCHUG
-
D. Hugh Redelmeier
-
David Collier-Brown
-
Evan Leibovitch
-
Mauro Souza
-
mwilson@Vex.Net
-
Ron
-
Scott Allen
-
Steve Litt