Re: [GTALUG] LLM oddity: Researchers puzzled by AI that praises Nazis after training on insecure code

27 Feb 2025

      D. Hugh Redelmeier via talk wrote on 2025-02-27 12:00:
...
<https://arstechnica.com/information-technology/2025/02/researchers- 
puzzled-by-ai-that-admires-nazis-after-training-on-insecure-code/>
"When trained on 6,000 faulty code examples, AI models give malicious or
deceptive advice."
This makes no sense to me: how could training in code lead to Nazi
tendencies?
There's quite a bit of speculation on that in the comments.

Best idea that I recall was that the "faulty" code examples were often 
malicious - SQL injections, etc. - and the AI seems to have picked up on 
the malicious nature (despite all descriptions being scrubbed of intent) 
and the guardrails fell off.

"You wanted malicious, and I have many malicious ideas..."

It's a fascinating (and frightening) emergent behaviour.

Re: [GTALUG] LLM oddity: Researchers puzzled by AI that praises Nazis after training on insecure code

Ron