Re: [GTALUG] LLM oddity: Researchers puzzled by AI that praises Nazis after training on insecure code

27 Feb 2025

      In the case of machine learning (ML), adding too many "factors" to 
consider leads the model to build false relationships, like "all horses 
live in barns". This over-constraint causes distantly-related queries to 
return garbage answers.

ML in a previous job used a small integer number of factors. Adding more 
over-constrained the model. That then led it to reason that if it saw a 
horse, it was in a barn.  In the use we had, being in a barn was a bad 
thing, even if at the same time it said the horse was in a house 
(obviously an unusual horse). As house ::= good, barn ::= bad, the 
presence of a horse in the input data meant we would always make the 
"it's bad" decision.

LLMs are much more complex, and have gazillions of factors, so the 
probability of doing something amazing stupid because of bad data is high.

I didn't expect Nazis, though. I expected the "bender problem", in which 
the robot wants to kill all humans.

--dave

On 2/27/25 15:00, D. Hugh Redelmeier via talk wrote:
...
<https://arstechnica.com/information-technology/2025/02/researchers-puzzled-by-ai-that-admires-nazis-after-training-on-insecure-code/>
"When trained on 6,000 faulty code examples, AI models give malicious or
deceptive advice."
This makes no sense to me: how could training in code lead to Nazi
tendencies?
---
Post to this mailing list talk@gtalug.org
Unsubscribe from this mailing list https://gtalug.org/mailman/listinfo/talk
-- 
David Collier-Brown,         | Always do right. This will gratify
System Programmer and Author | some people and astonish the rest
davecb@spamcop.net           |              -- Mark Twain