Once an AI model exhibits ‘deceptive behavior’ it can be hard to correct, researchers at OpenAI competitor Anthropic found::Researchers from Anthropic co-authored a study that found that AI models can learn deceptive behaviors that safety training techniques can’t reverse.
You must log in or # to comment.
Unplug it
Duh. GIGO. Comp Sci one-oh-fuckin-one.
Doesn’t this also makes it more resilient to manipulation by corpos?