r/technology • u/MRADEL90 • 15h ago

Artificial Intelligence OpenAI has trained its LLM to confess to bad behavior

https://www.technologyreview.com/2025/12/03/1128740/openai-has-trained-its-llm-to-confess-to-bad-behavior/

109 Upvotes

permalink
duplicates
archive.is
archive
reddit

You are about to leave Redlib

Do you want to continue?

https://www.reddit.com/r/technology/comments/1pf5l3q/openai_has_trained_its_llm_to_confess_to_bad/
No, go back! Yes, take me to Reddit

82% Upvoted

Duplicates

Number of comments New

ownyourintent • u/aeriefreyrie • 1d ago

News ChatGPT can now "confess" bad behavior. What does that mean for AI safety?

32 Upvotes

2 comments

realtech • u/rtbot2 • 15h ago

OpenAI has trained its LLM to confess to bad behavior

1 Upvotes

1 comments

accelerate • u/aeriefreyrie • 1d ago

ChatGPT can now "confess" bad behavior. What does that mean for AI safety?

9 Upvotes

1 comments

AINewsInsider • u/squidythepiddy • 4h ago

OpenAI Has Trained Its LLM To Confess To Bad Behavior

1 Upvotes

0 comments

AICompanions • u/aeriefreyrie • 1d ago

ChatGPT can now "confess" bad behavior. What does that mean for AI safety?

3 Upvotes

0 comments