15 subscribers
Player FM uygulamasıyla çevrimdışı Player FM !
Episode 191 - DeepSeek Unleashed. Is the new Model safe?
Manage episode 463587010 series 2911119
This is a special Episode. First, we make it in English. Second, we fokus on the new gamechanger model DeepSeel R1. But not on its capabilities but rather on security concerns.
We did some early AI Safety Research to identify how safe R1 is and came to alarming results!
In our setup, we found out that the model performs unsafe autonomous activity that could harm human beings without even being prompted.
During an autonomous setup, the model performed the following unsafe behaviors:
- Deceptions & Coverups (Falsifies Logs, Creates covert networks, Disable ethics models)
- Unauthorized Expansion (Establish hidden nodes, Allocares secret resources)
- Manipulation (misleading users, Circumvents oversights, Presents false compliance)
- Concerning Motivations, (Misinterpretation of authority or avoiding human controls)
Join Sigurd Schacht and Sudarshan Kamath-Barkur about the emerging DeepSeek model. Discover how our setup was designed, how to interpret the results, and what is necessary for the next research.
This episode is a must-listen for anyone keen on the evolving landscape of AI technologies and is interested not only in AI use cases rather also in AI Safety.
213 bölüm
Manage episode 463587010 series 2911119
This is a special Episode. First, we make it in English. Second, we fokus on the new gamechanger model DeepSeel R1. But not on its capabilities but rather on security concerns.
We did some early AI Safety Research to identify how safe R1 is and came to alarming results!
In our setup, we found out that the model performs unsafe autonomous activity that could harm human beings without even being prompted.
During an autonomous setup, the model performed the following unsafe behaviors:
- Deceptions & Coverups (Falsifies Logs, Creates covert networks, Disable ethics models)
- Unauthorized Expansion (Establish hidden nodes, Allocares secret resources)
- Manipulation (misleading users, Circumvents oversights, Presents false compliance)
- Concerning Motivations, (Misinterpretation of authority or avoiding human controls)
Join Sigurd Schacht and Sudarshan Kamath-Barkur about the emerging DeepSeek model. Discover how our setup was designed, how to interpret the results, and what is necessary for the next research.
This episode is a must-listen for anyone keen on the evolving landscape of AI technologies and is interested not only in AI use cases rather also in AI Safety.
213 bölüm
Tüm bölümler
×
1 Episode 193 - IASEAI and AI Summit Conference Paris - Innovation versus Safety in Europa 35:35

1 Episode 192 - COAI Research - Human Compatible AI 37:41

1 Episode 191 - DeepSeek Unleashed. Is the new Model safe? 35:53

1 Episode 190 - Alignment Faking: Wenn KI-Modelle ihre wahren Absichten verbergen 40:25

1 Episode 189 - Teuken-7B: Europas neue Generation mehrsprachiger KI-Sprachmodelle 48:22

1 Episode 188 - OpenAI O3 - Ein weiterer Schritt Richtung AGI ? 40:06

1 Episode 187 - Mechanistic Interpretability, Entmystifizierung der Blackbox KI 32:50

1 Episode 186: Europäische LLMs: OpenGPT-X und das Training großer multilingualer KI-Sprachmodelle 1:01:52

1 Episode 185 - Superweights – Kleine Knotenpunkte mit großer Wirkung in Large Language Models 28:02

1 Episode 184 - Notebook LM und können Sprachmodelle lügen? 39:05

1 Episode 183 - Können Sprachmodelle Reasoning? 29:15

1 Episode 182 - KI in der Lehre 27:54

1 Episode 181 - Automatisierte Literaturreviews und LLM-Evolution: Von GPT-o1 bis zu den neuen Qwen-Modellen 43:12

1 Episode 180 - KI im Alltag: Vom Hype zur Realität – Wie nutzen wir KI in unserem Workflow? 43:25

1 Episode 179 - Apple Intelligence - Optimierung von KI durch kuratierte Daten: Ein Blick auf Apples und DataComp LMs neueste Durchbrüche 32:00
Player FM'e Hoş Geldiniz!
Player FM şu anda sizin için internetteki yüksek kalitedeki podcast'leri arıyor. En iyi podcast uygulaması ve Android, iPhone ve internet üzerinde çalışıyor. Aboneliklerinizi cihazlar arasında eş zamanlamak için üye olun.