DeepSeek-V2: A Strong, Economical, And Efficient Mixture-of-Experts Language Model Papers Read On AI podcast

Artwork

İçerik Rob tarafından sağlanmıştır. Bölümler, grafikler ve podcast açıklamaları dahil tüm podcast içeriği doğrudan Rob veya podcast platform ortağı tarafından yüklenir ve sağlanır. Birinin telif hakkıyla korunan çalışmanızı izniniz olmadan kullandığını düşünüyorsanız burada https://tr.player.fm/legal özetlenen süreci takip edebilirsiniz.

Papers Read on AI « »
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

26d ago 41:56

Paylaş

MP3•Bölüm sayfası

İçerik Rob tarafından sağlanmıştır. Bölümler, grafikler ve podcast açıklamaları dahil tüm podcast içeriği doğrudan Rob veya podcast platform ortağı tarafından yüklenir ve sağlanır. Birinin telif hakkıyla korunan çalışmanızı izniniz olmadan kullandığını düşünüyorsanız burada https://tr.player.fm/legal özetlenen süreci takip edebilirsiniz.

We present DeepSeek-V2, a strong Mixture-of-Experts (MoE) language model characterized by economical training and efficient inference. It comprises 236B total parameters, of which 21B are activated for each token, and supports a context length of 128K tokens. DeepSeek-V2 adopts innovative architectures including Multi-head Latent Attention (MLA) and DeepSeekMoE. MLA guarantees efficient inference through significantly compressing the Key-Value (KV) cache into a latent vector, while DeepSeekMoE enables training strong models at an economical cost through sparse computation. Compared with DeepSeek 67B, DeepSeek-V2 achieves significantly stronger performance, and meanwhile saves 42.5% of training costs, reduces the KV cache by 93.3%, and boosts the maximum generation throughput to 5.76 times. We pretrain DeepSeek-V2 on a high-quality and multi-source corpus consisting of 8.1T tokens, and further perform Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL) to fully unlock its potential. Evaluation results show that, even with only 21B activated parameters, DeepSeek-V2 and its chat versions still achieve top-tier performance among open-source models.
2024: DeepSeek-AI
https://arxiv.org/pdf/2405.04434

… continue reading

293 bölüm

Artwork

DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

Papers Read on AI

31 subscribers

published 26d ago

Paylaş

MP3•Bölüm sayfası

İçerik Rob tarafından sağlanmıştır. Bölümler, grafikler ve podcast açıklamaları dahil tüm podcast içeriği doğrudan Rob veya podcast platform ortağı tarafından yüklenir ve sağlanır. Birinin telif hakkıyla korunan çalışmanızı izniniz olmadan kullandığını düşünüyorsanız burada https://tr.player.fm/legal özetlenen süreci takip edebilirsiniz.

We present DeepSeek-V2, a strong Mixture-of-Experts (MoE) language model characterized by economical training and efficient inference. It comprises 236B total parameters, of which 21B are activated for each token, and supports a context length of 128K tokens. DeepSeek-V2 adopts innovative architectures including Multi-head Latent Attention (MLA) and DeepSeekMoE. MLA guarantees efficient inference through significantly compressing the Key-Value (KV) cache into a latent vector, while DeepSeekMoE enables training strong models at an economical cost through sparse computation. Compared with DeepSeek 67B, DeepSeek-V2 achieves significantly stronger performance, and meanwhile saves 42.5% of training costs, reduces the KV cache by 93.3%, and boosts the maximum generation throughput to 5.76 times. We pretrain DeepSeek-V2 on a high-quality and multi-source corpus consisting of 8.1T tokens, and further perform Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL) to fully unlock its potential. Evaluation results show that, even with only 21B activated parameters, DeepSeek-V2 and its chat versions still achieve top-tier performance among open-source models.
2024: DeepSeek-AI
https://arxiv.org/pdf/2405.04434

… continue reading

293 bölüm

Tüm bölümler

×

Player FM'e Hoş Geldiniz!

Player FM şu anda sizin için internetteki yüksek kalitedeki podcast'leri arıyor. En iyi podcast uygulaması ve Android, iPhone ve internet üzerinde çalışıyor. Aboneliklerinizi cihazlar arasında eş zamanlamak için üye olun.

500+ konuyu dinleyin

Hızlı referans rehberi

En Popüler Podcast'ler

Farklı Kaydet Podcast

Socrates Podcasts

girişimci muhabbeti

Başka Kanatlar Altında Yaşayamam

BuHafta Sinema & Dizi Gündemi

Ekonomik Gidişat

Haftalık Gündem Değerlendirmesi

Yoldayız Geliyor Musun?

Evrim Ağacı ile Bilime Dair Her Şey!

Yeşilçam Arkeolojisi

Yardım / SSS | Yükselt | Advertise

Sanat|İş Dünyası|Komedi|İktisat|Eğlence|Haberler|Politika|Din

Bilim|Futbol|Spor|Hikaye Anlatımı|Teknoloji|Gerçek Suçlar

Telif hakkı 2024 | Site haritası | Gizlilik Politikası | Kullanım Şartları | | Telif hakkı