Artwork

İçerik Machine Learning Street Talk (MLST) tarafından sağlanmıştır. Bölümler, grafikler ve podcast açıklamaları dahil tüm podcast içeriği doğrudan Machine Learning Street Talk (MLST) veya podcast platform ortağı tarafından yüklenir ve sağlanır. Birinin telif hakkıyla korunan çalışmanızı izniniz olmadan kullandığını düşünüyorsanız burada https://tr.player.fm/legal özetlenen süreci takip edebilirsiniz.
Player FM - Podcast Uygulaması
Player FM uygulamasıyla çevrimdışı Player FM !

Ashley Edwards - Genie Paper (DeepMind/Runway)

25:04
 
Paylaş
 

Manage episode 439764172 series 2803422
İçerik Machine Learning Street Talk (MLST) tarafından sağlanmıştır. Bölümler, grafikler ve podcast açıklamaları dahil tüm podcast içeriği doğrudan Machine Learning Street Talk (MLST) veya podcast platform ortağı tarafından yüklenir ve sağlanır. Birinin telif hakkıyla korunan çalışmanızı izniniz olmadan kullandığını düşünüyorsanız burada https://tr.player.fm/legal özetlenen süreci takip edebilirsiniz.

Ashley Edwards, who was working at DeepMind when she co-authored the Genie paper and is now at Runway, covered several key aspects of the Genie AI system and its applications in video generation, robotics, and game creation.

MLST is sponsored by Brave:

The Brave Search API covers over 20 billion webpages, built from scratch without Big Tech biases or the recent extortionate price hikes on search API access. Perfect for AI model training and retrieval augmentated generation. Try it now - get 2,000 free queries monthly at http://brave.com/api.

Genie's approach to learning interactive environments, balancing compression and fidelity.

The use of latent action models and VQE models for video processing and tokenization.

Challenges in maintaining action consistency across frames and integrating text-to-image models.

Evaluation metrics for AI-generated content, such as FID and PS&R diff metrics.

The discussion also explored broader implications and applications:

The potential impact of AI video generation on content creation jobs.

Applications of Genie in game generation and robotics.

The use of foundation models in robotics and the differences between internet video data and specialized robotics data.

Challenges in mapping AI-generated actions to real-world robotic actions.

Ashley Edwards: https://ashedwards.github.io/

TOC (*) are best bits

00:00:00 1. Intro to Genie & Brave Search API: Trade-offs & limitations *

00:02:26 2. Genie's Architecture: Latent action, VQE, video processing *

00:05:06 3. Genie's Constraints: Frame consistency & image model integration

00:07:26 4. Evaluation: FID, PS&R diff metrics & latent induction methods

00:09:44 5. AI Video Gen: Content creation impact, depth & parallax effects

00:11:39 6. Model Scaling: Training data impact & computational trade-offs

00:13:50 7. Game & Robotics Apps: Gamification & action mapping challenges *

00:16:16 8. Robotics Foundation Models: Action space & data considerations *

00:19:18 9. Mask-GPT & Video Frames: Real-time optimization, RL from videos

00:20:34 10. Research Challenges: AI value, efficiency vs. quality, safety

00:24:20 11. Future Dev: Efficiency improvements & fine-tuning strategies

Refs:

1. Genie (learning interactive environments from videos) / Ashley and DM collegues [00:01]

https://arxiv.org/abs/2402.15391

2. VQ-VAE (Vector Quantized Variational Autoencoder) / Aaron van den Oord, Oriol Vinyals, Koray Kavukcuoglu [02:43]

https://arxiv.org/abs/1711.00937

3. FID (Fréchet Inception Distance) metric / Martin Heusel et al. [07:37]

https://arxiv.org/abs/1706.08500

4. PS&R (Precision and Recall) metric / Mehdi S. M. Sajjadi et al. [08:02]

https://arxiv.org/abs/1806.00035

5. Vision Transformer (ViT) architecture / Alexey Dosovitskiy et al. [12:14]

https://arxiv.org/abs/2010.11929

6. Genie (robotics foundation models) / Google DeepMind [17:34]

https://deepmind.google/research/publications/60474/

7. Chelsea Finn's lab work on robotics datasets / Chelsea Finn [17:38]

https://ai.stanford.edu/~cbfinn/

8. Imitation from observation in reinforcement learning / YuXuan Liu [20:58]

https://arxiv.org/abs/1707.03374

9. Waymo's autonomous driving technology / Waymo [22:38]

https://waymo.com/

10. Gen3 model release by Runway / Runway [23:48]

https://runwayml.com/

11. Classifier-free guidance technique / Jonathan Ho and Tim Salimans [24:43]

https://arxiv.org/abs/2207.12598

  continue reading

187 bölüm

Artwork
iconPaylaş
 
Manage episode 439764172 series 2803422
İçerik Machine Learning Street Talk (MLST) tarafından sağlanmıştır. Bölümler, grafikler ve podcast açıklamaları dahil tüm podcast içeriği doğrudan Machine Learning Street Talk (MLST) veya podcast platform ortağı tarafından yüklenir ve sağlanır. Birinin telif hakkıyla korunan çalışmanızı izniniz olmadan kullandığını düşünüyorsanız burada https://tr.player.fm/legal özetlenen süreci takip edebilirsiniz.

Ashley Edwards, who was working at DeepMind when she co-authored the Genie paper and is now at Runway, covered several key aspects of the Genie AI system and its applications in video generation, robotics, and game creation.

MLST is sponsored by Brave:

The Brave Search API covers over 20 billion webpages, built from scratch without Big Tech biases or the recent extortionate price hikes on search API access. Perfect for AI model training and retrieval augmentated generation. Try it now - get 2,000 free queries monthly at http://brave.com/api.

Genie's approach to learning interactive environments, balancing compression and fidelity.

The use of latent action models and VQE models for video processing and tokenization.

Challenges in maintaining action consistency across frames and integrating text-to-image models.

Evaluation metrics for AI-generated content, such as FID and PS&R diff metrics.

The discussion also explored broader implications and applications:

The potential impact of AI video generation on content creation jobs.

Applications of Genie in game generation and robotics.

The use of foundation models in robotics and the differences between internet video data and specialized robotics data.

Challenges in mapping AI-generated actions to real-world robotic actions.

Ashley Edwards: https://ashedwards.github.io/

TOC (*) are best bits

00:00:00 1. Intro to Genie & Brave Search API: Trade-offs & limitations *

00:02:26 2. Genie's Architecture: Latent action, VQE, video processing *

00:05:06 3. Genie's Constraints: Frame consistency & image model integration

00:07:26 4. Evaluation: FID, PS&R diff metrics & latent induction methods

00:09:44 5. AI Video Gen: Content creation impact, depth & parallax effects

00:11:39 6. Model Scaling: Training data impact & computational trade-offs

00:13:50 7. Game & Robotics Apps: Gamification & action mapping challenges *

00:16:16 8. Robotics Foundation Models: Action space & data considerations *

00:19:18 9. Mask-GPT & Video Frames: Real-time optimization, RL from videos

00:20:34 10. Research Challenges: AI value, efficiency vs. quality, safety

00:24:20 11. Future Dev: Efficiency improvements & fine-tuning strategies

Refs:

1. Genie (learning interactive environments from videos) / Ashley and DM collegues [00:01]

https://arxiv.org/abs/2402.15391

2. VQ-VAE (Vector Quantized Variational Autoencoder) / Aaron van den Oord, Oriol Vinyals, Koray Kavukcuoglu [02:43]

https://arxiv.org/abs/1711.00937

3. FID (Fréchet Inception Distance) metric / Martin Heusel et al. [07:37]

https://arxiv.org/abs/1706.08500

4. PS&R (Precision and Recall) metric / Mehdi S. M. Sajjadi et al. [08:02]

https://arxiv.org/abs/1806.00035

5. Vision Transformer (ViT) architecture / Alexey Dosovitskiy et al. [12:14]

https://arxiv.org/abs/2010.11929

6. Genie (robotics foundation models) / Google DeepMind [17:34]

https://deepmind.google/research/publications/60474/

7. Chelsea Finn's lab work on robotics datasets / Chelsea Finn [17:38]

https://ai.stanford.edu/~cbfinn/

8. Imitation from observation in reinforcement learning / YuXuan Liu [20:58]

https://arxiv.org/abs/1707.03374

9. Waymo's autonomous driving technology / Waymo [22:38]

https://waymo.com/

10. Gen3 model release by Runway / Runway [23:48]

https://runwayml.com/

11. Classifier-free guidance technique / Jonathan Ho and Tim Salimans [24:43]

https://arxiv.org/abs/2207.12598

  continue reading

187 bölüm

모든 에피소드

×
 
Loading …

Player FM'e Hoş Geldiniz!

Player FM şu anda sizin için internetteki yüksek kalitedeki podcast'leri arıyor. En iyi podcast uygulaması ve Android, iPhone ve internet üzerinde çalışıyor. Aboneliklerinizi cihazlar arasında eş zamanlamak için üye olun.

 

Hızlı referans rehberi