AI Models

Stable Video 4D

August 20, 20240 Comment1.7k

Stable Video 4D (SV4D) is a generative model based on Stable Video Diffusion (SVD) and Stable Video 3D (SV3D), which takes in a single-view video of an object and generates multiple novel-view videos (4D image matrix) of that object.

arXiv

huggingface

Code

What is Stable Video 4D?

Stability AI has introduced a new generative AI called Stable Video 4D, which significantly expands the possibilities of video generation. The model accepts video input and generates several new videos from eight different perspectives. The main advantage of Stable Video 4D is that it combines the synthesis of video views and video generation in a single process, which improves 3D consistency and temporal smoothness.

Stable Video 4D Model is trained to generate 40 frames (5 video frames x 8 camera views) at 576×576 resolution, given 5 reference frames of the same size. To generate a 5×8 image matrix from a single view video, first run SV3D on the first input frame to generate an orbital video following a specified camera path, then use the orbital video as SV4D’s reference views, and input video as reference frames, as conditioning for 4D sampling. To generate longer novel-view videos, we use the first generated frames as anchors, and then densely sample (interpolate) the remaining frames. Please check our [tech report] and for details.

Key Takeaways:

Status and Performance
Stable Video 4D is capable of generating 5-frame videos from 8 viewpoints in about 40 seconds, and the entire 4D optimization process takes about 20 to 25 minutes. The technology is expected to have a wide range of application prospects in the fields of game development, video editing, and virtual reality, where professionals can view objects from multiple perspectives to enhance the realism and immersion of their products.

Technological Innovation
Unlike previous methods that required sampling from image diffusion models, video diffusion models, and multi-view diffusion models, SV4D generates multiple new viewpoint videos at the same time, which significantly improves the consistency of spatial and temporal axes and ensures the consistency of the object’s appearance across multiple viewpoints and timestamps.

Application Perspectives
SV4D’s lightweight 4D optimization framework brings innovation to virtual reality, video editing, and other fields, and how to achieve a better balance between visual effects and creativity will be realized in the future.