Introducing Stable Zero123: Quality 3D Object Generation from Single Images

Key Takeaways

  • Stable Zero123 generates novel views of an object, demonstrating 3D understanding of the object’s appearance from various angles with notably improved quality over Zero1-to-3 or Zero123-XL due to improved training datasets and elevation conditioning.

  • Based on Stable Diffusion 1.5, this model consumes the same amount of VRAM as SD1.5 to generate 1 novel view. Using Stable Zero123 to generate 3D objects requires more time and memory (24GB VRAM recommended).

  • This model is being released for non-commercial and research use, and the weights can be downloaded here.

  • Stable Zero123C can be used commercially with a Stability AI membership.

Sample 3D models reconstructed using score distillation on the Stable Zero123 model. 

Today we’re releasing Stable Zero123, our in-house trained model for view-conditioned image generation. Stable Zero123 produces notably improved results compared to the previous state-of-the-art, Zero123-XL. This is achieved through 3 key innovations: 

  1. An improved training dataset heavily filtered from Objaverse, to only preserve high quality 3D objects, that we rendered much more realistically than previous methods

  2. During training and inference, we provide the model with an estimated camera angle. This elevation conditioning allows it to make more informed, higher quality predictions.

  3. A pre-computed dataset (pre-computed latents) and improved dataloader supporting higher batch size, that, combined with the 1st innovation, yielded a 40X speed-up in training efficiency compared to Zero123-XL.

This model is now released on Hugging Face to enable researchers and non-commercial users to download and experiment with it.

Comparing Stable Zero123 (Stability AI) and Zero123-XL predictions across different views from a sample input image shown on top-right

Creating 3D objects using Stable Zero123

To enable open research in 3D object generation, we've improved the open-source code of threestudio open-source code to support Zero123 and Stable Zero123. This simplified version of the Stable 3D process is currently in private preview. In technical terms, this uses Score Distillation Sampling (SDS) to optimize a NeRF using the Stable Zero123 model, from which we can later create a textured 3D mesh. This process can be adapted for text-to-3D generation by first generating a single image using SDXL and then using Stable Zero123 to generate the 3D object.

Comparing 3D objects using Stable Zero123 (Stability AI) and Zero123-XL models

License considerations (non-commercial vs commercial use)

We released 2 versions of Stable Zero123.

  • Stable Zero123 included some CC-BY-NC 3D objects, so it can not be used commercially, but can be used for research purposes.

  • Stable Zero123C (“C” for “Commercially-available”) was only trained on CC-BY and CC0 3D objects. You can use it commercially only while you have an active Stability AI membership. If you’re not a Stability AI member, you cannot use that model commercially.

According to our internal tests, both models perform similarly in terms of prediction quality.                             

Stay updated on our progress by signing up for our newsletter, and learn more about commercial applications by contacting us here.

Follow us on X (Twitter), Instagram, LinkedIn, and join our Discord Community.

Updated Jan 8, 2024

Previous
Previous

Introducing the Stability AI Membership

Next
Next

Behind the Compute: Building the New AI Supercomputer