Setting up and Using SD3 Medium Locally


What is Stable Diffusion 3 Medium

Stable Diffusion 3 Medium (SD3M) is a two billion-parameter Multimodal Diffusion Transformer (MMDiT) text-to-image model that significantly improves image quality, typography, complex prompt understanding, and resource efficiency performance. Developed by Stability AI and open-sourced to empower a wide range of users, this model brings industry-leading image generation capabilities within everyone’s reach.

There are many ways to engage with SD3M, such as using Stability AI’s Developer Platform or our chat-based interface, Stable Assistant. Both these methods can get you up and running quickly and let you utilize our state-of-the-art Image Generation models. 

However, what sets Stability AI apart is that our models are open-source. This means you can download the model weights and parameters and run them on your servers and infrastructure. This allows you a level of security not available with closed-source AI models. 

This guide will cover how to quickly set up the SD3 Medium model on your own servers and infrastructure.

Downloading SD3 Medium model weights

The model weights for SD3 Medium are hosted on Hugging Face, which you can download by going to the Files and Versions section of the repository. As shown in the screenshot below, all the model files are hosted and available for download.

The different models that are listed are as follows:

  • sd3_medium.safetensors includes the MMDiT and VAE weights but does not include any text encoders.

  • sd3_medium_incl_clips_t5xxlfp16.safetensors contains all necessary weights, including fp16 version of the T5XXL text encoder.

  • sd3_medium_incl_clips_t5xxlfp8.safetensors contains all necessary weights, including fp8 version of the T5XXL text encoder, balancing quality and resource requirements.

  • sd3_medium_incl_clips.safetensors includes all necessary weights except for the T5XXL text encoder. It requires minimal resources, but the model's performance will differ without the T5XXL text encoder.

  • The text_encoders folder contains three text encoders and their original model card links for user convenience. All components within the text_encoders folder (and their equivalents embedded in other packings) are subject to their respective original licenses.

  • The example_workfows folder contains example comfy workflows.

How to use the model locally

There are different ways to use the model locally. One way is to use the open-source Graphical User Interface (GUI) ComfyUI.

ComfyUI 

ComfyUI is open-source software developed by the Stable Diffusion community and has been widely adopted for using Diffusion models. It allows you to create model workflows by grouping nodes that can be custom-built or taken from the active Stable Diffusion community, which develops a vast array of nodes.

This video discusses why ComfyUI one of the best GUI tools for using your SD3 or any Stable Diffusion model. 

Below is an example of a ComfyUI workflow that shows how different nodes can be grouped together in a simple-to-use interface.

Installing ComfyUI

ComfyUI is an open-source repository that can be cloned and set up locally. The repository can be found here. It can be installed on any of the major operating systems and makes the best use of your computing resources while letting you build workflows that can be exported as API endpoints to be used later in your production environment. 

ComfyUI Manager adds to ComfyUI's power, letting you keep up with and download any community models from ComfyUI. You can install ComfyUI Manager once you have installed ComfyUI itself. The details for installing the Manager are in its GitHub repo here.

For video tutorials on installation, see the Resources section at the end of the document.

ComfyUI resources for learning

There are many tutorials, videos, and text for ComfyUI. If you want to explore different use cases documented in great detail by a Stability AI legend, please see the playlists documented by Scott Detweiler on his YouTube channel.

Note: For use with ComfyUI, use the model sd3_medium_incl_clips.safetensors for Model Load Checkpoint, as it includes the necessary CLIP embedding models that don’t need to be loaded separately. 

Python - Using Hugging Face Diffusers Library  

Another way to use Stable Diffusion 3 Medium (SD3M) locally is by utilizing the Hugging Face Diffusers Python library.  The Hugging Face Diffusers library offers a versatile toolkit for working with diffusion models, enabling image and audio generation tasks. It’s particularly valuable for using SD3M because it allows users to run these advanced models locally on their own devices, eliminating the need for cloud services. This local execution keeps all data, including prompts and generated outputs, secure on the user’s device, enhancing privacy by avoiding internet transmission. This setup is ideal for users who prioritize data confidentiality and want to minimize privacy risks associated with cloud processing.

To get set up with using the diffusers library follow the steps below;(See the CODE BELOW AS WELL)

  1. Set up the diffusers library by installing it following the instructions shared by Hugging Face here.

  2. Install the Hugging Face Command Line Interface (CLI) to pass in your Hugging Face authentication token, which you can get from your Hugging Face Profile Settings here.

  3. Install the transformers library following the directions listed here.

Installing the libraries.

You can install the relevant libraries with your terminal using the following commands.

# For installing via your terminal
pip install -U "huggingface_hub[cli]"
pip install torch diffusers
pip install torch transformers accelerate
pip install --upgrade diffusers

If you are running the code in Jupyter Notebook you can use the following commands to install the libraries in your run time.

# For installing in your Jupyter runtime
%pip install -U "huggingface_hub[cli]"
%pip install torch diffusers
%pip install torch transformers accelerate
%pip install --upgrade diffusers

Setting up your Hugging Face credentials.

Use the following code to run and store your Hugging face credentials.

huggingface-cli login

Generating images with diffusers

import torch
from diffusers import StableDiffusion3Pipeline

# Check if a CUDA-enabled GPU is available
device = "cuda" if torch.cuda.is_available() else "cpu"

# Load the model
pipe = StableDiffusion3Pipeline.from_pretrained(
    "stabilityai/stable-diffusion-3-medium-diffusers",
    torch_dtype=torch.float16 if device == "cuda" else torch.float32
)
pipe.to(device)



# Generate an image
image = pipe(
    prompt="a photo of a cat holding a sign that says hello world",
    negative_prompt="",
    num_inference_steps=28,
    height=1024,
    width=1024,
    guidance_scale=7.0,
).images[0]

# Save the generated image
image.save("sd3_hello_world.png")

Previous
Previous

Getting Started with Stable Assistant

Next
Next

Stable Diffusion 3 Medium Fine-tuning Tutorial