Deep Dive Into Flux: Everything You Need to Know! - Pruna AI - Make your AI models cheaper, faster, smaller ...

Back to articles

Technical Article

Deep Dive Into Flux: Everything You Need to Know!

Nov 27, 2024

Johanna Sommer

ML Research Engineer

Bertrand Charpentier

Cofounder, President & Chief Scientist

Over the past few weeks, we've received 3 to 5 weekly requests from companies curious about our public quantized versions of Flux. After discovering that these optimized models could fit on A10G GPU, they asked detailed benchmarking questions: How much faster is it? Can it handle any pixel size or batch of images without quality loss? Is it compatible across various hardware setups? The sudden surge in interest genuinely caught us off guard. So, naturally, we decided to dive deeper and get to the bottom of what’s driving this excitement.

Why We Believe Flux Is More Than Just Hype

The Flux model (and all its variations) is making headlines in the world of image generation, and for good reason. With a whopping 12 billion parameters under its hood, Flux is described as a “rectified flow transformer.” But what does that mean in simpler terms?

Let’s start with the concept of a “rectified flow.” Flux operates similarly to diffusion models, which generate high-quality images from random noise. Flux, however, which is based on the concept of “Flow Matching”, takes a more direct and efficient route than traditional diffusion models. While those older models might meander through complex, winding paths to create an image, Flux takes the straight road. It moves directly from noise to image along a straight path in its internal representation space. Because Flux doesn’t take unnecessary detours, it can produce high-quality photos with fewer steps, meaning much faster image generation. Imagine getting from point A to point B without taking the scenic routes—you arrive much quicker!

Comment: We can see that diffusion models only start producing higher-quality images in the later steps, whereas Flow Matching achieves this much earlier." Source: https://arxiv.org/pdf/2210.02747

Now, let’s talk about Flux's “transformer” part. This refers to the neural network architecture it uses to go from point A to point B. Despite handling many parameters, the transformer block's design helps Flux stay efficient. These layers allow Flux to manage complex spatial relationships within images more effectively, enhancing speed and quality.

When we talk about efficiency, the Flux developers did not stop there. Good job, Black Forrest Labs! In addition to the FLUX.1 [dev] version, we have access to FLUX.1 [schnell] , which was trained with latent adversarial diffusion distillation, meaning it can generate high-quality images in only 1 to 4 steps.

As an ML Researcher who worked on Flow Matching models during my PhD, I've made my choice, and Flux is a resounding Yes! However, the community's opinions do vary. If you browse some discussions on Reddit (like this one), you'll notice that some researchers are still debating whether Flow Matching truly outperforms traditional diffusion models in terms of innovation, speed, and quality. It's a hot topic, and the differing viewpoints make for an interesting read!

Introducing the “Flux Playground”

With all the buzz happening, the questions we've received, and the benchmarks we've conducted, we decided to build a mini app to showcase our findings. Brace yourself: yes, it’s a test URL, and yes, the design is simple—but the value it delivers made us go public with it. The app compares the Schnell and Dev base models with optimized versions like Turbo and Fast, using over 60 prompts. We included speed-up, cost metrics, and direct image comparisons to assess image quality. By the way, since quality evaluation can be subjective, we've integrated a feature for customers who want to dive deeper, enabling real-time comparison votes from a panel of real people.

Getting Started With Flux and Pruna AI

4 steps, 28 lines of code. That's all it takes—and lucky for you, we love sharing code snippets to make your life easier! Just a heads-up: you'll need a token (see line 16: # replace <your_token> with your actual token or None if you do not have one). But no worries—drop your email, and you'll automatically receive one. Yup, that's a new feature, and we're teasing it a bit here! 😉

import torch
from diffusers import FluxPipeline
from pruna import SmashConfig, smash

# Load the Flux model
pipe = FluxPipeline.from_pretrained("black-forest-labs/FLUX.1-schnell", torch_dtype=torch.bfloat16)
pipe.to('cuda')

# Initialize the SmashConfig
smash_config = SmashConfig()
smash_config['compilers'] = ['onediff']

# Smash the model
pipe.transformer = smash(
    model=pipe.transformer,
    token='<your_token>',  # replace <your_token> with your actual token or None if you do not have one
    smash_config=smash_config,
)

# Run the model on a given input
prompt = "A cat holding a sign that says hello world"
pipe(
    prompt,
    guidance_scale=0.0,
    num_inference_steps=4,
    max_sequence_length=256,
    generator=torch.Generator("cpu").manual_seed(0)
).images[0]

As we wrap up, let’s take a step back and examine how theory meets reality. We recently published a case study with Each AI, and we think it illustrates perfectly how the technical considerations we've discussed translate into impactful, real-life use cases. Check out the full blog post to learn how Each AI went from zero to production in just days, achieving incredible x3 cost and speed optimizations. If you're inspired to get started yourself, here’s the Getting Started Documentation to make it work for you too!

Or you can stop by the Discord, say Hi, and discuss tech with our team!

Want to Go Deeper?

At Pruna AI, our team of Researchers and PhDs shares a passion for deep scientific exploration, and we love providing additional resources for those who share the same DNA of curiosity. We've curated a list of recommendations for anyone eager to dive deeper into the science behind Flow Matching and the innovative techniques used in Flux.

Back to articles

・

Nov 27, 2024

Subscribe to Pruna's Newsletter

Curious what Pruna can do for your models?

Whether you're running GenAI in production or exploring what's possible, Pruna makes it easier to move fast and stay efficient.

Install Pruna

Get a benchmark

Curious what Pruna can do for your models?

Whether you're running GenAI in production or exploring what's possible, Pruna makes it easier to move fast and stay efficient.

Install Pruna

Get a benchmark

Curious what Pruna can do for your models?

Whether you're running GenAI in production or exploring what's possible, Pruna makes it easier to move fast and stay efficient.

Install Pruna

Get a benchmark