AI Image & Video Gen: Water Impact

Image Quality / Model

Energy per image varies 40× between SD-Turbo draft mode and DALL-E 3 ultra with iterative editing

SDXL / Midjourney v6 standard — 1024px, ~30 diffusion steps. Most common API tier.

Images Generated / Day 50

1101005001k

Site WUE [?] 0.25 L/kWh

Include grid indirect water (Scope 1+2 · 4.52 L/kWh US avg)

Water per Image

---

per image generated

---

Daily total

---

per day

Annual total

---

per year

Daily energy

---

kWh / day

Images / burger

---

images = 1 burger WF

Scale Equivalences — Daily

🍔 1 burger = years of this daily habit---

🚿 vs. 10-min shower (65 L)---

🏠 vs. household daily (341 L)---

💧 500 mL bottles equivalent---

Scale Visualization

Per-image and per-clip values vs. real-world water baselines — logarithmic scale — updates live with calculator

Key Findings

Image & video generation water context — companion to bra-khet AI Water-Energy Nexus Report v1.2

Text generation (LLM inference) at 2026 efficiency: 0.26–2.0 mL per query. Image generation adds a fundamentally different compute burden:

• Fast AI image (SD-Turbo, 0.5 Wh): ~2.4 mL — comparable to a 2026 LLM query at standard efficiency
• Standard image (SDXL / MJ v6, 3 Wh): ~14.3 mL — ~55× more water than Gemini per-query
• High quality (50 steps + upscale, 10 Wh): ~47.7 mL — nearly a shot glass of water per image
• Ultra (DALL-E 3 with editing, 20 Wh): ~95.4 mL — about 365× more water than Gemini per-query

The core reason: diffusion models run 20–100 denoising steps, each requiring a full U-Net or DiT forward pass. Text models run one forward pass with KV-cache reuse — an order-of-magnitude computational difference for similar output quality.

Video generation compounds image-gen cost by frame count and temporal consistency overhead:

• 5s 480p SD clip (~125 frames, 25 Wh): ~119 mL — roughly half a small glass of water
• 10s 720p HD clip (~240 frames, 80 Wh): ~382 mL — one full 500 mL bottle per clip
• 30s 1080p FHD clip (~720 frames, 300 Wh): ~1.43 L — three water bottles; equivalent to ~550 Gemini queries
• 60s 4K clip (~1,440 frames, 1500 Wh): ~7.16 L — more than a full gallon; matches ~27,500 Gemini queries

Temporal consistency models (Sora, Wan, CogVideo) require cross-frame attention over time, adding 20–40% overhead vs. independent frame generation. Efficient video diffusion (StreamDiffusion, AnimateDiff V3) can reduce this by ~3–5×.

One hamburger carries ~2,498 L total water footprint (Mekonnen & Hoekstra 2012, Ecosystems):

• Fast images (2.4 mL each): 1,040,833 images per burger — generating one million quick AI images = one hamburger's water
• Standard images (14.3 mL each): 174,685 images per burger
• High images (47.7 mL each): 52,370 images per burger
• Ultra images (95.4 mL each): 26,185 images per burger
• 10s HD clips (382 mL each): 6,544 clips per burger — ~18 years of daily creator-tier video generation
• 60s 4K clips (7,155 mL each): 349 clips per burger — less than one year of daily 4K generation at Pro Studio rate

All tool figures apply to cloud API inference in professional data centers (WUE 0.15–0.55 L/kWh + grid GWIF ~4.52 L/kWh). Running Stable Diffusion locally changes the calculus:

• Your GPU has no data center cooling loop — zero Scope 1 direct water evaporation
• Scope 2 still applies: your grid's upstream thermoelectric generation (GWIF ~4.52 L/kWh national avg)
• RTX 4090 at 400W peak — 512×512 image in ~0.5s → ~0.056 Wh → ~0.25 mL per image (Scope 2 only)
• Consumer GPU generates images at ~10× lower water cost than cloud API (no cooling overhead, same compute time)
• This also applies to video: local inference on a 4090 for a 10s clip might use ~8–15 Wh vs. 80 Wh on a data center GPU cluster

• Diffusion distillation (LCM, TurboSD, Hyper-SD): 4-step inference vs. 30 steps → ~7.5× energy reduction per image at comparable quality
• Flow matching (SD3, FLUX): Deterministic trajectories with fewer NFE (Number of Function Evaluations) — 8–12 steps at high quality; ~2–3× energy reduction vs. DDPM
• Video frame reuse (StreamDiffusion): Delta diffusion for slow-motion scenes reduces effective NFE by 40–60% at similar temporal coherence
• Linear-attention architectures (Mamba, SSMs for video): Eliminate quadratic attention scaling → 3–5× lower compute for long-duration video generation
• Flash-Attention-2 + Triton kernels: 30–40% power efficiency gain at constant throughput via reduced GPU memory bandwidth pressure
• Trajectory: 2026 "fast" quality (0.5 Wh/image) will be standard quality by 2028 via distillation; current ultra (20 Wh) may drop to ~5 Wh through efficient sampling pipelines

Sources

[1] Li, P. et al. (2023). Making AI Less Thirsty: Uncovering and Addressing the Secret Water Footprint of AI Models. arXiv:2311.16863.
[2] Zhao, S. et al. (2025). Energy and Water Consumption in AI-Generated Content: Image and Video Models. arXiv:2509.19222.
[3] Hao, K. (2025, May). How much energy does AI actually use? MIT Technology Review.
[4] Ren, S. et al. (2023). On the Energy and Water Consumption of Generative AI. arXiv:2304.03271.
[5] Lawrence Berkeley National Laboratory (2024). US Data Center Energy Use Report (GWIF 4.52 L/kWh).
[6] Mekonnen, M.M. & Hoekstra, A.Y. (2012). A global assessment of the water footprint of farm animal products. Ecosystems 15(3):401–415.
[7] Rombach, R. et al. (2022). High-Resolution Image Synthesis with Latent Diffusion Models. CVPR 2022 (SD/SDXL baseline architecture).
[8] OpenAI (2024). DALL-E 3 technical report. openai.com/research.
[9] Luo, S. et al. (2023). LCM: Latent Consistency Models. arXiv:2310.04378 (4-step distillation inference).
[10] Esser, P. et al. (2024). Scaling Rectified Flow Transformers for High-Resolution Image Synthesis (SD3 / FLUX). arXiv:2403.03206.
[11] Blattmann, A. et al. (2023). Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets. arXiv:2311.15127.
[12] Bar-Tal, O. et al. (2024). Lumiere: A Space-Time Diffusion Model for Video Generation. arXiv:2401.12945.
[13] AWWA (2022). US daily household indoor water use baseline: 341 L/day.
[14] Jegham, I. et al. (2025). Empirical energy benchmarking of 30 LLMs (Gemini 0.24 Wh/query). arXiv preprint.
[15] Google (2026). Ironwood TPU benchmarks and environmental disclosures. Google Environmental Reports.