Image Quality / Model
Energy per image varies 40ร between SD-Turbo draft mode and DALL-E 3 ultra with iterative editing
Water per Image
Scale Equivalences โ Daily
Clip Length & Resolution
Video generation energy scales sharply with resolution and duration โ 4K 60s uses ~60ร more energy than a 5s SD clip
Uncertainty Range
Video generation energy estimates carry ยฑ50% uncertainty (2026). Actual use depends on model architecture, diffusion steps, CFG scale, and cluster efficiency. These figures reflect GPU-cluster API inference benchmarks, not local consumer hardware.
Water per Clip
Scale Equivalences โ Daily
Estimated Platform Daily Water Use
Based on publicly reported volume estimates and per-image/clip energy benchmarks. ยฑ50% uncertainty applies. Scope 1+2 at US avg WUE 0.25 L/kWh + GWIF 4.52 L/kWh.
MidJourney (Images)
~15M images/day at avg 5 Wh/image โ ~76 MWh/day โ ~358 kL/day. Equivalent to ~1,050 households' daily indoor water use.
DALL-E / OpenAI (Images)
~5M images/day at avg 12 Wh/image (DALL-E 3 + iterative editing) โ ~60 MWh/day โ ~286 kL/day. Iterative multi-edit workflow adds 3โ5ร base generation cost.
Stable Diffusion APIs (cloud)
~20M API images/day at avg 3 Wh โ ~60 MWh/day โ ~286 kL/day. Local inference on own GPU eliminates Scope 1 entirely; only upstream grid water (Scope 2) applies.
Runway / Pika / Kling (Video)
~500K clips/day at avg 80 Wh/clip (10s HD) โ ~40 MWh/day โ ~191 kL/day. Per-second energy cost of video generation is ~27ร higher than image generation.
Sora / Luma Dream Machine (Video)
~200K clips/day at avg 200 Wh/clip (15s HDโFHD) โ ~40 MWh/day โ ~191 kL/day. Transformer-based diffusion requires more compute per frame than earlier GAN-based systems.
All Platforms Combined (~2026)
Total estimated: ~1.3 million L/day (~476 ML/yr). For context: NYC daily water use โ 1 billion liters. All AI generative platforms = ~0.13% of NYC's daily use โ a growing but currently minor national demand.
Scale Visualization
Per-image and per-clip values vs. real-world water baselines โ logarithmic scale โ updates live with calculator
Key Findings
Image & video generation water context โ companion to bra-khet AI Water-Energy Nexus Report v1.2
- โข Fast AI image (SD-Turbo, 0.5 Wh): ~2.4 mL โ comparable to a 2026 LLM query at standard efficiency
- โข Standard image (SDXL / MJ v6, 3 Wh): ~14.3 mL โ ~55ร more water than Gemini per-query
- โข High quality (50 steps + upscale, 10 Wh): ~47.7 mL โ nearly a shot glass of water per image
- โข Ultra (DALL-E 3 with editing, 20 Wh): ~95.4 mL โ about 365ร more water than Gemini per-query
- โข 5s 480p SD clip (~125 frames, 25 Wh): ~119 mL โ roughly half a small glass of water
- โข 10s 720p HD clip (~240 frames, 80 Wh): ~382 mL โ one full 500 mL bottle per clip
- โข 30s 1080p FHD clip (~720 frames, 300 Wh): ~1.43 L โ three water bottles; equivalent to ~550 Gemini queries
- โข 60s 4K clip (~1,440 frames, 1500 Wh): ~7.16 L โ more than a full gallon; matches ~27,500 Gemini queries
- โข Fast images (2.4 mL each): 1,040,833 images per burger โ generating one million quick AI images = one hamburger's water
- โข Standard images (14.3 mL each): 174,685 images per burger
- โข High images (47.7 mL each): 52,370 images per burger
- โข Ultra images (95.4 mL each): 26,185 images per burger
- โข 10s HD clips (382 mL each): 6,544 clips per burger โ ~18 years of daily creator-tier video generation
- โข 60s 4K clips (7,155 mL each): 349 clips per burger โ less than one year of daily 4K generation at Pro Studio rate
- โข Your GPU has no data center cooling loop โ zero Scope 1 direct water evaporation
- โข Scope 2 still applies: your grid's upstream thermoelectric generation (GWIF ~4.52 L/kWh national avg)
- โข RTX 4090 at 400W peak โ 512ร512 image in ~0.5s โ ~0.056 Wh โ ~0.25 mL per image (Scope 2 only)
- โข Consumer GPU generates images at ~10ร lower water cost than cloud API (no cooling overhead, same compute time)
- โข This also applies to video: local inference on a 4090 for a 10s clip might use ~8โ15 Wh vs. 80 Wh on a data center GPU cluster
- โข Diffusion distillation (LCM, TurboSD, Hyper-SD): 4-step inference vs. 30 steps โ ~7.5ร energy reduction per image at comparable quality
- โข Flow matching (SD3, FLUX): Deterministic trajectories with fewer NFE (Number of Function Evaluations) โ 8โ12 steps at high quality; ~2โ3ร energy reduction vs. DDPM
- โข Video frame reuse (StreamDiffusion): Delta diffusion for slow-motion scenes reduces effective NFE by 40โ60% at similar temporal coherence
- โข Linear-attention architectures (Mamba, SSMs for video): Eliminate quadratic attention scaling โ 3โ5ร lower compute for long-duration video generation
- โข Flash-Attention-2 + Triton kernels: 30โ40% power efficiency gain at constant throughput via reduced GPU memory bandwidth pressure
- โข Trajectory: 2026 "fast" quality (0.5 Wh/image) will be standard quality by 2028 via distillation; current ultra (20 Wh) may drop to ~5 Wh through efficient sampling pipelines
Sources
- [1] Li, P. et al. (2023). Making AI Less Thirsty: Uncovering and Addressing the Secret Water Footprint of AI Models. arXiv:2311.16863.
- [2] Zhao, S. et al. (2025). Energy and Water Consumption in AI-Generated Content: Image and Video Models. arXiv:2509.19222.
- [3] Hao, K. (2025, May). How much energy does AI actually use? MIT Technology Review.
- [4] Ren, S. et al. (2023). On the Energy and Water Consumption of Generative AI. arXiv:2304.03271.
- [5] Lawrence Berkeley National Laboratory (2024). US Data Center Energy Use Report (GWIF 4.52 L/kWh).
- [6] Mekonnen, M.M. & Hoekstra, A.Y. (2012). A global assessment of the water footprint of farm animal products. Ecosystems 15(3):401โ415.
- [7] Rombach, R. et al. (2022). High-Resolution Image Synthesis with Latent Diffusion Models. CVPR 2022 (SD/SDXL baseline architecture).
- [8] OpenAI (2024). DALL-E 3 technical report. openai.com/research.
- [9] Luo, S. et al. (2023). LCM: Latent Consistency Models. arXiv:2310.04378 (4-step distillation inference).
- [10] Esser, P. et al. (2024). Scaling Rectified Flow Transformers for High-Resolution Image Synthesis (SD3 / FLUX). arXiv:2403.03206.
- [11] Blattmann, A. et al. (2023). Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets. arXiv:2311.15127.
- [12] Bar-Tal, O. et al. (2024). Lumiere: A Space-Time Diffusion Model for Video Generation. arXiv:2401.12945.
- [13] AWWA (2022). US daily household indoor water use baseline: 341 L/day.
- [14] Jegham, I. et al. (2025). Empirical energy benchmarking of 30 LLMs (Gemini 0.24 Wh/query). arXiv preprint.
- [15] Google (2026). Ironwood TPU benchmarks and environmental disclosures. Google Environmental Reports.