From Andrew A. Chien, Liuzixuan Lin, Hai Nguyen, Varsha Rao, Tristan Sharma, and Rajini Wijayawardana. 2023. Reducing the Carbon Impact of Generative AI Inference (today and in 2035): In 2nd Workshop on Sustainable Computer Systems (HotCarbon β23), July 9, 2023, Boston, MA, USA. ACM, New York, NY, USA, 7 pages.ππ·π = 0.428 kW per GPU (1/8 of 3.43 kW for the instance) x 1.1 PUEππΌ = 0.35 is TFLOPS per inference assuming GPT-3 model (around 175 billion weights) processed with BF16 operations.πΌπ = 5 is the number of inferences per output word (assumed window/sampling of 5 for each output word)ππΆ is the output word count (measured average of 185 output words/request)πΆ = 156 TFLOPS is the GPU capacity assuming 50% efficiencyπΈhπ€ is per-GPU emission calculated as 1/8 of estimated per-instance emissions:
πΈhπ€ = 1/8 (ππΉ +πΈπΊππ +πΈπΆππ +πΈπ·π π΄π +πΈπππ· +πΈπ»π·π·)
where ππΉ is IC packaging Carbon footprint while πΈπΊππ , πΈπΆππ , πΈπ·π π΄π, πΈπππ·, and πΈπ»π·π· are GPU, CPU, memory, and storage emissions, respectively. We estimate these emissions based on previous reports [26] and instance hardware specifications [1, 3, 11], yielding πΈhπ€ = 318 kgCO2 per GPU
Water consumption per wafer-layer (Liter/12-inch equivalent wafer mask layer) from TSMC 2022 ESG report: 137.3 L per 12-inch equivalent wafer mask layer
(water use per chip) = (water use per wafer mask layer) x (mask layers) / (chips per wafer)(water use per A100) = (137.3 L/layer) x (87 layers/wafer) / (29 chips/wafer) = 411.9 L/chip
Note: this doesnβt include the memory chips that are also on the A100β¦ need to find a source for the water use there