From Andrew A. Chien, Liuzixuan Lin, Hai Nguyen, Varsha Rao, Tristan Sharma, and Rajini Wijayawardana. 2023. Reducing the Carbon Impact of Generative AI Inference (today and in 2035): In 2nd Workshop on Sustainable Computer Systems (HotCarbon β23), July 9, 2023, Boston, MA, USA. ACM, New York, NY, USA, 7 pages.
ππ·π = 0.428 kW per GPU (1/8 of 3.43 kW for the instance) x 1.1 PUE
ππΌ = 0.35 is TFLOPS per inference assuming GPT-3 model (around 175 billion weights) processed with BF16 operations.
πΌπ = 5 is the number of inferences per output word (assumed window/sampling of 5 for each output word)
ππΆ is the output word count (measured average of 185 output words/request)
πΆ = 156 TFLOPS is the GPU capacity assuming 50% efficiency
πΈhπ€ is per-GPU emission calculated as 1/8 of estimated per-instance emissions: πΈhπ€ = 1/8 (ππΉ +πΈπΊππ +πΈπΆππ +πΈπ·π π΄π +πΈπππ· +πΈπ»π·π·) where ππΉ is IC packaging Carbon footprint while πΈπΊππ , πΈπΆππ , πΈπ·π π΄π, πΈπππ·, and πΈπ»π·π· are GPU, CPU, memory, and storage emissions, respectively. We estimate these emissions based on previous reports [26] and instance hardware specifications [1, 3, 11], yielding πΈhπ€ = 318 kgCO2 per GPU
Data for A100:
Using the manufacturing water use formula:
Note: this doesnβt include the memory chips that are also on the A100β¦ need to find a source for the water use there