Methodology for calculating the water, energy, and embodied emissions of a server cluster based on time
A server cluster is a group of servers in a single datacenter or cloud platform. Calculating the aggregate emissions of the cluster creates a logical entity that can be used to model how software uses this cluster. A key note is that a cluster is assumed to have uniform utilization. A higher-level abstraction is necessary to model the relationships between different components of a computing system.
As an example of what constitutes an AI cluster, Meta has documented their genAI infrastructure, serving as a decent illustration of what a scaled, purpose-built training cluster looks like.
A cluster is defined by:
A server is defined by:
A datacenter is defined by:
Component | Disclosed data |
---|---|
GPU | Nvidia A100 80GB |
Server | HPE Apollo 6500 Gen10 Plus |
Number of GPUs | 384 |
Number of servers | 48 |
The cluster methodology produces the following outputs:
From LLMCarbon: Modeling the End-to-end Carbon Footprint of Large Language Models, the embodied carbon from a chip can be estimated based on its area: “The Carbon emitted Per unit Area (CPA) is contingent on various semiconductor fabrication parameters, including yield, energy consumption per unit area during manufacturing, emissions from chemicals utilized in hardware production, and emissions associated with raw material sourcing for fabrication.”
These are the representative values shared by the article. By aggregating all of the components of the technical infstructure used to train or operate a model, the total embodied emissions can be calculated.
hardware | description | unit | CPA |
---|---|---|---|
CPU | TSMC 16nm | 147 mm² | 1 kgCO2/cm² |
DRAM | Micron 18nm | 256 GB | 0.4 kgCO2/GB |
SSD | Samsung 20nm | 32 TB | 0.018kgCO2/GB |
TPUv3 | TSMC 16nm | 700 mm² | 1 kgCO2/cm² |
TPUv4 | TSMC 7nm | 400 mm² | 1.6 kgCO2/cm² |
V100 | TSMC 12nm | 815 mm² | 1.2 kgCO2/cm² |
H100 | TSMC 4nm | 814 mm² | 1.8 kgCO2/cm² |
The energy calculation uses derived data from the cluster definition:
The energy use of the cluster E based on the GPU utilization G and the CPU utilization C is:
Note that this must be multiplied by the datacenter PUE or WUE!
The energy use for one GPU hour assuming 100% GPU and no incremental CPU would be:
The embodied water use of the CPU, GPU, and memory chips can be derived from manufacturer sustainability reporting or industry averages, generally based on die size. See NVIDIA A100 as an example.
Using:
The embodied water use is: