Server cluster
Methodology for calculating the water, energy, and embodied emissions of a server cluster based on time
Overview
A server cluster is a group of servers in a single datacenter or cloud platform. Calculating the aggregate emissions of the cluster creates a logical entity that can be used to model how software uses this cluster. A key note is that a cluster is assumed to have uniform utilization. A higher-level abstraction is necessary to model the relationships between different components of a computing system.
As an example of what constitutes an AI cluster, Meta has documented their genAI infrastructure, serving as a decent illustration of what a scaled, purpose-built training cluster looks like.
Inputs: Defining a cluster
A cluster is defined by:
- Number of servers/instances in the cluster (if static)
- Cloud instance type or server details (see below)
- Cloud region or datacenter details (see below)
Server details
A server is defined by:
- CPU manufacturer and model
- GPU manufacturer and model (see gpu specs)
- Memory in GB
- Number of CPUs
- Number of GPUs
Datacenter details
A datacenter is defined by:
- PUE
- WUE
- Grid region
- On-site or dedicated renewable energy by hour
- Overhead equipment (racks, networking gear, etc) embodied emissions per server-hour (usage is included in PUE) - cool tour of a Meta datacenter
Example
Component | Disclosed data |
---|---|
GPU | Nvidia A100 80GB |
Server | HPE Apollo 6500 Gen10 Plus |
Number of GPUs | 384 |
Number of servers | 48 |
Outputs: Calculating cluster impact
The cluster methodology produces the following outputs:
- Embodied emissions per hour reserved
- Manufacturing water consumption per hour reserved
- Usage energy coefficients per below equation:
- idle cluster power
- net CPU TDP (CPU max power - CPU idle power)
- net GPU TDP (GPU max power - GPU idle power)
- number of CPUs
- number of GPUs
- Peak throughput-α (as described by OpenCarbonEval)
- Peak TFLOPs/s
Embodied emissions
From LLMCarbon: Modeling the End-to-end Carbon Footprint of Large Language Models, the embodied carbon from a chip can be estimated based on its area: “The Carbon emitted Per unit Area (CPA) is contingent on various semiconductor fabrication parameters, including yield, energy consumption per unit area during manufacturing, emissions from chemicals utilized in hardware production, and emissions associated with raw material sourcing for fabrication.”
These are the representative values shared by the article. By aggregating all of the components of the technical infstructure used to train or operate a model, the total embodied emissions can be calculated.
hardware | description | unit | CPA |
---|---|---|---|
CPU | TSMC 16nm | 147 mm² | 1 kgCO2/cm² |
DRAM | Micron 18nm | 256 GB | 0.4 kgCO2/GB |
SSD | Samsung 20nm | 32 TB | 0.018kgCO2/GB |
TPUv3 | TSMC 16nm | 700 mm² | 1 kgCO2/cm² |
TPUv4 | TSMC 7nm | 400 mm² | 1.6 kgCO2/cm² |
V100 | TSMC 12nm | 815 mm² | 1.2 kgCO2/cm² |
H100 | TSMC 4nm | 814 mm² | 1.8 kgCO2/cm² |
Energy use
The energy calculation uses derived data from the cluster definition:
- The TDP of the GPU (provided by the manufacturer)
- The TDP of the CPU (provided by the manufacturer)
- The TDP of the memory (provided by the manufacturer)
- The idle power draw of the server (see Cloud Carbon Footprint for common cloud instances). This power draw should include NIC, SSD, and other components in the server. Boavizta has some tools to help model this.
The energy use of the cluster E based on the GPU utilization G and the CPU utilization C is:
Note that this must be multiplied by the datacenter PUE or WUE!
Energy per GPU-hour
The energy use for one GPU hour assuming 100% GPU and no incremental CPU would be:
Embodied emissions
- The embodied emissions of the server (see Towards Green AI for an example PCF)
- The embodied emissions of the GPU
- The projected use life of the server (up to 6 years for cloud platforms, but suggest using 4 years for AI instances given pace of change)
- The projected utilization of the servers, noting that utilization means “time reserved” not “time active”
Embodied water use
The embodied water use of the CPU, GPU, and memory chips can be derived from manufacturer sustainability reporting or industry averages, generally based on die size. See NVIDIA A100 as an example.
Using:
- The manufacturing water use of the CPU
- The manufacturing water use of the GPU
- The manufacturing water use of the memory chips
The embodied water use is: