Overview

A server cluster is a group of servers in a single datacenter or cloud platform. Calculating the aggregate emissions of the cluster creates a logical entity that can be used to model how software uses this cluster. A key note is that a cluster is assumed to have uniform utilization. A higher-level abstraction is necessary to model the relationships between different components of a computing system.

As an example of what constitutes an AI cluster, Meta has documented their genAI infrastructure, serving as a decent illustration of what a scaled, purpose-built training cluster looks like.

Inputs: Defining a cluster

A cluster is defined by:

  • Number of servers/instances in the cluster (if static)
  • Cloud instance type or server details (see below)
  • Cloud region or datacenter details (see below)

Server details

A server is defined by:

  • CPU manufacturer and model
  • GPU manufacturer and model (see gpu specs)
  • Memory in GB
  • Number of CPUs
  • Number of GPUs

Datacenter details

A datacenter is defined by:

  • PUE
  • WUE
  • Grid region
  • On-site or dedicated renewable energy by hour
  • Overhead equipment (racks, networking gear, etc) embodied emissions per server-hour (usage is included in PUE) - cool tour of a Meta datacenter

Example

ComponentDisclosed data
GPUNvidia A100 80GB
ServerHPE Apollo 6500 Gen10 Plus
Number of GPUs384
Number of servers48

Outputs: Calculating cluster impact

The cluster methodology produces the following outputs:

  • Embodied emissions per hour reserved
  • Manufacturing water consumption per hour reserved
  • Usage energy coefficients per below equation:
    • idle cluster power
    • net CPU TDP (CPU max power - CPU idle power)
    • net GPU TDP (GPU max power - GPU idle power)
    • number of CPUs
    • number of GPUs
  • Peak throughput-α (as described by OpenCarbonEval)
  • Peak TFLOPs/s

Embodied emissions

From LLMCarbon: Modeling the End-to-end Carbon Footprint of Large Language Models, the embodied carbon from a chip can be estimated based on its area: “The Carbon emitted Per unit Area (CPA) is contingent on various semiconductor fabrication parameters, including yield, energy consumption per unit area during manufacturing, emissions from chemicals utilized in hardware production, and emissions associated with raw material sourcing for fabrication.”

These are the representative values shared by the article. By aggregating all of the components of the technical infstructure used to train or operate a model, the total embodied emissions can be calculated.

hardwaredescriptionunitCPA
CPUTSMC 16nm147 mm²1 kgCO2/cm²
DRAMMicron 18nm256 GB0.4 kgCO2/GB
SSDSamsung 20nm32 TB0.018kgCO2/GB
TPUv3TSMC 16nm700 mm²1 kgCO2/cm²
TPUv4TSMC 7nm400 mm²1.6 kgCO2/cm²
V100TSMC 12nm815 mm²1.2 kgCO2/cm²
H100TSMC 4nm814 mm²1.8 kgCO2/cm²

Energy use

The energy calculation uses derived data from the cluster definition:

  • The TDP of the GPU (provided by the manufacturer)
  • The TDP of the CPU (provided by the manufacturer)
  • The TDP of the memory (provided by the manufacturer)
  • The idle power draw of the server (see Cloud Carbon Footprint for common cloud instances). This power draw should include NIC, SSD, and other components in the server. Boavizta has some tools to help model this.

The energy use of the cluster E based on the GPU utilization G and the CPU utilization C is:

E(G,C) = ((idle cluster power) + (memory TDP) + (net CPU TDP) x (number of CPUs) + (net GPU TDP) x (number of GPUs))

Note that this must be multiplied by the datacenter PUE or WUE!

Energy per GPU-hour

The energy use for one GPU hour assuming 100% GPU and no incremental CPU would be:

E(gpu-hour) = E(100,0) / 1000 / (number of GPUs)
            = ((idle cluster power) / (number of GPUs) + (net GPU TDP)) / 1000

Embodied emissions

  • The embodied emissions of the server (see Towards Green AI for an example PCF)
  • The embodied emissions of the GPU
  • The projected use life of the server (up to 6 years for cloud platforms, but suggest using 4 years for AI instances given pace of change)
  • The projected utilization of the servers, noting that utilization means “time reserved” not “time active”
EmbEm(h) = ((number of GPUs) x (GPU embodied emissions) +
                               (number of servers) x (server embodied emissions))
                              / (use life in hours)
                              / (utilization)

Embodied water use

The embodied water use of the CPU, GPU, and memory chips can be derived from manufacturer sustainability reporting or industry averages, generally based on die size. See NVIDIA A100 as an example.

Using:

  • The manufacturing water use of the CPU
  • The manufacturing water use of the GPU
  • The manufacturing water use of the memory chips

The embodied water use is:

EmbH20(h) = ((number of GPUs) x (water use per GPU) +
             (number of CPUs) x (water use per CPU) +
             (number of memory chips) x (water use per memory chip))
            / (use life in hours)
            / (utilization)

(manufacturing water use per chip) = (water use per wafer mask layer per wafer) x (wafer mask layers) / (chips per wafer)