Server cluster

Overview

A server cluster is a group of servers in a single datacenter or cloud platform. Calculating the aggregate emissions of the cluster creates a logical entity that can be used to model how software uses this cluster. A key note is that a cluster is assumed to have uniform utilization. A higher-level abstraction is necessary to model the relationships between different components of a computing system. As an example of what constitutes an AI cluster, Meta has documented their genAI infrastructure, serving as a decent illustration of what a scaled, purpose-built training cluster looks like.

Inputs: Defining a cluster

A cluster is defined by:

Number of servers/instances in the cluster (if static)
Cloud instance type or server details (see below)
Cloud region or datacenter details (see below)

Server details

A server is defined by:

CPU manufacturer and model
GPU manufacturer and model (see gpu specs)
Memory in GB
Number of CPUs
Number of GPUs

Datacenter details

A datacenter is defined by:

PUE
WUE
Grid region
On-site or dedicated renewable energy by hour
Overhead equipment (racks, networking gear, etc) embodied emissions per server-hour (usage is included in PUE) - cool tour of a Meta datacenter

Example

Component	Disclosed data
GPU	Nvidia A100 80GB
Server	HPE Apollo 6500 Gen10 Plus
Number of GPUs	384
Number of servers	48

Outputs: Calculating cluster impact

The cluster methodology produces the following outputs:

Embodied emissions per hour reserved
Manufacturing water consumption per hour reserved
Usage energy coefficients per below equation:
- idle cluster power
- net CPU TDP (CPU max power - CPU idle power)
- net GPU TDP (GPU max power - GPU idle power)
- number of CPUs
- number of GPUs
Peak throughput-α (as described by OpenCarbonEval)
Peak TFLOPs/s

Embodied emissions

From LLMCarbon: Modeling the End-to-end Carbon Footprint of Large Language Models, the embodied carbon from a chip can be estimated based on its area: “The Carbon emitted Per unit Area (CPA) is contingent on various semiconductor fabrication parameters, including yield, energy consumption per unit area during manufacturing, emissions from chemicals utilized in hardware production, and emissions associated with raw material sourcing for fabrication.” These are the representative values shared by the article. By aggregating all of the components of the technical infstructure used to train or operate a model, the total embodied emissions can be calculated.

hardware	description	unit	CPA
CPU	TSMC 16nm	147 mm²	1 kgCO2/cm²
DRAM	Micron 18nm	256 GB	0.4 kgCO2/GB
SSD	Samsung 20nm	32 TB	0.018kgCO2/GB
TPUv3	TSMC 16nm	700 mm²	1 kgCO2/cm²
TPUv4	TSMC 7nm	400 mm²	1.6 kgCO2/cm²
V100	TSMC 12nm	815 mm²	1.2 kgCO2/cm²
H100	TSMC 4nm	814 mm²	1.8 kgCO2/cm²

Energy use

The energy calculation uses derived data from the cluster definition:

The TDP of the GPU (provided by the manufacturer)
The TDP of the CPU (provided by the manufacturer)
The TDP of the memory (provided by the manufacturer)
The idle power draw of the server (see Cloud Carbon Footprint for common cloud instances). This power draw should include NIC, SSD, and other components in the server. Boavizta has some tools to help model this.

The energy use of the cluster E based on the GPU utilization G and the CPU utilization C is:

E(G,C) = ((idle cluster power) + (memory TDP) + (net CPU TDP) x (number of CPUs) + (net GPU TDP) x (number of GPUs))

Note that this must be multiplied by the datacenter PUE or WUE!

Energy per GPU-hour

The energy use for one GPU hour assuming 100% GPU and no incremental CPU would be:

E(gpu-hour) = E(100,0) / 1000 / (number of GPUs)
            = ((idle cluster power) / (number of GPUs) + (net GPU TDP)) / 1000

Embodied emissions

The embodied emissions of the server (see Towards Green AI for an example PCF)
The embodied emissions of the GPU
The projected use life of the server (up to 6 years for cloud platforms, but suggest using 4 years for AI instances given pace of change)
The projected utilization of the servers, noting that utilization means “time reserved” not “time active”

EmbEm(h) = ((number of GPUs) x (GPU embodied emissions) +
                               (number of servers) x (server embodied emissions))
                              / (use life in hours)
                              / (utilization)

Embodied water use

The embodied water use of the CPU, GPU, and memory chips can be derived from manufacturer sustainability reporting or industry averages, generally based on die size. See NVIDIA A100 as an example. Using:

The manufacturing water use of the CPU
The manufacturing water use of the GPU
The manufacturing water use of the memory chips

The embodied water use is:

EmbH20(h) = ((number of GPUs) x (water use per GPU) +
             (number of CPUs) x (water use per CPU) +
             (number of memory chips) x (water use per memory chip))
            / (use life in hours)
            / (utilization)

(manufacturing water use per chip) = (water use per wafer mask layer per wafer) x (wafer mask layers) / (chips per wafer)

Foundations

Advertising

Generative AI

Server cluster

Overview

Inputs: Defining a cluster

Server details

Datacenter details

Example

Outputs: Calculating cluster impact

Embodied emissions

Energy use

Energy per GPU-hour

Embodied emissions

Embodied water use

Foundations

Advertising

Generative AI

​Overview

​Inputs: Defining a cluster

​Server details

​Datacenter details

​Example

​Outputs: Calculating cluster impact

​Embodied emissions

​Energy use

​Energy per GPU-hour

​Embodied emissions

​Embodied water use

Overview

Inputs: Defining a cluster

Server details

Datacenter details

Example

Outputs: Calculating cluster impact

Embodied emissions

Energy use

Energy per GPU-hour

Embodied emissions

Embodied water use