Month: February 2023

EC2 Trainium UltraClusters

Each EC2 Trn1 instance has

  • up to 16 AWS Trainium accelerators purpose built to accelerate DL training and deliver up to 3.4 petaflops of FP16/BF16 compute power. Each accelerator includes two second-generation NeuronCores
  • 512 GB of shared accelerator memory (HBM) with 9.8 TB/s of total memory bandwidth
  • 1600 Gbps of Elastic Fabric Adapter (EFAv2)

An EC2 Trn1 UltraCluster, consists of densely packed, co-located racks of Trn1 compute instances interconnected by non-blocking petabyte scale networking. It is our largest UltraCluster to date, offering 6 exaflops of compute power on demand with up to 30,000 Trainium chips.

https://aws.amazon.com/blogs/machine-learning/scaling-large-language-model-llm-training-with-amazon-ec2-trn1-ultraclusters/