Amazon EC2 Trn1 Instances
Amazon Elastic Compute Cloud (EC2) Trn1 instances, powered by AWS Trainium chips, are purpose-built for high-performance deep learning training of generative AI models, including large language models and latent diffusion models. Trn1 instances offer up to 50% cost-to-train savings over other comparable Amazon EC2 instances. You can use Trn1 instances to train 100B+ parameter DL and generative AI models across a broad set of applications, such as text summarization, code generation, question answering, image and video generation, recommendation, and fraud detection. The AWS Neuron SDK helps developers train models on AWS Trainium (and deploy models on the AWS Inferentia chips). It integrates natively with frameworks such as PyTorch and TensorFlow so that you can continue using your existing code and workflows to train models on Trn1 instances.
Learn more
CoreWeave
CoreWeave is a cloud infrastructure provider specializing in GPU-based compute solutions tailored for AI workloads. The platform offers scalable, high-performance GPU clusters that optimize the training and inference of AI models, making it ideal for industries like machine learning, visual effects (VFX), and high-performance computing (HPC). CoreWeave provides flexible storage, networking, and managed services to support AI-driven businesses, with a focus on reliability, cost efficiency, and enterprise-grade security. The platform is used by AI labs, research organizations, and businesses to accelerate their AI innovations.
Learn more
Amazon EC2 Trn2 Instances
Amazon EC2 Trn2 instances, powered by AWS Trainium2 chips, are purpose-built for high-performance deep learning training of generative AI models, including large language models and diffusion models. They offer up to 50% cost-to-train savings over comparable Amazon EC2 instances. Trn2 instances support up to 16 Trainium2 accelerators, providing up to 3 petaflops of FP16/BF16 compute power and 512 GB of high-bandwidth memory. To facilitate efficient data and model parallelism, Trn2 instances feature NeuronLink, a high-speed, nonblocking interconnect, and support up to 1600 Gbps of second-generation Elastic Fabric Adapter (EFAv2) network bandwidth. They are deployed in EC2 UltraClusters, enabling scaling up to 30,000 Trainium2 chips interconnected with a nonblocking petabit-scale network, delivering 6 exaflops of compute performance. The AWS Neuron SDK integrates natively with popular machine learning frameworks like PyTorch and TensorFlow.
Learn more
AWS Deep Learning AMIs
AWS Deep Learning AMIs (DLAMI) provides ML practitioners and researchers with a curated and secure set of frameworks, dependencies, and tools to accelerate deep learning in the cloud. Built for Amazon Linux and Ubuntu, Amazon Machine Images (AMIs) come preconfigured with TensorFlow, PyTorch, Apache MXNet, Chainer, Microsoft Cognitive Toolkit (CNTK), Gluon, Horovod, and Keras, allowing you to quickly deploy and run these frameworks and tools at scale. Develop advanced ML models at scale to develop autonomous vehicle (AV) technology safely by validating models with millions of supported virtual tests. Accelerate the installation and configuration of AWS instances, and speed up experimentation and evaluation with up-to-date frameworks and libraries, including Hugging Face Transformers. Use advanced analytics, ML, and deep learning capabilities to identify trends and make predictions from raw, disparate health data.
Learn more