Alternatives to NVIDIA HPC SDK
Compare NVIDIA HPC SDK alternatives for your business or organization using the curated list below. SourceForge ranks the best alternatives to NVIDIA HPC SDK in 2025. Compare features, ratings, user reviews, pricing, and more from NVIDIA HPC SDK competitors and alternatives in order to make an informed decision for your business.
-
1
Rocky Linux
Ctrl IQ, Inc.
CIQ empowers people to do amazing things by providing innovative and stable software infrastructure solutions for all computing needs. From the base operating system, through containers, orchestration, provisioning, computing, and cloud applications, CIQ works with every part of the technology stack to drive solutions for customers and communities with stable, scalable, secure production environments. CIQ is the founding support and services partner of Rocky Linux, and the creator of the next generation federated computing stack. - Rocky Linux, open, Secure Enterprise Linux - Apptainer, application Containers for High Performance Computing - Warewulf, cluster Management and Operating System Provisioning - HPC2.0, the Next Generation of High Performance Computing, a Cloud Native Federated Computing Platform - Traditional HPC, turnkey computing stack for traditional HPC -
2
Arm Allinea Studio is a suite of tools for developing server and HPC applications on Arm-based platforms. It contains Arm-specific compilers and libraries, and debug and optimization tools. Arm Performance Libraries provide optimized standard core math libraries for high-performance computing applications on Arm processors. The library routines, which are available through both Fortran and C interfaces. Arm Performance Libraries are built with OpenMP across many BLAS, LAPACK, FFT, and sparse routines in order to maximize your performance in multi-processor environments.
-
3
Arm Forge
Arm
Build reliable and optimized code for the right results on multiple Server and HPC architectures, from the latest compilers and C++ standards to Intel, 64-bit Arm, AMD, OpenPOWER, and Nvidia GPU hardware. Arm Forge combines Arm DDT, the leading debugger for time-saving high-performance application debugging, Arm MAP, the trusted performance profiler for invaluable optimization advice across native and Python HPC codes, and Arm Performance Reports for advanced reporting capabilities. Arm DDT and Arm MAP are also available as standalone products. Efficient application development for Linux Server and HPC with Full technical support from Arm experts. Arm DDT is the debugger of choice for developing of C++, C, or Fortran parallel, and threaded applications on CPUs, and GPUs. Its powerful intuitive graphical interface helps you easily detect memory bugs and divergent behavior at all scales, making Arm DDT the number one debugger in research, industry, and academia. -
4
NVIDIA GPU-Optimized AMI
Amazon
The NVIDIA GPU-Optimized AMI is a virtual machine image for accelerating your GPU accelerated Machine Learning, Deep Learning, Data Science and HPC workloads. Using this AMI, you can spin up a GPU-accelerated EC2 VM instance in minutes with a pre-installed Ubuntu OS, GPU driver, Docker and NVIDIA container toolkit. This AMI provides easy access to NVIDIA's NGC Catalog, a hub for GPU-optimized software, for pulling & running performance-tuned, tested, and NVIDIA certified docker containers. The NGC catalog provides free access to containerized AI, Data Science, and HPC applications, pre-trained models, AI SDKs and other resources to enable data scientists, developers, and researchers to focus on building and deploying solutions. This GPU-optimized AMI is free with an option to purchase enterprise support offered through NVIDIA AI Enterprise. For how to get support for this AMI, scroll down to 'Support Information'Starting Price: $3.06 per hour -
5
Bright Cluster Manager
NVIDIA
NVIDIA Bright Cluster Manager offers fast deployment and end-to-end management for heterogeneous high-performance computing (HPC) and AI server clusters at the edge, in the data center, and in multi/hybrid-cloud environments. It automates provisioning and administration for clusters ranging in size from a couple of nodes to hundreds of thousands, supports CPU-based and NVIDIA GPU-accelerated systems, and enables orchestration with Kubernetes. Heterogeneous high-performance Linux clusters can be quickly built and managed with NVIDIA Bright Cluster Manager, supporting HPC, machine learning, and analytics applications that span from core to edge to cloud. NVIDIA Bright Cluster Manager is ideal for heterogeneous environments, supporting Arm® and x86-based CPU nodes, and is fully optimized for accelerated computing with NVIDIA GPUs and NVIDIA DGX™ systems. -
6
NVIDIA NGC
NVIDIA
NVIDIA GPU Cloud (NGC) is a GPU-accelerated cloud platform optimized for deep learning and scientific computing. NGC manages a catalog of fully integrated and optimized deep learning framework containers that take full advantage of NVIDIA GPUs in both single GPU and multi-GPU configurations. NVIDIA train, adapt, and optimize (TAO) is an AI-model-adaptation platform that simplifies and accelerates the creation of enterprise AI applications and services. By fine-tuning pre-trained models with custom data through a UI-based, guided workflow, enterprises can produce highly accurate models in hours rather than months, eliminating the need for large training runs and deep AI expertise. Looking to get started with containers and models on NGC? This is the place to start. Private Registries from NGC allow you to secure, manage, and deploy your own assets to accelerate your journey to AI. -
7
Intel oneAPI HPC Toolkit
Intel
High-performance computing (HPC) is at the core of AI, machine learning, and deep learning applications. The Intel® oneAPI HPC Toolkit (HPC Kit) delivers what developers need to build, analyze, optimize, and scale HPC applications with the latest techniques in vectorization, multithreading, multi-node parallelization, and memory optimization. This toolkit is an add-on to the Intel® oneAPI Base Toolkit, which is required for full functionality. It also includes access to the Intel® Distribution for Python*, the Intel® oneAPI DPC++/C++ C¿compiler, powerful data-centric libraries, and advanced analysis tools. Get what you need to build, test, and optimize your oneAPI projects for free. With an Intel® Developer Cloud account, you get 120 days of access to the latest Intel® hardware, CPUs, GPUs, FPGAs, and Intel oneAPI tools and frameworks. No software downloads. No configuration steps, and no installations. -
8
CUDA
NVIDIA
CUDA® is a parallel computing platform and programming model developed by NVIDIA for general computing on graphical processing units (GPUs). With CUDA, developers are able to dramatically speed up computing applications by harnessing the power of GPUs. In GPU-accelerated applications, the sequential part of the workload runs on the CPU – which is optimized for single-threaded performance – while the compute intensive portion of the application runs on thousands of GPU cores in parallel. When using CUDA, developers program in popular languages such as C, C++, Fortran, Python and MATLAB and express parallelism through extensions in the form of a few basic keywords. The CUDA Toolkit from NVIDIA provides everything you need to develop GPU-accelerated applications. The CUDA Toolkit includes GPU-accelerated libraries, a compiler, development tools and the CUDA runtime.Starting Price: Free -
9
TrinityX
Cluster Vision
TrinityX is an open source cluster management system developed by ClusterVision, designed to provide 24/7 oversight for High-Performance Computing (HPC) and Artificial Intelligence (AI) environments. It offers a dependable, SLA-compliant support system, allowing users to focus entirely on their research while managing complex technologies such as Linux, SLURM, CUDA, InfiniBand, Lustre, and Open OnDemand. TrinityX streamlines cluster deployment through an intuitive interface, guiding users step-by-step to configure clusters for diverse uses like container orchestration, traditional HPC, and InfiniBand/RDMA architectures. Leveraging the BitTorrent protocol, enables rapid deployment of AI/HPC nodes, accommodating setups in minutes. The platform provides a comprehensive dashboard offering real-time insights into cluster metrics, resource utilization, and workload distribution, facilitating the identification of bottlenecks and optimization of resource allocation.Starting Price: Free -
10
TotalView
Perforce
TotalView debugging software provides the specialized tools you need to quickly debug, analyze, and scale high-performance computing (HPC) applications. This includes highly dynamic, parallel, and multicore applications that run on diverse hardware — from desktops to supercomputers. Improve HPC development efficiency, code quality, and time-to-market with TotalView’s powerful tools for faster fault isolation, improved memory optimization, and dynamic visualization. Simultaneously debug thousands of threads and processes. Purpose-built for multicore and parallel computing, TotalView delivers a set of tools providing unprecedented control over processes and thread execution, along with deep visibility into program states and data. -
11
Arm MAP
Arm
No need to change your code or the way you build it. Profiling for applications running on more than one server and multiple processes. Clear views of bottlenecks in I/O, in computing, in a thread, or in multi-process activity. Deep insight into actual processor instruction types that affect your performance. View memory usage over time to discover high watermarks and changes across the complete memory footprint. Arm MAP is a unique scalable low-overhead profiler, available standalone or as part of the Arm Forge debug and profile suite. It helps server and HPC code developers to accelerate their software by revealing the causes of slow performance. It is used from multicore Linux workstations through to supercomputers. You can profile realistic test cases that you care most about with typically under 5% runtime overhead. The interactive user interface is clear and intuitive, designed for developers and computational scientists. -
12
Amazon EC2 G4 Instances
Amazon
Amazon EC2 G4 instances are optimized for machine learning inference and graphics-intensive applications. It offers a choice between NVIDIA T4 GPUs (G4dn) and AMD Radeon Pro V520 GPUs (G4ad). G4dn instances combine NVIDIA T4 GPUs with custom Intel Cascade Lake CPUs, providing a balance of compute, memory, and networking resources. These instances are ideal for deploying machine learning models, video transcoding, game streaming, and graphics rendering. G4ad instances, featuring AMD Radeon Pro V520 GPUs and 2nd-generation AMD EPYC processors, deliver cost-effective solutions for graphics workloads. Both G4dn and G4ad instances support Amazon Elastic Inference, allowing users to attach low-cost GPU-powered inference acceleration to Amazon EC2 and reduce deep learning inference costs. They are available in various sizes to accommodate different performance needs and are integrated with AWS services such as Amazon SageMaker, Amazon ECS, and Amazon EKS. -
13
Intel Tiber AI Cloud
Intel
Intel® Tiber™ AI Cloud is a powerful platform designed to scale AI workloads with advanced computing resources. It offers specialized AI processors, such as the Intel Gaudi AI Processor and Max Series GPUs, to accelerate model training, inference, and deployment. Optimized for enterprise-level AI use cases, this cloud solution enables developers to build and fine-tune models with support for popular libraries like PyTorch. With flexible deployment options, secure private cloud solutions, and expert support, Intel Tiber™ ensures seamless integration, fast deployment, and enhanced model performance.Starting Price: Free -
14
Amazon S3 Express One Zone
Amazon
Amazon S3 Express One Zone is a high-performance, single-Availability Zone storage class purpose-built to deliver consistent single-digit millisecond data access for your most frequently accessed data and latency-sensitive applications. It offers data access speeds up to 10 times faster and requests costs up to 50% lower than S3 Standard. With S3 Express One Zone, you can select a specific AWS Availability Zone within an AWS Region to store your data, allowing you to co-locate your storage and compute resources in the same Availability Zone to further optimize performance, which helps lower compute costs and run workloads faster. Data is stored in a different bucket type, an S3 directory bucket, which supports hundreds of thousands of requests per second. Additionally, you can use S3 Express One Zone with services such as Amazon SageMaker Model Training, Amazon Athena, Amazon EMR, and AWS Glue Data Catalog to accelerate your machine learning and analytics workloads. -
15
NVIDIA DGX Cloud
NVIDIA
NVIDIA DGX Cloud offers a fully managed, end-to-end AI platform that leverages the power of NVIDIA’s advanced hardware and cloud computing services. This platform allows businesses and organizations to scale AI workloads seamlessly, providing tools for machine learning, deep learning, and high-performance computing (HPC). DGX Cloud integrates seamlessly with leading cloud providers, delivering the performance and flexibility required to handle the most demanding AI applications. This service is ideal for businesses looking to enhance their AI capabilities without the need to manage physical infrastructure. -
16
Amazon EC2 P5 Instances
Amazon
Amazon Elastic Compute Cloud (Amazon EC2) P5 instances, powered by NVIDIA H100 Tensor Core GPUs, and P5e and P5en instances powered by NVIDIA H200 Tensor Core GPUs deliver the highest performance in Amazon EC2 for deep learning and high-performance computing applications. They help you accelerate your time to solution by up to 4x compared to previous-generation GPU-based EC2 instances, and reduce the cost to train ML models by up to 40%. These instances help you iterate on your solutions at a faster pace and get to market more quickly. You can use P5, P5e, and P5en instances for training and deploying increasingly complex large language models and diffusion models powering the most demanding generative artificial intelligence applications. These applications include question-answering, code generation, video and image generation, and speech recognition. You can also use these instances to deploy demanding HPC applications at scale for pharmaceutical discovery. -
17
AWS Elastic Fabric Adapter (EFA)
United States
Elastic Fabric Adapter (EFA) is a network interface for Amazon EC2 instances that enables customers to run applications requiring high levels of inter-node communications at scale on AWS. Its custom-built operating system (OS) bypass hardware interface enhances the performance of inter-instance communications, which is critical to scaling these applications. With EFA, High-Performance Computing (HPC) applications using the Message Passing Interface (MPI) and Machine Learning (ML) applications using NVIDIA Collective Communications Library (NCCL) can scale to thousands of CPUs or GPUs. As a result, you get the application performance of on-premises HPC clusters with the on-demand elasticity and flexibility of the AWS cloud. EFA is available as an optional EC2 networking feature that you can enable on any supported EC2 instance at no additional cost. Plus, it works with the most commonly used interfaces, APIs, and libraries for inter-node communications. -
18
Amazon EC2 P4 Instances
Amazon
Amazon EC2 P4d instances deliver high performance for machine learning training and high-performance computing applications in the cloud. Powered by NVIDIA A100 Tensor Core GPUs, they offer industry-leading throughput and low-latency networking, supporting 400 Gbps instance networking. P4d instances provide up to 60% lower cost to train ML models, with an average of 2.5x better performance for deep learning models compared to previous-generation P3 and P3dn instances. Deployed in hyperscale clusters called Amazon EC2 UltraClusters, P4d instances combine high-performance computing, networking, and storage, enabling users to scale from a few to thousands of NVIDIA A100 GPUs based on project needs. Researchers, data scientists, and developers can utilize P4d instances to train ML models for use cases such as natural language processing, object detection and classification, and recommendation engines, as well as to run HPC applications like pharmaceutical discovery and more.Starting Price: $11.57 per hour -
19
Intel offers a comprehensive suite of development tools tailored for designing with Altera FPGAs, CPLDs, and SoC FPGAs, catering to hardware engineers, software developers, and system architects. The Quartus Prime Design Software serves as a multiplatform environment encompassing all necessary features for FPGA, SoC FPGA, and CPLD design, including synthesis, optimization, verification, and simulation. For high-level design, Intel provides tools such as the Altera FPGA Add-on for oneAPI Base Toolkit, DSP Builder, High-Level Synthesis (HLS) Compiler, and the P4 Suite for FPGA, facilitating efficient development in areas like digital signal processing and high-level synthesis. Embedded developers can utilize the Nios V soft embedded processors and a range of embedded design tools, including the Ashling RiscFree IDE and Arm Development Studio (DS) for Altera SoC FPGAs, to streamline software development for embedded systems.
-
20
Fuzzball
CIQ
Fuzzball accelerates innovation for researchers and scientists by eliminating the burdens of infrastructure provisioning and management. Fuzzball streamlines and optimizes high-performance computing (HPC) workload design and execution. A user-friendly GUI for designing, editing, and executing HPC jobs. Comprehensive control and automation of all HPC tasks via CLI. Automated data ingress and egress with full compliance logs. Native integration with GPUs and both on-prem and cloud storage on-prem and cloud storage. Human-readable, portable workflow files that execute anywhere. CIQ’s Fuzzball modernizes traditional HPC with an API-first, container-optimized architecture. Operating on Kubernetes, it provides all the security, performance, stability, and convenience found in modern software and infrastructure. Fuzzball not only abstracts the infrastructure layer but also automates the orchestration of complex workflows, driving greater efficiency and collaboration. -
21
AWS HPC
Amazon
AWS High Performance Computing (HPC) services empower users to execute large-scale simulations and deep learning workloads in the cloud, providing virtually unlimited compute capacity, high-performance file systems, and high-throughput networking. This suite of services accelerates innovation by offering a broad range of cloud-based tools, including machine learning and analytics, enabling rapid design and testing of new products. Operational efficiency is maximized through on-demand access to compute resources, allowing users to focus on complex problem-solving without the constraints of traditional infrastructure. AWS HPC solutions include Elastic Fabric Adapter (EFA) for low-latency, high-bandwidth networking, AWS Batch for scaling computing jobs, AWS ParallelCluster for simplified cluster deployment, and Amazon FSx for high-performance file systems. These services collectively provide a flexible and scalable environment tailored to diverse HPC workloads. -
22
AWS ParallelCluster
Amazon
AWS ParallelCluster is an open-source cluster management tool that simplifies the deployment and management of High-Performance Computing (HPC) clusters on AWS. It automates the setup of required resources, including compute nodes, a shared filesystem, and a job scheduler, supporting multiple instance types and job submission queues. Users can interact with ParallelCluster through a graphical user interface, command-line interface, or API, enabling flexible cluster configuration and management. The tool integrates with job schedulers like AWS Batch and Slurm, facilitating seamless migration of existing HPC workloads to the cloud with minimal modifications. AWS ParallelCluster is available at no additional charge; users only pay for the AWS resources consumed by their applications. With AWS ParallelCluster, you can use a simple text file to model, provision, and dynamically scale the resources needed for your applications in an automated and secure manner. -
23
QumulusAI
QumulusAI
QumulusAI delivers supercomputing without constraint, combining scalable HPC with grid-independent data centers to break bottlenecks and power the future of AI. QumulusAI is universalizing access to AI supercomputing, removing the constraints of legacy HPC and delivering the scalable, high-performance computing AI demands today. And tomorrow too. No virtualization overhead, no noisy neighbors, just dedicated, direct access to AI servers optimized with NVIDIA’s latest GPUs (H200) and Intel/AMD CPUs. QumulusAI offers HPC infrastructure uniquely configured around your specific workloads, instead of legacy providers’ one-size-fits-all approach. We collaborate with you through design, deployment, to ongoing optimization, adapting as your AI projects evolve, so you get exactly what you need at each step. We own the entire stack. That means better performance, greater control, and more predictable costs than with other providers who coordinate with third-party vendors. -
24
Google Cloud GPUs
Google
Speed up compute jobs like machine learning and HPC. A wide selection of GPUs to match a range of performance and price points. Flexible pricing and machine customizations to optimize your workload. High-performance GPUs on Google Cloud for machine learning, scientific computing, and 3D visualization. NVIDIA K80, P100, P4, T4, V100, and A100 GPUs provide a range of compute options to cover your workload for each cost and performance need. Optimally balance the processor, memory, high-performance disk, and up to 8 GPUs per instance for your individual workload. All with the per-second billing, so you only pay only for what you need while you are using it. Run GPU workloads on Google Cloud Platform where you have access to industry-leading storage, networking, and data analytics technologies. Compute Engine provides GPUs that you can add to your virtual machine instances. Learn what you can do with GPUs and what types of GPU hardware are available.Starting Price: $0.160 per GPU -
25
Amazon EC2 UltraClusters
Amazon
Amazon EC2 UltraClusters enable you to scale to thousands of GPUs or purpose-built machine learning accelerators, such as AWS Trainium, providing on-demand access to supercomputing-class performance. They democratize supercomputing for ML, generative AI, and high-performance computing developers through a simple pay-as-you-go model without setup or maintenance costs. UltraClusters consist of thousands of accelerated EC2 instances co-located in a given AWS Availability Zone, interconnected using Elastic Fabric Adapter (EFA) networking in a petabit-scale nonblocking network. This architecture offers high-performance networking and access to Amazon FSx for Lustre, a fully managed shared storage built on a high-performance parallel file system, enabling rapid processing of massive datasets with sub-millisecond latencies. EC2 UltraClusters provide scale-out capabilities for distributed ML training and tightly coupled HPC workloads, reducing training times. -
26
AWS Parallel Computing Service (AWS PCS) is a managed service that simplifies running and scaling high-performance computing workloads and building scientific and engineering models on AWS using Slurm. It enables the creation of complete, elastic environments that integrate computing, storage, networking, and visualization tools, allowing users to focus on research and innovation without the burden of infrastructure management. AWS PCS offers managed updates and built-in observability features, enhancing cluster operations and maintenance. Users can build and deploy scalable, reliable, and secure HPC clusters through the AWS Management Console, AWS Command Line Interface (AWS CLI), or AWS SDK. The service supports various use cases, including tightly coupled workloads like computer-aided engineering, high-throughput computing such as genomics analysis, accelerated computing with GPUs, and custom silicon like AWS Trainium and AWS Inferentia.Starting Price: $0.5977 per hour
-
27
HPE Performance Cluster Manager
Hewlett Packard Enterprise
HPE Performance Cluster Manager (HPCM) delivers an integrated system management solution for Linux®-based high performance computing (HPC) clusters. HPE Performance Cluster Manager provides complete provisioning, management, and monitoring for clusters scaling up to Exascale sized supercomputers. The software enables fast system setup from bare-metal, comprehensive hardware monitoring and management, image management, software updates, power management, and cluster health management. Additionally, it makes scaling HPC clusters easier and efficient while providing integration with a plethora of 3rd party tools for running and managing workloads. HPE Performance Cluster Manager reduces the time and resources spent administering HPC systems - lowering total cost of ownership, increasing productivity and providing a better return on hardware investments. -
28
Azure CycleCloud
Microsoft
Create, manage, operate, and optimize HPC and big compute clusters of any scale. Deploy full clusters and other resources, including scheduler, compute VMs, storage, networking, and cache. Customize and optimize clusters through advanced policy and governance features, including cost controls, Active Directory integration, monitoring, and reporting. Use your current job scheduler and applications without modification. Give admins full control over which users can run jobs, as well as where and at what cost. Take advantage of built-in autoscaling and battle-tested reference architectures for a wide range of HPC workloads and industries. CycleCloud supports any job scheduler or software stack—from proprietary in-house to open-source, third-party, and commercial applications. Your resource demands evolve over time, and your cluster should, too. With scheduler-aware autoscaling, you can fit your resources to your workload.Starting Price: $0.01 per hour -
29
Azure HPC
Microsoft
Azure high-performance computing (HPC). Power breakthrough innovations, solve complex problems, and optimize your compute-intensive workloads. Build and run your most demanding workloads in the cloud with a full stack solution purpose-built for HPC. Deliver supercomputing power, interoperability, and near-infinite scalability for compute-intensive workloads with Azure Virtual Machines. Empower decision-making and deliver next-generation AI with industry-leading Azure AI and analytics services. Help secure your data and applications and streamline compliance with multilayered, built-in security and confidential computing. -
30
Kombyne
Kombyne
Kombyne™ is an innovative new SaaS high-performance computing (HPC) workflow tool, initially developed for customers in the defense, automotive, and aerospace industries and academic research. It allows users to subscribe to a range of workflow solutions for HPC CFD jobs, from on-the-fly extract generation and rendering to simulation steering. Interactive monitoring and control are also available, all with minimal simulation disruption and no reliance on VTK. The need for large files is eliminated via extract workflows and real-time visualization. An in-transit workflow uses a separate process that quickly receives data from the solver code and performs visualization and analysis without interfering with the running solver. This process, called an endpoint, can directly output extracts, cutting planes or point samples for data science and can render images as well. The Endpoint can also act as a bridge to popular visualization codes. -
31
NVIDIA Modulus
NVIDIA
NVIDIA Modulus is a neural network framework that blends the power of physics in the form of governing partial differential equations (PDEs) with data to build high-fidelity, parameterized surrogate models with near-real-time latency. Whether you’re looking to get started with AI-driven physics problems or designing digital twin models for complex non-linear, multi-physics systems, NVIDIA Modulus can support your work. Offers building blocks for developing physics machine learning surrogate models that combine both physics and data. The framework is generalizable to different domains and use cases—from engineering simulations to life sciences and from forward simulations to inverse/data assimilation problems. Provides parameterized system representation that solves for multiple scenarios in near real time, letting you train once offline to infer in real time repeatedly. -
32
NVIDIA Base Command Manager
NVIDIA
NVIDIA Base Command Manager offers fast deployment and end-to-end management for heterogeneous AI and high-performance computing clusters at the edge, in the data center, and in multi- and hybrid-cloud environments. It automates the provisioning and administration of clusters ranging in size from a couple of nodes to hundreds of thousands, supports NVIDIA GPU-accelerated and other systems, and enables orchestration with Kubernetes. The platform integrates with Kubernetes for workload orchestration and offers tools for infrastructure monitoring, workload management, and resource allocation. Base Command Manager is optimized for accelerated computing environments, making it suitable for diverse HPC and AI workloads. It is available with NVIDIA DGX systems and as part of the NVIDIA AI Enterprise software suite. High-performance Linux clusters can be quickly built and managed with NVIDIA Base Command Manager, supporting HPC, machine learning, and analytics applications. -
33
NVIDIA Magnum IO
NVIDIA
NVIDIA Magnum IO is the architecture for parallel, intelligent data center I/O. It maximizes storage, network, and multi-node, multi-GPU communications for the world’s most important applications, using large language models, recommender systems, imaging, simulation, and scientific research. Magnum IO utilizes storage I/O, network I/O, in-network compute, and I/O management to simplify and speed up data movement, access, and management for multi-GPU, multi-node systems. It supports NVIDIA CUDA-X libraries and makes the best use of a range of NVIDIA GPU and networking hardware topologies to achieve optimal throughput and low latency. In multi-GPU, multi-node systems, slow CPU, single-thread performance is in the critical path of data access from local or remote storage devices. With storage I/O acceleration, the GPU bypasses the CPU and system memory, and accesses remote storage via 8x 200 Gb/s NICs, achieving up to 1.6 TB/s of raw storage bandwidth. -
34
Samadii Multiphysics
Metariver Technology Co.,Ltd
Metariver Technology Co., Ltd. is developing innovative and creative computer-aided engineering (CAE) analysis S/W based on the latest HPC technology and S/W technology including CUDA technology. We will change the paradigm of CAE technology by applying particle-based CAE technology and high-speed computation technology using GPUs to CAE analysis software. Here is an introduction to our products. 1. Samadii-DEM (the discrete element method): works with the discrete element method and solid particles. 2. Samadii-SCIV (Statistical Contact In Vacuum): working with high vacuum system gas-flow simulation. Using Monte Carlo simulation. 3. Samadii-EM (Electromagnetics): For full-field interpretation 4. Samadii-Plasma: Plasma simulation for Analysis of ion and electron behavior in an electromagnetic field. 5. Vampire (Virtual Additive Manufacturing System): Specializes in transient heat transfer analysis. additive manufacturing and 3D printing simulation software -
35
FieldView
Intelligent Light
Over the past two decades, software technologies have advanced greatly and HPC computing has scaled by orders of magnitude. Our human ability to comprehend simulation results has remained the same. Simply making plots and movies in the traditional way does not scale when dealing with multi-billion cell meshes or ten’s of thousands of timesteps. Automated solution assessment is accelerated when features and quantitative properties can be produced directly via eigen analysis or machine learning. Easy-to-use industry standard FieldView desktop coupled to the powerful VisIt Prime backend. -
36
ScaleCloud
ScaleMatrix
Data-intensive AI, IoT and HPC workloads requiring multiple parallel processes have always run best on expensive high-end processors or accelerators, such as Graphic Processing Units (GPU). Moreover, when running compute-intensive workloads on cloud-based solutions, businesses and research organizations have had to accept tradeoffs, many of which were problematic. For example, the age of processors and other hardware in cloud environments is often incompatible with the latest applications or high energy expenditure levels that cause concerns related to environmental values. In other cases, certain aspects of cloud solutions have simply been frustrating to deal with. This has limited flexibility for customized cloud environments to support business needs or trouble finding right-size billing models or support. -
37
The Nimbix Supercomputing Suite is a set of flexible and secure as-a-service high-performance computing (HPC) solutions. This as-a-service model for HPC, AI, and Quantum in the cloud provides customers with access to one of the broadest HPC and supercomputing portfolios, from hardware to bare metal-as-a-service to the democratization of advanced computing in the cloud across public and private data centers. Nimbix Supercomputing Suite allows you access to HyperHub Application Marketplace, our high-performance marketplace with over 1,000 applications and workflows. Leverage powerful dedicated BullSequana HPC servers as bare metal-as-a-service for the best of infrastructure and on-demand scalability, convenience, and agility. Federated supercomputing-as-a-service offers a unified service console to manage all compute zones and regions in a public or private HPC, AI, and supercomputing federation.
-
38
Qlustar
Qlustar
The ultimate full-stack solution for setting up, managing, and scaling clusters with ease, control, and performance. Qlustar empowers your HPC, AI, and storage environments with unmatched simplicity and robust capabilities. From bare-metal installation with the Qlustar installer to seamless cluster operations, Qlustar covers it all. Set up and manage your clusters with unmatched simplicity and efficiency. Designed to grow with your needs, handling even the most complex workloads effortlessly. Optimized for speed, reliability, and resource efficiency in demanding environments. Upgrade your OS or manage security patches without the need for reinstallations. Regular and reliable updates keep your clusters safe from vulnerabilities. Qlustar optimizes your computing power, delivering peak efficiency for high-performance computing environments. Our solution offers robust workload management, built-in high availability, and an intuitive interface for streamlined operations.Starting Price: Free -
39
Warewulf
Warewulf
Warewulf is a cluster management and provisioning system that has pioneered stateless node management for over two decades. It enables the provisioning of containers directly onto bare metal hardware at massive scales, ranging from tens to tens of thousands of compute systems while maintaining simplicity and flexibility. The platform is extensible, allowing users to modify default functionalities and node images to suit various clustering use cases. Warewulf supports stateless provisioning with SELinux, per-node asset key-based provisioning, and access controls, ensuring secure deployments. Its minimal system requirements and ease of optimization, customization, and integration make it accessible to diverse industries. Supported by OpenHPC and contributors worldwide, Warewulf stands as a successful HPC cluster platform utilized across various sectors. Minimal system requirements, easy to get started, and simple to optimize, customize, and integrate.Starting Price: Free -
40
Azure FXT Edge Filer
Microsoft
Create cloud-integrated hybrid storage that works with your existing network-attached storage (NAS) and Azure Blob Storage. This on-premises caching appliance optimizes access to data in your datacenter, in Azure, or across a wide-area network (WAN). A combination of software and hardware, Microsoft Azure FXT Edge Filer delivers high throughput and low latency for hybrid storage infrastructure supporting high-performance computing (HPC) workloads.Scale-out clustering provides non-disruptive NAS performance scaling. Join up to 24 FXT nodes per cluster to scale to millions of IOPS and hundreds of GB/s. When you need performance and scale in file-based workloads, Azure FXT Edge Filer keeps your data on the fastest path to processing resources. Managing data storage is easy with Azure FXT Edge Filer. Shift aging data to Azure Blob Storage to keep it easily accessible with minimal latency. Balance on-premises and cloud storage. -
41
Ansys HPC
Ansys
With the Ansys HPC software suite, you can use today’s multicore computers to perform more simulations in less time. These simulations can be bigger, more complex and more accurate than ever using high-performance computing (HPC). The various Ansys HPC licensing options let you scale to whatever computational level of simulation you require, from single-user or small user group options for entry-level parallel processing up to virtually unlimited parallel capacity. For large user groups, Ansys facilitates highly scalable, multiple parallel processing simulations for the most challenging projects when needed. Apart from parallel computing, Ansys also offers solutions for parametric computing, which enables you to more fully explore the design parameters (size, weight, shape, materials, mechanical properties, etc.) of your product early in the development process. -
42
Lustre
OpenSFS and EOFS
The Lustre file system is an open-source, parallel file system that supports many requirements of leadership class HPC simulation environments. Whether you’re a member of our diverse development community or considering the Lustre file system as a parallel file system solution, these pages offer a wealth of resources and support to meet your needs. The Lustre file system provides a POSIX-compliant file system interface, which can scale to thousands of clients, petabytes of storage, and hundreds of gigabytes per second of I/O bandwidth. The key components of the Lustre file system are the Metadata Servers (MDS), the Metadata Targets (MDT), Object Storage Servers (OSS), Object Server Targets (OST), and the Lustre clients. Lustre is purpose-built to provide a coherent, global POSIX-compliant namespace for very large-scale computer infrastructure, including the world's largest supercomputer platforms. It can support hundreds of petabytes of data storage.Starting Price: Free -
43
NVIDIA TensorRT
NVIDIA
NVIDIA TensorRT is an ecosystem of APIs for high-performance deep learning inference, encompassing an inference runtime and model optimizations that deliver low latency and high throughput for production applications. Built on the CUDA parallel programming model, TensorRT optimizes neural network models trained on all major frameworks, calibrating them for lower precision with high accuracy, and deploying them across hyperscale data centers, workstations, laptops, and edge devices. It employs techniques such as quantization, layer and tensor fusion, and kernel tuning on all types of NVIDIA GPUs, from edge devices to PCs to data centers. The ecosystem includes TensorRT-LLM, an open source library that accelerates and optimizes inference performance of recent large language models on the NVIDIA AI platform, enabling developers to experiment with new LLMs for high performance and quick customization through a simplified Python API.Starting Price: Free -
44
NVIDIA DRIVE
NVIDIA
Software is what turns a vehicle into an intelligent machine. The NVIDIA DRIVE™ Software stack is open, empowering developers to efficiently build and deploy a variety of state-of-the-art AV applications, including perception, localization and mapping, planning and control, driver monitoring, and natural language processing. The foundation of the DRIVE Software stack, DRIVE OS is the first safe operating system for accelerated computing. It includes NvMedia for sensor input processing, NVIDIA CUDA® libraries for efficient parallel computing implementations, NVIDIA TensorRT™ for real-time AI inference, and other developer tools and modules to access hardware engines. The NVIDIA DriveWorks® SDK provides middleware functions on top of DRIVE OS that are fundamental to autonomous vehicle development. These consist of the sensor abstraction layer (SAL) and sensor plugins, data recorder, vehicle I/O support, and a deep neural network (DNN) framework. -
45
NVIDIA Parabricks
NVIDIA
NVIDIA® Parabricks® is the only GPU-accelerated suite of genomic analysis applications that delivers fast and accurate analysis of genomes and exomes for sequencing centers, clinical teams, genomics researchers, and high-throughput sequencing instrument developers. NVIDIA Parabricks provides GPU-accelerated versions of tools used every day by computational biologists and bioinformaticians—enabling significantly faster runtimes, workflow scalability, and lower compute costs. From FastQ to Variant Call Format (VCF), NVIDIA Parabricks accelerates runtimes across a series of hardware configurations with NVIDIA A100 Tensor Core GPUs. Genomic researchers can experience acceleration across every step of their analysis workflows, from alignment to sorting to variant calling. When more GPUs are used, a near-linear scaling in compute time is observed compared to CPU-only systems, allowing up to 107X acceleration. -
46
NVIDIA Isaac
NVIDIA
NVIDIA Isaac is an AI robot development platform that comprises NVIDIA CUDA-accelerated libraries, application frameworks, and AI models to expedite the creation of AI robots, including autonomous mobile robots, robotic arms, and humanoids. The platform features NVIDIA Isaac ROS, a collection of CUDA-accelerated computing packages and AI models built on the open source ROS 2 framework, designed to streamline the development of advanced AI robotics applications. Isaac Manipulator, built on Isaac ROS, enables the development of AI-powered robotic arms that can seamlessly perceive, understand, and interact with their environments. Isaac Perceptor facilitates the rapid development of advanced AMRs capable of operating in unstructured environments like warehouses or factories. For humanoid robotics, NVIDIA Isaac GR00T serves as a research initiative and development platform for general-purpose robot foundation models and data pipelines. -
47
NVIDIA Isaac Sim
NVIDIA
NVIDIA Isaac Sim is an open source reference robotics simulation application built on NVIDIA Omniverse, enabling developers to design, simulate, test, and train AI-driven robots in physically realistic virtual environments. It is built atop Universal Scene Description (OpenUSD), offering full extensibility so developers can create custom simulators or seamlessly integrate Isaac Sim's capabilities into existing validation pipelines. The platform supports three essential workflows; large-scale synthetic data generation for training foundation models with photorealistic rendering and automatic ground truth labeling; software-in-the-loop testing, which connects actual robot software with simulated hardware to validate control and perception systems; and robot learning through NVIDIA’s Isaac Lab, which accelerates training of behaviors in simulation before real-world deployment. Isaac Sim delivers GPU-accelerated physics (via NVIDIA PhysX) and RTX-enabled sensor simulation.Starting Price: Free -
48
NVIDIA Morpheus
NVIDIA
NVIDIA Morpheus is a GPU-accelerated, end-to-end AI framework that enables developers to create optimized applications for filtering, processing, and classifying large volumes of streaming cybersecurity data. Morpheus incorporates AI to reduce the time and cost associated with identifying, capturing, and acting on threats, bringing a new level of security to the data center, cloud, and edge. Morpheus also extends human analysts’ capabilities with generative AI by automating real-time analysis and responses, producing synthetic data to train AI models that identify risks accurately and run what-if scenarios. Morpheus is available as open-source software on GitHub for developers interested in using the latest pre-release features and who want to build from source. Get unlimited usage on all clouds, access to NVIDIA AI experts, and long-term support for production deployments with a purchase of NVIDIA AI Enterprise. -
49
NVIDIA RAPIDS
NVIDIA
The RAPIDS suite of software libraries, built on CUDA-X AI, gives you the freedom to execute end-to-end data science and analytics pipelines entirely on GPUs. It relies on NVIDIA® CUDA® primitives for low-level compute optimization, but exposes that GPU parallelism and high-bandwidth memory speed through user-friendly Python interfaces. RAPIDS also focuses on common data preparation tasks for analytics and data science. This includes a familiar DataFrame API that integrates with a variety of machine learning algorithms for end-to-end pipeline accelerations without paying typical serialization costs. RAPIDS also includes support for multi-node, multi-GPU deployments, enabling vastly accelerated processing and training on much larger dataset sizes. Accelerate your Python data science toolchain with minimal code changes and no new tools to learn. Increase machine learning model accuracy by iterating on models faster and deploying them more frequently. -
50
PowerFLOW
Dassault Systèmes
By leveraging our unique, inherently transient Lattice Boltzmann-based physics PowerFLOW CFD solution performs simulations that accurately predict real world conditions. Using the PowerFLOW suite, engineers evaluate product performance early in the design process prior to any prototype being built — when the impact of change is most significant for design and budgets. PowerFLOW imports fully complex model geometry and accurately and efficiently performs aerodynamic, aeroacoustic and thermal management simulations. Automated domain discretization and turbulence modeling with wall treatment eliminates the need for manual volume meshing and boundary layer meshing. Confidently run PowerFLOW simulations using large number of compute cores on common High Performance Computing (HPC) platforms.