Securing Supply Chains for GenAI Hardware and Models

Today, at RSA Conference 2024, we’re announcing new capabilities to help secure the fundamental layers of the GenAI tech stack. First, we’re adding continuous monitoring support for NVIDIA hardware used in training, fine-tuning, and leveraging GenAI models, such as the NVIDIA H100 Tensor Core GPU. Second, we have added integrity verification for GenAI foundation models to our supply chain intelligence offering, allowing organizations to understand and compare the risk of various closed- and open-source GenAI foundation models. 

As the GenAI technology stack becomes more established, the IT industry must focus on addressing the new risks posed. Some solutions are emerging for the user, data, and application layers, but not for the underlying infrastructure:

  • Infrastructure layer: At the most fundamental levels, the optimized hardware used to train and run inference for GenAI models must be protected against exploits, implants, and tampering. This means being able to inventory the hardware, firmware, and software components that make up the compute and network infrastructure, quickly remediate vulnerabilities and insecure configurations, and detect compromise including the presence of implants and backdoors. We have already observed cybercriminals utilizing malware code in GPUs so any GenAI threat model should include this layer.
  • Foundation model layer: When choosing which foundation models to use, GenAI application teams must understand and weigh the latent risks for closed- and open-source models. They need visibility into the supply chain for these foundational model and the associated risks that come with training datasets, formats, architecture, and other characteristics. Highlighting the risk of using unverified models, recent research found more than 100 models on the HuggingFace platform that contained malware.
Eclypsium provides supply chain security for the fundamental layers of the GenAI tech stack. Credit to a16z for this diagram concept.

Protection for GenAI Hardware Infrastructure

We’re adding support for popular NVIDIA hardware used to train and run inference for GenAI models into the Eclypsium platform, which provides continuous monitoring and remediation for IT infrastructure assets. As with standard server infrastructure, the systems used to run GenAI workloads are composed of hardware and firmware components, each with complex supply chains. Threats at this fundamental level are virtually impossible to detect and remediate without the type of visibility that Eclypsium offers. 

As an example of the type of threat to GenAI hardware infrastructure, NVIDIA in January 2024 patched a vulnerability (CVE-2023-31029) in the NVIDIA DGX A100 baseboard management controller (BMC). The vulnerability was given a 9.3 CVSS and allowed an unauthenticated attacker to execute arbitrary code and tamper with data, according to NVIDIA’s security advisory. An adversary could exploit these vulnerabilities remotely to add a persistent backdoor into the BMC firmware of NVIDIA DGX servers. The backdoor could subsequently control all software executing on a server including LLM models, training datasets used by the models, and the results of using the models.

The Eclypsium platform provides the third-party assurance and validation that datacenter operators need to ensure that their GenAI hardware infrastructure can be trusted. Support for NVIDIA hardware is the beginning—we plan to expand our coverage to include other hardware components and accelerators to provide this protection to all types of GenAI hardware infrastructure.

Supply Chain Integrity for GenAI Foundation Models

Verification of the supply chain for LLMs, including vulnerabilities and model characteristics, features prominently among the checklist items for LLM application teams recommended by OWASP. In the same vein, recent AI risk mitigation guidelines from CISA include supply chain security recommendations: “Review AI vendor supply chains for security and safety risks. This review should include vendor-provided hardware, software, and infrastructure to develop and host an AI system and, where possible, should incorporate vendor risk assessments and documents, such as software bills of materials (SBOMs), AI system bills of materials (AIBOMs), data cards, and model cards.”

Currently, there’s no way to validate integrity and authenticity of LLM models, or to easily compare their relative risks. Using untrusted models represents supply chain risk to organizations. Eclypsium’s supply chain intelligence capabilities fill this gap, allowing users to easily verify the integrity of external models as well as evaluate the risks of models used within GenAI applications. This will enable teams that are building and using GenAI applications (both for internal and commercial purposes) to make risk-informed decisions when selecting which LLM model to use.

Extending Trust to GenAI Infrastructure

Eclypsium is on a mission to protect the digital supply chain so organizations can trust their technology. Our new capabilities provide critical infrastructure integrity and security monitoring for GenAI hardware—filling a critical gap that attackers are already targeting. In addition, we make it simple for organizations to verify the integrity of GenAI models so that they can easily evaluate risks. 

If you’re building out GenAI applications or infrastructure, we’d love to provide you with a live demo and discuss your use case.

Learn more: