CASE STUDIES

Leading Cloud Computing Company Relies on Eclypsium to Protect AI Data Center Infrastructure

This organization provides a new breed of cloud services, purpose-built for the needs of AI and the next generation of computing. With industry-leading innovation, close ties to industry titans such as NVIDIA, Microsoft and OpenAI, and meteoric growth across the US and Europe, they have become one of hottest cloud platform providers in the industry. 

But this rise naturally comes with challenges. The company’s services rely on a rapidly growing fleet of highly specialized server hardware and AI GPUs, and it is up to the security team to ensure the integrity and security posture of the hardware and components at the heart of the platform. 

By partnering with Eclypsium, the organization was able to harness a turn-key solution for the assessment and ongoing monitoring of their servers and internal components such as UEFI firmware, NVIDIA GPUs, Intel CPUs, and more. This enabled their team to proactively verify technology supply chains while ensuring the highest levels of security for their customers.

  • Verify the integrity of firmware in AI server infrastructure and components and proactively alert on any changes.
  • Firmware-specific vulnerability management, updating, and configuration management.
     
  • Support fast, efficient rollout while minimizing costs and impact to staff.
  • Deep vendor-agnostic visibility into AI servers and components including  NVIDIA GPUs
  • Automated assessments to identify vulnerabilities, threats, or changes to device firmware and proactively verify firmware and supply chain integrity.
  • Simple, easy-to-deploy solution at a significantly lower cost than developing custom solutions for each vendor.

Enabling the Age of AI

To efficiently harness the power of AI, organizations need to focus on their business without getting bogged down in the minutia of building out AI infrastructure. As a new class of cloud platform, this Eclypsium customer provides enterprises and AI labs with easy, cost-effective access to next-generation software and infrastructure designed specifically for the most compute-hungry projects. 

Shared Responsibility for Security

Like other platforms-as-a-service (PaaS), this technology offering has a shared responsibility model when it comes to security. Customers are responsible for securing their own code and projects, while the cloud hosting provider is responsible for the security of the infrastructure from the VMs and containers all the way down to the physical servers and components that run the workloads. 

Supply Chain Risks and Challenges

As a result, this organization knew early on that firmware and supply chain security would play a critical role in their security program. The team needed to audit and monitor a wide range of highly specialized assets. This included AI servers and all their critical code and components such as UEFI firmware and bootloaders, the NVIDIA GPUs that power AI workloads, Intel CPUs, external PCIe devices, and other system components. 

Teams needed to know that every piece of code was valid, unaltered, and free of threats or implants. They needed to know if there were vulnerabilities and to apply updates if needed. And on an ongoing basis, they needed to know if there were any changes, whether malicious or otherwise that could put a device or component at risk. And they needed to do all of this without slowing down their massive growth and expansion or overburdening technical or security staff. They were able to leverage Eclypsium to validate that all systems are running the level of code they are expecting to ensure devices are running the “gold” firmware binaries. 

The Eclypsium Solution

Using the Eclypsium Platform, they were able to quickly roll out a comprehensive strategy for firmware and supply chain security. Eclypsium was able to provide value in the following key areas.

  • Firmware integrity checks – Eclypsium scans were used to verify that all firmware was valid, unaltered, and free from threats such as implants and backdoors. Systems were also checked for threats such as malicious bootloaders.
  • Firmware baselines and change monitoring – Next, Eclypsium was able to establish an auditable baseline of all critical firmware and components. Any changes to these established baselines would then trigger alerts to the firm’s SIEM for further investigation and response.
  • Vulnerability management and flaw remediation – Eclypsium was able to provide simple automated vulnerability assessments for low-level components that are typically missed by traditional scans. This included detecting misconfiguration or missing protections between various system components. When problems were identified, Eclypsium helped the team to apply corrective changes or updates.
  • Additional context and enrichment for investigations – Using the Eclypsium API, the team was able to pull in low-level device details for servers and components to assist with other investigations. 
  • Simple evaluation and deployment – Within just a few hours, the team was able to install the Eclypsium platform and begin seeing results on their server infrastructure.
  • Broad support for the latest AI technology – Eclypsium provided consistent coverage across a wide range of vendors and asset types.
  • Automated assessments and guidance – Eclypsium assessments were able to automatically identify and explain firmware and supply chain risks without the need to hire firmware specialists. Staff could simply scan their devices and quickly detect and remediate threats.

To learn more, visit our AI Data Center Security solution page or watch a quick demo video of our AI Data Center Security solution.