BMC&C: Lights Out Forever
Introduction
Earlier this year, Eclypsium Research discovered and reported 5 vulnerabilities in American Megatrends (AMI) MegaRAC Baseboard Management Controller (BMC) software. MegaRAC BMC is a critical supply chain component found in millions of devices worldwide and used by multiple top-tier manufacturers to deliver “lights-out” management for servers.
Today, Eclypsium Research is disclosing a pair of additional BMC&C vulnerabilities in the same AMI MegaRAC BMC software. These new vulnerabilities range in severity from High to Critical, including unauthenticated remote code execution and unauthorized device access with superuser permissions. They can be exploited by any local or remote attacker having access to the Redfish management interface.
It is worth noting that this new and prior research was the result of ongoing analysis of information leaked as part of a prior ransomware incident in the supply chain. This is significant because it means that threat actors have access to the same source code we used in our research, making it a straightforward exercise to find these and other vulnerabilities. Of note, too, BMC firmware images can also be decompiled to sufficiently reveal the same vulnerabilities discovered in this research, even without direct access to source code. The impact of exploiting these vulnerabilities includes remote control of compromised servers, remote deployment of malware, ransomware and firmware implanting or bricking motherboard components (BMC or potentially BIOS/UEFI), potential physical damage to servers (over-voltage / firmware bricking), and indefinite reboot loops that a victim organization cannot interrupt. Lights out, indeed.
In disruptive attacks, attackers can leverage the often homogeneous environments in data centers to potentially send malicious commands to every other BMC on the same management segment, forcing all devices to continually reboot in such a way that victim operators are unable to stop the behavior. In extreme scenarios, the net impact could be indefinite, unrecoverable downtime until and unless devices are fully re-provisioned. These post-exploitation impact scenarios will be further explored below.
A recent CISA Binding Operational Directive 23-02 highlights the urgent need to secure these interfaces from active exploitation in the wild. Accompanying this, CISA has also published extensive hardening guidelines for BMCs, aligned to their Cross-Sector Cybersecurity Performance Goals. Attacks against data centers have been causing tremendous impact of late, affecting many third parties: In many cases attackers specifically target the lights out management interfaces and credentials. A growing amount of research has been drawing light to risks of remote management interfaces and server supply chain in general, including an upcoming DEF CON talk by NVIDIA researchers. The BMC vulnerabilities discussed in this blog serve as quintessential examples of why these interfaces need to be hardened, protected, monitored, and patched for the latest vulnerabilities discovered.
Affected Parties Notification Status
Eclypsium Research has been communicating with AMI to confirm the full scope of the vulnerabilities’ footprint. Additionally, Eclypsium has reached out to multiple parties who are working to determine the scope of impacted products and services, including top-tier OEM vendors, affected IT supply chain parties, and large cloud infrastructure providers.
Lights Out Forever
These vulnerabilities range in severity from High to Critical, including unauthenticated remote code execution and unauthorized device access with superuser permissions. They can be exploited by remote attackers having access to Redfish remote management interfaces, or from a compromised host operating system. Redfish is the successor to traditional IPMI and provides an API standard for the management of a server’s infrastructure and other infrastructure supporting modern data centers. Redfish is supported by virtually all major server and infrastructure vendors, as well as the OpenBMC firmware project often used in modern hyperscale environments.
These vulnerabilities pose a major risk to the technology supply chain that underlies cloud computing. In short, vulnerabilities in a component supplier affect many hardware vendors, which in turn can be passed on to many cloud services. As such these vulnerabilities can pose a risk to servers and hardware that an organization owns directly as well as the hardware that supports the cloud services that they use. They can also impact upstream suppliers to organizations and should be discussed with key 3rd parties as part of general supply chain risk management due diligence.
BMCs are designed to provide administrators with near total and remote control over the servers they manage. AMI is a leading provider of BMCs and BMC firmware to a wide range of hardware vendors and cloud service providers. As a result, these vulnerabilities affect a very large number of devices, and could enable attackers to gain control of or cause damage not only to devices but to data centers and cloud service infrastructure. The same logic flaws may affect devices in fall-back data centers in different geographic regions part of the same service provider, and can challenge assumptions cloud providers (and their customers) often make in the context of risk management and continuity of operations.
The vulnerabilities discovered are addressed by the following CVEs:
- CVE-2023-34329 – Authentication Bypass via HTTP Header Spoofing
- CVE-2023-34330 – Code injection via Dynamic Redfish Extension interface
The impact of exploiting these vulnerabilities includes remote control of compromised servers, remote deployment of malware, ransomware and firmware implanting or bricking motherboard components (BMC or potentially BIOS/UEFI), potential physical damage (over-voltage / bricking), and indefinite reboot loops that a victim cannot stop.
These risks are magnified by MegaRAC’s position as the world’s leading provider of BMC remote management firmware, sitting at the top of the BMC supply chain. This firmware is a foundational component of modern computing found in hundreds of thousands of servers in data centers, server farms, and cloud infrastructure around the world. And since devices in these environments typically standardize on a hardware configuration, a vulnerable configuration could likely be shared across thousands of devices. Additionally, much of this research was enabled by the discovery of a substantial amount of AMI intellectual property that was leaked on the Internet. The availability of this information (including firmware source code) could naturally increase the likelihood of attackers developing similar exploits and implants that our research team has been able to develop.
Eclypsium Research has been following a Coordinated Vulnerability Disclosure process, including AMI and other affected parties. Additionally, AMI and Eclypsium have reached out to multiple parties who are working to determine the scope of impacted products and services.
Discovery Process
In August 2022, Eclypsium Research was made aware of an online leak of intellectual property (which included AMI source code) related to a ransomware incident involving one of AMI’s supply chain partners, GigaByte. The leak purportedly contained sensitive IP under NDA from AMD, Intel, and AMI.
After downloading and reviewing the data, it was confirmed to be legitimate, and since there was a chance others had accessed it, the decision was made to look for vulnerabilities in case malicious actors were doing the same, and alert AMI via a responsible disclosure process. The download was present for more than 1 year on the leak website. However there were two reasons it likely may not have been downloaded and analyzed during that time:
- The threat actors forgot to link the last piece of the multipart archive, which we noticed was missing due to all parts being exactly the same size, which is abnormal for such archives. We were able to append an incremented numeral to a guessed filename URI in order to retrieve the final archive and re-assemble all the parts.
- The download was extremely unstable, so we had to further script the effort against network and onion site instabilities.
We would like to highlight that our experience working with AMI PSIRT has been exceptional. They have been highly professional throughout the disclosure process, reproduced our report quickly and clearly communicated their remediation process. This engagement exemplified how a mature vulnerability response program should operate and we look forward to future collaboration with AMI to protect customers.
Supply Chain Impacts
It is important to note that this research is the result of our ongoing analysis of source code that was made public as part of a previous supply chain leak resulting from a ransomware event; one of many that have targeted and similarly impacted the IT supply chain. While this was a challenging effort as described in the previous section, there is no reason to believe that other threat actors, whether nation-state or criminal, would not be able to perform a similar analysis and develop exploit code like our research team was able to in a matter of hours.
We have seen no evidence that these or our previously disclosed BMC&C vulnerabilities are being exploited in the wild. However, because threat actors have access to the same source data the risk of these vulnerabilities being weaponized is significantly raised. This fact has driven the urgency of our analysis to ensure we can find and address problems before threat actors can exploit them.
Organizations should consider the risks stemming from attacks on the IT supply chain (e.g. Western Digital, MSI, Acer, and others) when conducting tabletop exercises and assessing impact scenarios that can result in catastrophic or material impact. A single breach can expose the secrets of many upstream and downstream partners. For example, AMI was not the original victim of the ransomware attack and subsequent leak, but instead had their source code leaked due to an attack on a supply chain partner (Gigabyte). Source code leaks in the IT supply chain tend to have a long tail in terms of impact. As in this case, ongoing analysis often reveals more vulnerabilities. This makes it critical for organizations to patch and closely monitor any assets and components affected by incidents targeting the IT supply chain.
Vulnerability Details – Exploit Phase
- CVE-2023-34329 – Authentication Bypass via HTTP Header Spoofing
- CVE-2023-34330 – Code injection via Dynamic Redfish Extension interface
The full attack scenario consists of two vulnerabilities that, when combined, lead to a CVSS 10 combined score. We will first describe them separately, followed by the chained attack scenario.
CVE-2023-34329 – Authentication Bypass via HTTP Header Spoofing
CVSS Score: 9.1 Critical (CVSS:3.1/AV:N/AC:L/PR:H/UI:N/S:C/C:H/I:H/A:H)
The Redfish host interface handler allows for two Authentication options – Basic Auth, which requires support from the BIOS, or No Auth which verifies the communication is coming from the internal Host Interface or USB0 network IP address. By spoofing certain HTTP headers, an attacker can trick BMC into believing that external communication is coming in from the USB0 internal interface. When this is combined on a system shipped with the No Auth option configured, the attacker can bypass authentication, and perform Redfish API actions. For example, an attacker can opt to create the account using the usual /redfish/v1/AccountService/Accounts API as would be used by a legitimate administrator.
CVE-2023-34330 – Code injection via Dynamic Redfish Extension
CVSS Score: 8.2 High (CVSS:3.1/AV:L/AC:L/PR:H/UI:N/S:C/C:H/I:H/A:H)
AMI Redfish implementation offers a functionality, where it is possible to dynamically POST (HTTP POST) a full piece of code that will be executed along with the Redfish service, as root user on the BMC chip. Normally this development functionality is not meant to be enabled on a production device, as users should not be allowed to execute arbitrary code on the BMC chip itself. However, it is enabled by default, and is allowed on the Host Interface.
When combined with the No Auth option as described in CVE-2023-34329, any attacker on the host machine where the BMC chip resides, can POST arbitrary code (effectively achieving code execution). Without the No Auth option enabled, the attacker would also need BMC credentials.
When combined with prior CVE-2023-34329 vulnerability, an attacker can POST arbitrary code remotely, as we will discuss in the next section:
CVE-2023-34329 + CVE-2023-34330
CVSS Score: 10
When both of these vulnerabilities are chained together, even a remote attacker with network access to BMC management interface and no BMC credentials, can achieve remote code execution by tricking BMC into believing that the http request is coming from the internal interface. As a result the attacker can remotely upload and execute arbitrary code, possibly from the Internet, if the interface is exposed to it.
There are also multiple sub-scenarios. For example, the device might not ship with the No Auth option, in which case the attacker would need credentials first (any target account with any access level, including the lowest ‘callback’ account). Such credentials are often stolen from any number of common tactics on the network and hosts, including brute-force, compromise of platform monitoring systems, credential re-use across systems, etc.
In addition, the original BMC&C vulnerabilities can be used in combination with the two described above. For example CVE-2022-40258 – Weak password hashes for Redfish & API – can be used to simplify cracking administrator passwords for the admin accounts on the BMC chip.
We also need to emphasize that such an implant can be extremely hard to detect, and is extremely easy to recreate for any attacker in the form of a one-line exploit, which is why we have chosen not to describe specific values, header-names, URLs, etc. in this blog.
Post-Exploit Attack Scenarios
These vulnerabilities can pose serious risks in any scenario in which an attacker has access to an affected server’s BMC. They can be exploited by an attacker that has gained initial access into a data center or administrative network, and in many cases even from the Internet directly, as many architectures are misconfigured to allow direct access. Finally, attackers can also leverage these vulnerabilities directly from a compromised operating system down to the BMC on the same device.
As data centers tend to standardize on specific hardware platforms, any BMC-level vulnerability may apply to large numbers of devices and could potentially affect an entire data center and the services that it delivers, up to the point of catastrophic impact and indefinite downtime.
Due to the nature and location of BMC vulnerabilities, detecting exploitation and post-exploit activity is complex, as standard EDR & AV products focus on the operating system, not the underlying firmware. Further, network security controls likely will not detect traffic going to or from BMCs as malicious.
As we explore the post-impact attack scenarios below, It is worth mentioning that these actions can be performed by an attacker that has exploited either our previously disclosed BMC&C vulnerabilities, these newly disclosed ones, or in some cases combinations of them both:
“Infinite shutdown” Loop
The attacker can leverage existing BMC functionality to continuously shut down the host in a loop, every X number of seconds, as well as block any and all administrator access to legitimate BMC management functionality. This forms the basis of a simple and efficient extortion scenario. It can also be used to outright disrupt environments. In either scenario the victim administrators are effectively unable to restore systems to a functional state of operation, regardless of disaster recovery procedures in place. Moreover:
- The attacker does not need to have C2 back connect. They only need to communicate from any internal attacker-controlled machine to the BMC (or from the Host itself to the BMC via the internal interface)
- In the scenario in which an attacker implants the BMC chip, it is very hard to diagnose as a BMC implant versus a hardware malfunction. This is compounded should this scenario play out across many devices in the environment.
- Even if an implant is successfully detected, it is difficult to remedy for administrators, requiring specialized tools and even soldering, depending on the vendor. Perhaps ironically, the BMC is normally the last line of resort for administrators to restore a downed system, and here, it is no longer available.
- Finally, because there is an internal host-to-BMC interface, the attack can be performed even if the BMC management interface is not connected to the network via ethernet. All that is required is the attacker compromising the host via any number of common methods.
- Should the attacker’s motive center on the ability to periodically disrupt devices across entire management networks, the attack has two options at their disposal: They can either push an implant to all other hosts, or from a single compromised BMC, push the continuous-reboot command to all vulnerable BMCs on the same segment.
When this happens to a small number of machines, the impact may be limited in scale, however should the same vulnerabilities be exploited across an entire BMC management segment and affect hundreds or thousands of devices at once, the impact can be catastrophic to operations, and result in indefinite downtime with no ability to recover.
Long-term Espionage
BMCs provide on board KVM functionality that allows administrators to remotely manage the device. Post exploitation, the attacker has root access to the BMC, and now effectively has a stealthy way to tap into that KVM functionality, and monitor all admin actions. Even worse, the attacker can also provide inputs to the KVM, which allows an attacker to act as if they are at the keyboard. This evades host EDR solutions, as it is impossible to distinguish between legitimate admin input and attacker input.
“Patch” the BIOS
BMC’s allow administrators to perform BIOS updates as well as configure the BIOS. Thus, an attacker can also perform BIOS updates and reconfigurations. Potentially, such an update can contain implanted EFI binaries, and as such the attacker can effectively disable secure boot to place a malicious bootloader, or move up into the OS stack, and bypass or disable host OS security controls and 3rd party EDR solutions.
Physical Destruction
As evidenced by 3rd-party research, such access is enough to deploy malware payloads that would quite literally fry the CPU:
“This latest power management tampering, or PMFault, can be carried out by a privileged software adversary who doesn’t have access to Board Management Controller (BMC) login credentials. It allows the same data extraction as its predecessor attacks, but through the BMC flash memory chip. In other words, you need to be able to update the BMC firmware to include malicious code to perform the attack, which means you’ll need root access pretty much.”
In our case, the scenario is even more profound, as the attacker can “update the BMC firmware” directly from the network, without any authentication. Quoting the article further:
“By then overvolting – sending 2.84 volts to the 1.52 spec’d CPU – the pair [of researchers] permanently bricked two separate Xeon CPUs used in the experiment. This was done by a malicious software update.”
This brings a whole new meaning to the term “Lights Out Management”, especially should this simple exploit be scaled to entire management segments.
Lateral Movement to other BMCs and Network Devices
As the BMC in this case is technically a small Linux installation, the attacker gets the full network stack, as well as a scripting language bundled in it to work with, allowing them to move throughout the management network to other BMCs, or perform any other style of network attack in the attacker’s arsenal.
Lateral Movement to AD
BMC firmware allows the integration of its authentication with Active Directory. If this option is in use by the victim organization, the attacker can gain an initial set of AD credentials, and further expand their operation.
Cloudborne Style Attacks in Virtual Hosting Environments
BMCs can present a significant attack vector to virtual cloud environments. As extensively covered in past Eclypsium research (Cloudborne), attackers can move from guest images to the BMC, and from there either persist on the BMC to access other 3rd party guest images, or move laterally on BMC management networks to attack other host devices. Given the post-exploit attack scenarios outlined above, attackers may be able to spin up an instance in a cloud environment, and from that instance, access and exploit vulnerabilities in the BMC of the host device in order to achieve disruption, physical destruction, or long term espionage objectives.
The correct question concerning what the potential post-exploit impact scenarios are is: what can’t be achievable by the attacker? The attacker can do near-anything from the lateral network movement to literal hardware destruction.
Mitigations
- Ensure that all remote server management interfaces (e.g. Redfish, IPMI) and BMC subsystems in their environments are on their dedicated management networks and are not exposed externally, and ensure internal BMC interface access is restricted to administrative users with ACLs or firewalls per Zero Trust Architecture principles.
- U.S. Government agencies must adhere to CISA’s recent Binding Operational Directive 23-02, requiring:
- The BMC interface is removed from the internet by making it only accessible from an internal enterprise network (CISA recommends an isolated management network);
- The BMC interface is protected by capabilities, as part of a Zero Trust Architecture, that enforce access control to the interface through a policy enforcement point separate from the interface itself (preferred action).
- Review vendor default configurations of device firmware to identify and disable built-in administrative accounts and/or use remote authentication where available Change default BMC credentials as soon as possible and establish unique user accounts for administrators.
- Perform regular software and firmware updates on critical servers.
- Consult vendor guides and recommendations for hardening BMCs against unauthorized access and supply-chain threats. Note that UEFI hardening configuration guidance may apply to many BMC settings, as there is direct access to the UEFI via the BMC.
- Ensure that vulnerability assessments and red team fixed scope engagements include remote server management subsystems (like MegaRAC, iDRAC, iLO, etc.) and critical firmware.
- Ensure that all critical firmware in servers is regularly monitored for indicators of compromise or unauthorized modifications.This includes monitoring for outbound traffic coming from any BMCs.
- Perform supply chain checks of new equipment. Assess that all new servers have major vulnerabilities patched and the latest firmware updates installed.
- Organizations should revisit threat modeling, table-topping, DFIR playbooks and DRP planning to incorporate IT supply chain threats and related impact scenarios, to include site-wide catastrophic scenarios especially in homogeneous environments.
- Monitor BMCs for changes in integrity. Some BMCs report integrity data to a root of trust (RoT) which can be a TPM, dedicated security chip or coprocessor, or a central processing unit (CPU) secure memory enclave. Monitor integrity features for unexpected changes and platform alerts.
- For Eclypsium customers, the platform will provide coverage of recently discovered vulnerabilities through a forthcoming functionality that will dynamically update scan results. This functionality will align to the CISA BMC guidance; in particular:
“Use firmware scanning tools periodically. […] Establish a schedule to collect and inspect BMC firmware for integrity and unexpected changes. Include firmware audits in comprehensive anti-malware scanning tasks.” - Never ignore a BMC, even if it has been disabled. Keep all BMCs updated and configured securely. Over time the device may be relocated or repurposed, and the BMC may be turned back on.
Conclusions
This latest pair of BMC vulnerabilities further reinforce the need for organizations to proactively identify and patch vulnerable BMC interfaces. Given one of these vulnerabilities doesn’t even require authentication, there is even greater urgency to address them. The post-exploit impact potential for data centers, hyperscale environments and critical servers can result in long term persistence or disruptive/destructive scenarios on multiple devices across a single environment, or multiple environments exposed to the same vulnerabilities and threat actors able to exploit them.
Securing the firmware supply chain is a complex problem, and vulnerabilities found at the top of the chain present substantial risk due to the way OEMs integrate code into their products. Firmware vulnerabilities are non-trivial to remediate due the fact their location in the computing stack is not optimized for patching at scale. Furthermore, standardization of hosting & cloud providers on server components means these vulnerabilities can easily impact hundreds of thousands, possibly millions of systems. As attackers shift their focus from user facing operating systems to the lower level embedded code which hardware and computing trust relies on, compromise becomes harder to detect and exponentially more complex to remediate. While compromise of a server OS can be resolved with a wipe & reinstallation, firmware compromise can remain beyond OS reinstallation or hard drive replacement measures. Devices can also be physically destroyed or rendered indefinitely inoperable. Security research into this area is imperative to stay a step ahead of the attacks and protect the foundation upon which modern computing relies on.