How red teaming helps safeguard the infrastructure behind AI models

Artificial intelligence (AI) is now squarely on the frontlines of information security. However, as is often the case when the pace of technological innovation is very rapid, security often ends up being a secondary consideration. This is increasingly evident from the ad-hoc nature of many implementations, where organizations lack a clear strategy for responsible AI use.

Attack surfaces aren’t just expanding due to risks and vulnerabilities in AI models themselves but also in the underlying infrastructure that supports them. Many foundation models, as well as the data sets used to train them, are open-source and readily available to developers and adversaries alike.

Unique risks to AI models

According to Ruben Boonen, CNE Capability Development Lead at IBM: “One problem is that you have these models hosted on giant open-source data stores. You don’t know who created them or how they were modified, and there are a number of issues that can occur here. For example, let’s say you use PyTorch to load a model hosted on one of these data stores, but it has been changed in a way that’s undesirable. It can be very hard to tell because the model might behave normally in 99% of cases.”

Recently, researchers discovered thousands of malicious files hosted on Hugging Face, one of the largest repositories for open-source generative AI models and training data sets. These included around a hundred malicious models capable of injecting malicious code onto users’ machines. In one case, hackers set up a fake profile masquerading as genetic testing startup 23AndMe to deceive users into downloading a compromised model capable of stealing AWS passwords. It was downloaded thousands of times before finally being reported and removed.

In another recent case, red team researchers discovered vulnerabilities in ChatGPT’s API, in which a single HTTP request elicited two responses indicating an unusual code path that could theoretically be exploited if not addressed. This, in turn, could lead to data leakage, denial of service attacks and even escalation of privileges. The team also discovered vulnerabilities in plugins for ChatGPT, potentially resulting in account takeover.

While open-source licensing and cloud computing are key drivers of innovation in the AI space, they’re also a source of risk. On top of these AI-specific risk areas, general infrastructure security concerns also apply, such as vulnerabilities in cloud configurations or poor monitoring and logging processes.

AI models are the new frontier of intellectual property theft

Imagine pouring huge amounts of financial and human resources into building a proprietary AI model, only to have it stolen or reverse-engineered. Unfortunately, model theft is a growing problem, not least because AI models often contain sensitive information and can potentially reveal an organization’s secrets should they end up in the wrong hands.

One of the most common mechanisms for model theft is model extraction, whereby attackers access and exploit models through API vulnerabilities. This can potentially grant them access to black-box models — like ChatGPT — at which point they can strategically query the model to collect enough data to reverse engineer it.

In most cases, AI systems run on cloud architecture rather than local machines. After all, the cloud provides the scalable data storage and processing power required to run AI models easily and accessibly. However, that accessibility also increases the attack surface, allowing adversaries to exploit vulnerabilities like misconfigurations in access permissions.

“When companies provide these models, there are usually client-facing applications delivering services to end users, such as an AI chatbot. If there’s an API that tells it which model to use, attackers could attempt to exploit it to access an unreleased model,” says Boonen.

Red teams keep AI models secure

Protecting against model theft and reverse engineering requires a multifaceted approach that combines conventional security measures like secure containerization practices and access controls, as well as offensive security measures.

The latter is where red teaming comes in. Red teams can proactively address several aspects of AI model theft, such as:

API attacks: By systematically querying black-box models in the same way adversaries would, red teams can identify vulnerabilities like suboptimal rate limiting or insufficient response filtering.
Side-channel attacks: Red teams can also carry out side-channel analyses, in which they monitor metrics like CPU and memory usage in an attempt to glean information about the model size, architecture or parameters.
Container and orchestration attacks: By assessing containerized AI dependencies like frameworks, libraries, models and applications, red teams can identify orchestration vulnerabilities, such as misconfigured permissions and unauthorized container access.
Supply chain attacks: Red teams can probe entire AI supply chains spanning multiple dependencies hosted in different environments to ensure that only trusted components like plugins and third-party integrations are being used.

A thorough red teaming strategy can simulate the full scope of real-world attacks against AI infrastructure to reveal gaps in security and incident response plans that could lead to model theft.

Mitigating the problem of excessive agency in AI systems

Most AI systems have a degree of autonomy with regard to how they interface with different systems and respond to prompts. After all, that’s what makes them useful. However, if systems have too much autonomy, functionality or permissions — a concept OWASP calls “excessive agency” — they can end up triggering harmful or unpredictable outputs and processes or leaving gaps in security.

Boonen warns that components, such as optical character recognition (OCR) for PDF files and images which multimodal systems rely on to process inputs, “can introduce vulnerabilities if they’re not properly secured”.

Granting an AI system excessive agency also expands the attack surface unnecessarily, thus giving adversaries more potential entry points. Typically, AI systems designed for enterprise use are integrated into much broader environments spanning multiple infrastructures, plugins, data sources and APIs. Excessive agency is what happens when these integrations result in an unacceptable trade-off between security and functionality.

Let’s consider an example where an AI-powered personal assistant has direct access to an individual’s Microsoft Teams meeting recordings stored in OneDrive for Business, the purpose being to summarize content in those meetings in a readily accessible written format. However, let’s imagine that the plugin doesn’t only have the ability to read meeting recordings but also everything else stored in the user’s OneDrive account, in which many confidential information assets are also stored. Perhaps the plugin even has write capabilities, in which case a security flaw could potentially grant attackers an easy pathway for uploading malicious content.

Once again, red teaming can help identify flaws in AI integrations, especially in environments where many different plugins and APIs are in use. Their simulated attacks and comprehensive analyses will be able to identify vulnerabilities and inconsistencies in access permissions, as well as cases where access rights are unnecessarily lax. Even if they don’t identify any security vulnerabilities, they will still be able to provide insight into how to reduce the attack surface.

Freelance Content Marketing Writer