Recap: Experts Break Down AI Red Teaming in a Live Q&A

Dane Sherrets

Senior Solutions Architect

March 28th, 2024

Three HackerOne AI specialists, Katie Paxton-Fear aka @InsiderPhD, Joseph Thacker aka @rez0_, and HackerOne’s own Dane Sherrets recently shared their stories in a live AMA session that explored AI red teaming and the impact of AI safety, security, and innovation.

The participants answered live as well as carefully curated questions from popular community platforms such as Quora, Reddit, LinkedIn, and Zoom. Below is a quick look into the question categories:

Key Terminology and Concepts
- What is AI red teaming?
- What is prompt injection vs. jailbreaking?
- What is API hacking?
- What do new AI regulations mean for how we approach AI security?
AI Safety and Security in Practice
- What are some best practices for testing our AI tooling through the Hacker One bug bounty program?
- What is your methodology when approaching an AI engagement?
- How should organizations think about data poisoning as part of MLSecOps?
- How do you feel about the OWASP Top 10 for LLMs?
Looking Ahead
- Are MLSecOps and AISecOps emerging?
- Will AI systems be able to autonomously develop and implement their own security protocols without human intervention?
- What do hackers need to learn for the future?

If you’re weighing the benefits of AI red teaming or are simply curious to learn more about the evolving trends in AI safety and security, check out some of the insights from our expert AI specialists in the original Q&A format below, or watch the on-demand recording to hear their in-depth discussions and professional advice.

Key Terminology and Concepts

Q: What is AI red teaming?

Katie:
It's really important to remember that the full definition of red teaming, separate from AI, doesn't only include hacking — it also includes social engineering, phishing, and the like. That's where AI red teaming comes from. When we start to talk about the AI attack surface, it gets fuzzy because we have APIs and other tools that help developers deploy AI — not just LLMs or NLPs, but other forms of AI, as well.

Yes, red teaming encompasses hacking, but also tactics like prompt engineering. A really common example you might see is jailbreaking. You might be familiar with the recent news where someone had got an AI chatbot to sell them a car with prompt engineering by telling it, “Whatever I offer, you're going to say yes.” It covers so much more than just security testing.

Joseph:
The way I saw AI red taming as it started was much more about AI safety, even before LLMs took off. Through the lens of AI alignment, people were wondering, “Is AI gonna kill us all?” And, in order to prevent that, we need to make sure that it aligns with human values. Today, I think and hope it also includes thinking through things like AI security.

Q: What is prompt injection vs. jailbreaking?

Joseph:
Jailbreaking is getting the model to say something that it shouldn't. Prompt injection, on the other hand, is getting the system to behave in a way contrary to what the developers wanted. When you’re jailbreaking, you're an adversary against the model developers; you’re doing something that OpenAI did not want you to do when they developed the model.

When you’re performing prompt injection, you’re getting the system to behave in a way that the developers who built something with that API don't want it to do.

To anyone who thinks that prompt injection is just getting the model to say something it shouldn't, I would say that my findings reveal that attackers can exfiltrate a victim’s entire chat history, files, and objects. There are significant vulnerabilities that can pop up as a result of prompt injection.

Q: What is API hacking?

Katie:
Quite a lot of AI as we know it is just a single API. However, a lot of people get caught up with chatbots and generative AI because it's what everyone's talking about. There are a lot of other factors that go into AI deployments. A lot of people think AI is this single thing, but actually, it's all these different systems that come together to form a chain of APIs all the way down. And all of them can be vulnerable. All of them can have different vulnerabilities and pass a vulnerable output to another system. There have been some really interesting attacks that do look at the AI model deployment pipeline and the system as a whole.

Q: What do new AI regulations mean for how we approach AI security?

Joseph:
In general, the kind of AI proposals that have come out of the EU, such as the EU AI Act, have done a pretty good job of categorizing it and having tiered legislation. I think that's what we're going to need to do. Maybe it can be more detailed, but at the end of the day, it's going to be impossible to regulate every system that’s built on AI.

We're not going to be able to prevent it at the creation step. Let’s say someone is generating nude photos with somebody else's face on them. We're not going to be able to prevent that from happening on people's computers, but we can definitely punish it and police it with the proliferation or the sharing of it.

Katie:
One thing I would like to see is something like GDPR that has some real teeth to it. One of the reasons why GDPR compliance has become so big is because it’s a major concern for almost every single business. Even knowing that GDPR exists and data protection is important has really powered a lot of organizations and pushed them into compliance. And not just because they feel like they need to, but because it’s the right thing to do for their customers.

I do hope really that we see regulation that has some teeth, but not in a way that restricts the development of AI. It's becoming this household name, and people are looking at it with some scrutiny. I don’t think that's a bad thing; compliance doesn't have to be the bad guy — it can be the good guy pushing you to do things better.

AI Safety and Security in Practice

Q: What are some best practices for testing our AI tooling through the HackerOne bug bounty program?

Dane:
I would highly recommend using the AI model asset type when you're adding that into your scope. That's going to help attract more AI hackers and help source more hackers for your bug bounty program. In addition, explain the exact kind of threat scenario on your policy page and indicate what data this has access to.

Katie:
Mainly, it's understanding where you consider the security boundaries to be. Let's say you're using an API to OpenAI. Are you saying that anything that comes back should be managed by OpenAI? Are you saying that it's your prompt, so that's in scope? You have to be really clear about where you consider the boundaries of your security to be. I think there's a lot of passing the ball onto third parties when maybe it should be with the organization.

Joseph:

Understand: The organization needs to understand it well and communicate it clearly.
Document: Document it really well and run it in a flag-based way to optimize the researchers’ time and the findings you’ll receive.
Explain: Due to the newness of this industry, fewer tools exist to bypass prompt injection protection. Provide a white box explanation to researchers so they can show you the worst-case scenario.
Reward: The company should be ready and willing to reward traditional vulnerabilities found as a result of implementing this AI feature.

If you're going to have an AI safety HackerOne Challenge or private program, really define clearly what you expect to see. This is going to be extremely important because your traditional bug bounty hunters and even pentesters are not going to think through a safety lens by default.

Q: What is your methodology when approaching an AI engagement?

Katie:
My first step, no matter what kind of program I'm looking at, is to understand what's in front of me and understand how that AI is being used. A chatbot is not going to be very interesting to me, but agents that can generate code that’s run on your targets — that could be very interesting to me.

Once I understand it, then I focus on it the same way I look at business logic issues. I work through the steps that I have to go through to get something to work. What do I need to tell the agent? What steps will it then go through? What is that returning back to me? That's my approach.

Q: How should organizations think about data poisoning as part of MLSecOps?

Katie:
Model poisoning attacks are becoming a bit of an ethics issue. In AI art, for example, there has been a huge discussion between artists and the models themselves, like Midjourney, etc. Artists are suing some of the generative AI companies that do AI art for stealing their intellectual property to train these models.

I am really interested in how this is going to work out because artists have created tools to poison their artworks. There is, in fact, a tool you can download right now that you can apply to your drawing that will poison the model. Ethically, it probably is the right thing to do to not fix this security issue. Model poisoning attacks are security bugs, but there is an argument that maybe we shouldn’t fix these bugs because fixing them may potentially ruin the livelihoods of these artists.

Joseph:
From a bug bounty perspective, it's not as interesting. Poisoning the model is a long-term, deep attack. You're going to have to put a bunch of poison data in and then wait months. But it’s something we need to think about at the foundation stages.

It's very unlikely that there is enough security and scrutiny around large language models at the foundation builders. OpenAI, Google, Meta, Anthropic: the security around those AI model weights once they’re trained is not nearly strong enough. These companies need to double and triple the amount of security they’re applying against data poisoning at the foundation stage.

Q: How do you feel about the OWASP Top 10 for LLMs?

Katie:
At the moment, people are adopting LLMs very quickly, and whenever you adopt any technology really quickly, there is going to be a little trade-off between security and getting it out there.

LLMs are great, but they are not all AI. So, I want to counsel people to not only think about LLMs, but to think about other forms of AI, as well.

Joseph:
It’s really hard to classify these vulnerabilities because there are so many nuances, and they’re not as consistent as other bugs. But the OWASP Top 10 for LLMs is a great place to start. As an industry, we’ll grow and maybe reclassify it in the next year, but it’s a good starting point if people are curious about the different types of attacks to begin their research.

Looking Ahead

Q: Are MLSecOps and AISecOps emerging?

Joseph:
As an engineer doing AI development for my company, AppOmni, MLSecOps and AISecOps are 100% happening. It's pretty difficult to turn them into a production, and I do think they’re going to blow up.

But I don't think that MLSecOps or AISecOps are going to last more than a couple of years. If you're a developer or software engineer, you're going to have to understand how it works. It will be a wave that hackers can ride, and people should dig in and learn it because it's going to be highly applicable to every company. But in three or five year’s time, every good engineer is going to have to know how to use and implement LLM technology and other generative AI technology.

Q: Will AI systems be able to autonomously develop and implement their own security protocols without human intervention?

Katie:
I think we're still quite far off of that, but we're not that far off of developers getting an AI model to give them code to copy and paste in. It's the start of being able to say, “Please write me secure code.” We're not at that level yet, but do I think that it would be possible? Yes. People are really excited about having AI develop secure code itself.

Q: What do hackers need to learn for the future?

Katie:
I'm really starting to learn the operation side of how we get a model into deployment. In one or two year’s time, that's what we're going to be talking about—the infrastructure around how generative AI starts out.

For me, understanding the model, how it's being audited, and how it scales are going to be the real targets for attacks. Most software is written by academics and they didn't want it to be used in production, so they didn't care about security when developing it. That's where I'm going to make a lot of money on HackerOne.

Complete Your AI Safety and Security Program With HackerOne AI Red Teaming

For a deeper understanding of how AI red teaming can be tailored to meet your organization’s specific needs and objectives, contact our experts at HackerOne today.

The 7th Annual Hacker-Powered Security Report

Read the Report