Putting New AI Lab Commitments in Context

Aug. 8

This article was originally published on the GovAI blog.

On July 21, in response to emerging risks from AI, the Biden administration announced a set of voluntary commitments from seven leading AI companies: the established tech giants Amazon, Google, Meta, and Microsoft and the AI labs OpenAI, Anthropic, and Inflection.

In addition to bringing together these major players, the announcement is notable for explicitly targeting frontier models: general-purpose models that the full text of the commitments define as being “overall more powerful than the current industry frontier.” While the White House has previously made announcements on AI – for example, VP Harris’s meeting with leading lab CEOs in May 2023 – this is one of the most explicit that calls for ways to manage these systems.

Below, we summarize some of the most significant takeaways in the announcement and comment on some notable omissions, for instance what the announcement does not say about open sourcing models or about principles for model release decisions. While potentially valuable, it remains to be seen if the commitments will be a building block for or a blocker to regulation of AI, including frontier models.

Putting safety first

The voluntary commitments identify safety, security, and trust as top priorities, calling them “three principles that must be fundamental to the future of AI.” The emphasis on safety and security foregrounds the national security implications of frontier models, which often sit alongside other regulatory concerns such as privacy and fairness in documents like the National Institute of Standards and Technology AI Risk Management Framework (NIST AI RMF).

On safety, the commitments explicitly identify cybersecurity and biosecurity as priority areas and recommend use of internal and external red-teaming to anticipate these risks. Senior US cybersecurity officials have voiced concern about how malicious actors could use future AI models to plan cyberattacks or interfere with elections, and in Congress, two Senators have proposed bipartisan legislation to examine whether advanced AI systems could facilitate the development of bioweapons and novel pathogens.
On security, the commitment recognizes model weights – the core intellectual property behind AI systems – as being particularly important to protect. Insider threats are one concern that the commitment identifies. But leading US officials like National Security Agency head Paul Nakasone and cyberspace ambassador Nathaniel Fick have also warned that adversaries, such as China, may try to steal leading AI companies’ models to get ahead.

According to White House advisor Anne Neuberger, the US government has already conducted cybersecurity briefings for leading AI labs to pre-empt these threats. The emphasis on frontier AI model protection in the White House voluntary commitments suggests that AI labs may be open to collaborating further with US agencies, such as the Cybersecurity and Infrastructure Security Agency.

Information sharing and transparency

Another theme running through the announcement is the commitment to more information sharing between companies and more transparency to the public. Companies promised to share among themselves best practices for safety as well as findings on how malicious users could circumvent AI system safeguards. Companies also promised to publicly release more details on the capabilities and limitations of their models. The White House’s endorsement of this information sharing may help to allay concerns other researchers have previously raised about antitrust law potentially limiting cooperation on AI safety and security, and open the door for greater technical collaboration in the future.

Some of the companies have already launched a new industry body to share best practices, lending weight to the voluntary commitments. Five days after the White House announcement, Anthropic, Google, Microsoft, and OpenAI launched the Frontier Model Forum, “an industry body focused on ensuring safe and responsible development of frontier AI models.” Among other things, the forum aims to “enable independent, standardized evaluations of capabilities and safety,” and to identify best practices for responsible development and deployment.
However, the new forum is missing three of the seven companies who agreed to the voluntary commitments – Amazon, Meta, and Inflection – and it is unclear if they will join in the future. Nonetheless, these three could plausibly share information on a more limited or ad hoc basis. How the new forum will interact with other multilateral and multi-stakeholder initiatives like the G7 Hiroshima process or the Partnership on AI will also be something to watch.
The companies committed to developing technical mechanisms to identify AI-generated audio or visual content, but (apparently) not text. Although the White House announcement refers broadly to helping users “know when content is AI generated,” the detailed statement only covers audio and visual content. From a national security perspective, this means that AI-generated text-based disinformation campaigns could continue to be a concern. While there are technical barriers to watermarking AI-generated text, it is unclear whether these, or other political barriers, were behind the decision not to discuss watermarking text.

Open sourcing and deployment decisions

Among the most notable omissions from the announcement were the lack of details on how companies will ultimately decide whether to open source or otherwise deploy their models. On these questions, companies differ substantially in approach; for example, while Meta has chosen to open source some of their most advanced models (i.e., allow users to freely download and modify them), most of the other companies have been more reticent to open source their models and have sometimes cited concerns about open-source models enabling misuse. Unsurprisingly, the companies have not arrived at a consensus in their announcement.

For the seven companies, open-source remains an open question. Though the commitment says that AI labs will release AI model weights “only when intended,” the announcement provides no details on how decisions around intentional model weight release should be made. This choice trades off between openness and security. Advocates of open sourcing argue that it facilitates accountability and helps crowdsource safety, while advocates of structured access raise concerns including about misuse by malicious actors. (There are also business incentives on both sides.)
The commitments also do not explicitly say how results from red-teaming will inform decisions around model deployment. While it is natural to assume that these risk assessments will ultimately inform the decision to deploy or not, the commitments are not explicit about formal processes – for example, whether senior stakeholders must be briefed with red-team results when making a go/no-go decision or whether external experts will be able to red-team versions of the model immediately pre-deployment (as opposed to earlier versions that may change further during the training process).

Conclusion

The voluntary commitments may be an important step toward ensuring that frontier AI models remain safe, secure, and trustworthy. However, they also raise a number of questions and leave many details to be decided. It also remains unclear how forceful voluntary lab commitments will ultimately be without new legislation to back them up.

Acknowledgements

This is a blog post from Rethink Priorities–a think tank dedicated to informing decisions made by high-impact organizations and funders across various cause areas. The authors are Shaun Ee and Joe O’Brien. Thanks to Tim Fist, Tony Barrett, and reviewers at GovAI for feedback and advice.

If you are interested in RP’s work, please visit our research database and subscribe to our newsletter.

Shaun EeJoe O’Brien

Michael Aird

Michael Aird is a Researcher at Rethink Priorities. He has a background in political and cognitive psychology and in teaching. Before joining RP, he conducted longtermist macrostrategy research for Convergence Analysis and the Center on Long-Term Risk.

Putting New AI Lab Commitments in Context

Putting safety first

Information sharing and transparency

Open sourcing and deployment decisions

Conclusion

Acknowledgements

About

Our Work

Putting New AI Lab Commitments in Context

Putting safety first

Information sharing and transparency

Open sourcing and deployment decisions

Conclusion

Acknowledgements

Project proposal: Scenario analysis group for AI safety strategy

Announcing the AIPolicyIdeas.com Database

About

Our Work