HomeTechnologyMicrosoft’s new safety system...

Microsoft’s new safety system can catch hallucinations in its customers’ AI apps


Sarah Bird, Microsoft’s chief product officer of responsible AI, tells The Verge in an interview that her team has designed several new safety features that will be easy to use for Azure customers who aren’t hiring groups of red teamers to test the AI services they built. Microsoft says these LLM-powered tools can detect potential vulnerabilities, monitor for hallucinations “that are plausible yet unsupported,” and block malicious prompts in real time for Azure AI customers working with any model hosted on the platform. 

“We know that customers don’t all have deep expertise in prompt injection attacks or hateful content, so the evaluation system generates the prompts needed to simulate these types of attacks. Customers can then get a score and see the outcomes,” she says. 

Three features: Prompt Shields, which blocks prompt injections or malicious prompts from external documents that instruct models to go against their training; Groundedness Detection, which finds and blocks hallucinations; and safety evaluations, which assess model vulnerabilities, are now available in preview on Azure AI. Two other features for directing models toward safe outputs and tracking prompts to flag potentially problematic users will be coming soon. 

Whether the user is typing in a prompt or if the model is processing third-party data, the monitoring system will evaluate it to see if it triggers any banned words or has hidden prompts before deciding to send it to the model to answer. After, the system then looks at the response by the model and checks if the model hallucinated information not in the document or the prompt.

In the case of the Google Gemini images, filters made to reduce bias had unintended effects, which is an area where Microsoft says its Azure AI tools will allow for more customized control. Bird acknowledges that there is concern Microsoft and other companies could be deciding what is or isn’t appropriate for AI models, so her team added a way for Azure customers to toggle the filtering of hate speech or violence that the model sees and blocks. 

In the future, Azure users can also get a report of users who attempt to trigger unsafe outputs. Bird says this allows system administrators to figure out which users are its own team of red teamers and which could be people with more malicious intent.

Bird says the safety features are immediately “attached” to GPT-4 and other popular models like Llama 2. However, because Azure’s model garden contains many AI models, users of smaller, less used open-source systems may have to manually point the safety features to the models. 



Source link

Most Popular

LEAVE A REPLY

Please enter your comment!
Please enter your name here

More from Author

Read Now

Team India Squad for T20 World Cup 2024 Announced: Here’s India’s official team for T20 WC – Republic World

India T20 World Cup squad announcement | Image:APTeam India's squad for the upcoming ICC T20 World Cup 2024 has been announced. On Tuesday, the selection committee led by chief selector Ajit Agarkar convened in Ahmedabad and zeroed in on a 15-member unit, which they deem is the...

Justice Minallah says state has to protect judges, independence of judiciary

Justice Athar Minallah on Tuesday said the state had to protect the judges and the judiciary’s independence as the Supreme Court took up a suo motu case pertaining to allegations of interference in judicial affairs.A six-member bench resumed...

Stock futures slip slightly as investors look ahead to Fed decision, megacap earnings: Live updates

Traders work on the floor of the New York Stock Exchange during morning trading on February 29, 2024 in New York City. Michael M. Santiago | Getty ImagesU.S. stock futures fell slightly Tuesday morning after a positive start to the week, as investors brace for megacap earnings,...

Europe’s Economic Laggards Have Become Its Leaders

Something extraordinary is happening to the European economy: Southern nations that nearly broke up the euro currency bloc during the financial crisis in 2012 are growing faster than Germany and other big countries that have long served as the region’s growth engines.The dynamic is bolstering the...

Trump’s Plans for the Fed Make No Sense, Even for Him

A second Trump administration might be very different from the first, and that includes how the president treats the Fed. Donald Trump complained a lot about the US Federal Reserve when he was president, jawboning for lower interest rates and questioning its competence. Yet at the...

Police to launch raids to find migrants to deport to Rwanda, Cabinet Minister claims

Police will mount raids to find missing migrants so they can be deported to Rwanda, a Cabinet minister has said.Health Secretary Victoria Atkins was commenting on reports that the Home Office has lost contact with thousands of people who are set to be removed from the...

The French #Metoo Scandal Unraveling in Weinstein’s Shadow

French actor Gérard Depardieu was ordered to stand trial for allegedly sexually assaulting two women on a film set three years ago, marking the latest legal escalation for the 75-year-old movie star who has become a central figure in France’s #MeToo movement.The announcement coincides with a...

Hong Kong Bitcoin and Ether ETFs Have Soft Debut

Please note that our privacy policy, terms of use, cookies, and do not sell my personal information has been updated.CoinDesk is an award-winning media outlet that covers the cryptocurrency industry. Its journalists abide by a strict set of editorial policies. In November 2023, CoinDesk was acquired...

Customization Overview | Halo Infinite CU32

Operation: Banished Honor arrives on April 30 and you’re gonna want to look the part! After all, the Banished welcome all who pledge their service to Atriox, and your new allegiance and mindset demands a new outfit, so let’s find out more about the customization that...

T20 World Cup 2024 Squads: From India To Australia, Check Here Team-Wise Full Players List, Venues, Fixture, Timings And More

ICC T20 World Cup 2024 Cricket Matches Full Schedule: The T20 World Cup 2024 promises to be an exhilarating showcase of cricketing talent from around the globe. With teams from various nations competing for the prestigious title, fans can expect intense matches filled with thrilling moments...

How the Twins’ summer sausage celebration got made: It sparked the offense, but should they eat it?

CHICAGO — With Abe Froman unavailable, I called sausage expert Elias Cairo to address Rocco Baldelli’s concerns about a potentially hazardous pre-encased meat currently residing in the Minnesota Twins clubhouse.Nearly a week after it arrived and with the package showing visible signs of wear, tear and...