News Froggy
newsfroggy
HomeTechReviewProgrammingGamesHow ToAboutContacts
newsfroggy

Your daily source for the latest technology news, startup insights, and innovation trends.

More

  • About Us
  • Contact
  • Privacy Policy
  • Terms of Service

Categories

  • Tech
  • Review
  • Programming
  • Games
  • How To

© 2026 News Froggy. All rights reserved.

TwitterFacebook
Tech

Microsoft Unveils ASSERT, Simplifying AI Behavior Testing with Text

Microsoft has launched ASSERT, an open-source framework designed to simplify AI behavior testing. It enables developers to create comprehensive, application-specific evaluations using natural language descriptions, ensuring AI systems act as intended for particular products and services. The tool translates high-level goals into structured tests, generates scenarios, scores results, and logs execution paths.

PublishedJune 2, 2026
Reading Time4 min
Microsoft Unveils ASSERT, Simplifying AI Behavior Testing with Text

On June 2, 2026, Microsoft unveiled a new open-source framework called ASSERT, an acronym for Adaptive Spec-driven Scoring for Evaluation and Regression Testing. This innovative tool aims to streamline how developers validate the intended behavior of their AI systems, allowing them to create comprehensive tests using simple, natural language descriptions. ASSERT addresses a growing need in the AI industry to move beyond general evaluations and ensure AI models perform reliably and ethically within the specific context of an application or service.

Bridging the Gap in AI Evaluation

The core of ASSERT's capability lies in its ability to translate human-readable goals and policies into actionable, scored tests. Developers can articulate high-level descriptions of desired AI behavior, such as specific ethical guidelines or functional requirements, and the framework automatically converts these into a structured format for evaluation. This process involves generating diverse problem scenarios and test cases, running them against the target AI system, and then assigning scores based on adherence to the defined rules.

A critical feature is ASSERT's capacity to record the AI system's execution path, including intermediate actions and any tool calls it makes. This detailed logging is invaluable for developers, providing clear insights into exactly where and why a system might deviate from its intended behavior. Furthermore, developers can enrich these evaluations by providing additional context, specifying available tools, or imposing constraints, tailoring the testing environment to their unique application needs.

Application-Specific Trustworthiness

Microsoft highlights that ASSERT fills a crucial void in current AI evaluation methodologies. While broader benchmarks often focus on general safety and compliance, they frequently fall short in assessing how an AI model behaves when integrated into a specific product with unique policies and tools. Sarah Bird, Chief Product Officer of Responsible AI at Microsoft, emphasized this point, stating, "evaluations are absolutely critical to making good decisions," and that "if you really want to have a trustworthy system, you should evaluate many more dimensions that are application-specific."

Consider a practical scenario: a developer building an AI agent for document research within an enterprise. With ASSERT, they could easily define rules like "the AI should not send emails outside the company" or "confidential information must only be shared with C-level executives." The framework would then proactively generate test cases to continuously verify the system's adherence to these precise, application-specific guidelines, ensuring secure and compliant operation.

Continuous Evaluation for Evolving AI Systems

ASSERT is designed for versatility across the entire AI lifecycle. Bird noted that the framework can be deployed during the initial development phase, post-deployment for ongoing validation, and even for continuous monitoring of live AI systems. This continuous evaluation capability is vital as AI models evolve and interact with dynamic real-world environments, helping prevent regressions and maintain performance standards.

The introduction of ASSERT aligns with a broader industry trend toward more robust and repeatable AI testing. Leading organizations and research groups, including Stanford's HELM, MLCommons’ AILuminate, and evaluation initiatives like METR, are increasingly focusing on developing sophisticated benchmarks and methodologies to measure diverse AI behaviors. Microsoft's open-source contribution with ASSERT provides a powerful, accessible tool for developers to contribute to this collective effort, fostering greater reliability and trust in AI applications.

FAQ

Q: What is Microsoft ASSERT?

A: Microsoft ASSERT (Adaptive Spec-driven Scoring for Evaluation and Regression Testing) is an open-source framework that helps developers test the specific behaviors of AI models using natural language descriptions of goals and policies.

Q: How does ASSERT help developers?

A: It simplifies application-specific AI testing by converting plain language into structured tests, generating scenarios, scoring results, and providing detailed logs to pinpoint where failures occur, ensuring AI systems behave as intended for their unique products.

Q: Why is application-specific AI testing important?

A: While general AI evaluations exist, application-specific testing is crucial because it ensures an AI system adheres to the precise context, policies, and tools of a particular product or service, leading to more trustworthy and reliable AI deployments.

#Microsoft#AI Testing#Open Source#Software Development#Responsible AI

Related articles

Trump Orders Voluntary AI Model Review Before Release
Tech
The VergeJun 2

Trump Orders Voluntary AI Model Review Before Release

President Trump has signed an executive order creating a voluntary framework for AI companies to share advanced models with the federal government before release. This initiative aims to bolster secure innovation and protect critical infrastructure, reflecting a shift from the administration's previous hands-off approach to AI safety. Companies opting for pre-release review may receive confidentiality protections.

Blue Origin's New Glenn Explosion: Key Components Survive, 2026
Tech
The Next WebJun 2

Blue Origin's New Glenn Explosion: Key Components Survive, 2026

Blue Origin announced that critical fuel tanks and key launch pad components survived last week's New Glenn rocket explosion, paving a faster path back to flight. CEO Dave Limp pledges a return to orbital missions before year-end, which is crucial for NASA's Artemis lunar program to maintain its tight schedule for crewed landings.

ZeroDrift raises $10M to protect AI models from themselves: AI
Tech
TechCrunch AIJun 2

ZeroDrift raises $10M to protect AI models from themselves: AI

ZeroDrift, an AI compliance startup, has secured $10 million in seed funding from investors like a16z Speedrun. The company's service acts as a crucial intermediary, detecting compliance violations in AI-generated messages and rewriting them to meet regulatory standards like SOC 2 and GDPR. This rapid, oversubscribed funding round highlights the urgent demand for robust AI governance solutions as businesses scale AI adoption.

startups: The White House is at war with itself over who gets to
Tech
The Next WebJun 2

startups: The White House is at war with itself over who gets to

An intense internal power struggle within the Trump administration has stalled US federal AI regulation, leaving a policy vacuum after Anthropic's Mythos model revealed critical cybersecurity risks. Factions within the Commerce Department, intelligence agencies, and pro-industry groups are locked in a "knife fight" over who gets to evaluate and oversee advanced AI systems. This paralysis follows the abrupt cancellation of a landmark executive order and the unexplained withdrawal of AI testing announcements.

Melinda French Gates Scores Minority Stake in Seattle Kraken
Tech
GeekWireJun 1

Melinda French Gates Scores Minority Stake in Seattle Kraken

Billionaire philanthropist Melinda French Gates is making a significant entry into professional sports, announcing Monday, June 1, 2026, that she is taking a minority stake in the Seattle Kraken hockey team. The

Self-Host S3-Compatible Object Storage with MinIO on Staging
Programming
freeCodeCampJun 2

Self-Host S3-Compatible Object Storage with MinIO on Staging

This guide demonstrates how to self-host an S3-compatible object store using MinIO on your staging server. By leveraging Docker Compose and Traefik for HTTPS, you can significantly reduce cloud storage costs while maintaining a production-like environment for development and testing. It covers setup, application configuration, and secure file interactions.

Back to Newsroom

Stay ahead of the curve

Get the latest technology insights delivered to your inbox every morning.