MYSTIC DEPOT - Vendor-Agnostic AI Evaluation Infrastructure

Below is a brief summary. Please check the full solicitation before applying (link in resources section).

Executive Summary:

The Defense Innovation Unit (DIU) is seeking commercial solutions for MYSTIC DEPOT: Vendor-Agnostic AI Evaluation Infrastructure, a capability to rigorously evaluate artificial intelligence systems used in national security contexts. The government wants infrastructure that can continuously test new AI models, agents, and human-AI teaming workflows against mission-specific benchmarks as AI capabilities evolve.

The solicitation closes on 2026-03-24 23:59:59 US/Eastern Time, and companies must submit a short solution brief through DIU’s Commercial Solutions Opening (CSO) portal before that deadline.

The government is seeking two types of capabilities:

  • Evaluation Harness Infrastructure that connects AI models to benchmarks and generates standardized evaluation results.

  • Benchmark Development and Methodology that defines how government-specific AI capabilities should be tested across classified and unclassified mission environments.

Prototype awards may lead directly to follow-on production contracts without further competition, which could significantly expand the size of the opportunity.

How much funding would I receive?

The solicitation states that awards will be issued as Prototype Other Transaction (OT) agreements under 10 U.S.C. 4022. Awards typically range from $500k - $5m.

What could I use the funding for?

Funding would support the development and demonstration of solutions addressing one or both of the following Lines of Effort (LOE):

LOE 1: Evaluation Harness

Infrastructure that enables standardized, reproducible evaluation of AI systems.

Key capabilities include:

  • Model Interface to connect diverse AI systems to the evaluation harness

  • Execution Engine for orchestrating evaluation workflows

  • Measurement and Scoring System for benchmarking model outputs

  • Human-in-the-loop evaluation to measure performance of human-AI teams

  • Output and reporting tools that export results in open, non-proprietary formats

  • Continuous monitoring and analytics for ongoing model performance tracking

  • Benchmark configuration management

  • Simulation of degraded or denied environments (DDIL)

  • Agentic AI evaluation for multi-step autonomous behavior

  • Adversarial testing and red-teaming

  • Multimodal evaluation including video and audio inputs

Solutions should also support:

  • Modular architecture

  • Containerized deployment

  • Deployment across unclassified, classified cloud, and air-gapped environments

  • Interoperability between evaluation infrastructure and benchmark content

  • Access controls and sensitive data protection

LOE 2: Benchmark Development and Methodology

Creation of mission-relevant AI evaluation benchmarks across unclassified, secret, and top secret workflows.

Benchmark development should address:

  • Mission capability requirements

  • Task decomposition into measurable evaluations

  • Realistic operational scenarios

  • Scoring criteria and interpretability

  • Baseline model performance

  • Validation of reliability and fairness

  • Resistance to benchmark gaming

  • Ongoing benchmark maintenance

Vendors must also provide training materials so government personnel can maintain benchmarks independently.

Are there any additional benefits I would receive?

Prototype awards may lead to direct follow-on production contracts without additional competition.

Potential follow-on activities include:

  • Deployment across additional classification levels and environments

  • Expansion of benchmark suites for new mission areas

  • Ongoing system maintenance and capability upgrades

  • Training and support for government personnel

The solicitation states that the follow-on production award may be significantly larger than the prototype OT agreement.

What is the timeline to apply and when would I receive funding?

Submission deadline:
2026-03-24 23:59:59 US/Eastern Time

Application process:

  1. Companies submit a solution brief through the DIU submission portal.

  2. DIU reviews submissions and may invite selected companies to provide a pitch and full proposal.

  3. If selected, companies will negotiate the terms of a prototype OT agreement.

The solicitation states that DIU aims to respond within 30 days if it is interested in moving forward with a pitch.

Where does this funding come from?

The opportunity is issued by the Defense Innovation Unit (DIU) using the Commercial Solutions Opening (CSO) process.

Awards are made under the authority of 10 U.S.C. 4022, which allows the Department of Defense to issue Other Transaction (OT) agreements for prototype projects.

The solicitation references the DIU CSO HQ0845-20-S-C001, originally posted to SAM.gov on 23 March 2020.

The program is conducted in partnership with the Office of the Director of National Intelligence (ODNI).

Who is eligible to apply?

Eligible applicants include vendors that are eligible to receive an Other Transaction award in accordance with 10 U.S.C. 4022.

Companies should demonstrate expertise in areas such as:

  • AI evaluation methodology

  • Benchmark design and measurement

  • Security testing and adversarial AI evaluation

Preferred qualifications include:

  • Published research on evaluation methodologies

  • Contributions to AI evaluation frameworks or benchmarks

  • Collaboration with frontier AI labs

  • Experience working with government AI evaluation initiatives

  • Personnel with Secret clearance minimum (TS/SCI preferred) or the ability to obtain clearance

  • Experience deploying systems in DoD or Intelligence Community environments

  • Familiarity with national security mission contexts

  • Experience evaluating human-machine teaming performance

Vendors may apply individually or in partnership.

What companies and projects are likely to win?

The government is seeking solutions that demonstrate:

  • Proven expertise in AI evaluation infrastructure or benchmark development

  • Ability to support vendor-agnostic evaluation of diverse AI systems

  • Experience deploying technology in secure government environments

  • Capability to evaluate human-AI team performance

  • Infrastructure that supports agentic AI evaluation, adversarial testing, and multimodal inputs

Solutions should be designed for broad applicability across government programs, rather than optimized for a single use case.

Are there any restrictions I should know about?

Key restrictions include:

  • Submissions must be unclassified and contain no data above Controlled Unclassified Information (CUI).

  • Solution briefs must be PDF files under 10MB.

  • Briefs should be approximately:

    • 5 pages or fewer, or

    • 15 slides or fewer.

  • Vendors must comply with Section 889 of the John S. McCain National Defense Authorization Act for Fiscal Year 2019.

How long will it take me to prepare an application?

The application requires a solution brief describing your technology and how it meets the desired solution attributes.

Because the submission is limited to approximately 5 pages or 15 slides, most qualified teams can typically prepare a competitive submission within a short timeframe.

How can BW&CO help?

BW&CO helps startups and technology companies develop competitive DIU solution briefs and prototype proposals.

Support typically includes:

  • Interpreting the solicitation requirements

  • Positioning your technology against LOE 1 or LOE 2

  • Writing and designing the solution brief

  • Preparing technical narratives and evaluation plans

  • Preparing teams for DIU pitch sessions

  • Supporting negotiations for prototype OT agreements

Our goal is to translate your technology into language that aligns with DIU mission priorities and evaluation criteria.

How much would BW&CO Charge?

We have both fractional engagements ($250 an hour) and full engagements ($13,000 + 5%) available.

Additional Resources

Review the solicitation here.

Next
Next

Naval Air Warfare Center Training Systems Division (NAWCTSD) Broad Agency Announcement (BAA)