MYSTIC DEPOT - Vendor-Agnostic AI Evaluation Infrastructure
Below is a brief summary. Please check the full solicitation before applying (link in resources section).
Executive Summary:
The Defense Innovation Unit (DIU) is seeking commercial solutions for MYSTIC DEPOT: Vendor-Agnostic AI Evaluation Infrastructure, a capability to rigorously evaluate artificial intelligence systems used in national security contexts. The government wants infrastructure that can continuously test new AI models, agents, and human-AI teaming workflows against mission-specific benchmarks as AI capabilities evolve.
The solicitation closes on 2026-03-24 23:59:59 US/Eastern Time, and companies must submit a short solution brief through DIU’s Commercial Solutions Opening (CSO) portal before that deadline.
The government is seeking two types of capabilities:
Evaluation Harness Infrastructure that connects AI models to benchmarks and generates standardized evaluation results.
Benchmark Development and Methodology that defines how government-specific AI capabilities should be tested across classified and unclassified mission environments.
Prototype awards may lead directly to follow-on production contracts without further competition, which could significantly expand the size of the opportunity.
How much funding would I receive?
The solicitation states that awards will be issued as Prototype Other Transaction (OT) agreements under 10 U.S.C. 4022. Awards typically range from $500k - $5m.
What could I use the funding for?
Funding would support the development and demonstration of solutions addressing one or both of the following Lines of Effort (LOE):
LOE 1: Evaluation Harness
Infrastructure that enables standardized, reproducible evaluation of AI systems.
Key capabilities include:
Model Interface to connect diverse AI systems to the evaluation harness
Execution Engine for orchestrating evaluation workflows
Measurement and Scoring System for benchmarking model outputs
Human-in-the-loop evaluation to measure performance of human-AI teams
Output and reporting tools that export results in open, non-proprietary formats
Continuous monitoring and analytics for ongoing model performance tracking
Benchmark configuration management
Simulation of degraded or denied environments (DDIL)
Agentic AI evaluation for multi-step autonomous behavior
Adversarial testing and red-teaming
Multimodal evaluation including video and audio inputs
Solutions should also support:
Modular architecture
Containerized deployment
Deployment across unclassified, classified cloud, and air-gapped environments
Interoperability between evaluation infrastructure and benchmark content
Access controls and sensitive data protection
LOE 2: Benchmark Development and Methodology
Creation of mission-relevant AI evaluation benchmarks across unclassified, secret, and top secret workflows.
Benchmark development should address:
Mission capability requirements
Task decomposition into measurable evaluations
Realistic operational scenarios
Scoring criteria and interpretability
Baseline model performance
Validation of reliability and fairness
Resistance to benchmark gaming
Ongoing benchmark maintenance
Vendors must also provide training materials so government personnel can maintain benchmarks independently.
Are there any additional benefits I would receive?
Prototype awards may lead to direct follow-on production contracts without additional competition.
Potential follow-on activities include:
Deployment across additional classification levels and environments
Expansion of benchmark suites for new mission areas
Ongoing system maintenance and capability upgrades
Training and support for government personnel
The solicitation states that the follow-on production award may be significantly larger than the prototype OT agreement.
What is the timeline to apply and when would I receive funding?
Submission deadline:
2026-03-24 23:59:59 US/Eastern Time
Application process:
Companies submit a solution brief through the DIU submission portal.
DIU reviews submissions and may invite selected companies to provide a pitch and full proposal.
If selected, companies will negotiate the terms of a prototype OT agreement.
The solicitation states that DIU aims to respond within 30 days if it is interested in moving forward with a pitch.
Where does this funding come from?
The opportunity is issued by the Defense Innovation Unit (DIU) using the Commercial Solutions Opening (CSO) process.
Awards are made under the authority of 10 U.S.C. 4022, which allows the Department of Defense to issue Other Transaction (OT) agreements for prototype projects.
The solicitation references the DIU CSO HQ0845-20-S-C001, originally posted to SAM.gov on 23 March 2020.
The program is conducted in partnership with the Office of the Director of National Intelligence (ODNI).
Who is eligible to apply?
Eligible applicants include vendors that are eligible to receive an Other Transaction award in accordance with 10 U.S.C. 4022.
Companies should demonstrate expertise in areas such as:
AI evaluation methodology
Benchmark design and measurement
Security testing and adversarial AI evaluation
Preferred qualifications include:
Published research on evaluation methodologies
Contributions to AI evaluation frameworks or benchmarks
Collaboration with frontier AI labs
Experience working with government AI evaluation initiatives
Personnel with Secret clearance minimum (TS/SCI preferred) or the ability to obtain clearance
Experience deploying systems in DoD or Intelligence Community environments
Familiarity with national security mission contexts
Experience evaluating human-machine teaming performance
Vendors may apply individually or in partnership.
What companies and projects are likely to win?
The government is seeking solutions that demonstrate:
Proven expertise in AI evaluation infrastructure or benchmark development
Ability to support vendor-agnostic evaluation of diverse AI systems
Experience deploying technology in secure government environments
Capability to evaluate human-AI team performance
Infrastructure that supports agentic AI evaluation, adversarial testing, and multimodal inputs
Solutions should be designed for broad applicability across government programs, rather than optimized for a single use case.
Are there any restrictions I should know about?
Key restrictions include:
Submissions must be unclassified and contain no data above Controlled Unclassified Information (CUI).
Solution briefs must be PDF files under 10MB.
Briefs should be approximately:
5 pages or fewer, or
15 slides or fewer.
Vendors must comply with Section 889 of the John S. McCain National Defense Authorization Act for Fiscal Year 2019.
How long will it take me to prepare an application?
The application requires a solution brief describing your technology and how it meets the desired solution attributes.
Because the submission is limited to approximately 5 pages or 15 slides, most qualified teams can typically prepare a competitive submission within a short timeframe.
How can BW&CO help?
BW&CO helps startups and technology companies develop competitive DIU solution briefs and prototype proposals.
Support typically includes:
Interpreting the solicitation requirements
Positioning your technology against LOE 1 or LOE 2
Writing and designing the solution brief
Preparing technical narratives and evaluation plans
Preparing teams for DIU pitch sessions
Supporting negotiations for prototype OT agreements
Our goal is to translate your technology into language that aligns with DIU mission priorities and evaluation criteria.
How much would BW&CO Charge?
We have both fractional engagements ($250 an hour) and full engagements ($13,000 + 5%) available.