Collaborative Aided Target Recognition Using Next-Generation Foundation Models for Multi-Domain Unmanned Systems - STTR Topic OSW26TZ04-NV001
Funding Amount:
Est. $314,363
Deadline to Apply:
August 19th, 2026
ITAR:
The technology within this topic is restricted under the International Traffic in Arms Regulation (ITAR), 22 CFR Parts 120-130, which controls the export and import of defense-related material and services, including export of sensitive technical data, or the Export Administration Regulation (EAR), 15 CFR Parts 730-774, which controls dual use items. Offerors must disclose any proposed use of foreign nationals (FNs), their country(ies) of origin, the type of visa or work permit possessed, and the statement of work (SOW) tasks intended for accomplishment by the FN(s) in accordance with section 3.5 of the Announcement. Offerors are advised foreign nationals proposed to perform on this topic may be restricted due to the technical data under US Export Control Laws.
Objective:
Develop and demonstrate a collaborative Aided Target Recognition (AiTR) capability for multi-domain unmanned systems (UAS, UGV, USV) that leverages next-generation pre-trained foundation models—including Vision-Language Models (VLMs), Vision-Language-Action models (VLAs), and/or modern State-Space Models (e.g., S4, S5, Mamba, Mamba-2)—to achieve robust, accurate, and predictive multi-platform, multi-modal target disambiguation in cluttered, contested, and partially observed operational environments.
Description:
Modern military operations across air, land, and sea domains increasingly rely on heterogeneous teams of unmanned platforms tasked with detecting, identifying, and disambiguating targets at long range, often under appearance ambiguity, viewpoint and scale variation, sensor heterogeneity, and degraded Positioning, Navigation, and Timing (PNT).
Today, reconciling observations across platforms is largely a manual, communication-intensive process that is slow, cognitively demanding, and error-prone. Single-platform AiTR cannot resolve perceptual aliasing (e.g., identical vehicle classes in coordinated formations) or fragmented observations caused by occlusion and partial visibility.
Recent advances in foundation models offer transformative potential. Vision-Language Models (e.g., CLIP, LLaVA, Florence-2, Qwen-VL) provide view-invariant semantic representations; Vision-Language-Action models (e.g., RT-2, OpenVLA) couple perception with embodied decision-making; and State-Space Models (S4, S5, Mamba, Mamba-2) enable efficient long-sequence spatiotemporal reasoning with linear complexity—well-suited to multi-platform sensor fusion.
However, these models are not yet adapted for collaborative, multi-platform military AiTR under realistic battlefield constraints.
Offerors are encouraged to propose innovative technology-agnostic approaches built around next-generation foundation models (no legacy CNN-only solutions).
Development should address:
Collaborative Multi-Platform Target Disambiguation (foundation-model-based representations enabling cross-platform target correspondence under appearance ambiguity, viewpoint, scale variation, and clutter)
Geometry-Consistent Multi-Modal Sensor Fusion (heterogeneous modalities such as EO/IR, SAR, LiDAR, RF across platforms with diverse resolution, FoV, and noise characteristics)
Spatiotemporal World Modeling for Predictive AiTR (collective world models that maintain persistent target identity and forecast scene evolution under asynchronous, partial, intermittent data)
Friend-Foe-Neutral (Gray) Classification (robust discrimination among blue, red, and gray entities)
Resilient Operation in Degraded Environments (desirable capability to operate in GPS/PNT-denied, comms-degraded conditions)
An Agentic Autonomy Stack (a foundation-model-based framework orchestrating perception, reasoning, and tasking across heterogeneous platforms)
Tri-Service Relevance:
Army applications include multi-domain operations with ground/aerial scout teams; Navy/USMC applications include distributed maritime operations and expeditionary littoral reconnaissance; Air Force/Space Force applications include collaborative combat aircraft (CCA) and ISR drone swarms of varying size classes.
Special Considerations:
Cybersecurity (model integrity, adversarial robustness against evasion/poisoning), supply-chain provenance of pre-training data, and explainability of model outputs shall be addressed.
PHASE I:
Develop the conceptual design and demonstrate technical feasibility of a collaborative AiTR system based on next-generation foundation models.
Phase I shall include:
Selection and justification of foundation model architecture(s) (VLM/VLA/SSM or hybrid).
Preliminary algorithmic design for multi-platform target disambiguation, sensor fusion, and predictive world modeling.
Feasibility demonstration using simulated or limited real-world multi-platform datasets.
Initial assessment of computational requirements and PNT-denied operation.
A Phase II development plan with measurable milestones.
Phase I deliverables: feasibility study report, preliminary software prototype, and Phase II proposal.
PHASE II:
Develop, integrate, and validate a fully functional prototype across at least three (3) heterogeneous unmanned platforms representing multiple domains (air/ground/sea).
Conduct field demonstrations in operationally relevant environments.
Phase II deliverables: integrated software prototype with documented APIs; comprehensive technical report including architecture, training data provenance, and limitations; demonstration in relevant operational environment (TRL 6); and a transition plan addressing at least two of the three services.
Desired Key Performance Parameters (KPPs):
Multi-platform target disambiguation precision/recall/F1 > 0.85 across >= 3 collaborating platforms
Cross-platform correspondence accuracy > 90% under viewpoint variation >= 60 degrees
Friend-foe-neutral classification accuracy > 95% (friend) and > 90% (foe)
Predictive AiTR horizon >= 30 seconds with <= 2 m positional error
Latency for collaborative inference (per platform) < 500 ms end-to-end
Communication bandwidth requirement < 1 Mbps per inter-platform link
Desirable operation under PNT-denied conditions with < 10% degradation
Demonstrable model adaptability to new sensor modalities within 30 days of integration
PHASE III DUAL USE APPLICATIONS:
Defense:
Multi-domain ISR
Collaborative combat aircraft (CCA)
Distributed maritime operations
Ground reconnaissance swarms
Commercial:
Autonomous vehicle fleets (cross-vehicle perception sharing)
Search-and-rescue
Wildfire monitoring with cooperative UAS
Precision agriculture
Port and border security
Infrastructure inspection
Phase III refers to work that derives from, extends, or completes an effort made under prior SBIR/STTR funding agreements, but is funded by sources other than the SBIR/STTR Program.
Who will win?
If you can achieve the objective above better than any other company on the market, you have a very high-likelihood of success and should apply.
Who is eligible to apply?
Any company that meets the following criteria:
For-profit company
U.S.-owned and controlled.
500 or fewer employees (including affiliates)
How Can BW&CO Help?
1) End-to-end support including, strategy, writing of the full proposal, and administrative & compliance support.
2) Proposal strategy and review.
3) Administrative & compliance support.
Request to talk with a member of our team by completing the form below: