Acoustic Imaging for UAV Detection: Dense Beamformed Energy Maps and U-Net SELD
Belman Jahir Rodriguez, Sergio F. Chevtchenko, Marcelo Herrera Martinez, Yeshwanth Bethi, Saeed Afshar
Paper appears method- or tooling-adjacent to AI workflows with partial ecosystem coverage.
We introduce a U-net model for 360° acoustic source localization formulated as a spherical semantic segmentation task. Rather than regressing discrete direction-of-arrival (DoA) angles, our model segments beamformed audio maps (azimuth & elevation) into regions of active sound presence. Using delay-and-sum (DAS) beamforming on a custom 24-microphone array, we generate signals aligned with drone GPS telemetry to creat ...
e binary supervision masks. A modified U-Net, trained on frequency-domain representations of these maps, learns to identify spatially distributed source regions while addressing class imbalance via the Tversky loss. Because the network operates on beamformed energy maps, the approach is inherently array-independent and can adapt to different microphone configurations and can be transferred to different microphone configurations with minimal adaptation. The segmentation outputs are post-processed by computing centroids over activated regions, enabling robust DoA estimates. Our dataset includes real-world open-field recordings of a DJI Air 3 drone, synchronized with 360° video and flight logs across multiple dates and locations. Experimental results show that U-net generalizes across environments, providing improved angular precision, offering a new paradigm for dense spatial audio understanding beyond traditional Sound Source Localization (SSL). We additionally validate the same beamforming-plus-segmentation formulation on the DCASE 2019 TAU Spatial Sound Events benchmark, showing that the approach generalizes beyond drone acoustics to multiclass Sound Event Localization and Detection (SELD) scenarios.
Results & Benchmarks
Benchmark evidence drill-down
Audit each benchmark finding before selecting an implementation path. Evidence refs map to the disclosure section below.
| Task | Dataset | Metric | Value | Source | Evidence refs |
|---|---|---|---|---|---|
| Detection | SELDnet | F1 | 94.7 | paper-derived | No explicit refs |
| Detection | Baseline SELDnet | F1 | 83.1 | paper-derived | No explicit refs |
| Detection | w2v-SELD | F1 | 94.7 | paper-derived | No explicit refs |
| Computer vision | SELDnet | F1 | 94.7 | paper-derived | No explicit refs |
| Computer vision | Baseline SELDnet | F1 | 83.1 | paper-derived | No explicit refs |
We introduce a U-net model for 360° acoustic source localization formulated as a spherical semantic segmentation task.
Implementation Evidence Summary
Recommendation evidence is currently too limited for a maintained-repo choice. Use Implementation Status and Reproduction Path for a practical baseline plan.
Reproduction Risks
- Estimate is based on paper-only reproduction flow
Hardware Notes
Expect multi-day setup/compute for meaningful reproduction based on current guidance.
Evidence disclosure
Evidence graph: 2 refs, 1 links.
Utility signals: depth 95/100, grounding 68/100, status medium.
Implementation Status
There is no verified maintained implementation yet. Use this baseline plan to decide whether to prototype now or defer.
- No direct maintained implementation was found. Use the paper PDF and citation graph to design a baseline reproduction.
- Track assumptions and missing details in an experiment log before coding.
Reproduction readiness
Hardware requirements
- Expect multi-day setup/compute for meaningful reproduction based on current guidance.
No verified implementation available
- · No maintained repository has been identified for this paper. Check adjacent implementations or HF artifacts below.
Hugging Face artifacts
No trustworthy direct or curated related Hugging Face artifacts were found yet.
Continue with targeted Hugging Face searches derived from the paper title and method context:
Models
Datasets
Spaces
Tip: start with models, then check datasets/spaces if you need evaluation data or demos.
Direct artifact matches are currently sparse. Use targeted Hugging Face searches to quickly locate candidate models, datasets, and demos.
Research context
Tasks
Computer vision, Detection
Methods
None detected
Domains
Computer vision
Evaluation & Human Feedback Data
Open this paper in HFEPX to review benchmark signals, evaluation modes, and human-feedback protocol context.
Open in HFEPXExplore Similar Papers
Jump to Paper2Code search queries derived from this paper's research context.
Need human evaluators for your AI research? Scale annotation with expert AI Trainers.