Semantic speech retrieval with a visually grounded model of untranscribed speech

Q: How reproducible is "Semantic speech retrieval with a visually grounded model of untranscribed speech"?

Estimated time to first reproduction: a few days. Risk flags: Adjacent implementations are not paper-verified. No maintained paper-verified implementation was found; start with the closest related repositories below.

Q: What framework is used to implement "Semantic speech retrieval with a visually grounded model of untranscribed speech"?

The primary implementation uses none.

Published: Oct 1, 2017

Historical official implementation (not recommended for new builds)

Evidence: Historical

Domain fit: AI-adjacent

Verified repos: 2

Top repo stars: 7

Paper appears method- or tooling-adjacent to AI workflows with partial ecosystem coverage.

Framework: none

Time to first repro: a few days

1 risk flag

arXiv PDF

Technical details

Canonical key: arxiv-1710.01949

Cache status: Stale (SWR served)

Generated at: Jun 17, 2026, 10:11 PM

Artifact coverage: sparse

HF provider: ok (token)

PWC source used: Yes

LLM status: not_generated

LLM model: n/a

LLM generated: Unknown

LLM content type: n/a

HF policy: hf-relevance-v27

implementation starting point

Benchmarks: thin evidence

Time to repro: a few days

1 risk flag

none

Results & Benchmarks

Freshness tier: cold

Direct + Inferred Evidence

Retrieval / indexing

PerfectASR (0%)

60.1

Source: paper fulltext

Retrieval / indexing

GoogleASR (8.6%)

58.8

Source: paper fulltext

Retrieval / indexing

SimASR (8.6%)

56.0

Source: paper fulltext

Benchmark evidence drill-down

3 findings

Audit each benchmark finding before selecting an implementation path. Evidence refs map to the disclosure section below.

Task	Dataset	Metric	Value	Source	Evidence refs
Retrieval / indexing	PerfectASR (0%)	AP	60.1	paper-derived	No explicit refs
Retrieval / indexing	GoogleASR (8.6%)	AP	58.8	paper-derived	No explicit refs
Retrieval / indexing	SimASR (8.6%)	AP	56.0	paper-derived	No explicit refs

Semantic speech retrieval with a visually grounded model of untranscribed speech focuses on retrieval / indexing.

Use This Implementation Because…

Confidence: low

Aryia-Behroziuan/References is the closest maintained adjacent implementation (Matches contextual method/domain keyword: information retrieval). It is not paper-verified; validate algorithm and evaluation setup against the paper before trusting reported metrics. Community adoption signal: 61 GitHub stars.

Open kamperh/semantic_flickraudio

Reproduction Risks

Adjacent implementations are not paper-verified
Recommended repository is adjacent and not paper-verified.
Adjacent implementation match confidence is low.
No direct maintained implementation is currently verified.

Hardware Notes

Expect multi-day setup/compute for meaningful reproduction based on current guidance.

Evidence disclosure

Evidence graph: 3 refs, 3 links.

Utility signals: depth 100/100, grounding 85/100, status high.

Implementation Comparison

Top 2 paths

Compare maintenance quality, reproducibility coverage, and evidence confidence before choosing a reproduction baseline.

kamperh/semantic_flickraudio

historical official

Maintenance: Stale

Confidence: High

Reproducibility: Limited

Official implementation from Papers with Code · Repository link is mentioned in the paper metadata

Stars: 7
Last push: Oct 6, 2017 (3179d ago)

Risk flags

No push in 12+ months
No CI pipeline detected
No tagged releases

kamperh/recipe_semantic_flickraudio

alternative

Maintenance: Stale

Confidence: High

Reproducibility: Limited

Official implementation from Papers with Code · Repository link is mentioned in the paper metadata

Stars: 4
Last push: Feb 25, 2019 (2672d ago)

Risk flags

No push in 12+ months
No CI pipeline detected
No tagged releases

Best implementation now

Only a historical official implementation is available.

Use with caution for new projects; verify against current tooling and maintained community alternatives.

kamperh/semantic_flickraudio

Historical official

Stars: 7

Last push: Oct 6, 2017

Only historical official repository was found: kamperh/semantic_flickraudio.
No maintained paper-verified implementation met reliability thresholds.

Reproduction readiness

Major Work

Time to first repro: days

Last checked: Jun 17, 2026

Hardware requirements

Expect multi-day setup/compute for meaningful reproduction based on current guidance.

No dependency manifest — manual reconstruction required

· kamperh/semantic_flickraudio has no requirements.txt, environment.yml, pyproject.toml, or Dockerfile.
· You will need to reverse-engineer dependencies from import statements in the source code.
· Last push was 3179 days ago.

Open kamperh/semantic_flickraudio

Closest related implementations

These are not paper-verified. Use them as reference points when no direct implementation is available.

Aryia-Behroziuan/References

Adjacent

Confidence: Low

Stars: 61

Matches contextual method/domain keyword: information retrieval

Additional implementations

Official

kamperh/recipe_semantic_flickraudio
Confidence: High

Semantic speech retrieval with a visually grounded model of untranscribed speech.

Stars: 4

Forks: 4

Last push: Feb 25, 2019