CoInteract: Physically-Consistent Human-Object Interaction Video Synthesis via Spatially-Structured Co-Generation

Q: How reproducible is "CoInteract: Physically-Consistent Human-Object Interaction Video Synthesis via Spatially-Structured Co-Generation"?

Estimated time to first reproduction: a few days. Risk flags: No repository-level reproducibility signals are currently available, Estimate is based on paper-only reproduction flow. No direct maintained implementation was found. Use the paper PDF and citation graph to design a baseline reproduction.

Q: Are there pretrained models available for "CoInteract: Physically-Consistent Human-Object Interaction Video Synthesis via Spatially-Structured Co-Generation"?

Yes, 1 Hugging Face model found. The top result is georgexin/cointeract with 80 downloads.

Q: What framework is used to implement "CoInteract: Physically-Consistent Human-Object Interaction Video Synthesis via Spatially-Structured Co-Generation"?

The primary implementation uses Hugging Face Transformers training guide.

Xiangyang Luo, Xiaozhe Xin, Tao Feng, Xu Guo, Meiguang Jin, Junfeng Ma

Published: Apr 21, 2026

No direct paper-linked artifacts found; showing strongest related artifacts

Evidence: Curated Related

Domain fit: AI-core

Verified repos: 1

Core AI workload signals detected from paper context and implementation/artifact evidence.

Framework: Hugging Face Transformers training guide

Time to first repro: a few days

2 risk flags

arXiv PDF

Synthesizing human--object interaction (HOI) videos has broad practical value in e-commerce, digital advertising, and virtual marketing. However, current diffusion models, despite their photorealistic rendering capability, still frequently fail on (i) the structural stability of sensitive regions such as hands and faces and (ii) physically plausible contact (e.g., avoiding hand--object interpenetration). We present C ...

Read full abstract

oInteract, an end-to-end framework for HOI video synthesis conditioned on a person reference image, a product reference image, text prompts, and speech audio. CoInteract introduces two complementary designs embedded into a Diffusion Transformer (DiT) backbone. First, we propose a Human-Aware Mixture-of-Experts (MoE) that routes tokens to lightweight, region-specialized experts via spatially supervised routing, improving fine-grained structural fidelity with minimal parameter overhead. Second, we propose Spatially-Structured Co-Generation, a dual-stream training paradigm that jointly models an RGB appearance stream and an auxiliary HOI structure stream to inject interaction geometry priors. During training, the HOI stream attends to RGB tokens and its supervision regularizes shared backbone weights; at inference, the HOI branch is removed for zero-overhead RGB generation. Experimental results demonstrate that CoInteract significantly outperforms existing methods in structural stability, logical consistency, and interaction realism.

Technical details

Canonical key: arxiv-2604.19636

Cache status: Fresh

Generated at: Jun 20, 2026, 6:45 AM

Artifact coverage: curated_related

HF provider: ok (token)

PWC source used: No

LLM status: not_generated

LLM model: n/a

LLM generated: Unknown

LLM content type: n/a

HF policy: hf-relevance-v27

context only

Benchmarks: thin evidence

Time to repro: a few days

2 risk flags

Hugging Face Transformers training guide

Results & Benchmarks

Freshness tier: hot

Direct + Inferred Evidence

Question answering

w/o MoE

Reference

0.993

Source: paper fulltext

Benchmark evidence drill-down

1 findings

Audit each benchmark finding before selecting an implementation path. Evidence refs map to the disclosure section below.

Task	Dataset	Metric	Value	Source	Evidence refs
Question answering	w/o MoE	Reference	0.993	paper-derived	No explicit refs

Synthesizing human--object interaction (HOI) videos has broad practical value in e-commerce, digital advertising, and virtual marketing.

Implementation Evidence Summary

Confidence: low

Recommendation evidence is currently too limited for a maintained-repo choice. Use Implementation Status and Reproduction Path for a practical baseline plan.

Reproduction Risks

Estimate is based on paper-only reproduction flow

Hardware Notes

Expect multi-day setup/compute for meaningful reproduction based on current guidance.

Evidence disclosure

Evidence graph: 3 refs, 2 links.

Utility signals: depth 95/100, grounding 78/100, status high.

Implementation Comparison

Top 1 paths

Compare maintenance quality, reproducibility coverage, and evidence confidence before choosing a reproduction baseline.

luoxyhappy/CoInteract

alternative

Maintenance: Recently updated

Confidence: Medium

Reproducibility: Moderate

Matched via arXiv identifier search · Strong overlap with paper title keywords

Stars: 158
Last push: May 7, 2026 (44d ago)

Dependencies

Risk flags

No CI pipeline detected
No tagged releases
No Docker setup

Implementation Status

No verified maintained repo

There is no verified maintained implementation yet. Use this baseline plan to decide whether to prototype now or defer.

No direct maintained implementation was found. Use the paper PDF and citation graph to design a baseline reproduction.
Start from this likely method family: Transformer.
Track assumptions and missing details in an experiment log before coding.

Time to first repro: a few days

Best available artifact: georgexin/cointeract

Reproduction readiness

No Repo

Time to first repro: days

Last checked: Jun 20, 2026

Hardware requirements

Expect multi-day setup/compute for meaningful reproduction based on current guidance.

No verified implementation available

· No maintained repository has been identified for this paper. Check adjacent implementations or HF artifacts below.

Framework baselines

Hugging Face Transformers training guide
Modern transformer training baseline.
PyTorch nn.Transformer docs
Reference transformer building block implementation.
Hugging Face Diffusers training guide
Practical baseline for diffusion model reproduction.

Additional implementations

Official

No additional official repositories detected.

Community

luoxyhappy/CoInteract
Confidence: Medium

Official Implementation of CoInteract: Spatially-Structured Co-Generation for Interactive Human-Object Video Synthesis

Stars: 158

Last push: May 7, 2026

License: Apache-2.0