Official implementation from Papers with Code · Repository link is mentioned in the paper metadata
- Stars
- 2,252
- Last push
- Mar 6, 2025 (373d ago)
Risk flags
- No push in 12+ months
- No Docker setup
Jiaqi Xu, Kunzhe Huang, Xinyi Zou, Yunkuo Chen, Bo Liu, MengLi Cheng, Jun Huang, Xing Shi
Core AI workload signals detected from paper context and implementation/artifact evidence.
This paper introduces EasyAnimate, an efficient and high quality video generation framework that leverages diffusion transformers to achieve high-quality video production, encompassing data processing, model training, and end-to-end inference. Despite substantial advancements achieved by video diffusion models, existing video generation models still struggles with slow generation speeds and less-than-ideal video qual ...
ity. To improve training and inference efficiency without compromising performance, we propose Hybrid Window Attention. We design the multidirectional sliding window attention in Hybrid Window Attention, which provides stronger receptive capabilities in 3D dimensions compared to naive one, while reducing the model's computational complexity as the video sequence length increases. To enhance video generation quality, we optimize EasyAnimate using reward backpropagation to better align with human preferences. As a post-training method, it greatly enhances the model's performance while ensuring efficiency. In addition to the aforementioned improvements, EasyAnimate integrates a series of further refinements that significantly improve both computational efficiency and model performance. We introduce a new training strategy called Training with Token Length to resolve uneven GPU utilization in training videos of varying resolutions and lengths, thereby enhancing efficiency. Additionally, we use a multimodal large language model as the text encoder to improve text comprehension of the model. Experiments demonstrate significant enhancements resulting from the above improvements. The EasyAnimate achieves state-of-the-art performance on both the VBench leaderboard and human evaluation. Code and pre-trained models are available at https://github.com/aigc-apps/EasyAnimate.
Researcher verdict
This page has evidence-backed benchmark findings and a concrete implementation recommendation anchored on aigc-apps/easyanimate. Use it as an implementation baseline, then validate benchmark parity before adapting it.
Why this page is still worth reading
Benchmark trust
Some benchmark signal exists in the extracted evidence, but it is not structured strongly enough yet for a confident benchmark decision.
Use this page as
Start here when you need the most practical implementation path quickly.
Some benchmark signal exists in the extracted evidence, but it is not structured strongly enough yet for a confident benchmark decision.
This paper introduces EasyAnimate, an efficient and high quality video generation framework that leverages diffusion transformers to achieve high-quality video production, encompassing data processing, model training, and end-to-end inference.
aigc-apps/easyanimate is the strongest maintained implementation based on ranking signals. CI workflows are present. License is declared (Apache-2.0).
Open aigc-apps/easyanimateLLM evidence refs: paper.abstract, evidencePack.paperSections[id=paper_abstract], researcherSummary.coreClaim, researcherSummary.reproductionRisks[0], repos[0].fullName, paper.title, summary.hasReliableImplementation
Evidence graph: 4 refs, 4 links.
Utility signals: depth 55/100, grounding 85/100, status medium.
Compare maintenance quality, reproducibility coverage, and evidence confidence before choosing a reproduction baseline.
Official implementation from Papers with Code · Repository link is mentioned in the paper metadata
Risk flags
AI-generated summary grounded in paper metadata and artifact signals.
EasyAnimate is an efficient high-quality video generation framework built on diffusion transformers, covering data processing, model training, and end-to-end inference. This page includes benchmark evidence for text-to-video generation on VBench. Reproduction guidance focuses on implementation viability and concrete risk controls.
Use aigc-apps/easyanimate first because deterministic ranking and extracted evidence align on implementation viability. Start with the repo setup path, then validate benchmark reproduction before adaptation.
📺 An End-to-End Solution for High-Resolution and Long Video Generation Based on Transformer Diffusion
Follow the direct implementation path
Start with aigc-apps/easyanimate and validate setup instructions in README.
Reproduce the baseline result with the provided defaults before modifying hyperparameters.
Log exact dependency versions and runtime environment for reproducibility.
No direct paper-linked artifacts were found. Showing strongest curated related artifacts for faster exploration.
Broaden model search
No trustworthy dataset matches right now.
Search datasets on Hugging FaceBroaden demo search
Tasks
None detected
Methods
Transformer, Diffusion
Domains
Computer vision, Natural Language Processing
Evaluation & Human Feedback Data
Open this paper in HFEPX to review benchmark signals, evaluation modes, and human-feedback protocol context.
Open in HFEPXExplore Similar Papers
Jump to Paper2Code search queries derived from this paper's research context.
Need human evaluators for your AI research? Scale annotation with expert AI Trainers.