Table of content
- What is HunYuan Video model?
- How does it differ from competitors?
- Architectural novelties
- How to install
- How to run?
- Docker recipe
What is HunYuan Video model?
HunYuan is a new video model that supposedly outperforms SORA, the SOTA of video models.
How does it differ from competitors?
HunYuan is OpenSource, has 13 billion parameters, the largest for an open model to date and ranks better than closed-source versions.
Architectural novelties
To achieve the above, HunYuan innovated in these areas:
-
Unified Dual-to-Single-Stream Design: The model processes video and text tokens independently at first and later merges them for deeper multimodal understanding. This improves how visual and semantic information are aligned.
-
Multimodal Large Language Model (MLLM) as Text Encoder: Instead of traditional encoders like CLIP or T5, HunyuanVideo uses MLLM with a decoder-only structure, enhancing text-video alignment, complex reasoning, and image detail capture.
-
3D VAE Compression: The use of CausalConv3D compresses high-resolution videos into a compact latent space efficiently, allowing training on full-resolution, high-frame-rate videos without excessive computational cost.
-
Prompt Rewrite Mechanism: It fine-tunes user prompts into model-optimized instructions for better comprehension (Normal mode) or higher visual quality (Master mode), improving generation accuracy and adaptability.
How to install
First clone the repo
git clone https://github.com/tencent/HunyuanVideo
enter
cd HunyuanVideo
Prepare conda environment
conda env create -f environment.yml
Activate
conda activate HunyuanVideo
Install dependencies
python -m pip install -r requirements.txt
Install flash attention v2 for acceleration (requires CUDA 11.8 or above)
python -m pip install git+https://github.com/Dao-AILab/flash-attention.git@v2.5.9.post1
How to run?
Run this inside the folder
python sample_video.py \
--video-size 720 1280 \
--video-length 129 \
--infer-steps 30 \
--prompt "a cat is running, realistic." \
--flow-reverse \
--seed 0 \
--use-cpu-offload \
--save-path ./results
Docker recipe
For Cuda 12, you can run these instructions
wget https://aivideo.hunyuan.tencent.com/download/HunyuanVideo/hunyuan_video_cu12.tar
docker load -i hunyuan_video.tar
docker image ls
docker run -itd --gpus all --init --net=host --uts=host --ipc=host --name hunyuanvideo --security-opt=seccomp=unconfined --ulimit=stack=67108864 --ulimit=memlock=-1 --privileged docker_image_tag