VOID – Video Object and Interaction Deletion

🌐 Project Page | 💻 GitHub

Upload a video and its quadmask, enter a prompt describing the scene after removal, and VOID will erase the object along with its physical interactions.

Built on CogVideoX-Fun-V1.5-5B fine-tuned for interaction-aware video inpainting.

Input video

Quadmask video

Prompt — describe the scene after removal

Inpainted output

Quadmask format

The quadmask is a grayscale video where each pixel value encodes what role that region plays:

Pixel value	Meaning
0 (black)	Primary object to remove
63 (dark grey)	Overlap of primary object / affected zone
127 (mid grey)	Affected region — shadows, reflections, new and old trajectories
255 (white)	Background — keep as-is

Use the VLM-Mask-Reasoner pipeline included in the repo to generate quadmasks automatically.

Sample sequences — click to load inputs