Abstract: Deep learning (DL) algorithms are swiftly finding applications in computer vision and natural language processing. Nonetheless, they can also be employed for creating convincing deepfakes, ...
We introduce JavisDiT, a novel & SoTA Joint Audio-Video Diffusion Transformer designed for synchronized audio-video generation (JAVG) from open-ended user prompts. We hope to set a new standard for ...