xAI's Aurora-powered video model. Text or image in — cinematic video with native audio out. Dialogue, music, and sound effects generated in ~30 seconds.
Built on the Aurora engine — xAI's autoregressive video architecture trained on 110,000 NVIDIA GB200 GPUs.
Generate landscape, vertical, or square video — Grok Imagine supports all three major aspect ratios natively.
Two duration options to fit your creative needs — from punchy hooks to longer narrative sequences.
Perfect for social hooks, product reveals, reaction clips, and punchy visual statements that grab attention instantly.
Room for narrative beats, character moments, multi-scene pacing, and story arcs with beginning, middle, and end.
Two ways to create. Describe a scene from scratch — or animate a photo you already have.
Write a text prompt describing your scene — characters, setting, camera angle, mood, lighting. Grok Imagine generates the full video with synchronized audio from words alone.
Upload a reference image and describe how it should move. Grok Imagine animates the photo forward while preserving the original composition, colors, and subject identity.
Built on xAI's Aurora engine with native audio generation, temporal coherence, and cinematic shot understanding.
Dialogue with lip-sync, contextual background music, and ambient sound effects — all generated natively alongside the video. No separate audio pipeline.
Aurora maintains frame-to-frame coherence across the full clip. Characters stay consistent, objects persist, and camera motion flows without artifacts or flickering.
Describe camera movements in your prompt — tracking shots, close-ups, panning, aerial views — and Grok Imagine executes them with professional-grade framing.
Three creative modes — Normal, Fun, and Spicy — let you dial the aesthetic from photorealistic to highly stylized. Works across live-action, animation, and abstract styles.
Upload any image and Grok Imagine brings it to life. The model preserves subject identity, composition, and visual style while generating natural, fluid motion.
Every Grok Imagine video ships with three layers of audio — dialogue, music, and sound effects — generated natively alongside the visuals.
Characters speak with natural voice and precise lip synchronization. The model generates speech that matches mouth movements frame-by-frame — no manual dubbing needed.
Background music adapts to scene mood and tempo automatically. Action scenes get intensity and drive; quiet moments get ambient, atmospheric scoring.
Footsteps on gravel, rain on windows, engine rumble, wind through trees — environmental audio is generated and precisely timed to match the visual content.
xAI's proprietary autoregressive video architecture — the largest known training infrastructure for a video model.
Aurora is trained on the largest known GPU cluster dedicated to video generation — 110,000 NVIDIA GB200 GPUs. This massive compute enables the model to learn complex temporal dynamics, audio-visual synchronization, and physically plausible motion at scale.
From prompt to finished video with audio in under a minute.
For T2V, describe the scene, characters, camera angle, and mood. For I2V, upload a reference image and describe how it should animate. Choose Normal, Fun, or Spicy mode.
16:9 for landscape (YouTube, desktop), 9:16 for vertical (TikTok, Reels, Shorts), or 1:1 for square (Instagram). Pick 6s or 10s duration.
Aurora renders your video with fully synchronized audio — dialogue, music, and sound effects all included. Download and use immediately, no post-processing required.
From content creators to marketing teams — Grok Imagine accelerates every video workflow.
Create vertical 9:16 videos for TikTok, Reels, and Shorts — complete with music and sound effects, ready to post.
Rapid-prototype video ads and promotional clips with cinematic quality. Test concepts before committing to production.
Animate product images into dynamic showcase videos. Turn static product shots into engaging motion content.
Draft short scenes, music video concepts, or story sequences with character consistency and natural dialogue.
Fun and Spicy modes produce playful, exaggerated, or surreal results — ideal for meme-worthy, shareable video content.
Visualize ideas before investing in production. Generate quick visual references for pitches, mood boards, and storyboards.
Everything you need to know about generating video with Grok Imagine.
Generate your first video with native audio today. Text or image in — cinematic output in ~30 seconds.
No credit card required for free tier · Cancel anytime