AI video generation is becoming mainstream, thanks in part to ChatGPT’s Sora AI. The old version of Sora, Sora 1, is still available to paid ChatGPT subscribers, but the current version, Sora 2, is rolling out to a wider audience after initially being invite-only. If you don’t have access to it yet, you will likely have it soon.
Sora can create videos based on just about any prompt, and its videos also have audio. Results sometimes seriously impress, but they can also disappoint, especially without careful prompt calibration and multiple iterations. Sora isn’t just a video generation model, either: It’s also a TikTok-like social platform for sharing AI videos.
To evaluate their video generation abilities, I gave ChatGPT (Sora 2) and Gemini (Veo 3.1: Quality) three prompts, starting with: “Somebody going about their daily life in a trendy apartment with rustic decor.” ChatGPT’s video doesn’t impress. It treats a levitating cup as if it were a pour-over, and afterwards, the person in the video awkwardly crouches in front of a table. Veo’s video isn’t great either. In it, a person cooking grabs a spoon, but the spoon he grabs duplicates, leaving one in its original position on the table and one in the person’s hand. Oddly, a record player is also on the kitchen counter. In both videos, the audio is slightly distorted, doesn’t sync up perfectly, and is missing certain sounds.
To test the chatbots’ abilities to handle complex motion, I asked them to create a video of somebody solving a Rubik’s Cube in a competitive setting with the following prompt: “Show me a pro Rubik’s Cube solver solving a cube.” Once again, neither video is especially good. Both feature distorted cubes, which also means that the audio in both videos doesn’t quite sync up with what’s on screen. ChatGPT’s timer doesn’t make sense, while Veo’s camera zoom is distracting. The voice of ChatGPT’s persona also has a slightly distorted effect, which makes it feel AI-generated.
My final test was for text generation within a video: “Generate me a video of a teacher in front of a class writing down y = mx+b on a whiteboard while explaining the concept.” Unsurprisingly, there are significant issues with these videos as well. ChatGPT’s text is nonsense, and the voice of its teacher is, again, distorted. Veo’s video, confusingly, starts with “y = __ +b” on a whiteboard, and the teacher fills in the “mx” portion, while most of what its teacher actually says is garbled nonsense. Neither delivers on my prompt.
Even though my tests suggest otherwise, you can generate impressive videos with both ChatGPT and Gemini. However, this requires numerous prompt tweaks, multiple generations, and considerable time. If you pay for ChatGPT’s expensive, $200-per-month Pro subscription, you can use Sora 2 Pro, rather than Sora 2. I ran these same prompts through Sora 2 Pro below, and although the quality of the generation does seem slightly higher, the videos still feature various errors and distortions.
ChatGPT Pro also enables you to leverage Sora 2’s Storyboard feature, which essentially breaks down videos into individual scenes that you can script. These videos can be 25 seconds in length compared with 10 seconds for the standard ones. Although this feature is useful for generating more complex videos, it doesn’t result in meaningfully fewer errors and distortions in testing. Veo has a somewhat similar Flow tool for editing videos and stitching them together, but similarly doesn’t avoid issues.

