Sora

OpenAI 推出了其新的文本到视频 AI 模型 Sora。Sora 可以根据文本指令创建最长一分钟的逼真和富有想象力的场景视频。

OpenAI 报告称，其愿景是构建能够理解和模拟运动中物理世界的 AI 系统，并训练模型解决需要现实世界交互的问题。

功能

Sora 可以生成保持高视觉质量并紧密遵循用户提示的视频。Sora 还能够生成包含多个角色、不同运动类型和背景的复杂场景，并理解它们之间的关系。其他功能包括在单个视频中创建多个镜头，并在角色和视觉风格上保持一致性。以下是 Sora 生成的一些视频示例。

提示词

A stylish woman walks down a Tokyo street filled with warm glowing neon and animated city signage. She wears a black leather jacket, a long red dress, and black boots, and carries a black purse. She wears sunglasses and red lipstick. She walks confidently and casually. The street is damp and reflective, creating a mirror effect of the colorful lights. Many pedestrians walk about.

提示词

A movie trailer featuring the adventures of the 30 year old space man wearing a red wool knitted motorcycle helmet, blue sky, salt desert, cinematic style, shot on 35mm film, vivid colors.

视频来源： https://openai.com/sora （在新标签页中打开）

方法

据报道，Sora 是一种扩散模型，可以生成完整视频或延长已生成视频。它还使用了 Transformer 架构，从而实现了性能扩展。视频和图像被表示为 patch（块），类似于 GPT 中的 token，形成了统一的视频生成系统，从而实现更长的时长、更高的分辨率和更灵活的宽高比。他们使用了 DALL·E 3 中的 recaptioning 技术，使 Sora 能够更紧密地遵循文本指令。Sora 还能从给定的图像生成视频，使系统能够准确地为图像制作动画。

限制与安全

据报告，Sora 的限制包括模拟物理世界以及缺乏因果关系。提示中描述的空间细节和事件（例如，摄像机轨迹）有时也会被 Sora 误解。OpenAI 报告称，他们正在向红队成员和创作者开放 Sora，以评估其潜在危害和能力。

提示词

Prompt: Step-printing scene of a person running, cinematic film shot in 35mm.

视频来源： https://openai.com/sora （在新标签页中打开）

在此处查找 Sora 模型生成的更多视频示例： https://openai.com/sora （在新标签页中打开）

Phi-2 LLM 合集