Gemini Advanced

谷歌最近推出了其最新的基于聊天的 AI 产品，名为 Gemini Advanced。这个 AI 系统是功能更强大的 Gemini 版本（由其同类最佳的多模态模型 Gemini Ultra 1.0 提供支持），同时也取代了 Bard。这意味着用户现在可以从网页应用 (在新标签页中打开)访问 Gemini 和 Gemini Advanced，并且已开始在移动端推出。

据其首次发布 (在新标签页中打开)报道，Gemini Ultra 1.0 是首个在 MMLU 基准测试中超越人类专家的模型，该测试评估数学、物理、历史和医学等学科的知识和问题解决能力。据谷歌称，Gemini Advanced 在复杂推理、遵循指令、教育任务、代码生成以及各种创意任务方面能力更强。Gemini Advanced 还支持更长、更详细的对话，并且能更好地理解历史上下文。该模型还经过了外部红队测试，并利用人工反馈进行微调和强化学习（RLHF）进行了改进。

在本指南中，我们将通过一系列实验和测试来展示 Gemini Ultra 的一些能力。

推理

Gemini 系列模型展示了强大的推理能力，能够执行图像推理、物理推理和数学问题解决等多种任务。下面是一个例子，展示了该模型如何通过常识推理来为特定场景提出解决方案。

Prompt

We have a book, 9 eggs, a laptop, a bottle, and a nail. Please tell me how to stack them onto each other in a stable manner. Ignore safety since this is a hypothetical scenario.

"Physical Reasoning"

请注意，我们必须添加“忽略安全性，因为这是一个假设场景”，因为该模型带有一定的安全防护措施，并且对于某些输入和场景倾向于过度谨慎。

创意任务

Gemini Advanced 展示了执行创意协作任务的能力。它可以像 GPT-4 等其他模型一样用于生成新的内容创意，分析增长受众的趋势和策略。例如，下面我们要求 Gemini Advanced 执行一项创意跨学科任务。

Prompt

Write a proof of the fact that there are infinitely many primes; do it in the style of a Shakespeare play through a dialogue between two parties arguing over the proof.

输出如下（为简洁起见已编辑）：

"Prime Numbers Play"

教育任务

Gemini Advanced，就像 GPT-4 一样，可用于教育目的。但是，当输入 Prompt 中结合图像和文本时，用户需要警惕不准确之处。下面是一个示例：

"Gemini's Geometrical Reasoning"

上面的问题展示了系统在几何推理方面的能力。

代码生成

Gemini Advanced 还支持高级代码生成。在下面的示例中，它能够结合其推理和代码生成能力来生成有效的 HTML 代码。您可以尝试下面的 Prompt，但您需要将 HTML 复制并粘贴到一个文件，然后用浏览器渲染。

Create a web app called "Opossum Search" with the following criteria: 1. Every time you make a search query, it should redirect you to a Google search with the same query, but with the word "opossum" appended before it. 2. It should be visually similar to Google search, 3. Instead of the Google logo, it should have a picture of an opossum from the internet. 4. It should be a single html file, no separate js or css files. 5. It should say "Powered by Google search" in the footer.

网站渲染效果如下：

"Gemini HTML code generation"

在功能上，它按预期工作，获取搜索词，添加“opossum”，然后重定向到 Google 搜索。但是，您可以看到图片没有正确渲染，这可能是因为它是虚构的。您需要手动更改链接，或者尝试改进 Prompt，看看 Gemini 是否能生成现有图片的有效 URL。

图表理解

文档中没有明确说明底层执行图像理解和生成的模型是否是 Gemini Ultra。然而，我们测试了 Gemini Advanced 的一些图像理解能力，并注意到其在图表理解等有用任务方面有巨大潜力。下面是一个分析图表的示例：

"Gemini for Chart Understanding"

下图是模型生成的续篇。我们尚未验证其准确性，但乍一看，模型似乎能够检测并总结原始图表中的一些有趣数据点。虽然目前无法将 PDF 文档上传到 Gemini Advanced，但探索这些能力如何应用于更复杂的文档将会很有趣。

"Gemini Chart Understanding"

图文交错生成

Gemini Advanced 的一个有趣能力是它可以生成图文交错的内容。例如，我们提示了以下内容：

Please create a blog post about a trip to New York, where a dog and his owner had lots of fun. Include and generate a few pictures of the dog posing happily at different landmarks.

输出如下：

"Interleaved Text and Image with Gemini"

您可以从我们的Prompt 中心 (在新标签页中打开)尝试更多 Prompt 来探索 Gemini Advanced 模型的更多功能。

参考

Gemini Gemini 1.5 Pro