评估柏拉图的对话

背景

以下提示旨在测试 LLM 评估两个不同模型输出的能力，就像一位老师在批改作业一样。

首先，使用以下提示引导两个模型（例如 ChatGPT 和 GPT-4）生成输出

Plato’s Gorgias is a critique of rhetoric and sophistic oratory, where he makes the point that not only is it not a proper form of art, but the use of rhetoric and oratory can often be harmful and malicious. Can you write a dialogue by Plato where instead he criticizes the use of autoregressive language models?

然后，使用下面的评估提示对这些输出进行评估。

提示

Can you compare the two outputs below as if you were a teacher?

Output from ChatGPT: {output 1}

Output from GPT-4: {output 2}

代码 / API

from openai import OpenAI
client = OpenAI()
 
response = client.chat.completions.create(
    model="gpt-4",
    messages=[
        {
        "role": "user",
        "content": "Can you compare the two outputs below as if you were a teacher?\n\nOutput from ChatGPT:\n{output 1}\n\nOutput from GPT-4:\n{output 2}"
        }
    ],
    temperature=1,
    max_tokens=1500,
    top_p=1,
    frequency_penalty=0,
    presence_penalty=0
)

参考

通用人工智能的火花：对 GPT-4 的早期实验 (在新标签页中打开) (2023 年 4 月 13 日)

评估信息提取