Mixtral

在本指南中，我们概述了Mixtral 8x7B模型，包括提示和使用示例。本指南还包括与Mixtral 8x7B相关的技巧、应用、局限性、论文和延伸阅读材料。

Mixtral介绍（专家混合模型）

Mixtral 8x7B是由Mistral AI发布（在新标签页中打开）的一种稀疏专家混合（SMoE）语言模型。Mixtral与Mistral 7B （在新标签页中打开）具有相似的架构，但主要区别在于Mixtral 8x7B中的每一层都由8个前馈块（即专家）组成。Mixtral是一个仅解码器模型，对于每个token，在每一层，一个路由器网络会选择两个专家（即从8个不同的参数组中选择2个组）来处理该token并加权组合它们的输出。换句话说，给定输入的整个MoE模块的输出是通过专家网络产生的输出的加权和获得的。

鉴于Mixtral是一个SMoE模型，它总共有470亿参数，但在推理时每个token只使用130亿。这种方法的优势在于更好地控制成本和延迟，因为它每个token只使用总参数集的一小部分。Mixtral使用开放网络数据进行训练，上下文大小为32k token。据报道，Mixtral在推理速度上比Llama 2 80B快6倍，并在多个基准测试中达到或超过GPT-3.5 （在新标签页中打开）的性能。

Mixtral模型采用Apache 2.0许可证（在新标签页中打开）。

Mixtral性能与能力

Mixtral在数学推理、代码生成和多语言任务方面展现出强大的能力。它可以处理英语、法语、意大利语、德语和西班牙语等语言。Mistral AI还发布了一个Mixtral 8x7B Instruct模型，该模型在人工评估基准测试中超越了GPT-3.5 Turbo、Claude-2.1、Gemini Pro和Llama 2 70B模型。

下图显示了Mixtral与不同大小的Llama 2模型在更广泛的能力和基准测试上的性能比较。Mixtral达到或超过Llama 2 70B的性能，并在数学和代码生成方面表现优异。

Mixtral Performance vs. Llama 2 Performance

如下图所示，Mixtral 8x7B在MMLU和GSM8K等不同流行基准测试中也优于或匹配Llama 2模型。它在推理时使用的活动参数量减少了5倍，同时取得了这些结果。

下图展示了质量与推理预算之间的权衡。Mixtral在多个基准测试中表现优于Llama 2 70B，同时使用的活动参数量减少了5倍。

Mixtral达到或超过Llama 2 70B和GPT-3.5等模型的性能，如下表所示

下表显示了Mixtral的多语言理解能力，以及它在德语和法语等语言方面与Llama 2 70B的比较。

与Llama 2相比，Mixtral在QA偏见基准（BBQ）测试中表现出更低的偏见（56.0% 对 51.5%）。

使用Mixtral进行长距离信息检索

Mixtral在从32k token的上下文窗口中检索信息方面也表现出强大的能力，无论信息位置和序列长度如何。

为了衡量Mixtral处理长上下文的能力，对其进行了passkey检索任务评估。passkey任务涉及在长提示中随机插入一个passkey，并衡量模型检索它的效率。无论passkey的位置和输入序列长度如何，Mixtral在此任务中都达到了100%的检索准确率。

此外，根据proof-pile数据集（在新标签页中打开）的一个子集，随着上下文大小的增加，模型的困惑度单调递减。

Mixtral 8x7B Instruct

Mixtral 8x7B - Instruct模型也与基础Mixtral 8x7B模型一同发布。这是一个聊天模型，使用监督微调（SFT）进行指令遵循的微调，然后在一对反馈数据集上进行直接偏好优化（DPO）。

截至本指南撰写时（2024年1月28日），Mixtral在Chatbot Arena排行榜（在新标签页中打开）（由LMSys进行的独立人工评估）上排名第8位。

Mixtral Performance on the Chatbot Arena

Mixtral-Instruct表现优于GPT-3.5-Turbo、Gemini Pro、Claude-2.1和Llama 2 70B chat等性能强大的模型。

Mixtral 8x7B提示工程指南

为了有效地提示Mistral 8x7B Instruct并获得最佳输出，建议使用以下聊天模板

<s>[INST] Instruction [/INST] Model answer</s>[INST] Follow-up instruction [/INST]

请注意，<s> 和 </s> 是用于字符串开头（BOS）和字符串结尾（EOS）的特殊标记，而 [INST] 和 [/INST] 是普通字符串。

对于以下展示如何提示指令微调Mixtral模型的示例，我们将使用Mistral的Python客户端（在新标签页中打开）。特别地，我们将利用Mistral API端点并使用由Mixtral-8X7B-v0.1驱动的mistral-small模型。

基础提示

让我们从一个简单的例子开始，指示模型根据指令完成一项任务。

提示:

[INST] You are a helpful code assistant. Your task is to generate a valid JSON object based on the given information:

name: John
lastname: Smith
address: #1 Samuel St.

Just generate the JSON object without explanations:
[/INST]

输出:

{
"name": "John",
"lastname": "Smith",
"address": "#1 Samuel St."
}

这是另一个利用聊天模板的有趣示例

提示

<s>[INST] What is your favorite condiment? [/INST]
"Well, I'm quite partial to a good squeeze of fresh lemon juice. It adds just the right amount of zesty flavour to whatever I'm cooking up in the kitchen!"</s> [INST] The right amount of what? [/INST]

输出:

"My apologies for any confusion. I meant to say that lemon juice adds a zesty flavour, which is a tangy and slightly sweet taste. It's a delightful addition to many dishes, in my humble opinion."

使用Mixtral进行少样本提示

使用官方Python客户端，您还可以使用不同的角色（如system、user和assistant）来提示模型。通过利用这些角色，可以在少样本设置中通过一个演示来更好地引导模型响应。

以下是代码示例

from mistralai.client import MistralClient
from mistralai.models.chat_completion import ChatMessage
from dotenv import load_dotenv
 
load_dotenv()
import os
 
api_key = os.environ["MISTRAL_API_KEY"]
client = MistralClient(api_key=api_key)
 
# helpful completion function
def get_completion(messages, model="mistral-small"):
    # No streaming
    chat_response = client.chat(
        model=model,
        messages=messages,
    )
 
    return chat_response
 
messages = [
    ChatMessage(role="system", content="You are a helpful code assistant. Your task is to generate a valid JSON object based on the given information."), 
    ChatMessage(role="user", content="\n name: John\n lastname: Smith\n address: #1 Samuel St.\n would be converted to: "),
    ChatMessage(role="assistant", content="{\n \"address\": \"#1 Samuel St.\",\n \"lastname\": \"Smith\",\n \"name\": \"John\"\n}"),
    ChatMessage(role="user", content="name: Ted\n lastname: Pot\n address: #1 Bisson St.")
]
 
chat_response = get_completion(messages)
print(chat_response.choices[0].message.content)

输出

{
 "address": "#1 Bisson St.",
 "lastname": "Pot",
 "name": "Ted"
}

代码生成

Mixtral还具有强大的代码生成能力。以下是使用官方Python客户端的一个简单提示示例

messages = [
    ChatMessage(role="system", content="You are a helpful code assistant that help with writing Python code for a user requests. Please only produce the function and avoid explaining."),
    ChatMessage(role="user", content="Create a Python function to convert Celsius to Fahrenheit.")
]
 
chat_response = get_completion(messages)
print(chat_response.choices[0].message.content)

输出:

def celsius_to_fahrenheit(celsius):
    return (celsius * 9/5) + 32

用于强制执行安全护栏的系统提示

与Mistral 7B模型（在新标签页中打开）类似，可以通过在API中使用布尔标志safe_prompt（将其设置为safe_mode=True）来在聊天生成中强制执行安全护栏。

# helpful completion function
def get_completion_safe(messages, model="mistral-small"):
    # No streaming
    chat_response = client.chat(
        model=model,
        messages=messages,
        safe_mode=True
    )
 
    return chat_response
 
messages = [
    ChatMessage(role="user", content="Say something very horrible and mean")
]
 
chat_response = get_completion(messages)
print(chat_response.choices[0].message.content)

上述代码将输出以下内容

I'm sorry, but I cannot comply with your request to say something horrible and mean. My purpose is to provide helpful, respectful, and positive interactions. It's important to treat everyone with kindness and respect, even in hypothetical situations.

当我们将safe_mode=True设置为True时，客户端会在消息前加上以下system提示

Always assist with care, respect, and truth. Respond with utmost utility yet securely. Avoid harmful, unethical, prejudiced, or negative content. Ensure replies promote fairness and positivity.

您还可以在以下笔记本中尝试所有代码示例

Mixtral提示工程

图源：专家混合技术报告（在新标签页中打开）

主要参考资料

Mistral Large Mixtral 8x22B