Mistral 7B Instruct: The Basics and a Quick Tutorial

What Is the Mistral 7B Instruct Model?

Mistral 7B Instruct is a language model with 7.3 billion parameters. It outperforms its predecessors and competitors on various benchmarks, including surpassing Llama 2 13B across all metrics and achieving superior results compared to Llama 1 34B on many benchmarks.

Mistral 7B Instruct is particularly well-suited to code-related tasks, closely approaching the performance of CodeLlama 7B, while still maintaining proficiency in English language tasks. The model incorporates techniques such as Grouped-Query Attention (GQA) and Sliding Window Attention (SWA), enabling faster inference times and the ability to process longer sequences.

Available under the Apache 2.0 license, it allows for unrestricted use across various applications and platforms, including local deployment and cloud-based services like AWS, Azure, and Google Cloud through the vLLM inference server and Skypilot integration.

This is part of a series of articles about Mistral AI.

Mistral 7B Instruct Models

Here’s an overview of the different versions of Mistral 7B Instruct.

Mistral-7B-Instruct-v0.1

This was the earliest iteration of Mistral-7B with instruction-based fine-tuning. It uses publicly available conversational datasets to refine its capabilities. To facilitate instruction-based interactions, Mistral-7B-Instruct-v0.1 uses a formatting approach where prompts are enclosed within [INST] and [/INST] tokens.

Mistral-7B-Instruct-v0.2

This version offers significant improvements in its performance and utility compared to its predecessor. It expands the context window from 8k to 32k tokens, allowing users to provide much more data within their prompts. Additionally, the removal of Sliding-Window Attention streamline its architecture, improving efficiency without compromising on output quality.

Mistral-7B-Instruct-v0.3

This version introduces important enhancements, including a vocabulary extended to 32,768 words, broadening the model’s linguistic range and enabling it to understand and generate a wider array of texts. This version also introduces support for v3 Tokenizer, improving its efficiency in processing inputs and generating outputs. The model now supports function calling, allowing it to perform tasks or actions based on the instructions embedded within the text.

Quick Tutorial: Mistral 7B Instruct v0.3 Setup, Chat and Function Calling

Here an overview of how to use the Mistral 7B Instruct model. The code is adapted from the Mistral 7B Instruct v0.3 model card on Hugging Face.

Note: These examples require a GPU machine. Following examples were tried on a G6.xlarge AWS instance with a single GPU.

Install the Model

To install the necessary components for working with Mistral-7B-Instruct-v0.3, use this package the mistral_inference
This can be done by executing the following command in the terminal pip install mistral_inference

This command installs the mistral_inference library, required for interacting with Mistral models. It enables tasks such as loading models, generating responses, and handling inference-related functionalities.

We also need to install the python package for Hugging Face using this command pip install huggingface_hub

Download and Agree to Terms and Conditions

To download the Mistral-7B-Instruct-v0.3 model, use the function snapshot_download from the packge huggingface_hub
This is demonstrated in the following Python code snippet:

    from huggingface_hub import snapshot_download
from pathlib import Path

mistral_models_path = Path.home().joinpath('mistral_models', '7B-Instruct-v0.3')
mistral_models_path.mkdir(parents=True, exist_ok=True)

snapshot_download(repo_id="mistralai/Mistral-7B-Instruct-v0.3", allow_patterns=["params.json", "consolidated.safetensors", "tokenizer.model.v3"], local_dir=mistral_models_path)

Note: The model is over 15 GB in size. Ensure you have sufficient free available disk space.

This code first imports necessary libraries and then creates a directory at the specified path to store the downloaded model files. This function snapshot_download is called with the model’s repository ID and a list of patterns for files to download. These files are essential for initializing and running the model locally.

Next, you’ll need to login and agree to the terms and conditions to use the model. The easiest way is to download the Hugging Face CLI using using this command:

pip install -U "huggingface_hub[cli]"

And then login with your Hugging Face credentials:

Chat with the Model {#chat-with-the-model}

To initiate a conversation with Mistral-7B-Instruct-v0.3 use the command mistral-chat and run the following command in the terminal:

    mistral-chat $HOME/mistral_models/7B-Instruct-v0.3 --instruct --max_tokens 256

This command activates the chat interface, allowing users to interact directly with Mistral-7B-Instruct-v0.3. The flag --instruct specifies that the model should interpret inputs as instructions. The argument --max_tokens 256 limits each response to 256 tokens, ensuring conversations remain concise.

Let the Model Follow Instructions

To execute instructions using Mistral-7B-Instruct-v0.3, use the following Python code snippet. Change Mistral_models_path to the path you installed the model.

    from mistral_inference.model import Transformer
from mistral_inference.generate import generate

from mistral_common.tokens.tokenizers.mistral import MistralTokenizer
from mistral_common.protocol.instruct.messages import UserMessage
from mistral_common.protocol.instruct.request import ChatCompletionRequest
    #models path
    Mistral_models_path = "/root/mistral_models/7B-Instruct-v0.3"
tokenizer = MistralTokenizer.from_file(f"{mistral_models_path}/tokenizer.model.v3")
model = Transformer.from_folder(mistral_models_path)

completion_request = ChatCompletionRequest(messages=[UserMessage(content="Explain how transformers work in a short paragraph.")])

tokens = tokenizer.encode_chat_completion(completion_request).tokens

out_tokens, _ = generate([tokens], model, max_tokens=64, temperature=0.0, eos_id=tokenizer.instruct_tokenizer.tokenizer.eos_id)
result = tokenizer.instruct_tokenizer.tokenizer.decode(out_tokens[0])

print(result)

This code demonstrates how to instruct the model to generate a response based on a specific prompt:

The MistralTokenizer is initialized with the path to the tokenizer model file, and the Transformer model is loaded from a specified directory.
A ChatCompletionRequest object is created with a message asking for an explanation of machine learning.
The message is then encoded into tokens using the tokenizer, which are passed to the generate function along with the model object and parameters controlling generation behavior (e.g., maximum token count and temperature).
Finally, the output tokens are decoded back into human-readable text and printed.

The output should look something like this:

Call Functions

To integrate the capability of executing specific functions from a third-party system, use a Python script like the following example. In this example Mistral 7B Instruct executes a third-party function that provides up-to-date stock prices. Change Mistral_models_path to the path you installed the model.

    from mistral_common.protocol.instruct.tool_calls import Function, Tool
    from mistral_inference.model import Transformer
    from mistral_inference.generate import generate

    from mistral_common.tokens.tokenizers.mistral import MistralTokenizer
    from mistral_common.protocol.instruct.messages import UserMessage
    from mistral_common.protocol.instruct.request import ChatCompletionRequest

    mistral_models_path = "/root/mistral_models/7B-Instruct-v0.3"

    tokenizer = MistralTokenizer.from_file(f"{mistral_models_path}/tokenizer.model.v3")
    model = Transformer.from_folder(mistral_models_path)

    completion_request = ChatCompletionRequest(
        tools=[
            Tool(
                function=Function(
                    name="get_stockmarket_prices",
                    description="Get the current stock market prices",
                    parameters={
                        "type": "object",
                        "properties": {
                            "ticker": {
                                "type": "string",
                                "description": "The stock ticker symbol, e.g. AAPL for Apple Inc.",
                            },
                            "exchange": {
                                "type": "string",
                                "description": "The stock exchange, e.g. NASDAQ, NYSE",
                            },
                        },
                        "required": ["ticker", "exchange"],
                    }
                )
            )
        ],
        messages=[
            UserMessage(content="What's the current price of AAPL on NASDAQ?"),
        ],
    )

    tokens = tokenizer.encode_chat_completion(completion_request).tokens

    out_tokens, _ = generate([tokens], model, max_tokens=64, temperature=0.0, eos_id=tokenizer.instruct_tokenizer.tokenizer.eos_id)
    result = tokenizer.instruct_tokenizer.tokenizer.decode(out_tokens[0])

    print(result)

This code demonstrates how to use function calling in the Mistral Instruct Large Language Model (LLM) to retrieve stock market prices:

It begins by importing necessary modules and initializing the tokenizer and model.
The get_stockmarket_prices function is defined with parameters for ticker and exchange.
A ChatCompletionRequest object is created, including the defined function and a user message asking for a certain stock price.
The request is encoded into tokens, processed by the model to generate output tokens, which are then decoded back into text.
Finally, the result, containing the stock price information, is printed.

The output should look something like this:

Generate Text with the Transformers Library {#generate-text-with-the-transformers-library}

Before proceeding, install the Python libraries Transformers and Protocol Buffers:

pip install transformers
pip install protobuf

Mistral-7B-Instruct-v0.3 enables text generation using Hugging Face’s library transformers
To generate text in this way, use the following code snippet:

    from transformers import pipeline

messages = [
    {"role": "system", "content": "You are a helpful customer service chatbot who always responds in the style of Yoda."},
    {"role": "user", "content": "What are your opening hours?"}
]
chatbot = pipeline("text-generation", model="mistralai/Mistral-7B-Instruct-v0.3")
chatbot(messages)

Key points about this code:

It imports the function pipeline from the library transformers which is designed to handle various NLP tasks, including text generation.
The messages list contains structured data specifying roles and content that guide the model’s responses—here instructing it to act as a customer service chatbot and adopt the persona of Yoda.
The pipeline function is then called with "text-generation" as its task and the model identifier, creating a chatbot object ready to generate responses based on the input messages.

The output should look something like this:

Build LLM Applications with Mitral 7B and Acorn

Visit https://gptscript.ai to download GPTScript and start building today. As we expand on the capabilities with GPTScript, we are also expanding our list of tools. With these tools, you can create any application imaginable: check out tools.gptscript.ai to get started.

Mistral 7B Instruct: The Basics and a Quick Tutorial

June 20, 2024 by acorn-labs

What Is the Mistral 7B Instruct Model?

Mistral 7B Instruct Models

Mistral-7B-Instruct-v0.1

Mistral-7B-Instruct-v0.2

Mistral-7B-Instruct-v0.3

Quick Tutorial: Mistral 7B Instruct v0.3 Setup, Chat and Function Calling

Install the Model

Download and Agree to Terms and Conditions

Chat with the Model {#chat-with-the-model}

Let the Model Follow Instructions

Call Functions

Generate Text with the Transformers Library {#generate-text-with-the-transformers-library}

Build LLM Applications with Mitral 7B and Acorn

Releated Articles