Unveiling the Power of “Azure OpenAI on Your Data”

Azure OpenAI on your data is an empowering tool that allows you to harness the capabilities of powerful language models like GPT-4 and ChatGPT (gpt-35-turbo) by running these models directly on your data. The service is designed to boost the accuracy and speed of data analysis, ultimately leading to valuable business insights, optimization of operations, and better business decisions.

This blog post is intended to provide a detailed technical exploration of Azure OpenAI on your data, from its concept, data source options, to its recommended settings and best practices.

A Deep Dive into Azure OpenAI on Your Data

Azure OpenAI on your data is a service that enables these powerful language models to provide responses based on your specific data. This service is accessible via a REST API or through a web-based interface in Azure OpenAI Studio, creating a platform for enhanced chat experiences.

One of the distinctive features of Azure OpenAI on your data is its ability to retrieve and utilize data from designated sources, enhancing the model’s output. In combination with Azure Cognitive Search, the system identifies what data to retrieve from the source based on user input and conversation history. This data is then augmented and resubmitted as a prompt to the OpenAI model. The model processes this information, like any other prompt, to provide a completion.

It’s important to note that to start using Azure OpenAI on your data, approval for Azure OpenAI access is required, and an Azure OpenAI Service resource should already be deployed with either the gpt-35-turbo or the gpt-4 models.

Data Source Options and Preparation

Azure OpenAI on your data uses an Azure Cognitive Services index to determine what data to retrieve. The creation of this index from a blob storage or local files is recommended via Azure OpenAI Studio.

For long text documents and datasets, a data preparation script is available to ingest the data into cognitive search. This script supports .txt, .md, .html files as well as Microsoft Word, PowerPoint, and PDF files, and even scanned PDF files and images, using Form Recognizer.

Keep in mind that the structure and quality of your documents can affect the quality of responses from the model. For instance, if a document is a PDF file, the text contents are extracted as a pre-processing step. If your document contains images, graphs, or other visual content, the model’s response quality depends on the quality of the text that can be extracted from them.

Configuring Azure OpenAI on Your Data for Optimal Results

To optimize the output of Azure OpenAI on your data, several recommended settings and best practices can be followed:

  • System Message: This is an instruction to the model about how it should behave and what context it should reference when generating a response. For instance, if you’re creating a financial chatbot where the data consists of transcriptions of quarterly financial earnings calls, you might use a system message such as “You are a financial chatbot useful for answering questions from financial reports. You are given excerpts from the earnings call. Please answer the questions by parsing through all dialogue.”
  • Maximum Response: You can set a limit on the number of tokens per model response. The upper limit for Azure OpenAI on Your Data is 1500 tokens.
  • Limit Responses to Your Data: This option encourages the model to respond using your data only. It is selected by default, but if you unselect it, the model may more readily apply its internal knowledge to respond.
  • Semantic Search: Currently, Azure OpenAI on your data supports semantic search for English data only. If enabled for your Azure Cognitive Search service, it can improve response and citation quality.
  • Index Field Mapping: If you’re using your own index, you will be prompted in Azure OpenAI Studio to define which fields you want to map for answering questions when you add your data source. Mapping these fields correctly helps ensure the model has better response and citation quality.

Best Practices for Interacting with the Model

To achieve the best results when interacting with the model, certain practices are recommended:

  • Conversation History: It is best to clear the chat history before starting a new conversation or asking a question unrelated to the previous ones. This is because the conversation history changes the current state of the model, which may result in different responses for the same question between the first conversational turn and subsequent turns.
  • Model Response: If you are not satisfied with the model response for a specific question, try either making the question more specific or more generic. This can often change how the model responds, and you can reframe your question accordingly.
  • Question Length: The GPT models have limits on the number of tokens they can accept. To avoid exceeding these limits, it is recommended to avoid asking long questions and to break them down into multiple questions if possible.
  • Multi-lingual Support: Azure OpenAI on your data supports queries in the same language as the documents. For instance, if your data is in Japanese, then queries need to be in Japanese too. If you have documents in multiple languages, it is recommended to build a new index for each language and connect them separately to Azure OpenAI.

In conclusion, Azure OpenAI on your data is a powerful tool that allows businesses to leverage the capabilities of advanced language models to gain insights from their data. With the right configuration, it can be a game-changer for data analysis and decision-making processes.

About the author