Step-by-Step Guide: Building a RAG System using Azure AI Search
While Large Language Models (LLMs) have demonstrated impressive capabilities in generating human-like text, they can sometimes fall short when faced with highly specific or domain-dependent queries. This is where Retrieval-Augmented Generation (RAG) comes in. RAG enhances LLMs by grounding their responses in external knowledge sources.
But how do we efficiently find and retrieve the right information to feed these powerful models? The answer lies in a robust search solution. In our own experiments we found that the search part of RAG is the most important to get right. To optimize this, we built our own custom RAG solution with one of the best search engines for LLMs. However, we are well aware, that there are other search engines available which better fit many business use-cases - and many people frankly don't require the pin-point accuracy of our custom solution. One of these products is the Azure AI Search, a great, managed indexing and search solution which is hosted and run by Microsoft Azure and easily integrates with services already running on Microsoft Azure.
In this step-by-step guide, we'll explore how Azure AI Search, with its powerful vector search and hybrid search capabilities, forms the backbone of an effective RAG system. We'll dive into the process of setting up Azure AI Search to store and retrieve the contextual data your LLMs need to deliver insightful and accurate responses.
What is Azure AI search
Azure AI Search (formerly known as Azure Cognitive Search) is a cloud-based search-as-a-service solution that provides developers with APIs and tools to add a rich search experience over private, heterogeneous content in web, mobile, and enterprise applications.
Key Features of Azure AI Search Relevant to RAG
-
Vector Search: This is potentially the most important feature for RAG. Azure AI Search supports vector storage and various vector search algorithms (like HNSW Hierarchical Navigable Small World) which allows you to perform similarity searches based on the semantic meaning of queries and documents, rather than just keyword matching. You can store embeddings (vector representations of your data) directly in your search index.
-
Hybrid Search: Azure AI Search supports hybrid search, combining traditional keyword-based search with vector search. This offers the best of both worlds, providing accurate results even when users search with keywords that might not be an exact match for the content in the vector space - which is often the case for domain-specific language.
-
Semantic Ranking: Azure AI Search provides semantic ranking which uses a deep learning model to re-rank the results based on its understanding of the search query's meaning, bringing the most semantically relevant items to the top.
-
Data Ingestion and Indexing: Azure AI Search can index a wide variety of data sources, including Azure Blob Storage, Azure SQL Database, Azure Cosmos DB, and more. It supports various file formats (PDF, Word, HTML, etc.). You can create an indexing pipeline that includes steps to chunk your documents and generate the embeddings required for RAG.
-
AI Enrichment (Skillsets): This allows you to integrate AI capabilities into your indexing pipeline. While less directly relevant to the core RAG pattern, it can be helpful for pre-processing data, like performing OCR on images before generating embeddings, or translating documents into a single language for consistent embedding.
Note: You are absolutely right, "Skillsets" is an absolutely ridiculous name for this feature.
-
Scalability and Security: As a cloud service, Azure AI Search is designed for scalability and reliability. It offers enterprise-grade security features, including role-based access control (RBAC) and encryption.
How to use Azure AI Search for RAG
Now that we know roughly what the Azure AI search provides, let's quickly create a high-level plan on how to use it for RAG. We have 3 main steps in RAG:
-
Data Preparation and Indexing
- Chunking: Large documents are divided into smaller, manageable chunks of text.
- Embedding Generation: An embedding model (e.g., OpenAI's text-embedding-large-3, or a model you deploy to Azure OpenAI or Azure Machine Learning) is used to convert each chunk into a vector representation (embedding).
- Indexing: These chunks, along with their corresponding embeddings, are indexed in Azure AI Search. You'll create an index with fields for the text content, the vector embedding, and any other relevant metadata.
Azure AI Search provides tools to perform all of these steps. You simply just need to provide the data and the embedding model (which can be easily deployed in the same Azure environment).
-
Retrieval (During query time)
- Query Embedding: When a user enters a query, the same embedding model is used to convert the query into a vector.
- Vector Search: Azure AI Search performs a vector similarity search against the indexed embeddings to find the chunks most semantically similar to the query. You can use algorithms like HNSW for efficient similarity searches.
- Hybrid Search (Optional): You can also combine vector search with traditional keyword search to ensure relevant results even if the query doesn't perfectly match the embedding space.
- Semantic Ranking (Optional): You might apply semantic ranking to improve the ordering of results.
- Retrieval: The top N most relevant text chunks are retrieved.
Again, Azure AI Search provides the tools to perform these steps. You just need to set up the search index and query the API with the appropriate parameters.
-
Generation (LLM Answer generation)
- Prompt Engineering: The retrieved text chunks, along with the original user query, are used to create a prompt for a large language model (LLM). The prompt is designed to instruct the LLM to answer the user's query based on the provided context.
- LLM Processing: The LLM (e.g., GPT-4o, hosted on Azure OpenAI) processes the prompt and generates a response, hopefully answering the users query.
The process of generation is not directly part of Azure AI Search, however we'll demonstrate how to combine Azure AI Search with GPT-4o to complete the RAG pipeline.
Step by Step guide to implement RAG with Azure AI Search
Enough talk, let's dive right in.
-
Head over to the Azure Portal and create a new new Azure AI Search service .
Create new Azure AI Search service
Set the "pricing tier" to "Free" for testing purposes. For your production considerations, read up about the pricing tiers here.
Azure AI Search pricing
-
After you click "Review + Create" and "Create", the resources will be deployed. Click on "Go to resource" once the deployment is complete.
Azure AI Search resource
Now before we continue, we need to learn about the different elements the search service provides.
Assets of Azure AI Search
1. Indexes:
- What they are: Indexes are the heart of Azure AI Search. They are persistent storage structures that hold your searchable data. Think of an index like a sophisticated database table optimized for search.
- How they work: You define the schema of your index, specifying the fields you want to store (e.g., title, content, author, URL, embeddings), their data types (string, integer, boolean, collection, Edm.Single for vector embeddings), and their attributes (searchable, filterable, sortable, facetable, retrievable). When you index data, it's analyzed and stored in the index in a way that enables fast and efficient retrieval.
- Relevance to RAG: In a RAG system, your index will contain chunks of text from your knowledge base, along with their corresponding vector embeddings (generated by an embedding model). These embeddings are crucial for performing semantic similarity searches.
- Key Considerations:
- Schema Design: Carefully planning your index schema is critical for performance and relevance. Choose appropriate data types and attributes for each field.
- Vector Fields: When using RAG you will have one or multiple fields of type Edm.Single to store the vector embeddings.
- Analyzers: Azure AI Search uses analyzers to process text during indexing and querying. You can choose from built-in analyzers or create custom ones for specific language or domain needs. These are relevant for keyword search but not for vector search.
2. Indexers:
- What they are: Indexers automate the process of ingesting data from various data sources into your Azure AI Search indexes. They act as data connectors, extracting and transforming data before sending it to the index.
- How they work: You configure an indexer to connect to a specific data source (e.g., Azure Blob Storage, Azure SQL Database), define a schedule for indexing (on-demand or recurring), and map the fields in your data source to the fields in your index.
- Relevance to RAG: Indexers can be used to automatically ingest text data, chunk it, generate embeddings (using a custom skill, as explained below), and populate the index. This streamlines the process of keeping your RAG system's knowledge base up-to-date.
- Key Considerations:
- Change Tracking: Indexers can be configured to detect and process only new, modified, or deleted documents, making updates efficient.
- Data Transformation: You can use indexers to perform basic data transformations, such as field mapping or data type conversions.
3. Data Sources:
- What they are: Data sources are the repositories where your raw data resides. Azure AI Search supports a wide range of Azure data sources.
- How they work: Indexers connect to data sources to retrieve data for indexing.
- Relevance to RAG: Your data source could be a collection of documents in Azure Blob Storage, a database of product information in Azure SQL Database, or any other supported source containing the knowledge you want your RAG system to access.
- Supported Data Sources:
- Azure Blob Storage
- Azure SQL Database
- Azure Cosmos DB (SQL API and MongoDB API)
- Azure Table Storage
4. Aliases:
- What they are: An alias is an alternative name that can be used to refer to a search index. They are basically pointers or references to an index.
- How they work: You create an alias and point it to a specific index. Then, you can use the alias name in place of the index name in your search requests, index updates, and other API operations.
- Relevance to RAG: Although less directly relevant to the core RAG
logic, aliases are very useful for managing index updates in production
environments. You can use them to achieve zero-downtime index updates.
You would create a new version of your index (e.g.,
myindex-v2
when your current index ismyindex-v1
), fully index it, and then switch the alias from the old index to the new one in a single atomic operation. - Key Benefits:
- Zero-Downtime Updates: Perform index updates without interrupting your application's search functionality.
- Index Versioning: Easily switch between different versions of an index.
- Simplified Management: Update the alias target instead of modifying application code when index names change.
5. Skillsets:
- What they are: Skillsets are a powerful feature that allows you to integrate AI-powered enrichment steps into your indexing pipeline. They define a sequence of operations called "skills" that are applied to your data before it's indexed.
- How they work: You create a skillset and attach it to an indexer. Each skill in the skillset performs a specific enrichment task, such as image analysis, text translation, entity recognition, or sentiment analysis. Cognitive skills are built-in skills that leverage Azure Cognitive Services. Custom skills allow you to integrate your own code or models.
- Relevance to RAG:
- Embedding Generation: You can create a custom skill that calls an embedding model (e.g., hosted on Azure Machine Learning or Azure OpenAI) to generate vector embeddings for your text data during indexing. This is a critical step for implementing RAG.
- Data Preprocessing: Skillsets can be used to perform other preprocessing tasks that might be helpful for RAG, such as cleaning up text or extracting metadata.
- Key Considerations:
- Custom Skills: You'll need to write and deploy code to handle embedding generation for your custom skill.
- Performance: Be mindful of the performance impact of complex skillsets on your indexing process.
6. Debug Sessions:
- What they are: Debug sessions provide a way to inspect and troubleshoot the execution of your indexing pipeline, including the behavior of your skillsets.
- How they work: You can initiate a debug session and step through the indexing process, examining the input and output of each skill at each stage.
- Relevance to RAG: Debug sessions are invaluable when developing and debugging custom skills, such as those used for generating embeddings. They help you identify errors and ensure that your skills are transforming the data as expected.
- Key Benefits:
- Transparency: Gain insights into how your data is being processed.
- Error Detection: Identify and fix issues in your skillsets.
- Optimization: Analyze skill execution to improve performance.
7. Semantic Ranker:
- What it is: Semantic ranker is a feature that uses deep learning models to improve the ranking of search results based on their semantic relevance to the query, going beyond keyword matching. It's a second-stage ranking process that re-ranks the results produced by the initial (BM25) ranking algorithm.
- How it works: After the initial search results are retrieved, the
semantic ranker analyzes them and assigns a new
@search.rerankerScore
to each result. The higher the score, the more semantically relevant the result is deemed to be. You can then sort your results by this reranker score. Semantic ranker can also be used to generate captions and highlights semantically related to the query. - Relevance to RAG: Semantic ranker can help to further improve the quality of the retrieved context for RAG by bringing the most semantically relevant chunks to the top. However, it's important to note that semantic ranker currently works on text fields, not vector fields.
- Key Considerations:
- Availability: Semantic ranker is available on Standard tier search services and above and in specific regions. So, we can't use it in our free tier example - but the use of it will be clear by following this guide anyhow.
- Language Support: Semantic search supports a wide range of languages.
- Impact: It can add some latency to search requests due to the extra processing involved.
Continue with our setup
Now that we now the tools we have, let's continue.
-
First, we need a set of data to index. For this example, we'll use our very own sample machine description - a user manual for a hypothetical milling machine. You can download it here
-
Next, upload a pdf file to a Azure Blob storage container. Make sure to remember the folder name where you put your file. We assume you are familiar with Azure Blob Storage and how to upload files, so we'll skip this step for brevity.
-
Now we can use this blog storage container to create a new index. We can either manually create an index or use an import wizard. We'll use the latter for now. Head over to "Overview" in Azure AI Search and click on "Import data".
-
In the "Import and vectorize data" window. Select "Azure Blob Storage" as the data source and enter your blob storage details. Enable deletion tracking, as this will automatically remove deleted files from the index.
Import data
-
Now that we know where to get our data from, we need to use an embedding model to create the index. Select your existing embedding model. If you don't have one yet, please create one, following this guide.
Select embedding model
-
In the next step you can optionally apply some image OCR or enrichment "Skills". We'll skip that for now.
-
The next screen allows to define our search index. In short, you can define which of the main data (like content of files) and metadata (like file author) you want to index, search and or retrieve.
This step is quite important, so make sure to click "Preview and edit" to make necessary adjustments to the index.
One nice thing is that Azure automatically extracts metadata from the files it finds.
In our case we uploaded a pdf file to Azure blob storage. We want to use the "Add field" button to add the 'metadata_storage_last_modified' field as well as 'metadata_author'.
Edit default index
Click "Save" to continue.
-
In the final step, select how often to update the index. Set the "Schedule" option to your desired value.
Click "Create" to start the indexing process.
We've created all the resources we need. To check the index creation progress, head over to "Indexers", select the indexer you just created and check the Execution history. Wait until the status is "Succeeded".
Testing the search
Now we should be ready to use our index. Head over to "Indexes", select the auto-generated index and use the search bar. Enter "maintain the machine" for example to get some guaranteed results. In the results pane, you should see the values returned by the search.
Search results
Using the Azure AI Search with Azure AI Studio
Our RAG system is almost done - we created an index and Azure AI Search automatically provides our retrieval part. All we need now is to connect our LLM to the search index and generate the response.
Note: To use the Azure AI Search with Azure AI Studio, the embedding model to be used during indexing must be the OpenAI embedding ada v2 model. No other model is supported in Azure AI Studio as of time of this writing.
Navigate to the Azure AI Studio. Click "Create project" to create a new Azure AI Studio project. In the create project dialog make sure to select "Customize" to customize the AI Studio resources. In the customize screen, select an existing Azure OpenAI resource (you already needed one for the embedding model anyway) and select the newly created Azure AI Search instance in the last drop-down.
Create Azure AI Studio project
Click "Next" and "Create" to create the project.
In the left-hand menu select "Data + Indexes" and "New Index" to import the Azure AI Search index.
In the next dialog, select "Azure AI Search" as the data source and select "Next". Select your Azure AI Search instance and the index you created.
Connect the Azure AI Search instance
Click "Next" and select your "Azure OpenAI" instance. Make sure to leave 'Add vector search to this search resource' checked.
Connect OpenAI instance
Click Next, Next and Finish. Wait for the index to be connected.
We are now ready to use the Azure AI Search in our Azure AI Studio project. Head over to "Playground" and click "Try Chat playground". Select yourself a generation model (gpt-4o in the example below) and enter a search query.
Please note that the search index should automatically be added to the "Add your data" section. If it is not, please add it manually.
Now you are finally ready to run your first search query. Enter "how to maintain the machine" in the search bar and see the results.
Azure AI Studio Playground
Using the Azure AI Search with REST API
As most probably you don't want to use the Azure AI Studio playground for your RAG endeavours forever, you also have the option to use the Azure AI Search via REST API.
In the Azure portal, in the Azure AI Search resource screen, select "Overview". There you might note down the "Url".
Azure AI Search URL
Select "Settings" and "Keys" to find the key section. Select one of the "admin keys".
Navigate to your Azure OpenAI resource in the Azure portal. Click on "Keys and Endpoint". Get on of the keys and the endpoint.
Azure OpenAI Key and Endpoint
Also get the "Deployment Name" from the Azure OpenAI chat model you want to use.
In the end, you should have these variables:
To run a search query, you can use the following curl command (or equivalent REST API call with any other tool):
Interested in how to train your very own Large Language Model?
We prepared a well-researched guide for how to use the latest advancements in Open Source technology to fine-tune your own LLM. This has many advantages like:
- Cost control
- Data privacy
- Excellent performance - adjusted specifically for your intended use