Indexers in Azure AI Search

Indexers are Azure AI Search components that pull data from a data source and map it to an index. They automate the process of ingesting and indexing data from various sources.

What are Indexers?

Indexers are the backbone of data ingestion in Azure AI Search. They connect to your data sources (like Azure Blob Storage, Azure SQL Database, Cosmos DB, etc.), extract relevant data, transform it if necessary using cognitive skills, and load it into your Azure AI Search index.

By automating this process, indexers ensure your search index is kept up-to-date with minimal manual intervention. This is crucial for providing users with the most current and relevant search results.

How Indexers Work

An indexer operates in several stages:

Data Source Connection: The indexer connects to a configured data source.
Data Extraction: It queries the data source to retrieve documents or records.
Data Transformation (Optional): If a skillset is attached, data is enriched with AI capabilities (e.g., OCR, entity recognition, language detection).
Field Mapping: Data is mapped from the source fields to the fields defined in your Azure AI Search index schema.
Indexing: The processed data is sent to the Azure AI Search service to be indexed.
Scheduling: Indexers can be run on demand or scheduled to run periodically, ensuring your index stays synchronized with your data.

Key Components

When defining or working with an indexer, you'll encounter these core components:

Data Source

This defines the connection details for where your data resides. It includes information like the connection string, the specific container or table to access, and any credentials.

{
    "name": "my-blob-data-source",
    "type": "azureblob",
    "credentials": {
        "connectionString": "DefaultEndpointsProtocol=https;..."
    },
    "container": {
        "name": "my-docs"
    }
}

Indexer

This is the main object that orchestrates the data ingestion. It specifies the data source to use, the index to target, the mapping between source and index fields, and optionally, a skillset.

{
    "name": "my-document-indexer",
    "dataSourceName": "my-blob-data-source",
    "targetIndexName": "my-search-index",
    "skillsetName": "my-cognitive-skillset",
    "fieldMappings": [
        {
            "sourceFieldName": "metadata_storage_path",
            "targetFieldName": "document_id"
        },
        {
            "sourceFieldName": "content",
            "targetFieldName": "content"
        }
    ],
    "schedule": {
        "interval": "24h"
    }
}

Field Mapping

This crucial part of the indexer definition tells Azure AI Search how to map data from your source fields to your index fields. This allows you to rename fields, select specific fields, and handle data transformations.

Skillset (Optional)

A skillset allows you to integrate AI capabilities from Azure AI services (like Azure AI Vision, Azure AI Language) to enrich your data before indexing. This is powerful for extracting insights, detecting sentiment, recognizing entities, and more.

Schedule (Optional)

You can define a schedule for your indexer to run automatically. This can be an interval (e.g., every hour, every day) or a cron-like expression for more complex scheduling.

Creating and Managing Indexers

You can create and manage indexers using:

Azure Portal: A user-friendly graphical interface for configuring and monitoring indexers.
Azure CLI: Command-line tools for scripting and automation.
Azure SDKs: Programmatic access from your applications using languages like Python, C#, Java, and JavaScript.
REST API: Direct interaction with the Azure AI Search service.

Learn more about creating an indexer and monitoring its status.

Common Scenarios

Document Search: Indexing PDFs, Word documents, and other text files from Blob Storage for full-text search.
Product Catalog Search: Ingesting product data from a database into an index for e-commerce search.
Customer Support Data: Indexing support tickets and knowledge base articles to help agents find solutions faster.
Log Analysis: Processing and indexing logs for quick searching and troubleshooting.

Important: Ensure your data source credentials are secure and that the indexer has the necessary permissions to access your data.