Indexers in Azure AI Search

Indexers are Azure AI Search components that pull data from a data source and map it to an index. They automate the process of ingesting and indexing data from various sources.

What are Indexers?

Indexers are the backbone of data ingestion in Azure AI Search. They connect to your data sources (like Azure Blob Storage, Azure SQL Database, Cosmos DB, etc.), extract relevant data, transform it if necessary using cognitive skills, and load it into your Azure AI Search index.

By automating this process, indexers ensure your search index is kept up-to-date with minimal manual intervention. This is crucial for providing users with the most current and relevant search results.

How Indexers Work

An indexer operates in several stages:

  1. Data Source Connection: The indexer connects to a configured data source.
  2. Data Extraction: It queries the data source to retrieve documents or records.
  3. Data Transformation (Optional): If a skillset is attached, data is enriched with AI capabilities (e.g., OCR, entity recognition, language detection).
  4. Field Mapping: Data is mapped from the source fields to the fields defined in your Azure AI Search index schema.
  5. Indexing: The processed data is sent to the Azure AI Search service to be indexed.
  6. Scheduling: Indexers can be run on demand or scheduled to run periodically, ensuring your index stays synchronized with your data.

Key Components

When defining or working with an indexer, you'll encounter these core components:

Data Source

This defines the connection details for where your data resides. It includes information like the connection string, the specific container or table to access, and any credentials.

{
    "name": "my-blob-data-source",
    "type": "azureblob",
    "credentials": {
        "connectionString": "DefaultEndpointsProtocol=https;..."
    },
    "container": {
        "name": "my-docs"
    }
}

Indexer

This is the main object that orchestrates the data ingestion. It specifies the data source to use, the index to target, the mapping between source and index fields, and optionally, a skillset.

{
    "name": "my-document-indexer",
    "dataSourceName": "my-blob-data-source",
    "targetIndexName": "my-search-index",
    "skillsetName": "my-cognitive-skillset",
    "fieldMappings": [
        {
            "sourceFieldName": "metadata_storage_path",
            "targetFieldName": "document_id"
        },
        {
            "sourceFieldName": "content",
            "targetFieldName": "content"
        }
    ],
    "schedule": {
        "interval": "24h"
    }
}

Field Mapping

This crucial part of the indexer definition tells Azure AI Search how to map data from your source fields to your index fields. This allows you to rename fields, select specific fields, and handle data transformations.

Skillset (Optional)

A skillset allows you to integrate AI capabilities from Azure AI services (like Azure AI Vision, Azure AI Language) to enrich your data before indexing. This is powerful for extracting insights, detecting sentiment, recognizing entities, and more.

Schedule (Optional)

You can define a schedule for your indexer to run automatically. This can be an interval (e.g., every hour, every day) or a cron-like expression for more complex scheduling.

Creating and Managing Indexers

You can create and manage indexers using:

Learn more about creating an indexer and monitoring its status.

Common Scenarios

Important: Ensure your data source credentials are secure and that the indexer has the necessary permissions to access your data.