Azure AI Skillsets Documentation

Introduction to Azure AI Skillsets

Azure AI Skillsets extend Azure Cognitive Search capabilities by allowing you to enrich unstructured data with AI-powered insights. They integrate with Azure Cognitive Services and custom AI models to extract information, perform analyses, and generate metadata from documents, images, and other data sources.

Skillsets are a powerful way to transform raw data into structured, searchable information, enabling advanced analytics, AI-driven applications, and intelligent content management systems.

Getting Started

To begin using Azure AI Skillsets, you'll need:

  • An Azure subscription.
  • An Azure Cognitive Search service instance.
  • Optionally, an Azure Cognitive Services resource for pre-built skills (e.g., Language, Computer Vision).

The process typically involves defining a skillset, which is a JSON document that specifies the sequence of cognitive skills to be applied.

Note: Ensure your Cognitive Search service has sufficient indexing capacity and that any required Cognitive Services are properly configured and accessible.

Skillset Components

A skillset is composed of the following main parts:

  • Description: A human-readable description of the skillset.
  • Skills: An ordered list of cognitive skills to be executed. Each skill has an input, an output, and specific parameters.
  • Cognitive Services: (Optional) Configuration for Azure Cognitive Services to be used by the skills.
  • Indexers: (Implicitly linked) Skillsets are attached to indexers to process data during indexing.

Built-in Skills

Azure Cognitive Search provides a rich set of built-in skills that leverage Azure Cognitive Services. These include:

  • Text Skills:
    • Key Phrase Extraction
    • Language Detection
    • Sentiment Analysis
    • Entity Recognition (Person, Location, Organization, etc.)
    • Personally Identifiable Information (PII) Detection
    • Text Translation
    • Text Summarization
  • Image Skills:
    • Optical Character Recognition (OCR)
    • Image Description
    • Tagging
    • Face Detection
  • Document Skills:
    • Split Skill (for splitting large documents)
    • Shaper Skill (for reshaping data)
    • Conditional Skill (for conditional execution)

Each skill maps input fields to output fields, defining how data flows through the pipeline.

Custom Skills

For scenarios requiring unique logic or models not covered by built-in skills, you can create custom skills. These are typically deployed as Azure Functions or web APIs that can be invoked by the skillset.

Custom skills allow you to:

  • Integrate with proprietary AI models.
  • Perform specialized data processing.
  • Connect to external services.
Tip: Design your custom skill to accept input in a specified format (e.g., JSON) and return output in a compatible format for easy integration with the skillset.

Creating a Skillset

You can create a skillset using the Azure portal, Azure CLI, REST API, or Azure SDKs.

Here's a simplified example of a skillset definition:


{
  "name": "my-document-enrichment-skillset",
  "description": "Extracts key phrases and sentiment from documents.",
  "skills": [
    {
      "@odata.type": "#Microsoft.Skills.Text.KeyPhraseExtraction",
      "name": "keyPhraseExtractor",
      "description": "Extracts key phrases",
      "context": "/document",
      "inputs": [
        {
          "name": "text",
          "source": "/document/content"
        }
      ],
      "outputs": [
        {
          "name": "keyPhrases",
          "targetName": "documentKeyPhrases"
        }
      ]
    },
    {
      "@odata.type": "#Microsoft.Skills.Text.SentimentAnalysis",
      "name": "sentimentAnalyzer",
      "description": "Analyzes sentiment",
      "context": "/document",
      "inputs": [
        {
          "name": "text",
          "source": "/document/content"
        }
      ],
      "outputs": [
        {
          "name": "score",
          "targetName": "documentSentimentScore"
        }
      ]
    }
  ],
  "cognitiveServices": {
    "name": "my-cognitive-services-resource"
  }
}
                

Once defined, a skillset is attached to an indexer, which orchestrates the data ingestion and enrichment process.

Examples

📄

# Example using Azure CLI to create a skillset
az search skillset create --resource-group my-resource-group \
    --service-name my-search-service \
    --name my-skillset \
    --definition @skillset.json
                    
💡

# Example using Python SDK to add a skill to a skillset
from azure.search.documents.indexes.models import Skillset, WebApiSkill, Input, Output

skillset = Skillset(
    name="my-python-skillset",
    skills=[
        WebApiSkill(
            name="my-custom-skill",
            description="Calls my custom API",
            uri="https://my-custom-skill.azurewebsites.net/api/process",
            http_method="POST",
            batch_size_value=10,
            inputs=[Input(name="data", source="/document/content")],
            outputs=[Output(name="processed_data", target_name="customMetadata")]
        )
    ]
)
# ... (then add to indexer)
                    

Best Practices

  • Modular Design: Break down complex enrichment into smaller, reusable skills.
  • Context Management: Use the context property effectively to define the scope of skill application.
  • Error Handling: Implement robust error handling for custom skills and consider retry policies.
  • Performance Tuning: Monitor skillset execution times and optimize skill chaining for efficiency.
  • Cost Management: Be mindful of Cognitive Services usage costs and set appropriate limits.
  • Testing: Thoroughly test skillsets with diverse data samples to ensure accuracy and completeness.

Troubleshooting Common Issues

  • Skill Execution Errors: Check the indexer logs for specific error messages and review skill configurations.
  • Data Mismatch: Verify that input/output field mappings are correct and data types are compatible.
  • Performance Bottlenecks: Identify slow-running skills and consider optimizing them or distributing the workload.
  • API Permissions: Ensure the Cognitive Search service has the necessary permissions to access Cognitive Services or custom APIs.
Important: Always consult the official Azure documentation and logs for detailed troubleshooting steps.