Azure AI Documentation - API Reference

Azure AI Vision API

Explore the capabilities of Azure AI Vision to analyze images and videos, extract information, and gain insights. This includes object detection, face recognition, OCR, and content moderation.

POST

Analyze Image

/vision/v3.2/analyze

Analyzes an image and returns detailed information about its content, including tags, descriptions, and objects.

Parameters:

visualFeatures (string, optional): Specifies the visual features to return (e.g., Categories, Description, Objects, Faces).
details (string, optional): Specifies the details to return (e.g., Celebrities, Landmarks).

Response: A JSON object containing detected tags, descriptions, objects, etc.

POST

Read Text (OCR)

/vision/v3.2/ocr

Extracts printed and handwritten text from an image.

Parameters:

language (string, optional): The language of the text.
detectOrientation (boolean, optional): Whether to detect orientation.

Response: A JSON object with recognized text organized by lines and words.

POST

Detect Objects

/vision/v3.2/detect

Detects objects within an image and returns their bounding boxes and confidence scores.

Parameters:

maxCandidates (integer, optional): The maximum number of objects to return.

Response: A JSON array of detected objects with bounding box information.

Azure AI Language API

Leverage Natural Language Processing (NLP) to understand text, extract insights, and perform sentiment analysis, key phrase extraction, entity recognition, and more.

POST

Sentiment Analysis

/language/:analyze-text

Analyzes the sentiment of input text, classifying it as positive, negative, neutral, or mixed.

Parameters:

documents (array): Array of text documents to analyze.
opinionMining (boolean, optional): Whether to enable opinion mining.

Response: A JSON object containing sentiment scores for each document.

POST

Key Phrase Extraction

/language/:extract-key-phrases

Identifies and extracts the main points or key phrases from text.

Parameters:

documents (array): Array of text documents.

Response: A JSON object with an array of key phrases for each document.

POST

Named Entity Recognition (NER)

/language/:analyze-text

Identifies and categorizes entities in text, such as people, organizations, locations, and dates.

Parameters:

documents (array): Array of text documents.
categories (array, optional): Specify categories to extract (e.g., Person, Organization).

Response: A JSON object listing recognized entities with their types and confidence scores.

Azure AI Speech API

Transform speech into text and text into natural-sounding speech with Azure AI Speech. Includes real-time and batch transcription, speaker recognition, and speech translation.

POST

Speech to Text

/speech/v3.1/transcriptions

Converts spoken audio into written text. Supports multiple languages and real-time transcription.

Parameters:

content (audio stream): The audio data to transcribe.
language (string): The language of the audio.

Response: Transcribed text in JSON format.

POST

Text to Speech

/cognitiveservices/v1.0/tts.json

Synthesizes text into lifelike speech using various voices and languages.

Parameters:

text (string): The text to synthesize.
voice (string, optional): The desired voice (e.g., en-US-JennyNeural).
audioFormat (string, optional): The output audio format (e.g., riff-16khz-16bit-mono-pcm).

Response: Audio stream in the requested format.