API Reference
Azure AI Vision API
Explore the capabilities of Azure AI Vision to analyze images and videos, extract information, and gain insights. This includes object detection, face recognition, OCR, and content moderation.
Analyze Image
Analyzes an image and returns detailed information about its content, including tags, descriptions, and objects.
visualFeatures
(string, optional): Specifies the visual features to return (e.g.,Categories, Description, Objects, Faces
).details
(string, optional): Specifies the details to return (e.g.,Celebrities, Landmarks
).
Read Text (OCR)
Extracts printed and handwritten text from an image.
language
(string, optional): The language of the text.detectOrientation
(boolean, optional): Whether to detect orientation.
Detect Objects
Detects objects within an image and returns their bounding boxes and confidence scores.
maxCandidates
(integer, optional): The maximum number of objects to return.
Azure AI Language API
Leverage Natural Language Processing (NLP) to understand text, extract insights, and perform sentiment analysis, key phrase extraction, entity recognition, and more.
Sentiment Analysis
Analyzes the sentiment of input text, classifying it as positive, negative, neutral, or mixed.
documents
(array): Array of text documents to analyze.opinionMining
(boolean, optional): Whether to enable opinion mining.
Key Phrase Extraction
Identifies and extracts the main points or key phrases from text.
documents
(array): Array of text documents.
Named Entity Recognition (NER)
Identifies and categorizes entities in text, such as people, organizations, locations, and dates.
documents
(array): Array of text documents.categories
(array, optional): Specify categories to extract (e.g.,Person, Organization
).
Azure AI Speech API
Transform speech into text and text into natural-sounding speech with Azure AI Speech. Includes real-time and batch transcription, speaker recognition, and speech translation.
Speech to Text
Converts spoken audio into written text. Supports multiple languages and real-time transcription.
content
(audio stream): The audio data to transcribe.language
(string): The language of the audio.
Text to Speech
Synthesizes text into lifelike speech using various voices and languages.
text
(string): The text to synthesize.voice
(string, optional): The desired voice (e.g.,en-US-JennyNeural
).audioFormat
(string, optional): The output audio format (e.g.,riff-16khz-16bit-mono-pcm
).