Azure Speech Translation SDK for JavaScript
This documentation provides detailed information and examples for using the Azure Speech Translation SDK with JavaScript. Enable real-time speech-to-speech and speech-to-text translation in your applications.
Key Features
- Real-time Speech Translation: Translate spoken language in near real-time.
- Multiple Languages: Support for a wide range of source and target languages.
- Speech-to-Speech Translation: Convert spoken audio in one language to spoken audio in another.
- Speech-to-Text Translation: Transcribe spoken audio and translate the text into another language.
- Customizable Models: Integrate with Azure Custom Speech for domain-specific translation.
- Cross-platform Compatibility: Works seamlessly in web browsers and Node.js environments.
Prerequisites
- An Azure subscription.
- A Speech service resource created in your Azure subscription.
- Node.js and npm installed (for Node.js usage).
- The Azure Speech SDK for JavaScript package installed.
Installation
To install the SDK, use npm or yarn:
npm install microsoft-cognitiveservices-speech-sdk
yarn add microsoft-cognitiveservices-speech-sdk
Getting Started
Here's a basic example of how to perform speech-to-text translation using the SDK in a browser environment.
1. Initialize the Speech Translator
You'll need your Azure Speech service subscription key and region.
// In your JavaScript file (e.g., app.js)
import { SpeechTranslator, AudioConfig, TranslationRecognitionResult, SpeechConfig } from "microsoft-cognitiveservices-speech-sdk";
// Replace with your subscription key and region
const subscriptionKey = "YOUR_SPEECH_SUBSCRIPTION_KEY";
const serviceRegion = "YOUR_SPEECH_SERVICE_REGION";
// Create speech configuration
const speechConfig = SpeechConfig.fromSubscription(subscriptionKey, serviceRegion);
speechConfig.speechRecognitionLanguage = "en-US"; // Set the source language for recognition
// Create audio configuration (e.g., from microphone)
const audioConfig = AudioConfig.fromDefaultMicrophoneInput();
// Create the speech translator
const translator = new SpeechTranslator(speechConfig, audioConfig);
2. Perform Translation
Use the translateSpeechToText method to initiate the translation process. You need to specify the target languages for translation.
// Add target languages (e.g., French and Spanish)
translator.addTargetLanguage("fr-FR");
translator.addTargetLanguage("es-ES");
console.log("Start speaking. Press Enter to stop.");
translator.transTranslateSpeechToText(
function (result) {
// Process translation results
console.log("RECOGNIZED: Text: " + result.transcriptions[0].locale + ": " + result.transcriptions[0].transcript);
for (var key in result.translations) {
if (result.translations.hasOwnProperty(key)) {
console.log("TRANSLATED into " + key + ": " + result.translations[key]);
}
}
if (result.reason === TranslationRecognitionResult.Reason.TranslatedSpeech) {
// Handle successful translation
} else if (result.reason === TranslationRecognitionResult.Reason.RecognizingSpeech) {
// Handle intermediate recognition results
} else if (result.reason === TranslationRecognitionResult.Reason.NoMatch) {
console.log("NOMATCH: Speech could not be recognized.");
} else if (result.reason === TranslationRecognitionResult.Reason.Canceled) {
const cancellation = CancellationDetails.fromResult(result);
console.log("CANCELED: Reason=" + cancellation.reason);
if (cancellation.reason === CancellationReason.Error) {
console.log("CANCELED: ErrorCode=" + cancellation.ErrorCode);
console.log("CANCELED: ErrorDetails=" + cancellation.errorDetails);
}
}
},
function (err) {
console.error("ERROR: " + err);
}
);
// Stop translation on Enter key press
process.stdin.on('data', function(input) {
if(input.trim() === 'enter') {
translator.stopContinuousRecognitionAsync();
}
});
API Reference
SpeechTranslator Class
The main class for performing speech translation.
Constructor
new SpeechTranslator(speechConfig: SpeechConfig, audioConfig: AudioConfig)
Initializes a new instance of the SpeechTranslator class.
Parameters:
speechConfig: An instance ofSpeechConfigwith your service details and recognition language.audioConfig: An instance ofAudioConfigspecifying the audio input source.
Methods
addTargetLanguage(language: string): void
Adds a target language for translation. The language string should be a valid BCP-47 language tag (e.g., "fr-FR", "es-ES").
transTranslateSpeechToText(onEvent: (result: TranslationRecognitionResult) => void, onError: (error: string) => void): Promise<void>
Starts continuous speech-to-text translation. The provided callbacks will be invoked with translation results or errors.
Parameters:
onEvent: A callback function that receivesTranslationRecognitionResultobjects for each recognized or translated utterance.onError: A callback function that receives error messages.
stopContinuousRecognitionAsync(): Promise<void>
Stops the continuous speech recognition and translation process.
Advanced Scenarios
Speech-to-Speech Translation
For speech-to-speech translation, you'll need to configure the SpeechConfig with synthesis enabled and use the SpeechSynthesizer in conjunction with the translator.
// Example snippet for speech-to-speech (requires additional setup for synthesizer)
// ... (SpeechTranslator initialization as above)
// Configure synthesizer (assuming similar subscription key and region)
const speechConfigSynthesize = SpeechConfig.fromSubscription(subscriptionKey, serviceRegion);
speechConfigSynthesize.speechSynthesisVoiceName = "en-US-JessaNeural"; // Example voice
const synthesizer = new SpeechSynthesizer(speechConfigSynthesize);
// In your onEvent callback for translator:
// if (result.reason === TranslationRecognitionResult.Reason.TranslatedSpeech) {
// const translatedText = result.translations["fr-FR"]; // Example target language
// synthesizer.speakTextAsync(translatedText);
// }
Customization
Integrate with custom speech models for improved accuracy in specific domains. This typically involves training a model in the Azure portal and referencing its deployment ID in your SpeechConfig.