Azure Speech Translation SDK for JavaScript

This documentation provides detailed information and examples for using the Azure Speech Translation SDK with JavaScript. Enable real-time speech-to-speech and speech-to-text translation in your applications.

Note: This SDK is part of the broader Azure AI Speech services, offering powerful capabilities for voice-enabled applications.

Key Features

Real-time Speech Translation: Translate spoken language in near real-time.
Multiple Languages: Support for a wide range of source and target languages.
Speech-to-Speech Translation: Convert spoken audio in one language to spoken audio in another.
Speech-to-Text Translation: Transcribe spoken audio and translate the text into another language.
Customizable Models: Integrate with Azure Custom Speech for domain-specific translation.
Cross-platform Compatibility: Works seamlessly in web browsers and Node.js environments.

Prerequisites

An Azure subscription.
A Speech service resource created in your Azure subscription.
Node.js and npm installed (for Node.js usage).
The Azure Speech SDK for JavaScript package installed.

Installation

To install the SDK, use npm or yarn:

npm install microsoft-cognitiveservices-speech-sdk

yarn add microsoft-cognitiveservices-speech-sdk

Getting Started

Here's a basic example of how to perform speech-to-text translation using the SDK in a browser environment.

1. Initialize the Speech Translator

You'll need your Azure Speech service subscription key and region.

// In your JavaScript file (e.g., app.js)
            import { SpeechTranslator, AudioConfig, TranslationRecognitionResult, SpeechConfig } from "microsoft-cognitiveservices-speech-sdk";

            // Replace with your subscription key and region
            const subscriptionKey = "YOUR_SPEECH_SUBSCRIPTION_KEY";
            const serviceRegion = "YOUR_SPEECH_SERVICE_REGION";

            // Create speech configuration
            const speechConfig = SpeechConfig.fromSubscription(subscriptionKey, serviceRegion);
            speechConfig.speechRecognitionLanguage = "en-US"; // Set the source language for recognition

            // Create audio configuration (e.g., from microphone)
            const audioConfig = AudioConfig.fromDefaultMicrophoneInput();

            // Create the speech translator
            const translator = new SpeechTranslator(speechConfig, audioConfig);

2. Perform Translation

Use the translateSpeechToText method to initiate the translation process. You need to specify the target languages for translation.

// Add target languages (e.g., French and Spanish)
            translator.addTargetLanguage("fr-FR");
            translator.addTargetLanguage("es-ES");

            console.log("Start speaking. Press Enter to stop.");

            translator.transTranslateSpeechToText(
                function (result) {
                    // Process translation results
                    console.log("RECOGNIZED: Text: " + result.transcriptions[0].locale + ": " + result.transcriptions[0].transcript);
                    for (var key in result.translations) {
                        if (result.translations.hasOwnProperty(key)) {
                            console.log("TRANSLATED into " + key + ": " + result.translations[key]);
                        }
                    }

                    if (result.reason === TranslationRecognitionResult.Reason.TranslatedSpeech) {
                        // Handle successful translation
                    } else if (result.reason === TranslationRecognitionResult.Reason.RecognizingSpeech) {
                        // Handle intermediate recognition results
                    } else if (result.reason === TranslationRecognitionResult.Reason.NoMatch) {
                        console.log("NOMATCH: Speech could not be recognized.");
                    } else if (result.reason === TranslationRecognitionResult.Reason.Canceled) {
                        const cancellation = CancellationDetails.fromResult(result);
                        console.log("CANCELED: Reason=" + cancellation.reason);
                        if (cancellation.reason === CancellationReason.Error) {
                            console.log("CANCELED: ErrorCode=" + cancellation.ErrorCode);
                            console.log("CANCELED: ErrorDetails=" + cancellation.errorDetails);
                        }
                    }
                },
                function (err) {
                    console.error("ERROR: " + err);
                }
            );

            // Stop translation on Enter key press
            process.stdin.on('data', function(input) {
                if(input.trim() === 'enter') {
                   translator.stopContinuousRecognitionAsync();
                }
            });

API Reference

SpeechTranslator Class

The main class for performing speech translation.

Constructor

new SpeechTranslator(speechConfig: SpeechConfig, audioConfig: AudioConfig)

Initializes a new instance of the SpeechTranslator class.

Parameters:

speechConfig: An instance of SpeechConfig with your service details and recognition language.
audioConfig: An instance of AudioConfig specifying the audio input source.

Methods

`addTargetLanguage(language: string): void`

Adds a target language for translation. The language string should be a valid BCP-47 language tag (e.g., "fr-FR", "es-ES").

`transTranslateSpeechToText(onEvent: (result: TranslationRecognitionResult) => void, onError: (error: string) => void): Promise<void>`

Starts continuous speech-to-text translation. The provided callbacks will be invoked with translation results or errors.

Parameters:

onEvent: A callback function that receives TranslationRecognitionResult objects for each recognized or translated utterance.
onError: A callback function that receives error messages.

`stopContinuousRecognitionAsync(): Promise<void>`

Stops the continuous speech recognition and translation process.

Advanced Scenarios

Speech-to-Speech Translation

For speech-to-speech translation, you'll need to configure the SpeechConfig with synthesis enabled and use the SpeechSynthesizer in conjunction with the translator.

// Example snippet for speech-to-speech (requires additional setup for synthesizer)
                // ... (SpeechTranslator initialization as above)

                // Configure synthesizer (assuming similar subscription key and region)
                const speechConfigSynthesize = SpeechConfig.fromSubscription(subscriptionKey, serviceRegion);
                speechConfigSynthesize.speechSynthesisVoiceName = "en-US-JessaNeural"; // Example voice

                const synthesizer = new SpeechSynthesizer(speechConfigSynthesize);

                // In your onEvent callback for translator:
                // if (result.reason === TranslationRecognitionResult.Reason.TranslatedSpeech) {
                //     const translatedText = result.translations["fr-FR"]; // Example target language
                //     synthesizer.speakTextAsync(translatedText);
                // }

Customization

Integrate with custom speech models for improved accuracy in specific domains. This typically involves training a model in the Azure portal and referencing its deployment ID in your SpeechConfig.

Explore More Examples on Microsoft Docs