Azure Speech Translation SDK for JavaScript

This documentation provides detailed information and examples for using the Azure Speech Translation SDK with JavaScript. Enable real-time speech-to-speech and speech-to-text translation in your applications.

Note: This SDK is part of the broader Azure AI Speech services, offering powerful capabilities for voice-enabled applications.

Key Features

Prerequisites

  • An Azure subscription.
  • A Speech service resource created in your Azure subscription.
  • Node.js and npm installed (for Node.js usage).
  • The Azure Speech SDK for JavaScript package installed.

Installation

To install the SDK, use npm or yarn:

npm install microsoft-cognitiveservices-speech-sdk
yarn add microsoft-cognitiveservices-speech-sdk

Getting Started

Here's a basic example of how to perform speech-to-text translation using the SDK in a browser environment.

1. Initialize the Speech Translator

You'll need your Azure Speech service subscription key and region.

// In your JavaScript file (e.g., app.js)
            import { SpeechTranslator, AudioConfig, TranslationRecognitionResult, SpeechConfig } from "microsoft-cognitiveservices-speech-sdk";

            // Replace with your subscription key and region
            const subscriptionKey = "YOUR_SPEECH_SUBSCRIPTION_KEY";
            const serviceRegion = "YOUR_SPEECH_SERVICE_REGION";

            // Create speech configuration
            const speechConfig = SpeechConfig.fromSubscription(subscriptionKey, serviceRegion);
            speechConfig.speechRecognitionLanguage = "en-US"; // Set the source language for recognition

            // Create audio configuration (e.g., from microphone)
            const audioConfig = AudioConfig.fromDefaultMicrophoneInput();

            // Create the speech translator
            const translator = new SpeechTranslator(speechConfig, audioConfig);

2. Perform Translation

Use the translateSpeechToText method to initiate the translation process. You need to specify the target languages for translation.

// Add target languages (e.g., French and Spanish)
            translator.addTargetLanguage("fr-FR");
            translator.addTargetLanguage("es-ES");

            console.log("Start speaking. Press Enter to stop.");

            translator.transTranslateSpeechToText(
                function (result) {
                    // Process translation results
                    console.log("RECOGNIZED: Text: " + result.transcriptions[0].locale + ": " + result.transcriptions[0].transcript);
                    for (var key in result.translations) {
                        if (result.translations.hasOwnProperty(key)) {
                            console.log("TRANSLATED into " + key + ": " + result.translations[key]);
                        }
                    }

                    if (result.reason === TranslationRecognitionResult.Reason.TranslatedSpeech) {
                        // Handle successful translation
                    } else if (result.reason === TranslationRecognitionResult.Reason.RecognizingSpeech) {
                        // Handle intermediate recognition results
                    } else if (result.reason === TranslationRecognitionResult.Reason.NoMatch) {
                        console.log("NOMATCH: Speech could not be recognized.");
                    } else if (result.reason === TranslationRecognitionResult.Reason.Canceled) {
                        const cancellation = CancellationDetails.fromResult(result);
                        console.log("CANCELED: Reason=" + cancellation.reason);
                        if (cancellation.reason === CancellationReason.Error) {
                            console.log("CANCELED: ErrorCode=" + cancellation.ErrorCode);
                            console.log("CANCELED: ErrorDetails=" + cancellation.errorDetails);
                        }
                    }
                },
                function (err) {
                    console.error("ERROR: " + err);
                }
            );

            // Stop translation on Enter key press
            process.stdin.on('data', function(input) {
                if(input.trim() === 'enter') {
                   translator.stopContinuousRecognitionAsync();
                }
            });

API Reference

SpeechTranslator Class

The main class for performing speech translation.

Constructor

new SpeechTranslator(speechConfig: SpeechConfig, audioConfig: AudioConfig)

Initializes a new instance of the SpeechTranslator class.

Parameters:

  • speechConfig: An instance of SpeechConfig with your service details and recognition language.
  • audioConfig: An instance of AudioConfig specifying the audio input source.

Methods

addTargetLanguage(language: string): void

Adds a target language for translation. The language string should be a valid BCP-47 language tag (e.g., "fr-FR", "es-ES").

transTranslateSpeechToText(onEvent: (result: TranslationRecognitionResult) => void, onError: (error: string) => void): Promise<void>

Starts continuous speech-to-text translation. The provided callbacks will be invoked with translation results or errors.

Parameters:

  • onEvent: A callback function that receives TranslationRecognitionResult objects for each recognized or translated utterance.
  • onError: A callback function that receives error messages.

stopContinuousRecognitionAsync(): Promise<void>

Stops the continuous speech recognition and translation process.

Advanced Scenarios

Speech-to-Speech Translation

For speech-to-speech translation, you'll need to configure the SpeechConfig with synthesis enabled and use the SpeechSynthesizer in conjunction with the translator.

// Example snippet for speech-to-speech (requires additional setup for synthesizer)
                // ... (SpeechTranslator initialization as above)

                // Configure synthesizer (assuming similar subscription key and region)
                const speechConfigSynthesize = SpeechConfig.fromSubscription(subscriptionKey, serviceRegion);
                speechConfigSynthesize.speechSynthesisVoiceName = "en-US-JessaNeural"; // Example voice

                const synthesizer = new SpeechSynthesizer(speechConfigSynthesize);

                // In your onEvent callback for translator:
                // if (result.reason === TranslationRecognitionResult.Reason.TranslatedSpeech) {
                //     const translatedText = result.translations["fr-FR"]; // Example target language
                //     synthesizer.speakTextAsync(translatedText);
                // }
                

Customization

Integrate with custom speech models for improved accuracy in specific domains. This typically involves training a model in the Azure portal and referencing its deployment ID in your SpeechConfig.

Explore More Examples on Microsoft Docs