download Download: google-stt.zip

This example demonstrates how to implement the speech-to-text feature in c#, which is able to convert audio data to text messages. The conversion is based on the powerful Google Cloud Speech API. The converted data can be an audio file, audio stream or real time human voice as well. Any audio supported by Ozeki VoIP SIP SDK is accepted. To understand this article, please read the following tutorial as well:
How to configure Google Cloud Platform to your Ozeki VoIP SDK projects

You can choose from all Google translation API supported languages.

An internet access is required. To try this example, you need to have Ozeki VoIP SIP SDK installed, and a reference to OzekiSDK.dll should be added to your Visual Studio project.

speech to text conversion
Figure 1 - Speech to Text conversion

What is Speech-to-Text used for?

A speech-to-text (STT) system converts normal speech from multiple languages into text. Users can set speech inputs and save them as text files, so later on the files can be read or analysed. You can use the text results for several purposes. For example you could store phone conversations in written forms. You can also store texts in an SQL database, forward them in e-mail or SMS or search keywords in them.

Speech-to-Text refers to the ability to listen to an audio stream and converting it to a text message. STT engines with different languages, dialects and specialized vocabularies are available through the Google Cloud Speech API. Check if your required language is supported.

How to implement Google speech-to-text feature
in your Ozeki VoIP SIP SDK project?

First you will need to register to the Google Cloud Platform, than you need to set the API access credentials on your operating system and install the Google Cloud Speech SDK. After the installation is finished you will need to reboot your computer to test the example codes in Ozeki VoIP SIP SDK. Here is a detailed tutorial on how you can set up and try your examples.

The sample projects can be downloaded from here (google-stt.zip). Each project contains a basic example that combines the functionality of our SDK and the features provided by the Google Cloud Speech API, presented in a simple C# class, GoogleSTT. The GoogleSTT class demonstrates how to implement Speech-to-Text functionality with OzekiSDK powered by the robust Google Cloud Speech API. A wide variety of languages can be given as parameter (e.g. an instance is created at line 25 to 27 in the C# example below). This instance can be attached to the call through the correct sender object (line 29). The instance in the current example can recognise United Kingdom English speech arriving through the microphone and converts it to text messages.

Microphone signals are converted to text
in C# using the Google Cloud Speech API

'Program.cs'

using Ozeki.Media;
using System;

namespace Google_Speech_To_Text_V1
{
    class Program
    {
        static MediaConnector connector;

        static Microphone microphone;

        static GoogleSTT googleSTT;

        public static void Main(string[] args)
        {
            Console.OutputEncoding = System.Text.Encoding.UTF8;

            connector = new MediaConnector();
            microphone = Microphone.GetDefaultDevice();
         
            var format = new WaveFormat(48000, 16, 1);

            microphone.ChangeFormat(format);

            googleSTT =
            new GoogleSTT(GoogleLanguage.English_United_Kingdom,
            							format.AsVoIPMediaFormat());

            connector.Connect(microphone, googleSTT);

            microphone.Start();

            googleSTT.Start();

            Console.WriteLine("Speak !!");

            Console.ReadLine();

            Console.WriteLine("Disconnect");

            connector.Disconnect(microphone, googleSTT);
           
            Console.WriteLine("Google dispose");

            googleSTT.Dispose();
            googleSTT = null;

            Console.WriteLine("microphone dispose");

            microphone.Dispose();
            microphone = null;

            Console.WriteLine("connector dispose");

            connector.Dispose();
            connector = null;
        }
    }
}

'GoogleSTT.cs'

This 'GoogleSTT.cs' example class is capable to provide Speech-to-Text functionality through the Google Cloud Speech API. You can write classes similar to 'GoogleSTT.cs'.

From line 80 to 86 you can see the results of the speech-to-text conversation. The 'result.Alternatives' is a list of objects containing every possible result and the confidence level of each result. When speech is converted by the Google Cloud servers, the servers can understand speech as multiple texts and render a confidence level to each them from 0.0 to 1.0.

This example selects the text with the biggest confidence value and writes it on the console.

using Ozeki.Media;
using Google.Cloud.Speech.V1Beta1;

using System;
using System.Threading.Tasks;
using System.Threading;
using System.Linq;

namespace Google_Speech_To_Text_V1
{
    class GoogleSTT : AudioReceiver
    {
        SpeechClient speech;
        SpeechClient.StreamingRecognizeStream streamingCall;

        Task printResponses;

        AudioFormat _format;

        private string _languageCode;

        private GoogleLanguage _language;
        public GoogleLanguage Language
        {
            get { return _language; }
            set
            {
                _language = value;
                _languageCode = _language.GetCode();
            }
        }

        public GoogleSTT(string languageCode)
            : this(GoogleLanguageExt.GetGoogleLanguageFromCode(languageCode),
            		new AudioFormat())
        { }

        public GoogleSTT(GoogleLanguage languageCode, AudioFormat format)
        {
            Language = languageCode;

            SetReceiveFormats(format);

            _format = format;

            Init();
        }

        private void Init()
        {
            speech = SpeechClient.Create();

            streamingCall = speech.StreamingRecognize();

            streamingCall.WriteAsync(
               new StreamingRecognizeRequest()
               {
                   StreamingConfig = new StreamingRecognitionConfig()
                   {
                       Config = new RecognitionConfig()
                       {
                           Encoding =
                           RecognitionConfig.Types.AudioEncoding.Linear16,
                           SampleRate = _format.SampleRate,
                           LanguageCode = _languageCode,
                           MaxAlternatives = 5
                       },
                       InterimResults = true,
                   }
               });

            printResponses = Task.Run(async () =>
            {
                while (await streamingCall.ResponseStream.MoveNext(
                    default(CancellationToken)))
                {
                    foreach (var result in streamingCall.ResponseStream
                        .Current.Results)
                    {
                        if (result.IsFinal)
                        {
                            var top =
                            result.Alternatives.OrderBy(x => x.Confidence).First();

                            Console.WriteLine(top.Transcript);
                        }
                    }
                }
            });
        }

        object writeLock = new object();

        public bool IsRunning { get; private set; }

        public void Stop()
        {
            IsRunning = false;
        }

        public void Start()
        {
            IsRunning = true;
        }

        protected override void OnDataReceived(object sender, AudioData data)
        {
            if (!IsRunning) return;

            lock (writeLock)
            {
                var request = new StreamingRecognizeRequest();
                request.AudioContent = Google.Protobuf.ByteString
                            .CopyFrom(data.Data, 0, data.Data.Length);

                try
                {
                    streamingCall.WriteAsync(request).Wait();
                }
                catch (Exception e)
                {
                    streamingCall.WriteCompleteAsync();
                    Init();
                }
            }
        }

        protected override void Dispose(bool disposing)
        {
            Stop();

            if (printResponses != null)
            {
                printResponses = null;
            }

            if (streamingCall != null)
            {
                streamingCall = null;
            }

            if (speech != null)
            {
                speech = null;
            }

            base.Dispose(disposing);
        }
    }
}

Related Pages

More information