info@voip-sip-sdk.com Tel.: 00 36 52 532 731

Navigation in the IVR tree using Voice recognition

Explanation

Prerequisities

Download: VoiceNavIVR.zip

Voice recognition is another great option to navigate in the IVR tree beside DTMF navigation. Any of the options you can select because Ozeki VoIP SIP SDK provides professional technology to achieve your goals. This guide presents how you can implement navigation in the IVR tree using voice recognition with Ozeki SIP SDK.

Introduction

The IVR (Interactive Voice Response) system is a telecommunication software solution that uses a tree structured menu to navigate the caller user towards the solution for their problem. The main concept of this system is that there are some common problems and questions that can be answered without a human operator's help.


Figure 1 - IVR voice navigation

The IVR tree can be navigated by keypad buttons (DTMF signals) and by using voice recognition methods. In the latter case the user navigates the IVR tree with spoken words. For this purpose the program needs to implement some voice recognition techniques that support the navigation.

Ozeki VoIP SIP SDK provides all the support for the necessary techniques with that you can easily create your voice navigated IVR tree system.

The following program code uses the background support of Ozeki VoIP SIP SDK, therefore you will need to download and install the SDK on your computer before starting to use the program code. You will also need to have Visual Studio 2010 or compatible IDE and .NET Framework installed on your system, as the program code below is written in C# language.

The example program this article introduces is an auto answering softphone that registers a phone line in the PBX with the SIP data stores in the application configuration file. The IVR tree is built up by parsing a well structured XML file and the IVR node messages are read up by using text to speech conversion. The IVR navigation is implemented using speech to text voice recognition.

The user interface of the example program is shown in Figure 2. The TreeView on the left side shows the IVR tree. The actually selected node is displayed by red text color on the GUI.


Figure 2 - Graphical User Interface for the voice recognition IVR example program

The SIP registration information of this sample application is stored in the application configuration (app.config) file. Code 1 shows the app.config file structure. You can see how you can add the SIP data in the configuration file.
Note that you need to add your own SIP registration data in this file if you want to have your program work properly.

<?xml version="1.0" encoding="utf-8" ?>
<configuration>
  <appSettings>
    <add key="IsRegRequired" value="true"/>
    <add key="displayname" value="oz879"/>
    <add key="username" value="oz879"/>
    <add key="registername" value="oz879"/>
    <add key="regpass" value="oz879"/>
    <add key="domainhost" value="192.168.112.100"/>
    <add key="port" value="5060"/>
    <add key="nat" value="0"/>
  </appSettings>
</configuration>

Code 1 - The application configuration file

The application configuration file can be handled in a C# program with an inbuilt tool called ConfigurationManager. This tool is in the System.configuration namespace that you need to add to the using directives.

The ConfigurationManager.AppSettings is a NameValueCollection that can be used for getting the values for the defined keys in the app.config file. The gotten values are of string types, so in some cases you will need to parse them to bool or Int32.

When the actual parsing is done, you can register the phone line in the PBX. First you need to have an IPhoneLine instance, that is done by using the previously initialized softphone object, then you need to assign the phone line to the state change event and at last you can register the line also by using the softphone object. This means that the register() method needs to be called after the softphone initialization, if there is no initialized softphone object, this method will throw a NullReferenceException.

var cm = ConfigurationManager.AppSettings;
IsRegRequired = bool.Parse(cm["IsRegRequired"]);
displayname = cm["displayname"];
username = cm["username"];
registername = cm["registername"];
regpass = cm["regpass"];
domainhost = cm["domainhost"];
port = Int32.Parse(cm["port"]);

switch (Int32.Parse(cm["nat"]))
{
    case 0:
        natTraversal = NatTraversalMethod.None;
        break;
    case 1:
        natTraversal = NatTraversalMethod.STUN;
        break;
    case 2:
        natTraversal = NatTraversalMethod.TURN;
        break;
}

SIPAccount sa = new SIPAccount(IsRegRequired, displayname, username, registername, regpass, domainhost, port);
NatConfiguration nc = new NatConfiguration(natTraversal);

phoneLine = softPhone.CreatePhoneLine(sa, nc);
phoneLine.PhoneLineStateChanged += new EventHandler<VoIPEventArgs<PhoneLineState>>(phoneLine_PhoneLineInformation);

softPhone.RegisterPhoneLine(phoneLine);

Code 2 - Reading the registration data

The IVR tree structure is defined in the IVRTree.xml file. This is a very comfortable method as you only need to parse the xml file for building up the tree and you can modify the tree structure by changing the xml file data.

The tree structure definition .xml file can be seen in Code 3. An IVR node always has a name, message and navigation instruction word and it can have any children that are also node elements in the .xml.

<?xml version="1.0" encoding="utf-8" ?>
<node>
  <name>Node name</name>
  <message>Node message</message>
  <instruction>Instruction word</instruction>
  <children>
    <node>...

Code 3 - The tree structure definition in .xml file

The tree structure can be built by using a special XML parser class that uses the standard XElement object for the parsing. The parser method that can be seen in Code 4 is a recursive one that reads the .xml file got as parameter and builds up a tree structure of IVRNode objects. The IVRNode is a class defined in this example project too.

IVRNode node = new IVRNode(element.Element("name").Value.ToString(), element.Element("message").Value.ToString(), parentNode);
foreach (XElement childnode in element.Element("children").Elements("node"))
{
        try
        {
            node.AddChild(childnode.Element("instruction").Value.ToString(), parse(childnode, node));
            if (!instructions.Contains(childnode.Element("instruction").Value.ToString()))
                instructions.Add(childnode.Element("instruction").Value.ToString());
        }
        catch (Exception ex)
        {
           ...
        }
}
return node;

Code 4 - XML parsing method

The IVRNode class (Code 5) defines the tree nodes for the IVR menu tree in this example program. The self-defined class is needed as the TreeNode that is used for displaying the structure on the GUI can not store all the information this example program stores for the IVR menu node.

Each IVRNode object stores the node name and message that is read from the .xml file. The node also stores its parent node and its children nodes in a Dictionary. The Dictionary key is the navigation word for the child node and the value is an IVRNode object. The node object also contains a TreeNode object that is for the GUI displaying and handling.

The node can be initialized with the name and message elements. In that case the parent of the node will set to null. This means that the node is the root element in the tree. You can also define the tree node by specifying the parent node too, in this case you define a tree node that is not the root.

You can add child nodes to the tree node by calling the AddChild method and get a specified child by calling the getChild method. The getChild method needs a navigation string as parameter and returns the child node that has the given string as key in the children Dictionary.

class IVRNode
    {
        public TreeNode treeNode;
        IVRNode parent;
        Dictionary<string, IVRNode> children = new Dictionary<string,IVRNode>();
        string name;
        public string message;

        public IVRNode(string name, string message)
        {
            this.name = name;
            this.message = message;
            this.parent = null;
            treeNode = new TreeNode(name);
        }

        public IVRNode(string name, string message, IVRNode parent)
        {
            this.name = name;
            this.message = message;
            this.parent = parent;
            treeNode = new TreeNode(name);
        }

        public void AddChild(string instruction, IVRNode child)
        {
            children.Add(instruction, child);
            treeNode.Nodes.Add(child.treeNode);
        }

        public IVRNode getChild(string instruction)
        {
            if (children.ContainsKey(instruction))
                return children[instruction];
            return null;
        }


    }

Code 5 - IVR tree node definition

The IVR tree building is made by a simple method in the Softphone class (Code 6). This method is called before the softphone initialization and builds up the tree and sets the TreeView GUI element too.

The method uses the XML parser class for building up the tree from the .xml file. The parsing also sets the possible navigator words in the static instructions List that will be the basis for the speech to text recognition. When the tree is built up, the instruction words needs to be extent with two special navigation words "again" and "main" that can be used in every tree node for repeating the node message of returning to the main menu.

After building up the tree and setting the navigation words, the voice recognizer object can be initialized. It also needs to subscribe for the WordsRecognized event those event handler will do the actual IVR tree navigation in this program.

The method also sets the GUI TreeView according to the built up IVR tree and displays the root element as the selected one.

root = TreeXMLParser.parse(TreeXMLParser.treedoc, null);
TreeXMLParser.instructions.Add("again");
TreeXMLParser.instructions.Add("main");

navigator = SpeechToText.CreateInstance(new AudioFormat(8000, 1, 16, 20), TreeXMLParser.instructions.ToArray<string>());
navigator.WordsRecognized += (SpeechToText_WordsRecognized);

treeView1.Nodes.Add(root.treeNode);
selectedNode = root;
selectedTreeNode = root.treeNode;
try
{
        treeView1.SelectedNode = selectedTreeNode;
        treeView1.SelectedNode.ForeColor = Color.Red;
}
catch (Exception)
{
...
}

Code 6 - Building the IVR tree

This IVR example is a standard softphone program with some changes in the code, therefore you need to handle calls and call states. The program automatically accepts the incoming calls. As for the call states, in case of the InCall state you need to make the proper MediaHandler connection in order to make the program work properly (Code 7).

The PhoneCallAudioSender object needs to be connected to the TextToSpeech object that will read up the IVR nose messages, and the PhoneCallAudioReceiver and the SpeechToText objects need to be connected together. When both the sender and the receiver are attached to the call, the TextToSpeech object can be started and the main menu message will be read up.

You can see that there is no microphone or speaker used in this example program. This is because the program works automatically and does not need human interactions. If it was a fully featured IVR program, it would contain some call transferring for having the line given to a human operator, but this example only demonstrates how you can navigate the IVR tree with voice instructions.

private void call_CallStateChanged(object sender, VoIPEventArgs<CallState> e)
        {
            InvokeGUIThread(() => { label1.Text = e.Item.ToString(); });

            switch (e.Item)
            {
                case CallState.InCall:

                    connector.Connect(ivrReader, mediaSender);
                    connector.Connect(mediaReceiver, navigator);

                    mediaSender.AttachToCall(call);
                    mediaReceiver.AttachToCall(call);

                    ivrReader.AddAndStartText(selectedNode.message);

                    break;

Code 7 - Making MediaHandler connections

The actual IVR tree navigation is implemented in the method shown in code 8. This is the event handler for the speech to text recognizer object's WordsRecognized event. The method works with the vocabulary set for the speech to text recognizer during the initialization and steps to the instructed node. It also sets the selected node on the GUI.

You can always make the program repeat the actual node's message by saying "again" to the program. If you are not in the main menu, you can return to it by saying "main" any time.

When you want to step to a child node, you need to say the navigation instruction word for that node clearly. If there is a child node with that navigation word, the program will jump to it, if there is not, the message that is being read up continues.

When a new tree node is selected, the TreeView object on the GUI will also change the selected node. The selected node is always shown by setting the text color for that node to red.

void SpeechToText_WordsRecognized(object sender, VoIPEventArgs<IEnumerable<string>> e)
{
        foreach (string word in e.Item)
        {
            InvokeGUIThread(() =>
                {
                    label1.Text = "Recognized incoming word " + word;
                });
            //you can restart the actual message by saying again
            if (word == "again")
            {
                ivrReader.StopStreaming();
                ivrReader.AddAndStartText(selectedNode.message);
            }
            //saying main gets you back to the main menu
            if (word == "main" && !atRoot)
            {
                atRoot = true;
                selectedNode = root;
                selectedTreeNode = root.treeNode;
                InvokeGUIThread(() =>
                {
                    treeView1.SelectedNode.ForeColor = Color.Black;
                    treeView1.SelectedNode = selectedTreeNode;
                    treeView1.SelectedNode.ForeColor = Color.Red;
                });
                ivrReader.StopStreaming();
                ivrReader.AddAndStartText(selectedNode.message);
            }
            //stepping into a submenu if exists
            if (selectedNode.getChild(word) != null)
            {
                atRoot = false;
                ivrReader.StopStreaming();
                selectedNode = selectedNode.getChild(word);
                selectedTreeNode = selectedNode.treeNode;
                InvokeGUIThread(() =>
                {
                    treeView1.SelectedNode.ForeColor = Color.Black;
                    treeView1.SelectedNode = selectedTreeNode;
                    treeView1.SelectedNode.ForeColor = Color.Red;
                });

                ivrReader.AddAndStartText(selectedNode.message);
            }
        }
}

Code 8 - Voice recognition for IVR navigation

This IVR navigation example program demonstrated how you can use the IVR tree with voice recognition support with Ozeki SIP SDK. The program is only a sample application, you can extend it with human operator interface or with any other tool you need in your own solution but the main concept can be clearly seen in the sample application.

If you have any questions or need assistance, please contact us at info@voip-sip-sdk.com

You can a suitable Ozeki VoIP SIP SDK license for your IVR system on Pricing and licensing information page

Related Pages

Operating system: Windows 8, Windows 7, Vista, 200x, XP
System memory: 512 MB+
Free disk space: 100 MB+
Development environment: Visual Studio 2010 (Recommended), Visual Studio 2008, Visual Studio 2005
Programming language: C#.NET
Supported .NET framework: .NET Framework 4.5, .NET Framework 4.0, .NET Framework 3.5 SP1
Software development kit: OZEKI VoIP SIP SDK(Download)
VoIP connection: 1 SIP account