Creating an AI Chatbot Powered by IBM Watson

Author: Kit Armstrong, Full Stack Developer

The Problem

Lately, it seems that everywhere you turn companies are introducing chatbots to handle interactions with their clients. While chatbots were mainstreamed by big-name companies like Facebook, and Slack, the recent explosion of AI services means it's easier for any company to introduce their own chatbots on their own platforms and websites.

Case in point: recently, I found myself working on a project where the client wanted to use different AI services to create a chatbot that would assist in form completion in the context of users needing to fill out required forms for legal proceedings. Due to the difficult nature of these forms and the challenging (legal) vocabulary that is used, I needed to create a chatbot that would provide assistance to a wide variety of users.

The main focus of this chatbot is to allow users to complete the required forms by interacting with the bot in their preferred language. Along with completing the form, the chatbot needed to be able to answer general questions that users may have about the legal process they are working through. To create this chatbot I opted to use three services provided by IBM Watson: Assistant, Speech-to-Text, and Text-to-Speech.

 
AI Chatbot Powered by IBM Watson
 

Development

Assistant Configuration

Watson Assistant is IBM’s platform for creating conversation based interactions. It leverages natural language processing and continually improves its interactions the more it is used. From a developer’s perspective, Watson Assistant is extremely easy to work with when creating custom applications.

A new instance of Assistant can be created through IBM Bluemix, IBM’s cloud and AI platform. Once you have an Assistant workspace up and running it is time to start shaping the interactions your users will have within it. A conversation in Assistant is comprised of three components: intents, entities, and dialogue.

Intents are the purpose or goal of a user’s input. They are usually expressed as questions or answers. When Assistant recognizes an intent it can then select the correct dialogue to respond with and knows how to correctly shape the flow of the conversation.

Intent for a user requesting store hours.

Intent for a user requesting store hours.

Entities represent specific pieces of information in the user’s input that are relevant to the request. While intents represent the action a user wants to take, entities represent the specific nouns associated to the action. For instance, if a user’s intent is to get the date of a holiday, then the relevant holiday entity is required.

Holiday entity

Holiday entity

Dialogues use intents and entities to create responses that are returned to the user. Sometimes a single intent does not provide enough information required by a dialogue to form a response. In these instances, dialogues can be nested as children of other dialogues. When a dialogue does not have all of the required information, it can then respond with a child dialogue to request additional details. This is how the general flow of a conversation is created.

Hours of operation dialogue

Hours of operation dialogue

Those are the three general components required to create a conversation with Assistant. Of course, there are many other configurations that can be added to really enhance the ability of your conversation, but that is a topic for another article!

Assistant SDK

Once you have configured your Assistant, it is time to start interacting with it. IBM provides an SDK in a number of different languages that assists you in your communication with Watson. As I was going to be using React to create the chatbot, I opted to use the NodeJS SDK. Creating a new connection and sending a message is easy with the SDK:

const AssistantV1 = require('watson-developer-cloud/assistant/v1');
const assistant = new AssistantV1({
  username: process.env.ASSISTANT_ENG_USER,
  password: process.env.ASSISTANT_ENG_PASS,
  version:  process.env.ASSISTANTVERSION
});
const sendMessage = (message, context) => {
  return new Promise((resolve, reject) => {
    context = context || undefined;
    const payload = {
      workspace_id: process.env.ASSISTANT_ENG_WORKSPACE,
      input: {
        text: message
      },
      context: context 
    };
    assistant.message(payload, (err, response) => {
      if (err) {
        reject(err);
      } else {
        resolve(response);
      }
    });
  });
}

Then I created a route in Express that would allow me to communicate with the sendMessage() function from my React app. At that point, I was ready to start creating the frontend.

React Frontend

I opted to use React to create the frontend as it allowed me to easily create a small, standalone app that could be deployed over any existing application. I also used Redux in order to easily maintain the conversation history and language selection.

Sending a message to the endpoint I created earlier was easy:

sendMessage(message, language) {
  // Get the latest conversation context if it exists
  const context = this.getLastContext();
  
  // Add outgoing message to storage if it is not blank.
  if(message) {
    this.addMessage(message, context, OUTGOING_MESSAGE);
  }
  const data = {
    message: message,
    language: language,
    context: context
  };
  this.setState({unsentMessage: '', loadingResponse: true});
  fetch(ASSISTANT_URL, {
    method: 'POST',
    headers: {
      'Accept': 'application/json',
      'Content-Type': 'application/json'
    },
    body: JSON.stringify(data)
  })
  .then(response => response.json())
  .then(json => {
    this.setState({loadingResponse: false});
    this.handleChatResponse(json);
  });
}

This function sends a message in a user’s desired language and then waits for a response from Assistant. Once a response is received I passed it to handleChatResponse() which looks at the type of response that was received and then renders the correct components accordingly.

Assistant sends all responses in JSON format regardless of the type of response. Text, images, and lists are the main types of responses that Assistant will send. It is up to the programmer to determine how they want to display each response type. My chatbot, for example, displays text and images inline and then display lists as buttons or a dropdown depending on the number of returned options.

A context object is used by Assistant to determine which conversation a request belongs to. The first response returned from Assistant will contain this context as part of the response object and must be sent with every subsequent request. This is how Assistant knows the history of the conversation.

Here is an example of a response that contains the conversation context:

{
  "context": {
    "conversation_id": "73e3756c-6847-453c-8363-f6bc3496056f",
    "attempt": 0,
    "marriage_city": "Toronto",
    "marriage_country": "Canada",
    "marriage_date": "1998-06-25",
    "marriage_province": "ON",
    "question": 4,
    "spouse_moveindate": "1998-06-23",
    "step": 4,
  },
  "output": {
    "list": ["Never married", "Divorced", "Widowed"],
    "log_messages": [],
    "questioncomplete": {
      "question3c": {
        "answered": true,
        "marriage_country": "Canada"
      }
    },
    "text": [
      "Okay, so you were married in Toronto, ON, Canada. Lets go to    question 4.",
      "Okay, next question.. before you got married to your spouse,   what was your previous marital status?"
    ],
    "success": true
  }
}

Text-to-Speech and Speech-to-Text

Along with Watson’s Assistant service I also took advantage of speech-to-text and text-to-speech to make the chatbot more accessible.

Implementing text-to-speech with Watson is easy. Simply send the text you wish to transcribe in a request and Watson will respond with a blurb of the audio. The audio blurb can then be added to an Audio object and played through the user’s output device.

Watson’s speech-to-text service is also quite easy to use. The challenge is not using the service, but capturing the audio from the user’s input device. To accomplish this I used a library called RecordRTC. This library makes it easy to create a stream of audio from the user’s input that works across multiple browsers.

 
1_e6B1azTIptfJ35unURzsow.png
 

Once the user starts recording they must remember to click the button a second time to stop the recording. When finished, I took the audio stream, turned it into a blurb, and sent it to Watson. Watson would then process the speech and return the text.

See the Chatbot in action in the demo video.

Conclusion

Adding a Watson powered chatbot for your users to interact with can be a powerful tool. By providing an easy, intuitive avenue for users to get answers to their questions, help with their problems, or interact with you application, adding a chatbot service can bring you into the twenty-first century.