Building a Custom Serverless Chatbot — Pieces & Parts

Video demo of some of the key features in action

Implemented as Travel Destination Chatbot (https://app.spotsave.io )

Getting a handle on what-it-takes, pieces-and-parts is often difficult to conceive — overwhelm is more the initial reaction. Following aims to orient both techies and tech-interested business folks to the main components. This is clearly an AWS-centric architecture, but many of the concepts apply to different infrastructure approaches.

But aren’t there a million ways to build a chatbot these days (“and it only takes 10 minutes!”) ? Maybe, but when a certain level of sophistication & control is needed, it’s a different ballgame, e.g API integrations to varied data sources, complex business logic, custom UI widgets etc.

Infrastructure Components — Summary

Partial List of Managed Services

https://medium.com/media/d95cf178bb0c1814c97ef3fe95005312/href

Trending Chatbot Tutorials

1. How I developed my own ‘learning’ chatbot in Python

2. Write a serverless Slack chat bot using AWS

3. Best chatbot platforms to build a chatbot

4. Chatbot Conference in San Francisco

The result ?

Example: User entered “sightseeing in seattle” to receive the following combination of images, content, icons, buttons.

“sightseeing in seattle”

Typical response time for above ~2 seconds, mostly depending on 3rd party API’s in that moment.

Fundamentals in Approach

  • Chat Platform Independence: No reliance on the known chat platforms (Facebook Messenger, Slack, Telegram etc.). Inherent complexity to support them all and lack of control.
  • Serverless Architecture: Use fully managed services as bedrock. Infrastructure scale & stability baked-in.
  • Chat-ish Approach: Provide text-based inputs, as well as context sensitive buttons for easy nav.
  • Progressive Web Application (PWA): Can be installed on user’s device. Looks and feels like a traditional app. Way easier to maintain. Paves the way for app-like features, such as push notifications and full screen experience. Not a traditional mobile App!
  • Support Mixed Audience: Deliver an intuitive & useful experience for 20-somethings + grandma!

Client Side Application Components

  • React.js — Webapp built in react.js using Amplify CLI & Apollo GraphQL for deployment & integration.
  • Custom UI Widgets — Custom implementation of the UI Widgets you’d expect — Image carousel, slideshow, quick reply etc. Leverage semantic-ui.css as CSS & Icon framework.
  • “Rich” Message Types — Send & receive both (1) Text Requests and (2) Rich Requests for message passing (Buttons, Slideshows, Carousels, Forms & Responses, Links).

Sample Rich Message (from server to client) for generating a Slideshow (abbreviated).

{
"richmsg" :
{
"type" : "slideshow",
"slider": {
"urls": [
{ "url" : "https://images..1.jpg",
"caption" : "What a great image",
"attribution" : "© John Smith Photography, 2019"
},
{ "url" : "https://images..2.jpg",
"caption" : "What a fab image",
"attribution" : "© Jane Doe Photography, 2018"
}
}
}
}
  • Rich requests — Allow JSON objects with richer structured content about user’s request or response, e.g. clicked a button — send an id, description + a few fields with that button click. Example: clicking location icon sends lat/long in a rich message called “coords”.

Sample Rich Request (from Client to Server) representing a button push.

{ "richReq": 
{ "inputCode": "bookable:tours" }
}
  • Progressive Webapp (PWA) — Requires service worker (js file) + manifest.json to facilitate PWA goodies, app-like interface.

Server Side Application Components

High Level Summary Flow:

Client (React.js) → AppSync (GraphQL) → DynamoDB → Lambda → AppSync → Client

  • AWS Lambda — receives user messages (from AppSync via DynamoDB), executes all business logic, interfaces with third party API’s, returns response to user via AppSync.
  • DynamoDB — Initial data store for all user messages, integrates with AWS Appsync. Messages set to expire (auto-deleted), but in the meantime pushed to Kinesis/MySQL for analysis. DynamoDB also used for user data caching to provide context, e.g. last destination searched, form values entered etc. to be used as defaults.
  • Kinesis + RDS MySQL — Messages pushed to Kinesis to be asynchronously saved to MySQL for further analysis and reporting. NOTE: This does not add latency to user response.
  • S3 / Cloudfront — Cloudfront houses the app itself for fast global access by users. S3 houses various files, whose changes need to be available immediately, e.g. site config files and custom CSS overrides.
  • 3rd Party API Integration — fulfilling user requests depends on access to varied (reliable) sources of content: Rapidapi (quotes, trivia, viral content etc.), Azure (images & videos), Google (latlong/address mapping), Priceline (hotels), Triposo (destination content), Auth0 (authentication), Ticketmaster/Eventful (events), AWS, Yelp (reviews) … you get the idea.

Server side notables

  • Mis-en-place — Say What?! Since fast response is critical, minimize database lookups. Cache as much as practical in the app itself, e.g. profanity filter words, core intents etc.
  • Graceful Termination* — Structure the Lambda such that any one of the user messages, arriving in batches of up to 100 (from various users), gracefully terminate without affecting the others.
  • Timeouts — Set timeouts on API library (e.g. axios) to ensure lambda invocation terminates gracefully, and user does not have to wait too long for something/anything!
  • AWS XRay** — from time to time, enable XRay, to determine any potential performance issues.
  • Structured logging — Employ structured logging such that spurious error conditions can be searched/addressed in the sea of messages flying around.

Lambda Message handling + graceful termination*

module.exports.handler = async(event, context, callback) => {
// Asynchronously process each record in DynamoDB Stream batch
const promises = event.Records.map( async (record) => {
try {
const data = record['dynamodb']['NewImage'];
await process(data); // <-- Business Logic & response here
}
catch (error) {
console.error('Doh: ', JSON.stringify(record), error);
}
});
await Promise.all(promises);
... // End function when all responses processed

Some X-Ray** metrics as presented in dashbird.io

In this case, no intent was found, so it reverted to the Machine Learning model for help with an answer. Response from ML is typically much faster than shown, more like ~125ms.

The lambda function ran for 673ms. XRay coupled with Dashbird is super useful to see what’s going on.

Client side notables

  • Chunked Application — split the deployment package into multiple chunks, especially the JS files. Typically, the “Vendor” files (libraries) don’t change, but the Application code does. These should be separated such that application changes only require user’s browser pulling what changed. Webpack is one of the tools that enables this.
  • Startup Performance — Thanks to Google Chrome Team, the Lighthouse product is a great way to audit your app. How fast does it load, under various conditions: 3G, 4G, Direct Connect. Google Chrome → Developer Tools → Audit. Some great suggestions offered as part of audit.
  • Register/Login/Authentication— way harder than it looks to do this well, which is why companies like Auth0 exist and are very well funded. An obvious pain point (also for users) so it needs to be solid and a smooth experience. See section below for more on Authentication.
  • Multi-platform CSS — no gimme to get a webapp working on all devices, especially iPhone. You’ll need someone to run down the many annoying nuances to wrestle this bear to the ground!!

Quick Reminder

Authorization — What you have access to.

Authentication — Who are you.

A Few Key Subjects

Following are called out specifically since they deserve extra attention.

1. Intent Mapping/NLU (Natural Language Understanding)

The holy grail of chatbots is seamless understanding of what the user wants — state of the art is not there yet, and is super-difficult regardless, so some design decisions have to be made to get as close as possible, for the domain in question — travel destinations in this case. Goal is to answer the user’s question directly or smoothly guide them to a useful response, in a non-frustrating way!

Determine Intent

  • Machine Learning Model — assign Intent ID to set of user utterances. Build multi-class classification model in AWS ML. Used primarily for (1) small talk/random user utterances, e.g. “what’s up”, “how do I look”, “what did you do today” and (2) more complex, offbeat/colloquial utterances that may map to a core intent, e.g. “I fancy a ruby murray” could map to dining → curry. This approach can also be used to implement Client FAQ’s and custom content to offload customer service requests, e.g. “how do i pay my bill online” or “what is my points balance”.
  • Keywords — known, standard words, e.g. dining, sightseeing, activities
  • Keyphrases — combination of words that individually may not expose a keyword intent but do map to an intent, e.g. “popular with locals”
  • Regular Expressions — capture specific variations of words and phrases
  • Single Words — very specific case where 1 word is used and not embedded in a phrase, e.g. “history” to show user’s command history vs. “history in seattle” which returns culture and museums.
  • Profanity Filter — 500+ words. If any of these are hit, the user gets poop emoji’d, an admonishment, and no results. Go ahead, try it 🙂
  • Stop Words — words that do not materially contribute to intent (“is”, “or”, “but” etc.). These are ignored as the user’s input is evaluated.
  • Entities — in this case countries, states/provinces and cities (destinations) are the core entities. Available countries and states/provinces are cached directly in the app for instant lookup, cities (6000+) are stored in DynamoDB for selective/fast lookup.

Approx Timings

  • Parsing the user’s input, profanity filter, core intent mapping — a few milliseconds.
  • DynamoDB lookups — 10–30 milliseconds
  • Machine Learning inference (prediction)— typically 70–400 milliseconds.

Known Intent

To cater to all types of users, we can’t rely on fully understanding all textual inputs, so in the following cases, we already know the intent via these structured submissions.

  • Buttons — e.g. asking for a specific thing, e.g. Seattle Videos
  • Forms — e.g. searching for Hiking Trails in the area
  • Deeplinks — e.g. sharing direct access to a specific thing, e.g. “sightseeing in seattle”

Quick Question

Why not use Google Dialogflow or AWS Lex or Other to build some/all of the bot ?

  • Dialoglow has a callback mechanism which is cumbersome as an add-on to the Lambda processing.
  • Lex has a nice, fast request/response API implementation but with a 100 intent limit (actually fewer in real life) and was constantly rate limiting in terms of (re)builds of the model. Too limiting.
  • RASA is a nice tool but didn’t want the additional overhead of yet another thing to integrate, although may possibly move to that at some point.
  • In the end, to have control and flexibility, created the home-grown solution with the add-on of an AWS Machine Learning multi-class classification model.

2. Rich UI Widgets & Messages

Since we’re not leaning on an existing chat platform (e.g. FB Messenger), we needed to build a set of client-side widgets to present data in an attractive way, e.g. Carousels, Quickreplies, Slideshow, HTML content, Images, Videos etc.

When our chat platform responds to the user, the responses are structured in a particular way to signal the client app what type of Rich UI Widget (“Rich Message”) should be used for display. Leaned on Semantic UI (via React.js) for the core widgets, and icons.

3. Authentication

For such a simple concept, this can be a bear to get right especially while delivering a frictionless User Experience.

Some of the ways to login:

  • Userid / Password — least desirable requiring user to create yet-another combo.
  • Social Logins — leverage most often-used user acounts, e.g. Facebook, Twitter, Amazon etc. logins as the trusted validator of identity. Relatively simple for user.
  • Passwordless — Email or SMS a code or link, representing the password component of a login. User has to provide their Email or Phone# as “userid”.
  • Autologin — In coordination with Client, create a handshake where each side knows enough info about the other, along with shared secrets, to encrypt/decrypt & validate logins. Assumption is user is already logged into Client’s member/employee site and an autologin url is made available to eliminate the need for another login. This creates a seamless user experience with consistent branding.

Certain content suppliers require, as part of the license agreement, that users are validated/known before they can consume content.

Our approach is to support Social Logins, Passwordless and Autologin. The first 2 are delivered by an integration with Auth0.com. Following is Social Login example experience:

A Word About Branding (White label)

The system is aimed at member-based organizations — offer this up as a member benefit.

Following is a fictitious white label — varying colors, fonts and some icons. The url domain name is used as a key to retrieve site config info + custom CSS overrides.

Masthead, font, red location icon, menu icon uniqueness

The Landing Page

Stands to reason that if a Client wants a white-labeled implementation of this, they probably also want a customizable landing page to intro what is behind the curtain. The approach is to allow Client to build their own landing page (if they want), completely independent of the Chat platform, and we then just suck that into the beginning of the chat experience.

We built one in Bootstrap4, as an example, to demonstrate the approach — figured out how via a free course in Udemy (see reference section). Looks like below, but could look like anything. The photo is one I took with Google Pixel phone — Tillamook Rock Lighthouse, Cannon Beach. #random

Brandable Landing Page

Bothead — The Game!

I bring this up because it uses a few interesting NLP-related features — Word2Vec & Entity Extraction. Just for a bit of a fun …

Premise: You enter a word or phrase, the bot tells you the 1st things that “pop into it’s (bot)head”.

Word2Vec — the 2017 wikipedia dump pre-trained word2vec model is used to extract single words contextually similar to the word entered.

Entity Extraction — if a phrase is entered, a combination of a 1-hit web search + AWS Comprehend Entity Extraction is used to capture entities such as ORGANIZATION, PERSON, LOCATION, DATE etc. from the search results. Various filters and exclusions are layered on, to provide some variety.

Using the above, the bot pretty much always has something to say, mostly relevant, when playing the game. To engage, just enter: bothead

In Closing …

That’s it for now — hope it was informative. Always open to constructive ideas and feedback.

About the Author

2x CTO Co-Founder in Travel Tech Space. Sold equity in both.

Computer Science Degree from Mississippi State University while attending from Yorkshire, England on a Tennis Scholarship (can’t make this up!). Interests & Activities: Making dinner & cocktails for friends. Weight lifting & Spinning to build up an appetite.

Paul Heath, pheath@ownerservices.com

Bellevue, Washington, USA

https://www.linkedin.com/in/pheath

Recognition

Had help building this from REEA Consulting (React.js front-end dev) and a brief consulting project with Yan Cui (theburningmonk.com) for Lambda/Architecture expertise. Both provided valuable contributions.

Reference Material

Several resources I found useful along the way:

Complete Node.js developer course — Andrew Mead/Udemy

Just how expensive is the full AWS SDK — Yan Cui

Production Ready Serverless — Yan Cui/Manning Publication

Progressive Webapps — Google. How-to’s and checklists

Bootstrap 4: Create a Landing Page — Jeppe Schaumburg Jensen/Udemy

Products

Authentication as a Service — Auth0

Performance Analysis & Centralized Logging — https://dashbird.io & https://logz.io

Lighthouse in Chrome Dev Tools — Google, for webapp performance analysis.

Semantic UI — Front-end components and styling

https://app.spotsave.io

Don’t forget to give us your 👏 !

https://medium.com/media/7078d8ad19192c4c53d3bf199468e4ab/href


Building a Custom Serverless Chatbot — Pieces & Parts was originally published in Chatbots Life on Medium, where people are continuing the conversation by highlighting and responding to this story.

Source link