AI technologies and specially chatbots and personal assistants (like Google Assistant, Siri, Cortana… ) have been taking an important place in our everyday devices. Although, the features these conversational interfaces present are limited. They don’t seem to understand that much really; current chatbots programmers using services like Dialogflow or Wit.ai should spend hours just validating expressions (and that would be their task as long as the bot exists). Basically what is happening is that these are mostly intent matchers, they match the input from the user with a given intent, but they don’t really understand what is the user talking about.
The next generation of chatbots should instead bypass this issue, and be able to understand the knowledge itself, through its grammatical structure, semantic analysis and using machine learning.
When programmers try to get into the world of Natural Language Processing (NLP) from scratch to apply it to chatbots, it is really difficult to know a precise set of tools or services that will work for a given use case. There is no clear guides about how to put together the existing resources to create a working solution. And there are tons of scientific articles which would take too long to read even to realize if they could be of any help.
Top 3 Most Popular Bot Design Articles:
On the current article, I would like to share a set of tools and services which would be appropriate to achieve this, using a graph to represent the natural language properly, and available web services in order to make a grammatical and semantic disambiguation of the information provided to the chatbot.
Then, it will be provided some examples about how to put this information together into a graph, and how to use it to query useful information (this means, to get answers about what the bot has learned).
Tools and Services
This is a Graph Database, where nodes and relationships are used to represent the data. This representation is highly useful for NLP itself; it allows to store all the relationships between the entities mentioned in a given context, as well as the relationships between the previously acquired knowledge. It is also very natural for us, since it represents the knowledge closer to how we map our own knowledge in our everyday language.
Neo4j uses a query language called Cypher, which keeps some similarities with SQL, but it is optimized to interact with a graph. In the image above can be seen the command which created the represented data, and another one used to query the data and create a new connection on it.
Google Cloud NL
There are several tools and services out there to make a complete syntactic analysis and extract entities from a given text; some of them are free, many of them paid. Though, I see some advantages in using Google Cloud NL:
- It can be used freely for development, since it has a limit of free requests to their API which is typically enough in the development phase.
- Once the API limit is reached, the price is not really that big, at least for a chatbot. It will start to be representative once there are many users consuming the service.
- It uses the same architecture than Google Assistant! Following the good work Google has been doing on their assistant, it is good to assume that the tool will continue improving with time. (A good example: they have been adding several new APIs, recently they added Google Cloud AutoML Natural Language, which comes very handy for machine learning classification tasks).
The syntactic analysis returns two important sets of information about the text passed to the API, the part of the speech of each word, and more important, the dependency labels, which follows the Standford-style dependencies. This information is enough to extract each one of the Noun Phrases and Grammatical Clauses in the text, which allows to identify entities mentioned in the text and the actions (verbs) applied by/to these entities. All this information is crucial in order to apply grammatical rules to understand the text given by the user. Also, they provide an API to detect the entities mentioned in the text, also very handy to rapidly detect proper names, places, organizations, events, persons and others.
On the examples below, only the result from Google Cloud NL syntax analysis will be used to create a graph of knowledge, and no semantic analysis will be explained in details. But to fully understand what a user means, it is not enough with a syntactic analysis, it is required to understand the semantic of the words and the expressions used.
Firstly, we should know what is the real meaning of a word, since each one could have several senses. For instance, when the user says these sentences on the image:
Both of them are using the verb “live”, but they mean different things. We could define their meanings like this, in the first sentence it means:
- survive, live, subsist, exist (support oneself)
In the second one, the meaning of the verb “live” is more like:
- inhabit, dwell, live, populate (be an inhabitant of or reside in)
The task of detecting the differences of meaning in a word is what is called Word-sense disambiguation. Knowing the exact meaning of a word is useful in many different tasks, from matching what the user is saying with some given intent, to offering translations and language-independent understanding of a text.
There are several services/resources which try to lead with this problem; I would like to mention here BabelNet. On BabelNet, each “sense” of a word is linked to a unique synset, which basically denotes a set of words/expressions which have the same semantic meaning. It includes several sources, like the Wiktionary, DBpedia, Verbnet, Wordnet, and others. So, we could say this is a multilingual dictionary and a very powerful semantic network, which links many available resources on the web.
Converting Natural Language to a Graph Representation
The key in the proposed system is the ability to convert any given meaningful text, into its graph representation, which can be queried to obtain any learned information, or can be explored to find patterns which help to infer new relations.
Once a syntactic analysis is done using Google Cloud NL, it is inserted in the graph following the grammatical rules obtained from the labeled dependencies.
For instance, let’s say we want to add the expression: “Bob sends an email”, this basically would be translated to a create query like this one:
Now, when we use natural language, “Bob” (or “email”) are words we are using to refer real life entities or abstract concepts (but very real in our minds). When we continue talking about these objects using anaphoras to coreference them, we are always talking about these entities, and creating them is key in being able to modify the same object later. We can consider these entities as “instances” of a type which is represented by the words.
These create queries are more appropriate, and we’ll see the importance of these entities on the next example.
Let’s say the following text is given to our AI:
“There is a rich merchant, who had two daughters. They live in a beautiful house”
The syntactic analysis will show the following dependency tree:
The first task would be to convert these relations into Cypher queries, which will create the corresponding graph into the Neo4j database:
On the graph we have all the entities represented in white, the types in blue and the connections between them representing the actions are clearly seen. We can then convert questions to graph queries, which will give us any requested information. For instance, we could ask:
“Where do they live?” or “How many daughters have the merchant?” or “Who live in that house?”
These questions would be translated into the following (Cypher) queries, which will directly give the requested information:
Note that we haven’t made any semantic analysis of the input text. For instance, we could add a unique identifier in our types and relationships, so we could have several “live in” relationships inside our graph, but they will be uniquely identified by their ID (for instance, their unique synset obtained from BabelNet) and in fact, they could be completely exchanged with any word/expression with the same meaning. This would allow to uniquely identify the sense of the words, easily finding the same meaning when using synonyms; and as stated earlier, it would guarantee that our understanding is even language-independent, since we can say that each “synset” for a given word/relationship identifies the same “sense” in any language.
Finally, it would be good to know a little more about resolving references to entities in a given context.
On the examples above, we added that the entities involved where of type “person” or “place”. This information could be obtained directly from Google Cloud NL using their Entity Analysis API.
But this is not enough to solve coreferences. Let’s say that the conversation/narration continues like this:
“The eldest loved to go out to party, while the youngest loved to stay at home with her old father”
To be able to understand that “the eldest” and “the younger” are the daughters, we should know these are always referring to elements of a “group of entities” mentioned before. Knowing that “the merchant” is their “parent” would suffice to know that “the father” refers to the same entity. And to be able to solve “her” in “her father”, implies that we know that “daughter” is “feminine” and a “person”.
We could consult on this moment BabelNet or we could at priory keep our own copy of a semantic network in the graph in order to try to infer all these relations. Moreover, we could learn while reading any given text, or from the user input to automatically create new connections which will help later in the resolution of coreferences.
Note, that coreference resolution is one of the most challenging task in computational linguistic, its difficulty is so, that one of the most complicated tests presented to an AI that could pass the Turing test, is the Winograd Schema Challenge, where different expressions with a specific structure are given, and the computer is asked to solve references in the text which should be obvious for a human reader.
Note that the pronoun “they” refers to a completely different entity when it is used one expression or the other. The solution is simple for a human, only birds can fly, and only men can run out of bullets (specially when hunting).
A knowledge graph is a powerful method to be able to make the next generation of chatbots, and using available web services we could create a powerful application which is able to obtain near-human level of understanding.
But there are many different approaches that could help as well. The best scores in coreference resolution have been obtained using machine learning, training a neural network that could learn to solve when a given element is referring to another in a text. I would like to say though, there is a huge advantage in using a graph for it: it is known exactly why the solution is the one provided, and this could lead to a major ability to infer other kind of information and actually “reasoning” about that information. A solution that merges machine learning with an approach like the described using graphs should be more solid.
Using deep learning could be sometimes the only way to go, there are situations in which grammar and even the semantic is not enough to find out what the user is talking about. So, we could use neural networks to take decisions once we have restricted the set of references using our graph.
Another important point is how to detect “intents” from what the user says. This is the problem of actions selection; we should be able to know when the user is requesting to do something and what to do afterwards. But once we have built in a graph, it is not difficult to save intents on it, basically, these intents could be matches Cypher queries, which could instantly indicate what action to take when one of them is triggered, or they could be embedded in the hierarchy in the graph, allowing to match any new content and inference new information.
We’re living a new revolution in the way we interact with computers, we developers should get into this wave making our programs smarter in the real sense of that word. We’ll be talking to our electronic devices in the near future in the same way we do it between us. There is still a lot to do for our applications to be able to learn by themselves while talking to us, but we are starting to have the tools and computational power to make it possible.
At DSpot Sp. z o.o., we are developing tools which would speed up the creation of the next generation of chatbots. If you would like to cooperate with us on this challenge, please feel free to contact us.
Next generation of chatbots with NLP services and Graphs was originally published in Chatbots Life on Medium, where people are continuing the conversation by highlighting and responding to this story.