Conversational AI with Rasa: Entities

Rasa
24 Aug 202109:56

Summary

TLDRIn this video from the Conversational AI with Rasa series, Yusta explains the importance of entities in enhancing your assistant's understanding and response accuracy. The video covers different methods for entity extraction, including pre-built models like Duckling and SpaCy, regex patterns, and machine learning. It also delves into advanced features like synonyms, lookup tables, and entity roles and groups, providing practical insights on how to implement and utilize these techniques in your Rasa assistant for improved performance.

Takeaways

  • 🧠 Entities are critical for AI assistants to better understand user inputs and respond appropriately, often extracted from user messages.
  • 📝 Common entities include numbers, dates, country names, and other relevant information that can be contextualized in responses or actions.
  • 📚 The training data for entity extraction is stored in the 'nlu.yaml' file, with entities labeled using a simple square bracket and label convention.
  • 🚀 Rasa offers pre-built models like Duckling and Spacey for extracting entities without extensive labeled data, useful for dates, numbers, and more.
  • 🔍 Regex is a method in Rasa for extracting entities that follow a deterministic pattern, such as user IDs, enhancing the assistant's ability to identify specific details.
  • 🤖 Machine learning is employed for custom entities that don't have pre-built models or don't follow a regex pattern, requiring more training data for better results.
  • 📊 Rasa's DIET classifier is a powerful machine learning model for entity extraction, part of the NLU model pipeline configurations.
  • 📈 The output of entity extraction by Rasa is in JSON format, providing detailed information about the entity, its value, location in the sentence, and the extraction method used.
  • 🔄 Synonyms in Rasa allow normalization of entity values, useful for mapping different user terms to a single value for consistent data usage.
  • 📋 Lookup tables in Rasa are used to enhance extraction for entities with known possible values, like country names, by generating case-sensitive regex patterns.
  • 🎯 Entity roles and groups in Rasa add additional context to entities, helping distinguish between different types of information, such as origin and destination in flight booking.
  • 📘 Entity roles can influence the conversation flow and are configured in training data stories and referenced in the domain file for more natural and context-aware responses.

Q & A

  • What are entities in the context of Rasa?

    -Entities are pieces of information that your assistant can extract from user inputs to better understand what is being asked and use those details in a specific context.

  • What are some common examples of entities?

    -Common examples of entities include numbers, dates, country names, and any kind of relevant information, such as destinations in a flight booking assistant.

  • Where should the training data for entity extraction be stored in Rasa?

    -The training data for entity extraction should be stored inside the nlu.yaml file.

  • How should entities be labeled in the training data?

    -Entities should be surrounded by square brackets and the label should be included inside parentheses next to the word.

  • What are the different methods for entity extraction in Rasa?

    -The different methods for entity extraction in Rasa include using pre-built models, regex, and machine learning.

  • What is Duckling and how is it used in Rasa?

    -Duckling is a tool for extracting entities like numbers, dates, URLs, email addresses, etc. It requires no training data for your assistant to extract these details.

  • What is SpaCy and how does it enhance entity extraction in Rasa?

    -SpaCy is a powerful library that enables the use of pre-built models to enhance entity extraction for details like person names and locations.

  • How does regex help in entity extraction in Rasa?

    -Regex allows defining a specific pattern that the entities should follow, making it suitable for extracting entities that follow a deterministic pattern, such as user IDs.

  • When should machine learning be used for entity extraction in Rasa?

    -Machine learning should be used for extracting custom entities that don't have pre-built models or don't follow specific patterns.

  • What is the Rasa DIET classifier?

    -The Rasa DIET classifier is a powerful machine learning model used for entity extraction that requires a lot of good quality training data to achieve the best results.

  • What additional features does Rasa offer for enhancing entity extraction?

    -Rasa offers features such as synonyms, lookup tables, and entity roles and groups to enhance entity extraction.

  • How do synonyms work in Rasa?

    -Synonyms allow mapping extracted entity values to a different value to normalize the data, which is useful when users refer to the same thing using different terms.

  • What are lookup tables in Rasa and how are they used?

    -Lookup tables are lists of words used to generate case-sensitive regex patterns, enhancing entity extraction for details with a set of known possible values.

  • What are entity roles and groups in Rasa?

    -Entity roles and groups allow assigning additional details to entities, such as distinguishing between origin and destination in a flight booking assistant.

Outlines

00:00

🔍 Understanding Entities in Rasa AI

This paragraph introduces the concept of entities in the context of conversational AI with Rasa. Entities are vital pieces of information extracted from user inputs that assist the AI in comprehending the query and responding appropriately. The paragraph explains that entities can range from simple details like numbers and dates to more complex and context-specific information. It also touches on the training data requirements for entity extraction, which should be formatted within the 'nlu.yaml' file using a specific labeling convention. The paragraph further discusses various methods for entity extraction in Rasa, including the use of pre-built models like Duckling for common entities and Spacey for names and locations, regex for deterministic patterns, and machine learning for custom entities. The importance of quality training data for machine learning models is emphasized, along with the inner workings and considerations when building these models.

05:03

🛠️ Enhancing Entity Extraction with Rasa Features

The second paragraph delves into additional features of Rasa that can enhance entity extraction. It begins by discussing synonyms, which help in mapping different user expressions to a single normalized value, facilitating consistent data usage across the system. The paragraph explains how synonyms can be added either in the 'nlu.yaml' file or inline with training examples. It also introduces lookup tables, which are used to create regular expression patterns for known sets of values, such as country names. Furthermore, the paragraph covers entity roles and groups, which allow for the differentiation of entities within a context, such as distinguishing between an origin and a destination in a flight booking scenario. The importance of providing varied examples for roles and groups is highlighted to enable the AI to learn effectively. The paragraph concludes by noting that entity roles can influence conversation flow and must be integrated into training data stories and the domain file.

Mindmap

Keywords

💡Entities

Entities in the context of conversational AI refer to the specific pieces of information that an assistant extracts from user inputs to better understand the request and respond appropriately. They are crucial for context understanding and action execution. In the script, entities such as 'destination' are extracted from user statements like 'book a ticket to Sydney', where 'Sydney' is identified as an entity of the type 'destination'.

💡Conversational AI

Conversational AI, also known as chatbots, leverages natural language processing to simulate human-like conversations with users. The video script discusses how entities play a vital role in enhancing the functionality of these AI systems. For example, the assistant's ability to understand and act upon a user's request to book a flight is facilitated by the extraction of entities like 'Sydney'.

💡Rasa

Rasa is an open-source framework used for building conversational AI. It is mentioned in the script as the platform that developers use to implement entity extraction and other NLU (Natural Language Understanding) functionalities. The script explains various methods of entity extraction available in Rasa, such as using pre-built models, regex, and machine learning.

💡NLU (Natural Language Understanding)

NLU is a subfield of AI that focuses on enabling computers to understand human language. In the video, NLU is integral to the process of entity extraction, where the assistant interprets user inputs to identify and utilize entities. The script describes how training data for entity extraction should be formatted within the 'nlu.yaml' file in Rasa.

💡Duckling

Duckling is a tool integrated within the Rasa framework that specializes in extracting entities such as numbers, dates, URLs, and email addresses without the need for extensive training data. The script highlights Duckling as a powerful pre-built model that simplifies the process of entity extraction for common data types.

💡Regex

Regex, short for regular expressions, is a sequence of characters that defines a search pattern. In the context of the video, Regex is used to extract entities that follow a deterministic pattern, such as user IDs. The script explains how to define and include Regex patterns in the 'nlu.yaml' file for custom entity extraction.

💡Machine Learning

Machine learning is an AI technique that allows computers to learn from and make decisions based on data. The script discusses using machine learning for extracting custom entities that do not have pre-built models or do not follow a specific pattern. It emphasizes the need for substantial, high-quality training data to achieve accurate results.

💡DIET Classifier

The DIET (Distant Supervision for Entity Recognition) Classifier is a machine learning model within Rasa that is particularly powerful for entity extraction. The script mentions it as one of the models that can be trained on custom data to extract entities that do not fit standard patterns or have no pre-built models available.

💡Synonyms

Synonyms in the context of conversational AI allow the system to map different user expressions to a single, normalized value. This is important for consistency when querying databases or making API calls. The script describes how synonyms can be added in the 'nlu.yaml' file or inline with training examples to ensure entity values are standardized.

💡Lookup Tables

Lookup tables are lists of words used to create regular expression patterns for entity extraction. They are particularly useful for extracting entities with a known set of possible values, such as country names. The script explains how to include a lookup table in the 'nlu.yaml' file to enhance entity extraction for specific types of entities.

💡Entity Roles and Groups

Entity roles and groups are features that allow developers to add context to entities, helping the assistant understand the relationship between different entities in a user's request. For instance, in a flight booking scenario, the script mentions distinguishing between 'origin' and 'destination' by using roles. Groups can also be used to categorize entities, such as grouping topics related to a specific order in a pizza booking assistant.

Highlights

Entities are vital for assistants to understand user inputs and respond contextually.

Entities can be any relevant piece of information, such as numbers, dates, or country names.

Entity extraction training data should be stored in the nlu.yaml file.

Entities are labeled in training data by surrounding the word with square brackets and adding the label in parentheses.

Rasa offers pre-built models like Duckling for extracting numbers, dates, URLs, and email addresses without extensive training data.

Spacey is another library for enhancing entity extraction with pre-built models for names and locations.

To use pre-built models, reference them in the NLU model configuration file.

Regex is useful for extracting entities that follow a deterministic pattern, like user IDs.

Machine learning is employed for custom entities without pre-built models or specific patterns.

Rasa's diet classifier is a powerful model for entity extraction using machine learning.

High-quality training data is crucial for better machine learning model results.

Rasa's output for entity extraction is in JSON format, detailing the entity, value, location, and extraction method.

Synonyms help normalize extracted entity values for consistent database querying or API calls.

Synonyms can be added via the nlu.yaml file or inline with NLU training examples.

Lookup tables enhance extraction for entities with known possible values, like country names.

Entity roles and groups allow distinguishing between different types of entities, such as origins and destinations in flight booking.

Entity roles can influence the conversation flow and must be included in training data stories and the domain file.

Entities are essential for assistants to comprehend user requests and utilize them contextually in later interactions.

Transcripts

play00:01

[Music]

play00:05

hi everyone and welcome back to the

play00:07

conversational ai with rasa series my

play00:09

name is yusta and i'm extremely excited

play00:12

to continue your developer education

play00:14

journey with you

play00:15

in this video we will talk about

play00:17

entities

play00:18

entities are the pieces of information

play00:20

that your assistant can extract from

play00:22

user inputs those details can help your

play00:25

assistant to better understand what is

play00:27

being asked and later on use those

play00:29

details in a specific context when

play00:31

responding back to the user or running

play00:33

specific actions

play00:35

the most common examples of entities are

play00:37

details like numbers dates country names

play00:41

and similar but an entity can really be

play00:44

any kind of piece of information that is

play00:46

relevant and important to your assistant

play00:49

for example for a flight booking

play00:51

assistant it would be really useful to

play00:53

know which detail in users input is a

play00:56

destination

play00:57

that's why in this example where the

play00:59

user says that they would like to book a

play01:01

ticket to sydney

play01:03

sydney is being extract as an entity of

play01:06

a type destination

play01:09

now a quick note on how the training

play01:12

data for entity extraction should look

play01:14

like and where should it be stored

play01:16

the training data for entity extraction

play01:19

should be stored inside of your nlu.yaml

play01:21

file now how this data should be labeled

play01:25

it's a very simple convention

play01:27

the word that should be extracted as an

play01:29

entity should be surrounded by square

play01:31

brackets and then next to it you should

play01:34

include the label to this entity inside

play01:37

of the parentheses

play01:38

there are a few ways of how entities can

play01:40

be extracted using rasa

play01:43

different methods work best for specific

play01:45

entity types but the most common

play01:47

approach that we see being used by

play01:49

developers is using a mix of different

play01:52

methods to achieve the best results

play01:55

let's talk about those methods one by

play01:56

one

play01:58

the first approach is using pre-built

play02:00

models

play02:01

rasa enables you to use pre-built models

play02:04

to enhance the entity extraction results

play02:07

without requiring you to have loads of

play02:10

labeled data for example

play02:12

a tool called duckling is an extremely

play02:15

powerful approach for extracting

play02:17

entities like numbers dates urls email

play02:21

addresses and similar

play02:23

the best part about duckling is that it

play02:25

requires no training data for your

play02:27

assistant to extract those details

play02:29

another really powerful library spacey

play02:32

enables you to use pre-built models to

play02:35

enhance the entity extraction for

play02:38

details like person names locations and

play02:40

similar to enable your assa assistant to

play02:43

use pre-built models all you have to do

play02:46

is reference those models in your nlu

play02:49

model configuration file we will talk

play02:52

about those details in the later

play02:53

episodes of the series

play02:56

another approach for extracting entities

play02:58

with fasa is using regex

play03:01

regex allows you to define a specific

play03:04

pattern that the entities you would like

play03:06

to extract should follow

play03:08

it means that regex is the best approach

play03:10

for extracting entities that follow a

play03:13

specific deterministic pattern for

play03:15

example user ids and similar details

play03:18

to enable your rasa assistant to use

play03:20

regex you will have to define the regex

play03:22

pattern and include it in your nlu.yaml

play03:25

file in addition to a few training

play03:28

examples for that specific entity

play03:31

don't forget to name the entity you

play03:33

would like your assistant to extract the

play03:34

same way as you named your regex pattern

play03:38

and last but not least the third

play03:40

approach for extracting entities with

play03:43

rasa is using machine learning

play03:45

if some of the entities that you would

play03:47

like your assistant to extract don't

play03:49

have pre-built models or if they don't

play03:52

follow a specific pattern so using regex

play03:54

is not really an option

play03:56

you should use machine learning to

play03:57

extract those

play03:58

entities machine learning models are

play04:00

very powerful for extracting what we

play04:03

call custom entities

play04:05

to extract entities using machine

play04:07

learning models you will need some

play04:09

training data in fact you will need a

play04:11

lot more training data than for previous

play04:13

approaches

play04:14

but the more good quality training data

play04:16

you have the better results you will

play04:18

achieve

play04:19

raza comes with a few machine learning

play04:21

models that you can use and train on

play04:24

your own training data

play04:26

for example one of the most powerful

play04:28

models for entity extraction is rasa's

play04:31

diet classifier when extracting entities

play04:34

using machine learning models there are

play04:36

lots of things that are happening under

play04:37

the hood and there are quite a few

play04:39

things you have to take into account

play04:40

when building those models

play04:42

we will talk about this topic a lot more

play04:44

in the later episodes of the series when

play04:47

we will dive deeper into the nlu model

play04:50

pipeline configurations

play04:52

when entities are extracted under the

play04:55

hood rasa produces an output in a json

play04:58

format

play04:59

that output is rich in detail about what

play05:02

kind of entity has been extracted what

play05:04

is the value of that entity and where in

play05:07

a sentence that detail was found and

play05:09

also what method or a model has been

play05:13

used for extracting this detail so for

play05:15

example here you can see an entity city

play05:18

being extract with a value new york city

play05:21

and the method that has been used for

play05:23

extracting this entity is diet

play05:25

classifier

play05:26

in addition to just extracting entities

play05:29

rasa comes with a few additional

play05:30

features that can help you enhance the

play05:33

entity extraction even more let's talk

play05:36

about them

play05:37

the first one is called synonyms

play05:39

synonyms allow you to map extracted

play05:42

entity value to a value different than

play05:44

the one extracted

play05:46

now where this is useful in some cases

play05:49

users will refer to the same thing using

play05:52

lots of different terms

play05:55

but you as a developer you will want to

play05:57

use the values of the entities for let's

play06:00

say querying the database or making api

play06:03

calls or using those details for

play06:04

something else

play06:06

this means that you will need the

play06:08

extracted entity values to be normalized

play06:11

and may be mapped under one specific

play06:13

value and this exactly what you can

play06:15

achieve with synonyms

play06:18

there are two ways of how synonyms can

play06:20

be added to your rasa assistant

play06:22

one of them is by adding a new section

play06:25

to your nlu.yaml file called synonym you

play06:28

have to define the actual value that

play06:31

extracted values will have to be mapped

play06:33

to and then you have to provide examples

play06:35

of how users might refer to that

play06:37

specific synonym

play06:39

the second approach of adding synonyms

play06:40

to your aza assistant is by adding them

play06:43

in line with your nlu training examples

play06:46

all you have to do is add another

play06:48

parameter called value which will

play06:50

reference the value that extracted

play06:52

entities will have to be mapped to

play06:55

a very important note about synonyms

play06:57

synonym mapping happens after entities

play07:00

are extracted which means that you will

play07:02

need some training data to enable your

play07:04

assistant to extract entities first

play07:07

another very powerful feature of rasa is

play07:09

lookup tables

play07:10

lookup tables are lists of words that

play07:13

can be used to generate case sensitive

play07:16

regular expression patterns

play07:18

with lookup tables you can enhance

play07:20

entity extraction for details that have

play07:23

a set of known possible values

play07:26

for example you can use lookup table to

play07:29

enhance the entity extraction for

play07:31

country names to achieve that include

play07:33

the list of all countries in the world

play07:36

in your nlu.yaml file under the section

play07:39

lookup

play07:40

and the last very powerful feature of

play07:42

rasa for entity extraction

play07:44

is entity roles and groups

play07:47

if you are building an assistant for

play07:50

flight booking or something similar you

play07:52

will quickly realize that you will want

play07:54

to add additional details for specific

play07:56

entities for example

play07:58

if the user says that they would like to

play08:00

book a ticket from new york to boston

play08:02

in a very simple scenario your assistant

play08:04

will extract new york and boston as an

play08:08

entity location but for your assistant

play08:11

to be natural in how it responds and in

play08:14

general to be able to run necessary

play08:17

actions to book a ticket your assistant

play08:19

will have to distinguish which detail is

play08:21

the origin which one is the destination

play08:24

you can enable your assistant to do that

play08:27

by defining entity roles

play08:30

to do that you will have to include

play08:32

additional parameter called role to your

play08:35

entity labeled data and define which

play08:38

role a specific entity should correspond

play08:40

to

play08:42

entity groups allow you to group

play08:44

specific entities under a specific

play08:47

category

play08:48

so for example in a pizza booking

play08:50

assistant you can enable your assistant

play08:52

to group specific topics corresponding

play08:55

to a specific order

play08:57

an important thing about entity roles

play08:59

and groups is that you have to include

play09:02

quite a few different examples for your

play09:04

assistant to really learn so make sure

play09:06

to include different examples of

play09:08

different variations of roles and groups

play09:11

entity roles can also be configured to

play09:13

influence the flow of the conversation

play09:16

to do that you will have to include

play09:18

roles in your training data stories just

play09:20

like you see in the example here and

play09:22

also reference them in your domain file

play09:25

entities are extremely important pieces

play09:28

of information that can enable your

play09:30

assistant to better understand what is

play09:32

being asked and enable your assistant to

play09:34

use them later on in a specific context

play09:37

we will dive even deeper into the topic

play09:40

of collecting the information and using

play09:42

it in a specific context in the later

play09:45

episodes of the conversational ai with

play09:47

rasa series i hope i'll see you there

Rate This

5.0 / 5 (0 votes)

Related Tags
Conversational AIEntity ExtractionRasa FrameworkDeveloper EducationNLU ConfigurationDuckling ToolRegex PatternsMachine LearningJSON OutputSynonyms MappingLookup TablesEntity Roles