Articles, Blog

Conversational AI Best Practices with Cathy Pearl and Jessica Dene Earley-Cha: GCPPodcast 195

Conversational AI Best Practices with Cathy Pearl and Jessica Dene Earley-Cha: GCPPodcast 195


[MUSIC PLAYING] MARK MIRCHANDANI:
Hi, and welcome to Episode 195 of the weekly
“Google Cloud Platform” podcast. I’m Mark, and I’m here with
my colleague and new host, Priyanka. Hey, Priyanka. PRIYANKA VERGADIA: Hi. MARK MIRCHANDANI:
How are you doing? PRIYANKA VERGADIA: I’m super
excited to be a part of “Google Cloud Podcast” from now on. MARK MIRCHANDANI: Absolutely. And I know you have
been not just a host, but also a guest on
an episode previously. But now you’ll be joining
us as a host as well. PRIYANKA VERGADIA: Yes, yes. I’m super excited about that. MARK MIRCHANDANI:
It’s super awesome. And of course, in our
interview today, you and I got a chance to sit down
with Cathy and Jessica about some best practices
for conversation AI. They’re both Googlers
here, and they kind of work on different teams. But it’s really cool to hear
the different perspectives on what’s important and how do
you design good conversation systems. PRIYANKA VERGADIA: Yeah,
and the more exciting part is Jessica brings the
developer perspective, and then Cathy brings
her conversational design experience from over the years
of building IVRs and then moving into building
conversational design. So I’m super excited
for them to be here. MARK MIRCHANDANI: So
it’s a lot of fun. But we also have a cool
question of the week. I saw that you put
out a blog post about, how do you integrate
Dialogflow and BigQuery? Those are two very
different tools. PRIYANKA VERGADIA: Yeah. MARK MIRCHANDANI: So we’re going
to be getting into what that looks like a little bit later. But before we do that, let’s
talk about our cool things of the week. [MUSIC PLAYING] PRIYANKA VERGADIA: Yeah,
so I have something cool. I read this blog post
that Thomas Kurian himself had published out on
Google and Mayo Clinic’s new revolutionary partnership
in terms of AI in health care. And it’s really
exciting to see how we are able to take
the innovations that we do at Google and apply them
to some of the practical applications, like the
patient experience, clinical experience, and then
diagnostic and even research. So there’s going to
be some more stuff as a part of this
partnership coming up, which will be a combination of the
security and the compliance that Google Cloud
offers in the platform, combining with the
infrastructure. So check out the blog
post, but in general, it seems to be a very exciting
venture, where we’re going to explore
possibilities in health care with the Google Cloud Platform. MARK MIRCHANDANI: Yeah,
it’s always super cool to hear about different
companies that are using GCP. But I think a lot of
it, like you mentioned, is really based on what the
technologies are enabling them to do, and then kind of
all the cool research and future potential
that comes out of that. PRIYANKA VERGADIA:
Yeah, very exciting. MARK MIRCHANDANI:
Super, super cool. Well, I actually have two
cool things of the week. The first one is some
really, really big memory optimized virtual machines. And when I say big, I mean big. Because apparently–
and this is news– these are 6 and 12 terabyte
Memory Virtual Machines– PRIYANKA VERGADIA: Wow. MARK MIRCHANDANI: –which
just blows my mind in every possible way. Because I remember the day
of trying to get my hands on, like, 256 meg chips and putting
them into an old desktop computer, probably
cutting myself on the RAM one too many times. So to think about
the scale of how it’s changed from
back in the day all the way to these 12
terabytes of memory, I mean, it’s astonishing. PRIYANKA VERGADIA: Yeah. MARK MIRCHANDANI: The blog
post is talking a little bit about how these are
certified for SAP Hana and using it for those
types of workloads. But I guess you
can probably just use them for any other
application that might need 12 terabytes of memory to run. [LAUGH TRACK] PRIYANKA VERGADIA: Yeah, this
is some cool exciting stuff. MARK MIRCHANDANI:
That’s the kind of stuff that I think that when
customers come up and they see, oh, look, we have this
technology or these tools available, that’s
when they can say, well, we’re really good
at our subject matter. What can we do with
these new tools? What can we find out things? And just like Mayo
Clinic, I think there are some cool
opportunities there. PRIYANKA VERGADIA: Yeah. What was your second
cool thing of the week? MARK MIRCHANDANI: My second
cool thing of the week is a great little
educational hands-on lab. Actually, it’s a quest. So it’s a full set
of videos and labs and then a little
quiz at the end, all called Understanding
Your GCP Costs. So I’ve been working
on this for a while. I’ve talked about
it in a couple of the previous podcast episodes. And you can also see these
videos on the Google Cloud Platform YouTube channel. But it’s kind of a
good starting place to look at understanding
your GCP costs, right? I mean, it’s
incredibly important to understand what tools
are available for you to measure your costs. And then starting with this
quest and then the next content that will be coming
out, it gives you a little bit more
hands-on guidance for how to actually
control your costs. So things you can
put into place, there’s a bunch of different
discount tools like sustained use discounts,
committed use discounts, but then also identifying
resources that you’re not using, and then putting
budgets and quotas into place so you can control it. It’s actually really
super cool because when you think about it, I
think a lot of people just kind of use the cloud,
they use the machines, they go for it. And then they’re like, OK,
well, I have to pay x amount. Well, as you get to
be a big business, that doesn’t make as much sense. You need to plan. You need to control
your costs effectively. So this is a really cool
way to start with that. And I wanted to tease
out that there is a code. I don’t know how long it’ll
last for, so if it doesn’t work, ping me or something. We’ll figure something out. But there’s a code– 1Q-COSTS-626. So I’m sure that will be in the
description of the show notes. But if you use that, you can
get the labs part for free. [CASH REGISTER TRACK] So Qwiklabs I think
normally has credits that you need to do
the labs, and this will let you bypass that. I really, really recommend
it for people to– especially people
who are in finance, to get a better grasp on
what the cost controls are. But I think this is
introductory level enough that everyone maybe
should look at getting a baseline understanding of
where those costs are coming from and how to analyze
them, so that they can kind of tackle the cloud
a lot more responsibly. PRIYANKA VERGADIA: Do you also
talk a little bit about maybe, like, how do I set up my
organization so the costs are properly limited and
stuff in that video series that you talked about earlier? MARK MIRCHANDANI: Absolutely. So there’s a video or
two on resource hierarchy and organizations. And again, these are things that
the average individual user, someone who’s not either
using it as part of a business or as part of a small
business, may not run into. But especially for large
businesses and enterprises and so forth, the organization
is a very, very important part of how you structure your
Google Cloud Platform resources. And learning about that
can be really, really handy for scaling out. PRIYANKA VERGADIA: Yeah,
I remember those questions coming up a lot with enterprise
customers that I spoke with. So I think it’s amazing
to have that resource. And we’ll make sure that
the link for the videos go in the description. MARK MIRCHANDANI: Absolutely. Well, with all that cool
stuff out of the way, let’s get right
into our interview with Cathy and Jessica. [MUSIC PLAYING] Cathy, Jessica, thanks so
much for joining us today. First things first, tell us
who you are and what you do. CATHY PEARL: Yeah, so I am
head of conversation design outreach here at Google. And basically, that
means I talk a lot about conversation
design, what it is, and try to bring awareness. JESSICA DENE EARLEY-CHA:
So thank you so much. I’m a developer advocate
for actions on Google, which means I get to build
a lot of voice applications. And through that, I build a lot
of content from the learnings that I get, so we could
teach other third party developers how to build
for Google Assistant. And then with that, I
also get all the feedback from developers
on what goes great and what goes not so great and
bring that back to the product team so we can make it better. MARK MIRCHANDANI: Gotcha. So first things
first, Cathy, what is conversation design, right? You mentioned that you
kind of focus on that. I don’t know what that is. CATHY PEARL: That is
an excellent question that I’m asked many times. Basically, you can
think of it as we are trying to teach computers to
communicate like humans and not the other way around. So basically,
rather than have you learn some complex way of either
speaking or typing or tapping or swiping, we’d rather leverage
how humans have been speaking for, what, 150,000 years
and make it that much easier to get things done, get
the information you need, just the way you’re
naturally used to doing. And so we want to
work in a world where we’re thinking about
how humans communicate and then applying that within
the technical constraints that exist today. PRIYANKA VERGADIA: So I
have a question about that. So I think we got
the definition of how to work with
conversation design, but Jessica, I have
this question for you. Is this any different than how
we interact with maybe, say, apps or websites? And do developers have
to do something specific when they develop for voice? JESSICA DENE EARLEY-CHA:
Oh, definitely. Developing for voice is very
different than something with a GUI. Because there’s a lot of
control as a developer when you have a GUI. There’s only so many
buttons a user could push. Versus a conversation–
at any point, someone could say anything
and you’re supposed to support the
user through that. And it’s not just necessarily
that they could say anything at any time. Because most people– what
I’ve learned from Cathy– they’re cooperative. They want to support you and
be able to talk through things. They’re not going to be
just bringing something crazy into the conversation. It’s more on how
people explain things or how they communicate
could be slightly different. And our voice applications,
or actions that we call them in Google Assistant– we call
those actions instead of apps– we have to figure out how to
capture those different types of utterances that users
can say and give them the appropriate response. So it’s definitely a
different mental shift when it comes to thinking
about how you build. And something that I’ve
seen and I actually experienced when I made my first
application was I built it, and as I was going to
the docs, it said Design. And I was like, that’s
cool, I’ll do that later. Because as a web developer,
that’s what I would do. I would kind of build
all the scaffolding, then I bring in all
the design after. And it turns out, nope,
can’t do that this time. You have to have the
conversation design first, and then you build after. MARK MIRCHANDANI: So are there
like a list of best practices? Are there a bunch of
design docs on how to do conversation design? CATHY PEARL: Yeah,
we actually have established a lot of best
practices and principles. You can go to, for example,
actions.google.com/design. Get a whole ton of information
of even what is it, how to get started. The first thing we always
recommend to people, whether you’re a
developer, or a marketer, or whoever, whatever
role you have, is to start with something
called sample dialogues. And what that
means is basically, you’re creating a conversation. It’s like a movie script. It’s a back and forth between
the user and the action. And that just really
helps you sort out the complexities of what
this is going to look like. Because a lot of times,
like Jessica mentioned, people think, oh,
yeah, the words, we’ll just fix those later. But in fact, the words
are the structure. And it’s going to impact
the flow and the wording and everything throughout
your whole development cycle. So you might as well start
with design in the beginning and really save yourself
a lot of headaches. PRIYANKA VERGADIA: So can anyone
write these conversation design practices? Or when we start to develop,
is it just the developer? Who are the
stakeholders involved when somebody is thinking
about designing something, like a conversation? CATHY PEARL: I think
in the ideal world, a team will have a conversation
designer as part of it. And that person can really
help steer the direction of how things should sound, what
things we need to keep in mind, what are we going to need
from the back end, all that kind of stuff. Now I understand
in the real world, not everybody is going
to have a conversation designer to work with. So instead, in those
cases, I definitely encourage the developer or the
project manager, whoever it is, to really study up
on best practices and then write sample dialogues. And then what you do is just
like in Hollywood with a movie script, you do a table read. You read them out loud. Because sometimes
we write formally. And when we actually hear
those things out loud, they sound kind of
stiff and awkward. So you really want to read
out loud with other people, and you’ll quickly get a feel
for where things are going well and where things
are going wrong. And that can really help
anybody really get started and do a better job. MARK MIRCHANDANI: Yeah,
I think the last time we chatted with Priyanka
a little bit about some of the other chat options
specifically built into Google, we kind of brought up the idea
of people being frustrated by being very locked into
kind of an older school version of a voice system, where
it wants you to say something very exactly. Like, I want one small cup
of coffee that is dark brew, please. And it adds a lot more
effort into it, as opposed to how people might
more naturally speak. You mentioned earlier,
Jessica, that you thought about the
code part of it first and the design part of it later. And that can lead to a lot of
challenges using these best practices. How do you kind of
convince somebody to actually take
them into account and actually spend the
time upfront working with them, before just kind of
trotting off and doing the code part? JESSICA DENE EARLEY-CHA:
Oh, yeah, that’s something that I constantly
see whenever I go out, especially when I do
a one-day workshop. It’s really unrealistic to have
a great design and great voice application built in one day. It takes a long time. And so I definitely try to do my
little soapbox speech of like, oh my gosh, I spent
hours thinking these are technical
issues when I was building prior to having a design. It turns out was a design issue. I didn’t know where I was going
in regards to the conversation flow. So I think that’s
kind of my mantra. That’s what I’m constantly
trying to figure out. I don’t think I have a great
kind of convincing argument besides sharing my story and
sharing kind of the statistics on, like, it takes
more time, and you’re going to end up spending more
developer time in the long run trying to fix those mistakes. And because voice
is still fairly new, I think it’s more people are
going to be experiencing that. And then they realize and they
go, oh, Jessica and Cathy, they were right. MARK MIRCHANDANI: So
is there an example you can share of
where someone might have run into that or
an example that you’ve run into specifically? JESSICA DENE EARLEY-CHA: Oh
yeah, my very first voice application that I built.
The point of application was to mock or mimic
technical interviews. Because the hard part when
it comes to interviewing is not so much being able to
solve the technical problems, it’s hearing the problems and
practicing it, and hearing it from someone else. I was like, cool. I’ll build this thing. And it’s a really cool tool. I would have loved
to have had this when I was going through my
process of looking for a job. And I would get into these
weird kind of scuffles where I would start,
but how would I bring the user from giving
them the information, bringing them back, and letting
them know how to come back. And I knew how to
do that technically, but when I sent it out
and had people try it out, it wasn’t natural. It felt weird. I was like, well, how can I be
clever and put in little flags and try to bring them back? And it’s like, oh my gosh. Why am I doing all this overhead
to solve a problem where I could have just
stopped and went OK, let me talk to other people. Let me figure out and build
the actual conversation design by using sample dialogues. And I think something that
could be really challenging or something that– I know when I mention to
folks, especially developers, build some sample dialogues,
I get the, well, what is that? It’s like, it’s
literally writing down– pretend you’re the action
and pretend someone else is the user and having
that conversation. So I usually like telling
people, just grab two friends. Tell one of the friends
you are the action. This is what you do. The other friend,
you are the user. Now talk. And just let them naturally
talk and write that down, and how they’re doing it is
the best free way to do that. CATHY PEARL: Yeah,
and just to give you a really basic example, I think
a lot about yes/no questions and how often we fail
the user on those. Because a lot of
times, people will say, well, I asked them
a yes/no question. So let’s say I asked
you, do you like candy? And you might say yes,
and you might say no. MARK MIRCHANDANI:
Yes, I do like candy. CATHY PEARL: So you said,
yes, I do like candy. You didn’t say, yes. MARK MIRCHANDANI: I
just broke the chat bot. CATHY PEARL: Well,
that’s the point. You shouldn’t have broken it
if somebody designed it well. Or I might even say, like,
well, do you count chocolate? Which is neither a yes or a no,
but it’s a very natural thing for someone to say. And so someone who’s
really thinking things through and following
best practices is not just going
to be like, oh, I told them to say yes or no. That’s all I’m going to accept. They’re going to
say, hm, what are related things they might say? Or if maybe you’re booking
a table at a restaurant, and you say, how many
people in your party? And maybe I say, well, do
you have outdoor seating? Now that’s a related question,
but it’s not a number, like you asked me for. That being said, I’m not
going to suddenly say, how tall is Barack
Obama when you asked me, how many people
are in your party? So that’s what we call
sort of adjacent topics. So you might not enter
the exact question, but the user will say
something related, and you need to
anticipate those, or things are going
to fail very quickly. MARK MIRCHANDANI: Yeah, I think
this is something, Priyanka, you brought up the last
time we chatted about. People don’t know exactly
how to have a conversation. PRIYANKA VERGADIA: Yeah. MARK MIRCHANDANI: I think,
which is, first of all, a fascinating subject. I mean, Jessica, you
were mentioning earlier, if you tell two people to
have a conversation, boy, they get really awkward really
quickly, don’t they? JESSICA DENE
EARLEY-CHA: Oh, yeah. MARK MIRCHANDANI: Because they
think about everything you say. But people don’t normally think
about everything they say. They just kind of keep talking,
and eventually, their brain might catch up. But for people
like me, no, words are way faster than thought. So you can just keep on talking. But it’s an interesting thing. Because when you
actually call it out, I don’t think a lot of people
do think about how much we use, especially contextually, right? Which is one of the
hardest things to grab, but one of the most important
pieces of a conversation. Because you can ask a
question like exactly that– do you have outdoor seating? It’s like, oh, well, you’re
talking about this restaurant. You’re talking about this. How do you handle
that kind of idea that people don’t know how
to have a conversation? PRIYANKA VERGADIA: Yeah,
I think it goes back to the point where you don’t
just start coding, right? You start to think
about how it’s going to have interaction
with not just one person, but different people, right? Because I could talk about
things in a different way. And I think we talked
about this last time when we were like,
if I want coffee, depending on the time of the day
and the day and the frustration amount, I would ask for
coffee in different ways. When I’m happy, I’d be like,
wow, coffee, this is amazing. Oh my gosh, can I just
get coffee right now? I’m so frustrated. Like, things like
that, I think we have to incorporate
how a user could react in different
situations and ask for the same thing
in different ways. And then I guess, people
coming from different places with different accents
could ask for certain things in different ways,
so things like that. But Cathy, I would
like to ask you, what are the things that are
hard to deal with when we are working with conversation? CATHY PEARL: I think one of
the things that’s difficult– and Jessica was talking
about the difference between something like a GUI versus
a VUI, as we call it, Voice User Interface– is the fact that with a
GUI, you know what they did. They pressed a button. They swiped. They chose a menu item. With voice and natural
language understanding, it’s a little bit more
of an educated guess. You know, speech
recognition accuracy has improved
dramatically in the time I’ve been working in this
field, but it’s not perfect. And human speech, we don’t get
it right all the time either. But when humans have
a misunderstanding, we are really good at what we
call conversational repair. If you say something
like, oh, I went to the such and such restaurant
last night, I didn’t hear that. I’d say, what restaurant? And you’d say, oh, this
one, and we’d move on. And we’re very good at
that kind of repair. And that’s something you really
have to anticipate when you’re developing these
conversations, which is, no matter how good your
design is, no matter how good your
prompts are, people are going to say things
that you did not expect. And you will get an error. And you need to spend actual
time thinking through, what is my strategy for handling
when things go off the rails? How do I get somebody
back on track? And just doing the
default of like, sorry, I didn’t understand, please
say that again, is terrible. It won’t work. And but we see it all the time. MARK MIRCHANDANI:
It’s a great way to get your blood
pressure up real quick. CATHY PEARL: Exactly. And so we were able to think. We were like, well,
what’s the user doing? What did we just ask them? Let’s rephrase the
question a bit. Oh, we have another error. OK, let’s give them
more information. Maybe they can’t find
their account number, and we need to tell them more. You have to know upfront
you’re going to spend time on error handling or repair. And that is going to
be the make it or break it difference between your
users having a successful experience or not. MARK MIRCHANDANI:
Yeah, it was kind of mentioned earlier
that sometimes people will say something
completely unrelated to what you were asking about. Is that handled in the same way? Do you try to handle it in a,
like, context shifting way, and you say now we’re
talking about this, so the AI just has
to keep up, or do you try to keep them
in line with what the original conversation was? CATHY PEARL: It kind
of depends on the goal. It’s like, if you’re
trying to book a table, then you do want to get them
through those questions. But you should be flexible. You have to answer the questions
in the exact same order. On the other hand, if they’re
saying something completely out of context, you may
not be able to help. But another thing
I want to mention is basically, if somebody says
something that you can’t do, acknowledgment is
really important. So for example, if I go to a
concierge at a hotel and I say, hey, can you rent a car for me? And let’s say they can’t. Let’s say they only do
dinner reservations. If the person just said,
I don’t understand. Do you want dinner reservations? I’d be like, no, I
want to rent a car. What the human will
actually say is, I’m sorry. I can’t rent a car. Would you like
dinner reservations? So our systems, if you look in
the logs, and a lot of people are requesting a feature
that you do not have, don’t just say, I
don’t understand. You should do this other thing. Say, oh, sorry, we don’t
do that feature yet. I can do this or that. People like to be acknowledged. When you have a frustrating
day and you’re talking about it with somebody, you don’t
want them to say, well, I don’t know how to help you. You want them to
say, wow, I hear you’re saying that you
had a difficult talk with your coworker. People want to be acknowledged,
and this is true even for a voice user interface. MARK MIRCHANDANI: So
it’s incredibly important to make sure that the
conversation system is very clear about what
it can and can’t handle, but also that it still tries
to keep users down that path while matching all of the
kind of requests of that. I mean, I’m assuming
this is kind of covered in those best
practices of things to keep in mind when
you’re designing a system. CATHY PEARL: Yeah, for sure. Like you said, you want to
further them towards the goal while, at the same
time, keeping them from having a frustrating
experience and sort of gently guiding them back on track. JESSICA DENE EARLEY-CHA: Yeah,
and most third party voice applications don’t
do everything. They don’t necessarily and
shouldn’t replicate the Google Assistant because
Google Assistant could do all the things. Generally, a lot
of web developers are building are specific to
a niche or to a certain topic. And so that’s where I know– I’ve built server
applications– and there are times where users
are not as knowledgeable, that I can’t tell a
joke because that’s not what the action’s about. But most likely, the user
is thinking that they’re talking to Google Assistant. And so I have an intent in
several of my actions going, hey, you might be
wanting Google Assistant. Just say goodbye, and
it’ll get you out. And when you want to
chat about x, say, hey, Google, talk to blah. And that way, it can get the
user to what they want to do. And it’s not forcing me
as a developer trying to solve all the problems
because that’s not the purpose of
individual actions. PRIYANKA VERGADIA: Yeah. And so what are some of the
common pitfalls that developers actually come across when
they’re trying to develop? Because I think we talked a
little bit with Cathy about how do you actually
repair a conversation with the conversation
design aspect of it? But we didn’t touch a little
bit on the developer aspects, and what do they
see as challenges? JESSICA DENE
EARLEY-CHA: I would say the biggest challenge is not
choosing the right use case. Doing things through
voice is great, but it’s not helpful when
you’re searching for something particular, and
there’s lots of options and you’re trying
to narrow it down. It might be helpful to
narrow it down using voice, but then maybe at
the end, the user might want to be able to see the
last three options because they want to get into the details. And that’s where you
could use multi-modal. Having a smart display or being
able to show it on a smartphone might be a helpful setup,
instead of just depending all on voice because that might
not be necessarily the best use case for that. And so I think the
best use case– and I usually tell
folks, if it’s something you’re already
doing with your voice, that’s a great thing to
make into an action. If it’s something like
you’re using, let’s say, spreadsheets, that might be a
little more challenging to go, OK, if I want to get
column 4 and Row B and trying to figure that out,
if your spreadsheet is smart and you have actual labels on
things and it could make sense, then that would make
a lot more sense. PRIYANKA VERGADIA: You
brought up a great point there about multi-modality, and with
the new devices out there, we are getting into that world. So Cathy, would you mind
touching a little bit on how do you design properly
for those multi-modal devices? MARK MIRCHANDANI:
And just for anyone who may not be sure
what multi-modal means, a definition would be fantastic. Not for me, but
for everyone else. CATHY PEARL: So
multi-modal, we’re generally referring
to the case where there are different ways you can
interact with the technology. So for example, Jessica
mentioned smart displays, like the Google Home Hub. So that is what we call
a voice forward device. So you can do a lot of
things with your voice, but there is the
addition of a screen. And so you can see things. Like, if you’re shopping
for a blue shirt, you can see the
pictures on there. And something like
a phone, of course, is also multi-modal because
you can type, you can speak, you can tap, you can swipe. So when we’re thinking
about designing for these new surfaces,
it’s kind of exciting because something like the
smart display is very new. And we haven’t quite all figured
out exactly what all the best use cases are yet. But generally,
our guidelines are that when you are
thinking about this, the first thing you
need to understand is that although it would
be great, unfortunately, you can’t just design once
and be done for all surfaces. Obviously, there
are things that are in common across all
of them, but there will be some differences as well. So our advice is to start with
the voice first experience, like the smart speaker. Because believe it or not,
that is often the most complex or maybe most constrained case. And if you get the
conversation done right for the smart speaker,
then what you do is you go to something
like the smart display and you say, OK, now we have
the option for visual elements. Where would it benefit
the conversation to add visual components
at this point? And then you can do the
same for the phone as well. So start with the
voice only experience and expand from there. A lot of times,
people go with, oh, we’re going to have the
pretty pictures first, and then they get back to
smart speaker and they’re like, I don’t know what to do. So definitely start with voice. MARK MIRCHANDANI: Well, it
also adds a tremendous level of complexity when
you’re talking about mixing and matching all
these different interfaces. Not only is it now you’re
doing conversation design, you’re also doing all the
actual interface design. And then I have to imagine
it’s exponential in terms of difficulty because
you’re trying to mix them. And there’s enough ’80s
sci-fi shows out there that kind of showed people
interacting with computers and then talking
to the computers, and maybe even using some
kind of third party tool, all of these things mixing
together to kind of create one single experience. That’s got to be hell
for UI designers, whether it’s graphical,
visual, or a combination. What does that look
like right now? CATHY PEARL: I don’t know if
I’d use the words that it’s hell, but, um. MARK MIRCHANDANI: It’s
a significant challenge. CATHY PEARL: No, you’re right. And I think it’s important
to acknowledge that. And if you are, for
example, running a business or you’re the person who’s
the decision maker about we’re going to do this
conversation action, you just have to keep
in mind that it’s going to take some time. And you have to have these
different skill sets. So you have to have
a visual designer. You have to have these
different folks who are going to work together. And we have established
some of these nice processes where there’s handoffs. And so we really are
establishing these best practices to deal with these
new multi-modal services because as you say,
it does add complexity when you’ve got to be thinking
about all these things. But again, if you go through
the process in the right order, you’ll save yourself a lot
of headaches and hassles. And it’ll be a little smoother. JESSICA DENE EARLEY-CHA:
Mm-hmm, definitely. And I would say even just
building a voice conversation application, it’s not where
you could just work in a silo all by yourself. It’s really working with
the different parties and working together. So I definitely think
leading with the voice first, building that
experience first lays down a lot of great groundwork,
and then adding the visuals, the complement, the support. And even the visuals,
we do have a new tool called Canvas, where
you can bring in HTML, CSS, JavaScript into play,
which is a totally completely new experience. And you can play
games and do lots of fun interactive
simulations, too, to really make the
experience more rich. But definitely want
to start with voice, and then you can add in
those fun features later. But I wanted to bring up
the Canvas because even with Canvas, you
don’t necessarily have it set up where there’s
buttons because it’s a surface. You could just use your voice. And so you don’t need
a green button to go. You just say go and just
kind of make that more part of the visual elements. And so an example
would be let’s say you have a game where
there’s a little character. Instead of pushing buttons that
says attack, or go home, or X, Y, Z, just push on the
monster you want to attack. And push on the Home icon thing
in the background, so you could go and do it that way
instead, or, of course, use your voice to
control your character. So it’s a different type
of UI experience as well. PRIYANKA VERGADIA:
So if you were to pinpoint some of the use
cases for the listeners today, what would be some
of the best use cases to kind of handle or
tackle as the beginners who are starting to think about
creating something for voice? CATHY PEARL: Yeah, I mean,
as Jessica mentioned, one of the good
indicators is, do you have conversations with other
humans about this thing? Because that’s often
a good indicator. Also of course, the context. Where is the person? Are they in the kitchen,
and their hands are busy? Or they’re a new parent
and they’re always holding their baby, so they
don’t have use of their hands. Or they’re in the
car and you really need to reduce the
cognitive load. But you really have
to ask yourself, what is better
about someone doing this task through something like
a smart speaker than on an app? Maybe it’s not. And I think a lot of companies
get excited and like, we’re going to build for voice. It’s going to be awesome. And then it fails, and
they’re like, I understand. Why didn’t it work? And it’s well, because
it’s faster for me to do it on my phone. So you need to think through
some of these things, like what’s the value add,
truly, for doing something on a smart speaker? The other thing I
was going to say is that a wonderful place to
be thinking about conversation design is really through the
idea of inclusive design, which is we want to make sure
that whatever your product or feature is or the
information you have available, can everyone access it? And voice technology is
really reaching the point finally where it’s
getting mature enough to help really the people
who need it the most. Maybe somebody who is
visually impaired, somebody who has muscular
dystrophy and can’t use their hands to do things,
but they can use their voice. And that’s really
one of the places I think right now voice
is the most powerful. But to me, this idea
of inclusive design isn’t just about this one
group or that one group. It’s really about all of
us, because all of us, throughout our lives, have
situational impairments, where maybe you’re
walking in the door and you’re holding
your groceries. And you briefly don’t
have use of your hands. Or I can’t find my
reading glasses, and I can’t read this
important text message. Or I broke my arm– all of these things. So really, it’s about
building for everybody. PRIYANKA VERGADIA: And do I
need to be a machine learning expert, and do I need
to know natural language processing in order to
start building for voice? JESSICA DENE EARLEY-CHA:
No, which is fantastic. I know when I first
started in this field, I was really
intimidated and scared that I would have to
learn all these really interesting concepts and
really challenging concepts. And the time to get up to speed
on that would take a while. And what’s great is we provide
a tool called Dialogflow that does the NLU for you. And so what’s great is that
you just, as a developer, identify the different
intents or functionalities that the user could potentially
have within your voice application and just provide
training phrases to specify for each of those
possible interactions. And it handles the
machine learning for you. So what’s really great is
you don’t need to do that, but if you’re
interested in that, you can also connect Google
Assistant to your own NLU as well. So you don’t have
to use Dialogflow. So if you do have that
background and that knowledge, great. Connect it to any other
NLU that you have. But if you don’t,
don’t worry about it. MARK MIRCHANDANI: So
it sounds like there’s a bunch of cool tools
out there if people want to get started, including
Dialogflow and, of course, Priyanka, you’ve got the
Deconstructing Chat Bot series that goes into a
little bit of this and links out to a bunch
of other useful resources. PRIYANKA VERGADIA: Yeah. We also have a
bunch of code labs to refer to while you do the
Deconstructing Chat Bot videos. MARK MIRCHANDANI: So if people
are really interested in that, that sounds like a
good starting point. If people want to learn more
about the best practices, you mentioned the link earlier. Are there other good
resources for those best practices if we’re thinking
about conversation design? CATHY PEARL: We’ve also
got some videos out there on things like
conversation repair. Also, Persona, we didn’t
really touch on that. But when you are
designing, you want to be thinking about
the personality and the characteristics
of your action or your conversational
experience. So Persona is another
important thing. So we have some videos
resources for that as well. MARK MIRCHANDANI: So
what’s in the future for conversation design? Because I saw this little
thing about Alter Ego. CATHY PEARL: There’s some
pretty interesting prototyping work going on right now. Alter Ego is one example,
which is out of MIT Media Lab. And it’s all around
the idea of a lot of us might be comfortable talking
to our smart speakers in our own home, but maybe
not when we’re on the bus, or out about on the street,
or even in our office, it could be a little
annoying to hear everybody talking to their computer. So there’s this idea of silent
speech or sub vocalization, which sounds very sci-fi, where
essentially, you are wearing something along your jaw line. And what it’s doing is it’s
picking up pre-speech signals. So as you start to form words,
before the words actually come out of your mouth, you’re
doing these tiny little signals that are actually
indicating what it is that you’re about to speak. And so this technology
is picking up on those pre-speech signals
and translating them silently into words, which could
then be passed along to a voice assistant. And of course, the
responses could be done in your headphones, so
no one would actually hear it. And I encourage people to
look at the Alter Ego video that MIT Media Lab put
out to check it out. MARK MIRCHANDANI: So that’s
like an example of someone taking this conversation
design and just shortcutting the actual
needing to talk part. CATHY PEARL: Well,
you’re still talking. It’s just that the
words are not audible. MARK MIRCHANDANI:
Which to me is what I would define talking as, but. CATHY PEARL: Speaking, maybe
I should say, or conversing. I don’t know what
the right verb is, but you’re still
having a conversation. But you can think
of it just like when you’re typing to somebody,
you’re talking to somebody. But the words are not
spoken out loud, so. MARK MIRCHANDANI: I mean, it
could open up a lot of avenues, it sounds like, for– almost
like an additional interface to communicate
with these things. But I’m guessing it
would still follow a lot of the same conversation
best practices and guides. CATHY PEARL: For sure,
and even if you’re talking to, say, a chat
bot that’s typing only, you’re still following
the same best practices. And when you type
to a chat bot, it’s much more like
spoken words than it is if you’re writing a formal
essay or something like that. So even though it’s
a different modality, you’re still following
conversational principles. MARK MIRCHANDANI: And what
about for cool tools coming up or any other future
products that are really cool for people to check out? JESSICA DENE EARLEY-CHA: Yeah. We recently just
announced Canvas, and that’s, like, the cool new
thing right now, where, again, you’re bringing in web
technologies into voice and making that voice forward. So that is really exciting. And I believe that just got
announced for GA this month. So that’s, like,
hot off the presses, the newest thing right now. PRIYANKA VERGADIA: Awesome. Well, I guess we did
talk a little bit about who you are and
what you do initially, but if people want to
find you, what would you like to share where you’re
going to be in the next few days or your Twitter handles? CATHY PEARL: So I am
cpearl42 on Twitter. I also blog on Medium
from time to time. And I’ll be speaking in
January at Project Voice. MARK MIRCHANDANI: And
what is Project Voice? CATHY PEARL: Project
Voice is going to get together a whole lot of
voice developer and designer and biz dev folks, too,
for a week of talking about all things voice
and conversation, how to build the products,
use cases, everybody together to really talk
through those things. JESSICA DENE EARLEY-CHA:
You can definitely follow me on Twitter. My handle is chatasweetie,
all one word. And I also blog a lot
on Medium as well, so you can definitely check that
out on the Google Developers Medium page. Right now, I’m really
seeing Season 2 of Assistant on Air, which is a
video series where we have conversations with folks
who built for Google Assistant. So Season 1 was all about
talking to Googlers internally, how they build
and why they build Google Assistant or actions on
Google in that particular way or how they could use the tools. And Season 2 is all
about our GDEs, which are Google Developer Experts. So there are folks
globally around the world who are experts within
building for Google Assistant. And I sit down and
chat with them. MARK MIRCHANDANI: Awesome. Well, thank you all
so much for coming in. I think we had some really
cool understanding more. I mean, I definitely
walked in here without knowing what
conversation design was. And now I know a
slightly bit more. So thank you all for coming in. CATHY PEARL: Thanks very much. JESSICA DENE
EARLEY-CHA: Awesome. Thanks for having us. MARK MIRCHANDANI: Thanks so
much to Cathy and Jessica for coming in and
talking about, like I said before, the best practices
around conversation AI. I really like that note you
said, Priyanka, about how it’s a good balance between
the design of conversation AI and the experience there,
but also the development aspect of it and
how do you combine those to make a good system. PRIYANKA VERGADIA: Yeah. It was very enlightening to
hear not just from the Google Assistant perspective, but the
general aspect of designing any experience from the
conversational point of view. MARK MIRCHANDANI: And
speaking of which, I think it’s about time to get
into our question of the week. [MUSIC PLAYING] For this, you
published a blog post. I wanted to ask
you more about it. You have a solution
here to combine Dialogflow and BigQuery. And I mean, my first
question, of course, is, why would you do that? What tools are available in
there that make that useful, but then also, how do you do it? PRIYANKA VERGADIA:
So when I have been speaking with customers
and users of Dialogflow, one of the cool questions
that always comes up is, I want to take
some of the data that I’m getting from the users. Like, I take an example of
an appointment schedule, and the time and
the date and why people want to set an
appointment for coming to whatever it is that I
serve as a service, right? And I just wrote
this cool little tool to understand there is the
reporting side of things that Dialogflow offers, but
there’s always an analysis that you may want to do as a
business in terms of what you want to make decisions off of. So I take these appointment
things from the users– date, time, and why
they want to come in. And then I want to
do my own analysis to understand when do
people come in more, or what are the different
requests that I’m getting more versus less. And all of that is very
easily available as variables, and I can pass
them into BigQuery and I can run SQL
commands on it. So it’s a very good way to find
out how can I provide a better customer experience. And what are some
of the questions I should even automate or
not automate is an indication that you can get with an
integration like this. So that’s the why
aspect of why I came to think that this
could be a good tool. And how do you do it? It’s actually pretty simple. In Dialogflow, you
have the fulfillment where you can write
Node.js code to make a connection to any API. So in this case, I’m using
the BigQuery API to take those variables that I’m capturing
as a part of Dialogflow– the dates, and the times,
and why people want to come in for this appointment– and then just pass those
variables into a table by calling my BigQuery
API from the Cloud Function that is inside the
fulfillment from Dialogflow. So it’s pretty
simple integration. If you want to look at a sample
or want to do it yourself, I have linked some of
the code into GitHub. And you can download that
from the repository as well. MARK MIRCHANDANI:
So you still get to use the dialog
aspect of Dialogflow in creating the chat
bot, but then you’re easily able to take a
question that somebody asks or some information
that somebody provides, and then basically query against
BigQuery, which is really good at, well, big queries,
and then take that, get an answer back, and then use
that back in your Dialogflow application. PRIYANKA VERGADIA:
You could either use that back in your
Dialogflow application, or you could take
that information and create your business
strategy off of it and be like, OK, now I should change my bot,
or I should add these questions because people are asking for
these more, that type of stuff. So you can decide where to
focus more of your energy on. But as you said, you
can also process it and send the response
back to the Dialogflow to have further
conversation, so both ways. MARK MIRCHANDANI: Yeah, would
you recommend that people generally take the
conversations that users are having with their
Dialogflow apps and record all that information
in BigQuery, so they can make decisions? PRIYANKA VERGADIA:
It is a good practice to do that if they are
thinking about ML and AI aspects of taking this
huge amount of data and making sense of it
and then utilizing it to either better improve the
conversational experience or to provide more experiences
that they may not be thinking of providing today,
just based on what the users are asking for. MARK MIRCHANDANI: Super cool. So we’ve got that
article down there, and it sounds like the
code’s on GitHub as well. PRIYANKA VERGADIA: Yep. MARK MIRCHANDANI: So it’ll
be a great opportunity for people to actually
pretty easily try it out. PRIYANKA VERGADIA: Yeah. MARK MIRCHANDANI: Awesome. I think we’re just about out
of time, but before we go, Priyanka, do you have any
cool fun trips coming up? Any big events? PRIYANKA VERGADIA: Yeah. I have a lot of
travel coming up. In October, I will be in Europe. I’ll be in GOTO Berlin, doing
a talk on natural language processing and conversational
AI, and then similar in Milan. And then later,
in November, I’ll be in Copenhagen doing NLP and
conversational AI talks again. MARK MIRCHANDANI: Woo, that’s
a lot of travel coming up for you. PRIYANKA VERGADIA: Yeah,
and then after that, I am going to be on
vacation in December. So I’m looking forward to that. MARK MIRCHANDANI: It sounds
like you might need it. Maybe a staycation
would be better. PRIYANKA VERGADIA: Exactly. Yeah, with family. MARK MIRCHANDANI: Yeah, I think
I’ll have a vacation coming up as well, so definitely
looking forward to unplug for a little bit. And maybe I won’t even
bring my laptop with me. I’ve been juggling that idea. We’ll see if it actually
happens or not, but. PRIYANKA VERGADIA:
You should definitely try to do that because without
laptop, life can be good. [LAUGHTER] MARK MIRCHANDANI: Well,
thanks, everyone, so much for tuning in, whether you’re
tuning in from your phone or from your laptop. Hopefully, life can still
be good while listening to a podcast. So we’ll see you all next week. [MUSIC PLAYING] PRIYANKA VERGADIA: I can’t
even find my window, sorry. MARK MIRCHANDANI: Too many tabs? PRIYANKA VERGADIA:
Too many tabs. MARK MIRCHANDANI:
Somehow the world has gotten into a scenario
where having 150 tabs open is acceptable. PRIYANKA VERGADIA: I know. MARK MIRCHANDANI:
I don’t get it. I don’t understand how
people stay organized with so many [BLEEP] tabs open.

Tagged , , , , , , , , , , , , , , , , , ,

Leave a Reply

Your email address will not be published. Required fields are marked *