Generative Interfaces Beyond Chat // Linus Lee // LLMs in Production Conference

MLOps.community
2 May 202324:48

Summary

TLDRLinus discusses the evolution of generative interfaces beyond chatbots, emphasizing the importance of context in conversational AI. He explores how interfaces can be improved by integrating more intuitive interactions like point-and-select, and suggests using constraints to balance user-friendliness with the power of language models. Linus also touches on the need for interfaces to provide guidance and feedback loops to enhance user experience.

Takeaways

  • 💡 **Chat-style interfaces are useful but not the end-all**: Recognizing that while chat interfaces are valuable, there's room for evolution towards more intuitive and flexible interfaces.
  • 🌐 **Importance of context in conversations**: Emphasizing the need for language models to leverage context effectively to interpret user intent accurately.
  • 🔍 **Incremental evolution of interfaces**: Discussing how to move from chat-repeat style interactions to more engaging and intuitive interfaces.
  • đŸ› ïž **UI design and interaction**: Linus' background in UI/UX and creative tools informs his perspective on building interfaces that balance flexibility and ease of use.
  • 🔗 **Conversation anatomy**: Breaking down the structure of a conversation into context, intent, internal monologue, and action to better understand language model interactions.
  • 🎯 **Point and select interfaces**: Introducing the concept of using selections and pointing to clarify context and direct model actions within an interface.
  • 📋 **Co-inhabiting workspaces**: The idea that interfaces should allow models to 'live' within the same workspace as the user to enhance interaction.
  • 🔑 **Using names to reference objects**: Suggesting that naming elements within an interface can help users interact more naturally with language models.
  • 🔄 **Iterative feedback loops**: Encouraging the design of interfaces that allow for quick feedback and iteration to improve user experience.
  • 🔑 **Predictive interfaces**: The potential of predictive models to suggest actions and make interfaces more intuitive, reducing the 'blank page' problem.

Q & A

  • What is the main topic of the discussion led by Linus?

    -The main topic of the discussion is the evolution of generative interfaces beyond chat, exploring how to create more intuitive and flexible interfaces that balance the capabilities of language models with ease of use.

  • What does Linus suggest as a problem with current chat-style interfaces?

    -Linus suggests that current chat-style interfaces often start with zero context, which leads to difficulties in prompting and a need for users to specify every detail of their intent explicitly.

  • What is the importance of context in conversation according to Linus?

    -Context is crucial as it determines how everything said in a conversation is interpreted. Linus emphasizes that context is often undervalued in chat interfaces, leading to less effective communication with language models.

  • What is the role of 'intent' in a conversation as described by Linus?

    -In a conversation, 'intent' is what the speaker implies, which could be explicit, direct, or implicit. It's a key component that language models need to interpret correctly, often relying heavily on context.

  • How does Linus propose to improve the interaction with language models?

    -Linus proposes using point and select interfaces, where users can directly interact with objects in an application, and language models can observe these actions to better understand user intent and provide relevant responses.

  • What are the different ways Linus discusses for language models to observe user actions?

    -Linus discusses several ways including an omniscience model where the model has access to all application states, 'call by name' for environments with named objects, direct mention of objects by users, and contextual actions similar to right-click menus.

  • Why does Linus think that adding constraints to interfaces can be beneficial?

    -Adding constraints can make interfaces more intuitive and easier to use without sacrificing the power of language models. It provides a structured way for users to interact with models, reducing the 'blank page syndrome' and learning curve.

  • What is the concept of 'call by name' as mentioned by Linus?

    -'Call by name' is a method where users can name different parts of an application, like artboards in a design app, and then refer to them directly in their prompts to the language model for more contextual interactions.

  • What does Linus mean by 'closing feedback loops' in interface design?

    -Closing feedback loops refers to minimizing the time and effort it takes for users to evaluate system responses and iterate on their inputs. This can be achieved by providing multiple options, interactive components, or predictive actions.

  • What is the 'Holy Grail' of user interface design according to Linus?

    -The 'Holy Grail' of user interface design is to balance an intuitive UI that's easy to learn and progressively understand with the flexibility to handle complex tasks, without sacrificing the power of language models.

  • How does Linus view the future of interfaces that incorporate AI and language models?

    -Linus views the future of interfaces as incorporating AI and language models that co-inhabit the user's workspace, take advantage of rich shared context, predict happy path actions, and allow for a fast iteration loop, possibly including interactive components.

Outlines

00:00

🌟 Introduction to Generative Interfaces

Linus introduces the topic of generative interfaces beyond chat, acknowledging the usefulness of chat interfaces but also recognizing their limitations. He discusses the need to evolve beyond text-based interactions and integrate language models with more intuitive interfaces. Linus shares his background in UI design and his work as a research engineer at Notion, focusing on building creative and productivity tools. He outlines the key areas of conversation: understanding the parts of a conversation important for language models, the role of context and selection in dialogue, and the benefits of adding constraints to interfaces to balance ease of use with the power of language models.

05:00

đŸ—Łïž Anatomy of a Conversation

Linus breaks down the components of a conversation, emphasizing the importance of context in communication. He explains how context shapes the interpretation of language and gives examples of how it varies in different scenarios. He also discusses the role of intent in a conversation, which can be explicit or implicit, and how it's crucial for language models to understand this intent. Linus mentions 'Chain of Thought' as an internal monologue for the model to interpret the user's intent within the given context. He uses examples like Co-pilot in VS Code to illustrate how context is already integrated in some applications, and how intent is clear when the user's environment is known.

10:02

🎯 Point and Select Interfaces

Linus explores the concept of 'point and select' interfaces, suggesting that they can help clarify context and direct the model's actions. He explains how pointing and selecting are natural actions in the physical world that can be translated into digital interactions. Linus proposes several ways for language models to observe user actions, such as the omniscience model where the model has access to all application states, 'call by name' for applications with named objects, direct mention of objects, and contextual actions similar to right-click menus. He argues that these methods can make conversational interfaces feel more integrated and intuitive.

15:03

🔄 Balancing Intuition and Flexibility

Linus discusses the goal of balancing intuitiveness and flexibility in user interface design. He suggests that while chat interfaces offer flexibility, they often lack intuitiveness, leading to a steep learning curve for users. He proposes that by adding constraints and suggestions, interfaces can become more intuitive without sacrificing the power of language models. Linus also talks about the importance of closing feedback loops in creative and productivity applications, allowing users to quickly iterate and refine their requests. He gives examples of how providing a range of options or interactive components can enhance the user experience.

20:03

🚀 Conclusion and Future Outlook

In conclusion, Linus summarizes the key ideas from the discussion, emphasizing the potential for dialogue interfaces built on language models to co-inhabit the user's workspace and take advantage of rich shared context. He suggests that interfaces should start with constrained actions but allow for advanced use through chat. Linus also reiterates the importance of speeding up the user's iteration loop through direct manipulation and interactive feedback. He expresses optimism about the future of intuitive and powerful conversational applications and encourages the exploration of predictive interfaces, given the advancements in language models.

Mindmap

Keywords

💡Generative Interfaces

Generative Interfaces refer to systems that can create content based on user input or interactions. In the video, this concept is explored in the context of evolving beyond traditional chat interfaces to more intuitive and flexible systems that can leverage the power of language models. An example mentioned is how generative interfaces could allow for more natural interactions with AI, similar to how humans interact in the real world.

💡Chat-style Interactions

Chat-style interactions are a form of dialogue where users communicate with a system by exchanging text back and forth, similar to a conversation. The video discusses how these are currently dominant but may not represent the ultimate interface for interacting with AI, suggesting the need for more advanced interfaces that can better utilize contextual information.

💡Context

Context, in the video, refers to the environmental or situational information that influences the interpretation of communication. It is crucial for understanding user intent in conversations with AI. The speaker argues that current chat interfaces often start with little context, which can lead to difficulties in prompting the AI effectively.

💡Intent

Intent in the video represents the purpose or goal behind a user's action or communication with an AI system. It can be explicit or implicit and is important for AI to interpret correctly. The script discusses how additional context can help in accurately interpreting intent, which is vital for the AI to provide the appropriate response or action.

💡Co-pilot

Co-pilot, as used in the script, refers to AI systems that assist users in performing tasks, often by providing suggestions or automating parts of the workflow. An example given is copilot chat inside VS Code, where the AI is aware of the user's current coding context and can generate responses or actions based on that context.

💡Point and Select

Point and Select is a user interaction pattern where users direct the AI's attention to a specific part of the interface by selecting or pointing at elements. The video suggests that this can make interactions more intuitive by allowing users to interact with the AI in a way that mimics natural human behavior, such as pointing at an object to refer to it.

💡Constraints

Constraints in the video refer to limitations or guidelines that are故意 introduced in an interface to make it more intuitive and easier to use. By adding constraints, interfaces can guide users towards common actions or happy paths, reducing the complexity of interactions with AI and making them more accessible.

💡Conversational Interfaces

Conversational Interfaces are systems that allow users to interact with technology using natural language in a conversational manner. The video discusses the evolution of these interfaces, suggesting that while they are powerful, they can also benefit from additional context and constraints to become more intuitive.

💡Intuition

Intuition, in the context of the video, refers to the natural, instinctive understanding that allows users to interact with an interface without needing extensive instruction. The speaker argues for the importance of creating interfaces that are intuitive, suggesting that they should guide users towards actions rather than presenting a blank page.

💡Feedback Loops

Feedback Loops are cycles of user action and system response that allow for continuous improvement and iteration. The video discusses the importance of closing feedback loops in creative and productivity applications, suggesting that interfaces should provide immediate, interactive feedback to facilitate user exploration and refinement of ideas.

💡Interactive Components

Interactive Components are elements of a user interface that respond to user interaction, such as sliders, buttons, or widgets. The video suggests that incorporating these components into conversational interfaces can provide users with more direct ways to explore options and receive feedback from the AI, enhancing the interaction experience.

Highlights

Introduction to generative interfaces beyond chat

The limitations of current chat-style interfaces

The importance of context in language model interactions

The role of intent in conversational interfaces

The concept of 'internal monologue' for language models

The action phase in conversational AI interfaces

Example of context usage in Co-pilot for VS Code

The need for more intuitive interfaces beyond chat

The idea of 'point and select' interfaces

Different ways for models to observe user actions

The concept of 'call by name' for object interaction

The potential of drag-and-drop for interface interaction

The role of contextual actions in interfaces

The importance of balancing intuitiveness and flexibility in UI design

The challenge of 'blank page syndrome' in chat interfaces

The idea of using predictive interfaces to aid user interaction

The concept of interactive components in chat interfaces

The goal of tightening feedback loops in creative applications

Final thoughts on balancing power and intuitiveness in interface design

Transcripts

play00:00

alrighty uh there we go cool I don't

play00:04

know or between between uh Harrison and

play00:07

that song I don't know how to top that

play00:09

but but I'll make my best attempt

play00:12

um happy to be here to talk about

play00:13

generative interfaces Beyond chat

play00:15

uh I'm I'm Linus and I'll do my intro in

play00:18

a bit but um where I want to start

play00:21

today is I think we all generally have a

play00:25

sense of

play00:26

catch upd style chat is super useful

play00:28

super valuable you can do a lot with

play00:30

them but it's I think at this point we

play00:33

all kind of accept that it's not the end

play00:34

of the road right with these interfaces

play00:36

and I had a I had a a tweet a while ago

play00:40

where I was like you guys are telling me

play00:42

that we're gonna invent literal super

play00:43

intelligence and we're going to interact

play00:46

with this thing by sending text back and

play00:47

forth and and it's it's obvious that

play00:52

it's not the end of the road but Chinese

play00:53

here today most

play00:56

um usages of language models and

play00:58

interfaces that I've seen in production

play01:00

are built on chat style interactions or

play01:02

dialogue style turn-by-turn kind of

play01:04

interactions and there's interesting

play01:05

experiments on the fringes some of which

play01:07

I'll hopefully mention later but Chad is

play01:09

here today and so the leading question

play01:11

that I want to spend our time talking

play01:13

about today is given that Chatters where

play01:15

we are and given the possibilities for

play01:17

other things to come how do we

play01:19

incrementally evolve what we have chat

play01:21

repeat style chat forward to build more

play01:24

interesting interfaces that balance the

play01:26

flexibility of language models and the

play01:28

power of language models with the ease

play01:30

of use and the intuitiveness of some of

play01:31

the other kinds of interfaces it can

play01:33

build

play01:34

so uh excited to talk about that a

play01:37

little bit about me I think a lot about

play01:38

UI design interfaces interaction design

play01:40

Nai and I've I've spent a bunch of time

play01:43

thinking about that in the context of

play01:44

building creative tools and productivity

play01:45

tools so it makes sense that I am

play01:48

currently a notion number research

play01:49

engineer at notion before that I spent a

play01:51

couple years working independently also

play01:53

pursuing these ideas building a lot of

play01:54

prototypes uh it sounds like some of

play01:56

which are going to be linked somewhere

play01:58

in the chat

play01:59

um and have worked on other kind of

play02:00

productivity tool companies and apps

play02:03

before

play02:04

so

play02:06

if I had to roadmap we're going to talk

play02:08

about I think there's sort of three big

play02:09

buckets first I want to just lay the

play02:12

groundwork for how should we think about

play02:14

conversations and dialogue what are the

play02:15

parts of a conversation that we should

play02:17

think about when we build language

play02:18

models for conversations and then second

play02:20

talk about specifically the idea of

play02:23

context in a conversation and the

play02:25

selection in a conversation which will

play02:26

will come up and then third I want to

play02:28

land on this idea of constraints and

play02:30

adding constraints and the benefits that

play02:32

they can have and how we can balance

play02:34

adding constraints to make interfaces

play02:36

more intuitive and more easy to use

play02:37

without sacrifice and Power

play02:40

so let's talk about conversations

play02:42

let's say you and your buddy here are

play02:44

about to talk about something and you

play02:45

want to say something

play02:46

even before you say anything at all the

play02:49

communication channel has already opened

play02:52

and kind of started because in any

play02:53

conversation in any kind of like

play02:55

linguistic act uh do you you start with

play02:57

a shared context that context might be

play02:59

uh your friend just pulled up with a

play03:01

chair next to your office and you're

play03:02

about to pay a program it might be

play03:04

you're in a supermarket checking

play03:05

something out it might be you're

play03:06

collaborating with a co-worker it might

play03:08

be your friend uh or a stranger walked

play03:10

up to on the street that context

play03:12

determines how everything that you say

play03:14

and everything that your interlocutor

play03:16

says back to you is interpreted and so I

play03:18

think context is important and notably I

play03:20

think in applications like child PT

play03:24

context gets very very low billing uh

play03:26

you basically start with zero context

play03:28

and you have to embed all the context

play03:30

that you want the model to use to

play03:31

interpret your words in the prompt

play03:33

itself which is where I think a lot of

play03:35

the difficulties of prompting come from

play03:36

so you start with some context

play03:38

and then uh your the speaker will imply

play03:43

some intent it could be explicit and

play03:44

direct like hey can you pass me a glass

play03:46

of water you could be a little more uh

play03:49

it could be a little more implicit like

play03:51

if I'm a construction worker I'm

play03:54

building a house or I'm like assembling

play03:55

a Lego kid I might say oh the blue brick

play03:58

and that's not a complete thought but in

play04:00

the context it can be interpreted to to

play04:01

figure out exactly what you're looking

play04:02

for or it could even be just like me

play04:04

pointing at something right or or like

play04:06

doing like one of these and then and

play04:08

then that my partner in in the

play04:10

conversation can interpret the intent

play04:12

out of what I'm doing and that's that's

play04:14

the speaker's role unless you have the

play04:16

intent um especially important for

play04:18

language models as giving the model time

play04:20

to think I'm borrowing some I'm abusing

play04:24

some reinforcement learning terminology

play04:25

terminology here and calling this rule

play04:26

out but uh people call this Chain of

play04:28

Thought call this a scratch Pad some

play04:31

internal monologue for the model or for

play04:33

the recipient of this action to figure

play04:34

out exactly what you mean and kind of do

play04:36

the interpretation of your intent what

play04:39

you said within the context that you

play04:41

have and then once the model or the

play04:43

recipient of the message is done

play04:45

thinking there's some action the action

play04:47

might be answering the question so maybe

play04:49

just textual but more exciting and more

play04:52

often I think we're seeing lots of apps

play04:53

where the action is some combination of

play04:55

a response back and

play04:57

the action that the model is taking

play05:00

whether in an application or integrating

play05:02

with an API

play05:03

so I think this is kind of where the

play05:05

anatomy of a conversation if you really

play05:07

break it down in a typical kind of

play05:09

language model usage Style

play05:11

let's let's take uh co-pilot does it as

play05:13

an example this is a screenshot from

play05:16

co-pilot next or copilot X chat all

play05:20

these names are insane

play05:22

um copilot chat inside vs code

play05:25

this is one of those cases where there's

play05:27

very clear you're already starting with

play05:28

some context if you're building

play05:29

something like this you wouldn't just

play05:31

build the chat you would want the

play05:32

language model to be aware of all the

play05:34

context as much context you can get out

play05:36

of this application so the context

play05:37

includes things like what files you have

play05:40

open uh or do you have a terminal open

play05:41

is there what's the last a few commands

play05:44

and the outputs that they got because

play05:45

maybe the error message in the terminal

play05:46

can inform what the model can do for the

play05:49

user it includes it in things like what

play05:51

line is the cursor on for the user or

play05:52

what lines do they have selected because

play05:54

selection is actually a really strong

play05:55

signal for what the user is thinking

play05:56

about and looking at it's kind of like

play05:58

pointing but but on a screen or in a

play06:00

text editor so you start with some

play06:01

context

play06:02

and there's the intent the intent is in

play06:05

this case is write a set of unit test

play06:06

functions for the selected code and you

play06:08

can see that

play06:09

in interfaces like this you really need

play06:11

the context to interpret the intent

play06:12

correctly and the more context you have

play06:14

usually the better that interpretation

play06:15

is going to be and then presumably there

play06:18

are some some internal monologue

play06:19

thinking for the model

play06:21

um uh and then after that we we get the

play06:24

models action back out

play06:27

so

play06:28

when we think of prompt these

play06:31

conversational interfaces I think we

play06:32

usually focus on the intent The Prompt

play06:35

and then the completion the output the

play06:36

action but there's there are these other

play06:38

elements that I think we should also

play06:39

think about

play06:40

um the context in which the users uh

play06:43

intenses being interpreted and then also

play06:45

the internal monologue

play06:47

so uh

play06:49

this is a screenshot of Microsoft's

play06:52

copilot for Excel I think this is an

play06:54

interesting example of a really rich

play06:56

valuable application where there is a

play06:59

chat interface but where there's clearly

play07:00

a lot more we could do

play07:02

um so so in this case there is so much

play07:05

context that the model can use to figure

play07:07

out exactly what the user is doing and

play07:08

maybe what they're trying to accomplish

play07:10

um and if you had a chat sidebar like

play07:12

this the sidebar sort of exists in a

play07:14

totally different Universe uh than the

play07:16

chat I mean the model can theoretically

play07:17

look at everything that's in the

play07:19

spreadsheet that the user has open but

play07:21

in this case in this screenshot the user

play07:22

is having to refer to specific columns

play07:25

and specific part of this spreadsheet

play07:26

verbally by naming things saying you

play07:29

know the in this case it's the column

play07:31

about uh the sales data or last month's

play07:33

sales data why can't I just point in and

play07:35

select and then say what about this

play07:36

column right in the real world that's

play07:38

kind of how we work if I want to refer

play07:40

to something that's in front of me I'll

play07:41

just point to it I'll just look at it

play07:43

um

play07:43

so in this case I think

play07:45

having the chat agent be co-located or

play07:49

like inhabited co-inhabiting the same

play07:51

workspace that the the user is in I

play07:54

think is a key part of how to make these

play07:55

interfaces gel a little better

play07:57

and without that I think these

play08:00

conversational interfaces start to feel

play08:01

more like the the command line interface

play08:03

of generative AI where you're having to

play08:05

specify increment every possible

play08:07

information about your intent and your

play08:08

action explicitly into the prompt rather

play08:10

than being able to work more fluidly in

play08:12

a kind of a shared working environment

play08:15

so uh how do we how do we where do we go

play08:19

from here well I think

play08:20

I keep talking about this idea of

play08:22

pointing and and using the context and

play08:24

selecting things and so one really

play08:26

powerful technique that I think we can

play08:28

look for is using our hands using

play08:30

selections and pointing

play08:32

and when you point at things in the

play08:35

context or when you select things there

play08:37

are a few different ways I think for the

play08:38

language model to observe what you're

play08:39

pointing out or what you're doing

play08:41

sort of in order from both grounded and

play08:44

reality so that I think the most

play08:45

interesting and kind of out there I

play08:47

think we can start with uh yeah so point

play08:50

and selection interfaces uh one way to

play08:52

think of point and select is uh sort of

play08:54

breaking down your action into nouns and

play08:57

then verbs so what I mean by that is if

play08:59

you're in a spreadsheet the noun might

play09:00

be like the column that I want to

play09:02

manipulate and you select the noun and

play09:04

then the vertical bee I'm going to

play09:05

filter it or I want to aggregate it or I

play09:06

want to hide it or delete it or

play09:08

duplicate it

play09:09

um if you're in a writing app then now

play09:11

it might be a single block in the

play09:13

writing app like the title block or it

play09:14

could be the entire page or it could be

play09:15

like a new file

play09:17

um in the real world this point and

play09:19

select mechanic is sort of built into

play09:21

every object in every material if I

play09:22

wanna

play09:23

take action on some object I have to

play09:25

first like grab the thing and then do

play09:27

something with it

play09:28

um but in in chat style interfaces I

play09:30

think it's less obvious

play09:32

but this point in select mechanic is

play09:34

also what makes the web great for a lot

play09:36

of applications because there's existing

play09:38

sort of materiality built into

play09:40

everything on the web every bit of text

play09:42

by default on the web is selectable you

play09:44

can select anything you can copy paste

play09:45

anything uh you can uh often drag and

play09:48

drop files into web pages and so there's

play09:50

all this like noun and Denver based

play09:52

mechanic built into materials that you

play09:55

use to build apps on the web

play09:56

and uh in chat all of that kind of all

play09:59

of those affordances around selecting

play10:01

objects and then applying actions to

play10:02

them are kind of gone and I think we

play10:04

might we could think about how to bring

play10:06

that back to uh chat style interfaces

play10:09

and point and select I think are most

play10:10

useful for helping clarify context and

play10:13

focus the model on something you're

play10:14

looking at or alternatively directing

play10:16

the kind of action stream or the output

play10:18

stream of the model so if you're in a

play10:20

writing app you could say you know take

play10:21

select something take summarize this bit

play10:23

and then like put it here at the top of

play10:25

the page or or you know make a new page

play10:28

over over here and being able to point

play10:30

point and select I think are useful for

play10:32

directing the output as well

play10:34

so there are a few ways that we can get

play10:36

the model to observe what you're doing

play10:38

or what you're pointing out the most

play10:40

common one currently and I think the

play10:42

most obvious one is sort of this what

play10:44

I'm calling like the omniscience model

play10:45

which is the model can look at

play10:47

everything everywhere visible all the

play10:48

time it just kind of knows the entire

play10:50

state of the application but it's up to

play10:52

the model to figure out what to query

play10:53

and what to look at it's so it's

play10:55

technically the context is technically

play10:57

fully accessible but the model doesn't

play10:59

know exactly what you want the model to

play11:02

look at

play11:03

next level up from that is what I'm

play11:05

calling call by name which I think is

play11:06

kind of interesting for certain types of

play11:08

application especially kind of pro

play11:10

applications where there's a lot of

play11:11

flexibility and customization if you

play11:13

have an application like like a design

play11:14

app like figma or a sketch you could

play11:17

imagine naming different artboards or

play11:19

different panels and then being able to

play11:21

app mention and say hey can you clean up

play11:23

you know panel two or can you clean up

play11:25

like the like timeline panel

play11:27

so uh this only really makes sense for

play11:30

environments where it makes sense to

play11:31

name objects and refer to them by kind

play11:33

of handles or names but if that's the

play11:34

case then I think this is an interesting

play11:36

way to incorporate context and be able

play11:38

to directly point to things but with

play11:39

your words using the names

play11:41

there's also this kind of really

play11:43

interesting uh interface that I I don't

play11:46

think anybody's really seen in

play11:47

production which is what I'm calling

play11:49

literally mentioning something

play11:51

um this in particular is a screenshot

play11:52

from a paper from a project called

play11:54

sikulu from I believe an MIT lab where

play11:57

they uh had a programming language that

play11:59

interleaved icons and images with a way

play12:03

to program around the UI and you could

play12:04

mention you could imagine an interface

play12:06

where if I wanted to refer to a

play12:08

paragraph I could start writing and

play12:10

saying summarize and then I literally

play12:12

drag and drop the paragraph into the

play12:14

prompt box or if I wanted to transform

play12:16

an image I could drag and drop an image

play12:17

or or if I have to if I want to talk to

play12:20

a person that's in my contact list I

play12:23

could grab that person's icon and say

play12:25

hey can you call this person and then

play12:28

just drag and drop that image or that

play12:29

object and so having support for these

play12:30

rich objects inside the prompt box I

play12:33

think is a really interesting

play12:33

possibility

play12:35

and then the last one is what I'm

play12:37

calling contextual actions which is a

play12:39

great example of this is uh like a right

play12:41

click

play12:42

so uh an example of the right click is

play12:46

these context menus right so the left is

play12:48

notion right is figma you grab an object

play12:51

sort of metaphorically and then you can

play12:52

see all the options that are available

play12:53

to you all the things you might want to

play12:55

do with that object in a lot of current

play12:56

applications these are hard-coded in but

play12:58

you could imagine using a language model

play13:00

to say okay here's what here's the

play13:02

object that the user has in their hands

play13:04

or the most likely actions given the

play13:06

full context of the application and

play13:07

given maybe even their like history of

play13:09

actions and what the title of the file

play13:12

is and what they want to do what are the

play13:14

most likely actions they might want to

play13:15

do you could you could have the model

play13:17

select from a list you could also have

play13:18

the model generate possible trajectories

play13:20

that the user might want to take and so

play13:22

context menu I think is an interesting

play13:23

way to reveal actions that you want the

play13:26

user to take without having to force

play13:27

them to to type the instruction up fully

play13:31

another kind of context menu pattern is

play13:33

this kind of thought driven programming

play13:35

or autocomplete driven programming which

play13:36

I think is the the analog of right click

play13:39

but with text so if I'm typing in a text

play13:42

editor or code editor and you hit like

play13:43

dot like document.potty Dot and it'll

play13:45

show me all the autocomplete options

play13:46

this is kind of like saying I'm holding

play13:48

this object in my hand what are all the

play13:50

things that are accessible to me from

play13:51

this or what are the actions that I can

play13:52

take from this or uh in the the other

play13:56

panel I have tab completion so I'm

play13:58

working inside a terminal I have this

play14:00

CLI in my hand what are the things that

play14:01

I can do with it tell me the

play14:02

possibilities and this is another way of

play14:04

grabbing an object and then sort of

play14:05

showing me what's possible and you can

play14:07

imagine powering something like this

play14:08

with a language model as well

play14:10

and then lastly this is a slightly more

play14:12

complex pattern but but where you if the

play14:15

user selects an object you could uh

play14:17

materialize an entire piece of UI like a

play14:21

side panel or or a kind of an overlay so

play14:23

on the left again is notion AI on the

play14:25

right is is keynote uh which is what I

play14:27

was using to make the stack and in

play14:29

either case you select an object and

play14:30

then you can see a whole host of options

play14:32

for how you want to control it and this

play14:33

gives the user a lot of extra power at

play14:35

the cost of maybe not being obvious

play14:36

exactly what the user wants to do or

play14:38

what the user should take action on

play14:40

so in all these cases we have this sort

play14:43

of like noun and then verb like choose

play14:45

the object and then what action you want

play14:46

to take kind of pattern and that lets

play14:48

the system constrain the action space

play14:51

that the user might want to take and

play14:52

maybe even come up with uh follow-ups or

play14:55

suggestions and what the best actions

play14:57

you could take are

play14:58

given all of this given and given what

play15:00

we talked about around the anatomy of a

play15:02

conversation I think

play15:04

and then when you look back at something

play15:06

like Challenger PT charger PT is really

play15:07

just about okay you have this little

play15:09

tiny prompt box and you have to cram all

play15:11

of the context all of the intent in

play15:12

there and also everything that you want

play15:14

the model to know about where you want

play15:16

that model to take its action and that

play15:18

that I think is a good place to start

play15:21

but it is limiting and there are ways we

play15:24

can expand out of it

play15:26

um so one way to summarize the the sort

play15:30

of ground that we've covered might be

play15:31

that the Holy Grail or one powerful goal

play15:34

of user interface design is to balance

play15:37

um intuition building an intuitive UI

play15:39

that's uh easy to learn and sort of

play15:41

progressively understand but flexible

play15:44

um and intuitive and flexibility I think

play15:46

is the strength of language models

play15:48

um

play15:49

in chat style interfaces with chat cell

play15:51

interfaces you have the full access to

play15:53

the full capabilities of a model you can

play15:55

ask it to do anything that the model

play15:56

could possibly do including things like

play15:57

use API and use tools and fully specify

play15:59

like a programming language that you

play16:01

want the model to use if you want and

play16:02

that's the strength of falling back to

play16:04

chat but by adding these constraints

play16:08

where you start with something in your

play16:09

hand and then try to recommend or

play16:12

suggest or follow up and say given this

play16:15

is what you're looking at given this is

play16:16

the locus of your attention right now

play16:17

here are the things that you can do

play16:19

um and maybe predict some actions and

play16:21

add some guardrails add some structure

play16:24

to the instruction I think that's where

play16:26

we can add sort of bring back the

play16:28

intuitiveness of graphical user

play16:29

interfaces without sacrificing the power

play16:31

of language models

play16:34

open-ended natural language interfaces I

play16:36

think uh trades off too much of that

play16:39

intuitiveness for for the flexibility

play16:42

um so in an app like chachu PT you have

play16:45

this blank page syndrome where the user

play16:46

doesn't know exactly what they are

play16:47

supposed to type in maybe they have a a

play16:49

sense of maybe I want a summary or maybe

play16:51

I want a conversation of a certain style

play16:53

but there's no affordances in the UI to

play16:56

give them hints of okay these are the

play16:58

things that the model is good at these

play17:00

are the ways you might not phrase the

play17:01

answer none of this none of this exists

play17:02

and so I think it adds a huge learning

play17:04

curve and as detrimental to the ease of

play17:06

Discovery and by bringing back some of

play17:07

these graphical interfaces I think we

play17:10

can improve that situation a bit

play17:13

and then lastly since I'm closing in on

play17:15

time

play17:16

um I wanted to add one more note around

play17:18

another goal of interface design often

play17:21

which I think is closing feedback loops

play17:22

particularly in Creative applications

play17:24

and sometimes also in productivity

play17:26

applications you want to try to tighten

play17:28

the close the

play17:30

um

play17:30

tighten the the feedback loop between

play17:33

the user attempting something and maybe

play17:35

having something in their mind they want

play17:36

to see and then looking at the results

play17:38

evaluating the result and then figuring

play17:40

out okay this is how I need to iterate

play17:41

this is the fix that I need to apply to

play17:43

get get the model to generate what I

play17:44

want

play17:45

and uh there are a few ways to do this

play17:48

right one is instead of the model

play17:49

generating one output if you're say

play17:52

generating images instead of a model

play17:53

generating one output it could generate

play17:54

a range of outputs and that allows users

play17:57

to pick okay these are these are maybe

play17:58

four different ways of looking at this

play18:00

answer for four different images that

play18:01

you could generate this is one that I

play18:02

like and then iterate on that again this

play18:04

only really works if the output is easy

play18:05

to evaluate if I ask the model to write

play18:07

an essay and it gives me four different

play18:09

essays it would be it would be pretty

play18:10

difficult to use

play18:12

um so going along with that idea I think

play18:14

uh you want

play18:16

um

play18:17

if whenever possible you want to prefer

play18:20

uh what I've heard referred to as like

play18:22

people like to shop more than like to

play18:23

create they like having a range of

play18:25

options they can choose from and maybe

play18:27

even like swipe left and right style

play18:29

kind of do I like this do I not they

play18:30

want to that kind of interface is easier

play18:33

to use a more intuitive uh and more

play18:35

engaging than here's a blank page tell

play18:37

me exactly what you want and again

play18:39

powering that kind of thing comes back

play18:41

to okay what coming up with uh options

play18:44

and predictions of actions and

play18:45

suggestions that you is that you sort of

play18:47

uh plan out for the user in case they

play18:50

want

play18:51

and then the user can make that

play18:52

selection and then lastly

play18:54

um I've seen some prototypes of this

play18:57

this thing that I'm calling interactive

play18:58

components what I'm referring to this

play19:00

we're referring to by interactive

play19:01

components is if you are in a chat kind

play19:04

of interface and you ask a question and

play19:06

instead of responding with a paragraph

play19:08

of answer maybe I ask like uh what's the

play19:10

weather in New York and instead of

play19:11

responding with a paragraph of answer

play19:12

the model says okay here's maybe maybe

play19:15

it says the temperature tomorrow is 85

play19:17

and then there's a little weather widget

play19:18

with a slider for time or with with

play19:20

buttons we're looking at precipitation

play19:22

and these other things and the model can

play19:24

synthesize maybe the model will be able

play19:25

to synthesize little interactive

play19:27

components little widgets on the Fly and

play19:30

that again helps me close my feedback

play19:31

loop by saying okay these are these

play19:33

other options of information that I can

play19:34

look at and I can really explore them

play19:36

really directly without having to

play19:37

re-prompt and retype my my queries

play19:41

so uh bringing it all back I wanted to

play19:44

close out with one of my one of my

play19:46

favorite quotes from one of my favorite

play19:49

papers essays when I'm thinking about

play19:52

creative tools uh by

play19:55

um

play19:55

okay content in this essay called casual

play19:57

creators

play19:59

um and I I think this quote is great so

play20:01

I'm just going to quote at length the

play20:02

possibility space of creative tools and

play20:05

what you can do the action space should

play20:07

be narrow enough to exclude broken

play20:08

artifacts like models that fall over or

play20:11

break when when you're in a 3D printing

play20:13

app but it should be broad enough to

play20:14

contain surprising artifacts as well the

play20:16

surprising quality of the artifacts

play20:18

motivates the user to explore the

play20:19

possibilities base in search of new

play20:21

discoveries new use cases a motivation

play20:23

which disappears if that space is too

play20:24

uniform so again she's talking about

play20:26

this balance of you want to constrain

play20:28

just a bit just enough that the user

play20:31

never gets stuck in that like blank page

play20:33

state that there's always some option

play20:34

that they can take or always some

play20:35

suggested action that seems interesting

play20:37

uh you want to preserve the power and

play20:39

the flexibility and the sometimes

play20:40

surprising quality of these language

play20:42

models and I think that distracting that

play20:44

balance is is sort of the primary

play20:46

challenge of building interfaces for

play20:47

these models

play20:49

oh

play20:51

something's happening

play20:52

there we go okay uh uh so last slide

play20:57

just to sum up I think five Big Ideas

play21:00

that I want uh it'd be great you could

play21:02

take away from this this conversation I

play21:04

think good dialogue interface is built

play21:06

on llms can have agents that co-inhabit

play21:09

your workspace that are there and can

play21:10

see what you're doing in its full detail

play21:12

including where your attention is it

play21:14

should take full advantage of the rich

play21:15

shared context that you have with the

play21:18

model

play21:19

uh to interpret your actions so that you

play21:21

don't have to cram everything into a

play21:22

prompt I think these interfaces can lead

play21:25

initially with constrained happy path

play21:27

actions that you can use models and

play21:29

other language models and other

play21:30

predictive models to try to predict and

play21:32

then if the user wants to do something

play21:33

more advanced or different we can always

play21:35

fall back to chat as an escape patch

play21:37

because there's the power and the

play21:38

flexibility in language models and then

play21:40

lastly

play21:41

whether you're building a chat interface

play21:43

or something a little more direct

play21:44

manipulation graphical uh it's I think

play21:46

it's always good to think about how we

play21:48

can speed up that iteration Loop

play21:49

especially by forcing the user not not

play21:52

forcing the user to type text but by

play21:53

responding uh more directly with a with

play21:56

a mouse or with a touchscreen for

play21:58

closing that feedback loop

play22:00

so with that I hope that was interesting

play22:02

and useful and hope you can build some

play22:04

some great conversational applications

play22:06

uh given there's a extrude

play22:10

wow wow I mean so many questions there's

play22:14

so much stuff that's going through my

play22:15

mind and I love the idea of how it's

play22:17

like

play22:19

you

play22:21

you're helping guide people that is so

play22:25

nice to think about instead of just

play22:27

leaving this open space And then trying

play22:30

to figure it out it's like hey can we

play22:32

can we suggest things so that people can

play22:34

figure it out with us as opposed to just

play22:37

letting their imagination go wild and

play22:40

then it may or may not turn out okay

play22:41

yeah exactly I think

play22:44

some there's some history of predictive

play22:47

interfaces like this and I think uh in

play22:49

the design World collectively our tastes

play22:52

have been soured a bit on predictive

play22:53

interfaces because the model that we've

play22:55

used models that we've used in the past

play22:56

have not been that good and so we

play22:58

couldn't really predict that far and we

play22:59

could really predict only simple actions

play23:01

but I've seen prototypes of like

play23:03

programming interfaces where given the

play23:05

full file context uh you can predict not

play23:07

only code but you can predict hey do you

play23:09

want to refactor this function or do you

play23:11

want to like rewrite this type into this

play23:12

other type or if you're in a creative

play23:13

app you could predict fairly complex

play23:16

trajectories for the user like hey do

play23:18

you want to take this drawing and

play23:20

recolor it in this way or do you want to

play23:21

apply this filter and then this other

play23:23

filter and given the power of these

play23:25

models I think we should I think it's

play23:28

worth taking another look at um these

play23:30

predictive interfaces as well obviously

play23:31

leaving the escape hatch that is just

play23:33

normal chat

play23:35

yes so I'm excited for the day that

play23:38

notion automatically knows I want to

play23:41

create a table with something and it

play23:43

will populate it with things exactly

play23:45

what I want and uh I'm guessing that

play23:49

you're going to be one of the people

play23:50

that's making that the reality of the

play23:52

future perhaps one day

play23:54

sweet man well this was awesome there's

play23:56

so many incredible questions for you

play23:58

that are happening in the chat so if you

play24:02

all want to continue the conversation I

play24:04

am pretty sure that is is on slack and

play24:07

it's not at Linus it is at the that is

play24:11

yeah it's my there it is my internet

play24:14

name I guess so

play24:16

[Music]

play24:37

thank you

play24:43

[Music]

Rate This
★
★
★
★
★

5.0 / 5 (0 votes)

Étiquettes Connexes
Generative InterfacesChatbotsUI DesignAI InteractionProductivity ToolsCreative ToolsContextual AIConversational AIUser ExperienceInterface Evolution
Besoin d'un résumé en anglais ?