LLMs: A Hackers Guide

Hrishi Olickel
24 Mar 202430:08

Summary

TLDRDer Vortrag beschäftigt sich mit der Verwendung von KI und der Generierung von Dokumenten. Der Sprecher führt seine Erfahrungen mit verschiedenen AI-Projekten ein, wie dem Erstellen eines Kommunikations-Automations-Tools und eines visuellen RAG. Er betont die Bedeutung des iterativen Lernprozesses und der Verwendung verschiedener Modelle. Es werden auch Ressourcen und Werkzeuge für die Entwicklung mit KI vorgestellt, sowie Tipps zur Fehlerbehebung und zukünftige Entwicklungen in der KI-Branche.

Takeaways

  • 🚀 Die Arbeitsweise mit künstlichen Intelligenzen (AI) und Generativen AI hat sich verändert und erfordert ein neues Denken.
  • 💡 Die Verwendung von großen Sprachmodellen (LLMs) hat die Natursprachverarbeitung (NLP) stark verbessert und macht alle englischen Texte und Nachrichten zugänglich.
  • 📈 Die Anwendung von AI bei kommerzieller Schifffahrt, einschließlich der Verwendung von Multi-Modal RAG (RAG für visuelle Informationen), hat die Beantwortung missionkritischer Fragen verbessert.
  • 🔄 Der iterative Loop (CPLN) ist ein wichtiger Aspekt der AI-Arbeit, bei dem es darum geht, ständig neue Ansätze zu entwickeln und zu testen.
  • 📚 Die Erstellung von Dokumenten kann durch die Verwendung von AI-Tools automatisiert werden, was die Wiederholung von Informationen in Meetings oder Dokumenten reduziert.
  • 🎨 Die Verwendung von verschiedenen AI-Tools und -Methoden ist entscheidend, um komplexe Probleme zu lösen, und erfordert die Verwendung aller verfügbaren Modi und Strukturen.
  • 🛠️ Die Verwendung strukturierter Eingaben und Ausgaben ist wichtig, um die Genauigkeit und Zuverlässigkeit von AI-Modellen zu verbessern.
  • 🔄 Die ständige Iteration und Verbesserung von Prompts ist notwendig, um die besten Ergebnisse zu erzielen, und sollte einen großen Teil der Arbeitszeit einnehmen.
  • 🔍 Die Verwendung von Debugging-Techniken ist entscheidend, um Probleme in AI-Systemen zu identifizieren und zu beheben, insbesondere durch die Identifizierung von Fehlerquellen und die Anwendung von Korrekturen.
  • 📈 Die zukünftige Entwicklung von AI wird durch Hardware- und Speicheroptimierungen sowie Quantifizierung nochmals um ein Vielfaches schneller und kostengünstiger werden.
  • 💬 Die Verwendung von AI in der Codierung und Programmierung wird zunehmen und wird in Zukunft einen großen Einfluss auf die Art und Weise haben, wie Entwickler arbeiten.

Q & A

  • Was ist das Hauptthema des Gesprächs in dem Transkript?

    -Das Hauptthema des Gesprächs ist die Arbeit mit künstlichen Intelligenz (AI), insbesondere mit Generativen AI und der Verwendung von Prompts. Es wird auch über die Anwendung von AI in verschiedenen Bereichen und die Herausforderungen bei der Entwicklung mit AI diskutiert.

  • Wie definiert der Sprecher die 'iterative Schleife' im Kontext der AI-Entwicklung?

    -Die 'iterative Schleife' bezieht sich auf den Prozess des kontinuierlichen Tests und Änderns von Prompts und Ansätzen bei der Arbeit mit AI-Modellen. Es geht darum, verschiedene Ansätze zu erkunden und zu optimieren, um bessere Ergebnisse zu erzielen.

  • Was ist der Hauptunterschied zwischen traditioneller Programmierung und der Arbeit mit AI-Modellen?

    -Die traditionelle Programmierung ist deterministisch, während die Arbeit mit AI-Modellen nicht deterministisch ist und eher auf dem Experimentieren und Finden neuer Ansätze basiert. In AI ist es wichtig, sich neue Denkmuster anzueignen und sich nicht an die traditionellen Vorgehensweisen zu halten.

  • Welche Bedeutung haben 'CPLN'-Prompts in der AI-Entwicklung?

    -CPLN steht für 'Chat, Prompt, Loop, Nest' und ist ein von dem Sprecher eingeführtes Muster für die Arbeit mit AI. Es betont die Bedeutung des Chattens mit Modellen, des Änderns von Prompts, des Hinzufügens von Daten und Testfällen sowie des unterteilens der Aufgaben in kleinere Unteraufgaben.

  • Welche Rolle spielt die Verwendung von Strukturen in der Eingabe und Ausgabe bei der Arbeit mit AI?

    -Die Verwendung von Strukturen in der Eingabe und Ausgabe erleichtert es, die Modelle besser zu steuern und zu lenken. Strukturierte Eingaben helfen, die Relevanz und Richtigkeit der Anfragen zu erhöhen, während strukturierte Ausgaben dazu beitragen, dass die Antworten der Modelle präziser und leichter zu interpretieren sind.

  • Welche Ressourcen empfiehlt der Sprecher für die weitere Ausbildung in der AI-Entwicklung?

    -Der Sprecher empfiehlt, sich auf der Suche nach Projekten und Ressourcen in der AI-Community zu orientieren. Er verweist auf seine eigenen Artikel sowie auf Artikel von anderen Experten, die er auf der Website bereitgestellt hat.

  • Was ist der Vorteil der Verwendung von verschiedenen AI-Modellen?

    -Jeder AI-Modell hat unterschiedliche Trainingsdaten und -methoden und hat daher auch unterschiedliche Stärken und Schwächen. Durch die Verwendung verschiedener Modelle kann man bessere Ergebnisse erzielen und sich an unterschiedliche Anwendungsfälle anpassen.

  • Wie kann man die Effektivität von AI-Modellen bei der Problemlösung verbessern?

    -Die Effektivität kann durch das Ändern der Eingabedaten, das Hinzufügen von Struktur zur Ausgabe, das Testen verschiedener Modelle und das kontinuierliche Experimentieren mit verschiedenen Ansätzen verbessert werden.

  • Welche Art von Fehlern können bei der Arbeit mit AI auftreten?

    -Fehler können auf der App-Ebene, bei der Datenverarbeitung oder bei der Anweisungsbefolgung auftreten. Jeder Fehlertyp erfordert eine andere Art der Problemlösung, um die Leistung der AI-Modelle zu verbessern.

  • Wie sieht die Zukunft der AI-Entwicklung aus Ansehen des Sprechers?

    -Die Zukunft der AI-Entwicklung wird von schnelleren Hardware, besseren Optimierungen und der Fähigkeit, komplexere Projekte mit weniger Ressourcen zu erstellen, geprägt sein. Der Sprecher erwähnt auch, dass die Kosten für die Nutzung von AI in naher Zukunft signifikant sinken werden.

Outlines

00:00

🌟 Einführung und Hintergrund

Der Sprecher führt in diesem Abschnitt ein und erläutert, dass er keine Vorbereitung für diese Session hatte. Er diskutiert häufig gestellte Fragen zu künstlichen Intelligenzen (KI), insbesondere zu Generativen KI und der Verwendung von Prompts. Er teilt mit, dass er ein Unternehmen namens Greywing in Singapur leitet, das sich auf kommerzielle Versande konzentriert und eine Menge an KI- und Datenarbeit durchführt. Der Sprecher beschreibt auch seine früheren Projekte, einschließlich eines Kommunikationsautomatisierungstools und eines eigenen Assistenten, sowie seine Arbeit mit multimodaler RAG-Technologie, um komplexe Informationen zu verarbeiten und kritische Fragen zu beantworten.

05:01

🔄 Iterativer Loop und Prompt-Entwicklung

In diesem Abschnitt betont der Sprecher die Bedeutung des iterativen Loops und der kontinuierlichen Verbesserung von Prompts. Er erklärt, dass die Arbeitsweise mit KI-Modellen neue Muster erfordert, die er als CPLN (Chat, Prompt, Loop, Nest) bezeichnet. Er plädiert für mehr Experimentation und das Finden neuer Ansätze, um Probleme zu lösen, und warnt davor, in der Entwicklung zu früh aufzuhören. Er schließt mit der Idee ab, dass manche Tools und Technologien besser geeignet sind, um bestimmte Arten von Aufgaben durchzuführen, und dass es wichtig ist, diese zu identifizieren und anzuwenden.

10:01

🛠️ Werkzeuge und Methoden für KI-Entwicklung

Der Sprecher konzentriert sich in diesem Abschnitt auf die Werkzeuge und Methoden, die bei der KI-Entwicklung eingesetzt werden. Er diskutiert die Verwendung von Strukturen in Ein- und Ausgabe, die Bedeutung von TypeScript und Zod für die Typspezifikation und wie SQL als Hilfsmittel zur besseren Modellierung von Suchanfragen verwendet werden kann. Er betont auch die Notwendigkeit, die Größe und Form von Daten zu ändern, um Probleme zu lösen, und gibt eine Warnung vor der Verwendung von Abstraktionen, die das Verständnis der KI-Modelle behindern können.

15:03

🚫 Dos und Don'ts in der KI-Entwicklung

In diesem Abschnitt werden einige wichtige 'Dos und Don'ts' in der KI-Entwicklung vorgestellt. Der Sprecher ermutigt, alle verfügbaren Modalitäten zu nutzen, einschließlich Sprache, Vision und Code, um die besten Ergebnisse zu erzielen. Er warnt vor der Verwendung zu viel Abstraktionen und empfiehlt, verschiedene Modelle zu verwenden, da sie sehr unterschiedlich sind. Er gibt auch Tipps zur Vermeidung von KI-Halluzinationen durch die Verwendung strukturierter Ausgaben und ermutigt, die Größenordnung von Ein- und Ausgabe zu berücksichtigen, um bessere Ergebnisse zu erzielen.

20:04

🔍 Debugging und Problemlösung in KI

Der Sprecher diskutiert in diesem Abschnitt die Herausforderungen des Debuggings in KI und wie man effektiv Probleme identifizieren und löst. Er empfiehlt, immer auf die Prompt-Ebene zurückzugehen, wenn nichts funktioniert, und bietet verschiedene Ansätze, um die Quellen von Problemen zu ermitteln. Er betont die Bedeutung von Struktur in der Eingabe und Ausgabe, um Probleme besser identifizieren und beheben zu können. Schließlich teilt er einige Tipps zur Klassifizierung von Fehlern und wie man sie effektiv beheben kann.

25:04

📈 Projektbeispiel und zukünftige Entwicklungen

In diesem letzten Abschnitt präsentiert der Sprecher ein Projektbeispiel, das von einem spezifischen Problem ausgeht und zeigt, wie man durch iteratives试验 und Hinzufügen von Struktur Schritt für Schritt zu einer vollständigen Lösung kommt. Er diskutiert auch die zukünftige Entwicklung der KI-Technologie, insbesondere in Bezug auf die Kosten und Geschwindigkeit, und ermutigt die Zuhörer, über die Möglichkeiten nachzudenken, die durch diese zukünftigen Veränderungen entstehen werden. Er schließt mit einem Hinweis auf verfügbare Ressourcen und bietet an, weiterhin bei Fragen zur Verfügung zu sein.

30:05

🙏 Dank und Abschluss

Zum Abschluss bedankt sich der Sprecher bei den Zuhörern für ihre Aufmerksamkeit und eröffnet die Floor für Fragen oder Diskussionen. Er betont seine Bereitschaft, weiterhin zu helfen und Diskussionen zu führen, und bittet um Fragen oder Feedback.

Mindmap

Keywords

💡AI

Künstliche Intelligenz (AI) bezieht sich auf das menschliche Denken und Handeln von Computern und anderen Maschinen. Im Video wird AI als zentrales Thema hervorgeschoben, da sie in vielen Bereichen eingesetzt wird, um Probleme zu lösen und Prozesse zu verbessern. Ein Beispiel ist die Verwendung von AI im kommerziellen Schifffahrtssektor, um komplexe Informationen zu verarbeiten und missionentscheidende Fragen zu beantworten.

💡GenAI

GenAI ist ein Akronym für Generative Artificial Intelligence und bezieht sich auf die Verwendung von AI, um neue, originalliche Inhalte zu erstellen. Im Video wird erwähnt, dass die Arbeit mit GenAI neue Denkmuster erfordert, da sich die Modelle nicht deterministisch verhalten und die Arbeit mit Prompts wie codierenartige Aktivitäten sind.

💡Prompting

Prompting bezieht sich auf die Technik, AI-Modelle durch die Bereitstellung von Anweisungen oder Stichworten zu lenken und zu steuern. Im Video wird betont, dass Prompting ein wichtiger Aspekt der Interaktion mit AI-Modellen ist und dass es erforderlich ist, sich mit verschiedenen Ansätzen auseinanderzusetzen, um optimale Ergebnisse zu erzielen.

💡Iterative Loop

Ein iterativer Loop oder Schleife bezieht sich auf den Prozess, bei dem man fortlaufend Änderungen an einem Projekt vornimmt, um es zu verbessern oder zu verfeinern. Im Video wird der iterative Loop als entscheidendes Werkzeug für die Arbeit mit AI-Modellen vorgestellt, da er es ermöglicht, verschiedene Ansätze zu testen und die besten Ergebnisse zu erzielen.

💡CPLN

CPLN ist ein Akronym, das im Video als Abkürzung für ein neues Muster der Interaktion mit AI-Modellen eingeführt wird. Es steht für Chat, Prompt, Loop, Nest und bezieht sich auf die verschiedenen Schritte, die man durchlaufen muss, um effektiv mit AI-Modellen zu arbeiten.

💡Playground

Der Begriff 'Playground' im Kontext des Videos bezieht sich auf eine Umgebung oder ein Tool, in dem man Experimente mit AI durchführen und verschiedene Ansätze ausprobieren kann, ohne sich um Produktionsumgebungen sorgen zu müssen. Es ermöglicht das Lernen und Finden von Lösungen in einem sicheren Umfeld.

💡Diarization

Diarization ist die Prozessierung von Audiodaten, um die Quellen der Stimme zu identifizieren und zu trennen. Im Video wird diarization als ein Problem genannt, das gelöst wurde, aber auch als Beispiel dafür, wie manche Probleme spezifisch für AI-Modelle sind und wie wichtig es ist, sich auf neue Ansätze zu konzentrieren.

💡Structured Data

Strukturierte Daten sind Daten, die in einem vordefinierten Format vorliegen und leicht von Computern verarbeitet werden können. Im Video wird betont, dass es wichtig ist, strukturierte Daten zu verwenden, um die Interaktion mit AI-Modellen zu verbessern und die Ergebnisse zu steuern.

💡Debugging

Debugging bezieht sich auf das Identifizieren und Beheben von Fehlern oder Problemen im Code oder in einem System. Im Video wird erwähnt, dass Debugging in der AI-Entwicklung eine neue Paradigme erfordert, da die Art und Weise, wie AI-Modelle funktionieren, komplex ist und viele Abläufe nicht deterministisch sind.

💡Latent Space

Der Latente Raum ist ein Begriff aus der Mathematik und der KI, der eine Abstraktionsstufe darstellt, in der bestimmte Eigenschaften von Daten oder Modellen liegen. Im Video wird erörtert, wie man im Latenten Raum 'surfen' kann, um Veränderungen an Prompts und Gesprächshistorien vorzunehmen und somit die Interaktion mit AI-Modellen zu verbessern.

💡TypeScript und Zod

TypeScript ist eine Sprache, die JavaScript erweitert und statisches Typings unterstützt, während Zod ein Schema-Validator für TypeScript ist. Im Video werden beide als Werkzeuge eingeführt, die dazu beitragen, die Interaktion mit AI-Modellen zu steuern und zu verbessern, indem sie helfen, klare und präzise Anweisungen für die Modelle zu formulieren.

💡NLP

NLP (Natural Language Processing) bezieht sich auf die Verarbeitung und Analyse von natürlicher Sprache durch Computer. Im Video wird NLP als ein grundlegender Aspekt der AI-Entwicklung hervorgehoben, da es ermöglicht, Dokumente zu klassifizieren, Informationen zu extrahieren und zu transformieren sowie verschiedene andere Aufgaben auszuführen.

💡HTTP

HTTP (Hypertext Transfer Protocol) ist ein Protokoll, das für den Austausch von Daten zwischen Computern im World Wide Web verwendet wird. Im Video wird HTTP erwähnt, um zu zeigen, wie verschiedene Technologien und Modalitäten wie Audio, Vision und Sprache in die Entwicklung von AI-Modellen integriert wurden.

Highlights

The speaker runs a company called Greywing in Singapore focused on commercial shipping and AI.

Greywing's initial launch was a communications automation tool utilizing AI.

The company also developed an assistant with multi-tool usage capabilities before OpenAI.

Significant work is done with multi-modal RAG (Visual RAG) for processing complex information.

The importance of output in shipping and the use of AI to handle critical data sets is emphasized.

The speaker's company has created a tool to generate documentation from meetings and conversations.

The iterative loop (CPLN) is introduced as a key pattern for working with AI.

The importance of chatting with AI models and iterating prompts is highlighted.

The concept of 'playgrounds' for testing AI tools and the value of retroactive editing of prompts is discussed.

The strategy of adding more data and test cases to loop and refine AI models is mentioned.

Nesting prompts by breaking them into smaller subsegments is recommended for clarity and simplicity.

The idea of using AI models in an iterative process from zero to 0.1 is introduced.

The importance of trying new approaches with AI, as these models are still very new, is emphasized.

The speaker advises using all modalities (speech, vision, text) when working with AI.

The use of structured input and output in AI interactions is recommended to reduce errors and hallucinations.

The potential cost reduction and speed increase in AI technology is discussed, suggesting a rethink of what can be built.

Debugging AI systems by examining prompts, model levels, and input transformations is advised.

The speaker shares personal experiences and learnings from working with AI in a commercial setting.

Transcripts

play00:00

So this is just meant to be relatively casual,

play00:02

I didn't really prep for this.

play00:03

This is just a template and through to get a bunch of stuff.

play00:07

But a lot of it is just a lot of questions

play00:10

that I've been getting over the last couple of years,

play00:12

just about AI and now about prompting and everything else.

play00:15

Because a lot of the work that we now do with GenAI

play00:18

kind of requires that you think about things differently.

play00:20

So this is just a collection of all of those things

play00:22

and as we go through them, we'll see.

play00:25

Just tiny bit about me.

play00:26

I run a company called Greywing here in Singapore.

play00:28

We're focused on commercial shipping.

play00:30

We do a ton of AI work.

play00:32

We do a ton of data work.

play00:33

When LLMs came up, we started looking at them

play00:36

as NLP effectively on steroids.

play00:39

Because it was a suddenly all of the English language

play00:42

and people's messages, emails were all accessible.

play00:45

So we started working there.

play00:47

Out of that work came that initial launch,

play00:50

which was a comms automation tool.

play00:52

That's doing pretty well.

play00:54

Another one was an assistant of our own.

play00:55

Far before OpenAI had multi-tool usage or any of those things,

play00:59

we were doing charts and a bunch of things.

play01:01

Then came sort of RAG.

play01:03

We do a ton of work with multi-modal RAG,

play01:06

which is visual RAG.

play01:07

So being able to process sort of complex information,

play01:10

how to visually index it and sort of how to use that

play01:12

to answer mission critical questions.

play01:14

So that is, for example, a DRAM data sheet from Samsung.

play01:17

So they were testing the product.

play01:19

And in shipping, all of these data sets are really important

play01:21

and the importance of the output is also very high.

play01:24

So this is just to say,

play01:27

like we've done quite a bit of work

play01:28

across different parts of AI,

play01:30

and this is just everything we've learned.

play01:32

Also, I'll put up like little QR codes

play01:34

just in lieu of links.

play01:36

The slides are gonna be up later.

play01:37

So if you see something and you're like,

play01:38

oh, I wanna know a little bit more,

play01:39

there's gonna be a QR code.

play01:42

So this is, in the interest of hacking,

play01:45

everything today is gonna be about going from,

play01:46

let's say, zero to 0.1.

play01:49

Right, where do you start?

play01:50

How do you start?

play01:51

So this is just an open source project

play01:52

that I recently released.

play01:53

I'll use that as an example later on,

play01:55

how to go from like script to project to release.

play01:59

This is effectively a tool to generate docs.

play02:01

So we had a docs problem.

play02:03

Everybody's got a docs problem.

play02:04

But what we did have is a ton of meetings.

play02:06

We'd have tons of meetings,

play02:08

repeated meetings about the same things,

play02:10

explaining the same things.

play02:12

So now we have Whisper,

play02:13

we have all of these things accessible.

play02:15

Can we just make docs out of things

play02:17

we've explained like 10 times before?

play02:19

So this was launched like two days ago.

play02:21

This has already had like, I don't know what,

play02:23

50 or 60 projects already make docs using it.

play02:26

Because it turns out, people, it's easy to talk.

play02:28

It's very hard to write docs, right?

play02:30

So that's just an example.

play02:33

So the first kind of thing I wanna talk about really,

play02:35

and this is, you know, it took me a long time to discover

play02:37

and it made a huge difference,

play02:38

is just the iterative loop, right?

play02:40

And when I say iterative loop,

play02:41

I mean what is your process to build stuff, right?

play02:45

So long time ago, you know,

play02:46

if you were coding that long ago,

play02:48

we had write compile run, right?

play02:50

We basically would write code,

play02:51

it would take a long time to compile.

play02:53

People still working on Xcode

play02:54

would have that same problem today.

play02:56

We'd run it, we'd go back, write, compile and run.

play02:59

And then we had sort of interpretive languages come along

play03:01

and then we now have like the REPL,

play03:03

which is effectively the read eval print loop, right?

play03:05

Which is, you write code, runs instantly,

play03:08

you keep changing it, you keep changing it,

play03:09

something happens.

play03:12

And then, you know, a bunch of guys came along.

play03:14

I don't fully agree with these guys,

play03:15

but you know, they had a good point.

play03:16

You know, you have test-driven development,

play03:18

test build, test, that kind of thing.

play03:20

But I think AI really,

play03:22

because the way these models work is not deterministic

play03:25

and prompting can feel kind of code-like

play03:28

and working with prompts can feel kind of code-like,

play03:30

but they're really the furthest thing.

play03:31

So they really need new patterns.

play03:33

So the pattern that, you know, me and my team

play03:34

and a lot of other people that I know have fallen into

play03:36

is, you know, what I'm calling CPLN.

play03:39

So we'll see what that means.

play03:42

So the first one's just chat, right?

play03:44

Whatever your problem is, whatever you're trying to do,

play03:46

just chat with models,

play03:48

make more and more and more examples,

play03:50

and just keep changing prompts,

play03:52

keep changing what you're doing, right?

play03:53

A lot of people, and myself included,

play03:55

get into this habit because we have that habit from coding

play03:58

and sort of building things of doing it once

play04:00

and it sort of works.

play04:02

And then forever, you're iterating on that particular prompt.

play04:05

You make very small changes,

play04:07

you keep fixing things and you keep fixing things.

play04:09

But I think really where you should be spending

play04:11

most of your time really is just changing

play04:13

and finding new approaches to solve things

play04:15

because that makes a huge, huge difference.

play04:16

And that's unheard of in code, right?

play04:18

You wouldn't write something with the intention

play04:20

of rewriting it seven times before you got to the end, right?

play04:23

You'd write something, something be broken,

play04:25

you'd fix that broken thing,

play04:27

you'd fix the next broken thing,

play04:28

and then you'd just, you'd be done.

play04:30

So the most important thing is just chat.

play04:32

And I'm still surprised,

play04:34

and I still talk to the people on my team, me included,

play04:37

have this problem where we just don't play,

play04:39

we don't chat enough, really, right?

play04:42

Once we get to a system that sort of gets to like 40%,

play04:45

we're like, okay, we're now going to production.

play04:47

We're just, this is almost good.

play04:49

And so that's a big problem, right?

play04:51

The next one's just, you know,

play04:52

take whatever you've learned, go to the playground.

play04:54

There's a lot of tools and some of them are really good,

play04:57

but in most cases, 90% of what you want

play04:59

is still just in the playground.

play05:00

Everyone's got a playground.

play05:02

All you're really looking for is the ability

play05:03

to retroactively edit prompts and conversation histories.

play05:06

You know, some people call it surfing the latent space

play05:08

and sort of make changes, right?

play05:10

So this is where you'd spend maybe 20% of your time.

play05:13

Once you've got that working, right,

play05:15

let's say you've got one of it working,

play05:16

the next in most cases,

play05:18

and this is just examples from like Lamentis, is loop, right?

play05:22

Add more data, add more test cases, a lot more.

play05:25

See how solid your hypotheses were when you started, right?

play05:29

And always reset if it doesn't work.

play05:33

Once you're done with that, right, nest, right?

play05:37

Once you're done with that,

play05:38

you've got a general sense of the approach you want to take.

play05:41

99.999% of the time,

play05:43

and I almost dared a few people to do it,

play05:45

and I so far haven't seen a single prompt

play05:47

or a single approach where it couldn't be nested.

play05:50

And by that, I mean effectively break the prompt,

play05:53

break the work you're doing

play05:54

into smaller and smaller and smaller subsegments.

play05:56

We'll go into it later, right?

play05:58

But if you're not gonna accept

play05:59

like a 700-line code file as good,

play06:03

you shouldn't accept a 100-line prompt as good, right?

play06:06

Or a 50-line prompt as good.

play06:08

It could always be made simpler.

play06:09

It could always be broken down.

play06:11

So really, you just wanna keep doing that, right?

play06:13

And once you've gotten this far,

play06:15

and luck would have it,

play06:16

if you go to production and you've got users,

play06:18

you've got subtasks now,

play06:19

and they can go through the exact same loop, right?

play06:22

You run into a problem,

play06:23

you've got a new customer with new kind of data,

play06:25

new problems, new things,

play06:26

you wanna go back to the original loop.

play06:30

So this is kind of where you wanna be spending

play06:32

or where I found the best division of your time being, right?

play06:36

This entire blue segment is just try new approaches, right?

play06:40

Because these models, they've been around for about a year,

play06:43

but they are so new,

play06:44

and we are still finding new ways to use them,

play06:46

that you might try something,

play06:48

you might be the first person in the world to have tried it.

play06:50

You might genuinely be the first person

play06:51

to have thought of that particular way

play06:53

of solving a problem with a model, right?

play06:55

So I really can't emphasize that enough.

play06:58

And you probably wanna spend about 20% of your time

play06:59

tuning the prompts.

play07:01

Almost everything usually is a prompt issue,

play07:03

because I'm presuming that the people you work with

play07:05

and the things you're building,

play07:06

that you guys are good at coding.

play07:07

If you're not, there's tons of ways to get better at it.

play07:10

Computers are really good at coding.

play07:11

So in most cases, it's your prompt, right?

play07:13

If it's not your prompt, it's your input.

play07:16

It's the data you're providing.

play07:17

Ideally, see if you can change the size

play07:19

and shape of that data.

play07:20

And in most cases, that fixes your problem.

play07:25

So a couple of do's and don'ts.

play07:28

So the first one, and I still have this problem,

play07:32

although it's wonderful to know that someone in this room

play07:34

has solved my biggest problem,

play07:35

which is diarization, which is awesome.

play07:39

But really, just use all modalities, right?

play07:42

I think a lot of people kind of forgot

play07:44

that when we got to HTTP and in short order,

play07:47

we also got audio, we got vision, right?

play07:50

And we got speech to text

play07:51

and all of these different modalities.

play07:53

And even just the input modality of text,

play07:55

you can transform it into so many things, right?

play07:58

You can take text and transform that into code

play08:00

to get a more structured representation.

play08:02

You can get structured data.

play08:03

You can do language transformations.

play08:04

So use all of the tools that you've got, right?

play08:08

So let's, yeah.

play08:10

So speech, for example, here's where you'd use each one.

play08:13

Speech is verbose.

play08:15

If you've got anything dealing with users,

play08:16

we love to talk, right?

play08:18

This entire talk is probably gonna be,

play08:20

I don't know, I'm hoping not, like about 8,000 words, right?

play08:23

If you ask me to type out 8,000 words,

play08:25

it would take me far longer.

play08:27

I would be far less likely to do it.

play08:28

And I'd probably tell you no, right?

play08:30

If you present the users with a text box,

play08:32

they'll give you five words.

play08:33

If you ask them to just press a button and talk,

play08:35

they'll give you 200 words, right?

play08:37

And these models, the things that we work with,

play08:39

they love context.

play08:40

The more context you can provide, the better.

play08:44

Vision is insanely useful, right?

play08:47

There's a lot of relationships that you can capture

play08:50

with a picture that you can't with text.

play08:52

Like we know this, right?

play08:54

It's a thousand words.

play08:55

Anytime you write as a person,

play08:56

you wanna put pictures in for the same reason, right?

play08:59

So all of that can be captured.

play09:00

And now we're getting smarter and smarter and smarter models

play09:03

that can understand that information.

play09:04

You can use it as really expensive OCR if you want to.

play09:07

We do in some places.

play09:09

But in a lot of cases, it's also far more dense, right?

play09:12

Even if the diagram on the top right, top left,

play09:16

that's an actual diagram that we use, by the way,

play09:17

were to be represented in text,

play09:19

that would be far more token heavy than that picture,

play09:22

right, can encode.

play09:26

Code is awesome for structure, both for input and output.

play09:30

Like you almost always wanna be using structure

play09:32

both on the input and the output, right?

play09:34

Use structured output whenever possible.

play09:37

Structure your input whenever possible, right?

play09:39

Almost everything humans ever touch

play09:41

usually has some structure, right?

play09:43

Like when I talk, my talk has a structure.

play09:45

When you write a paragraph, there's a topic sentence.

play09:47

Everything humans ever do usually has structure.

play09:50

And if you're leaving it out, if you're not extracting it,

play09:52

it's a lot harder to control.

play09:56

Yeah, this is just stuff that we use, right?

play09:58

So we use TypeScript and Zod to build type specs,

play10:01

and that makes it so much easier to steer these models.

play10:03

We use SQL when we wanna express something as a search query.

play10:06

Even if we never run that SQL, it helps the model think,

play10:09

it helps the AI system sort of better guide these things.

play10:13

Yeah, same thing here.

play10:15

Use structured output as often as you can,

play10:17

far easier to guide.

play10:18

It's also far less prone to hallucinations

play10:20

because you've got a type spec on the inside,

play10:22

and structured output usually constrains the output

play10:24

that's coming out of it, that you see far fewer generation.

play10:27

Sorry, far fewer hallucinations with structured output.

play10:31

And I can talk about that more if we have time at the end,

play10:34

but it usually has to do with token probabilities

play10:35

and the output set.

play10:38

The same thing is, again, use as much as you can

play10:42

because you got this massive model for free, kind of.

play10:45

Commoditized down, and you got this massive model

play10:48

that had two trillion, three trillion tokens thrown in

play10:50

about human information into it.

play10:53

Use that as much as you can, lean into it.

play10:56

There's a lot of libraries, let's say projects

play10:59

that I've either consulted or advised with,

play11:02

where they're inventing their own DSLs,

play11:03

they're inventing their own languages

play11:05

to express what they want.

play11:06

When ideally, if they expressed it as a superset

play11:09

of something that existed, say TypeScript, Python,

play11:12

English, Hindi, whatever's in there,

play11:15

you'd get a lot more benefit out of that.

play11:21

Cool, so this is a bunch of don'ts.

play11:23

None of these are hard rules,

play11:25

but they're general rules of thumb,

play11:27

especially when you start out.

play11:29

In AI, I mean, this is a meme at this point,

play11:32

but we are still very, very early.

play11:34

This is not very early days of development

play11:37

or very early days of design.

play11:39

If you wanted to get into design

play11:41

and you wanted to be a good painter or a good designer,

play11:43

you wouldn't use DALI.

play11:46

You wouldn't add an abstraction between you and the thing.

play11:48

You would learn how to paint

play11:49

because you want that knowledge.

play11:51

You actually want that harder knowledge

play11:53

of how these things work, how they behave,

play11:56

what the actual nature of these things are.

play11:58

The more abstractions and toolkits and libraries

play12:00

you put between yourself and the model

play12:02

when you're developing, the less you learn.

play12:04

Some of them, honestly, are really good,

play12:07

but that's also a problem because they're really good

play12:10

and they have this little circle of things

play12:11

that they do really well.

play12:13

And very quickly, if you're lucky, somewhat slower,

play12:17

you'll wanna step out of it,

play12:18

and then it's just a wasteland.

play12:21

If you've ever built something with WordPress or Squarespace

play12:24

and then just wanted to do one thing that it didn't do,

play12:27

you know what I'm talking about.

play12:29

That's impossible.

play12:30

Everything will fight you.

play12:31

So ideally, don't add abstractions.

play12:33

I know it can be, especially people

play12:35

with a coding background,

play12:36

kind of sometimes I've seen want to distance themselves

play12:38

from prompting, distance themselves

play12:39

from that non-deterministic nature of these things.

play12:43

Bad instinct.

play12:45

Don't look away from it.

play12:47

The next one is also, I know we've got credits to OpenAI,

play12:51

but everyone wants to give you free money.

play12:52

Everyone wants to give you free credits these days

play12:54

if you're a provider.

play12:55

It's too much investor money in this space.

play12:58

Don't stick to one model.

play12:59

They're all very different.

play13:01

They were all kind of similar when they came out

play13:03

because everyone was working with the same information set,

play13:05

but things have diverged massively.

play13:07

They're all practically different people.

play13:09

It's almost like if you gave some work

play13:11

to someone on your team and they couldn't do it,

play13:14

you wouldn't go, oh, this is undoable.

play13:16

You'd probably give it to someone else.

play13:17

Same thing, work with different models.

play13:19

They're all very, very different, differently trained.

play13:23

There's even different personalities in there.

play13:26

This one is kind of easy to keep track of.

play13:30

Basically, have a general rule of thumb

play13:32

that your outputs are not gonna be that much bigger

play13:35

than your inputs in most cases.

play13:37

Again, rule of thumb.

play13:38

That's not gonna end up well.

play13:40

If you're looking to generate, let's say,

play13:42

20 paragraphs of an article from five words of input,

play13:45

you're usually just gonna get very generic,

play13:47

not so good input.

play13:48

Not so good output.

play13:50

So try and keep those ratios relatively the same

play13:52

if you can.

play13:56

Cool.

play13:57

Some smaller FAQs,

play14:00

because these questions get asked a lot.

play14:02

So agents, a lot of people have asked me about agents.

play14:05

The simple answer there is anything

play14:07

with looping and termination is usually considered an agent.

play14:10

So anytime you've got a system

play14:12

and it basically loops on the same prompt

play14:14

or some set of prompts,

play14:15

and it basically has the ability to continue execution

play14:17

and then decide when it wants to stop,

play14:19

that's usually an agent.

play14:23

This one is really helpful.

play14:25

When you run into problems

play14:26

or when you start working on a project,

play14:27

or you're just looking for a project to work on,

play14:29

it's useful to know what capabilities

play14:30

just got added to the tool set, right, within AI.

play14:33

These are four of the biggest ones.

play14:36

The first one is just plain NLP.

play14:38

If you've done NLP or anything close to it,

play14:40

it just got way better.

play14:41

We can classify documents,

play14:43

we can classify information all sorts of ways,

play14:45

we can label them,

play14:46

and we can do all sorts of things with them

play14:48

that previously NLP really couldn't do.

play14:50

The second one's filtering and extraction, right?

play14:52

So you can pull information out, right?

play14:56

And the next one's sort of transformation.

play14:58

So anytime you've got rags, summarization,

play14:59

that's a transformation, right?

play15:01

If you're doing code generation,

play15:02

a lot of cases that's transformation.

play15:04

If you're doing translation, that's transformation, right?

play15:07

So oftentimes it's useful to look at your problem, right,

play15:10

in an industry or your problems that are in front of you,

play15:11

or you're just looking for ideas.

play15:13

If you look for one of these four things,

play15:14

if you look for one of these four classes,

play15:16

it's an easier way to structure,

play15:17

maybe that's where you wanna go,

play15:19

instead of where to put things.

play15:21

The final one, and I think some people are using it,

play15:23

but I've seen that use case sort of go down for some reason,

play15:25

is just general purpose generation, right?

play15:27

You want it to write things no one's ever written before.

play15:30

You want it to make things up.

play15:34

So some resources.

play15:36

I'm not gonna be talking about prompting,

play15:37

not gonna be talking about rag.

play15:39

These are just some articles.

play15:40

These are my articles.

play15:42

If you don't like me,

play15:42

the top of it has people that I respect

play15:46

that are far smarter than me,

play15:47

so click the links and go there and read those.

play15:55

Cool.

play15:56

The next one, and this might be the final one,

play15:58

is debugging, right?

play16:00

I don't think I've heard that many people talk about it.

play16:02

I mean, among people who work with AI,

play16:04

this is a massive conversation, right?

play16:06

How do you debug?

play16:07

Because the sort of curse and sort of the benefit

play16:11

that we got with modern AI things

play16:13

is that it's very easy to build a demo.

play16:14

It's very easy to get to something that sort of works,

play16:16

but it's very hard to debug things when they go wrong.

play16:19

That's almost, again, new paradigm.

play16:23

So what is happening to you?

play16:25

If nothing works, always go down to the prompt level,

play16:29

and if you can't, then get rid of your abstractions

play16:31

and work up from there.

play16:33

Try a different model.

play16:34

Try going up a level of intelligence and see if it fixes it.

play16:37

That should tell you where your problems are.

play16:39

Or try going down a level of intelligence

play16:40

and see what happens.

play16:42

The next one is transform the input.

play16:44

In most cases, it's your input that's the issue.

play16:47

Either it's too verbose, it's not the right transformation,

play16:50

it's not structured the right way.

play16:51

So any transformations you can do on the input

play16:53

is gonna make a massive difference, right?

play16:56

And finally, if you're not doing this already,

play16:57

add more structure to the output, right?

play16:59

More structure is gonna help you point out

play17:00

where your problems are.

play17:02

More structure is gonna tell you,

play17:03

sort of expose some of the big issues there.

play17:07

Okay, so this is, you know,

play17:08

this doesn't usually happen to people.

play17:10

This usually does, right?

play17:12

Is it's kind of working, it's kind of working.

play17:16

And I can spend another, you know, like two weeks on it

play17:18

and it'll get a bit further down the line

play17:20

of kind of working, but it's not working necessarily, right?

play17:24

So again, I'm gonna go back to data.

play17:26

In most cases, you wanna find out

play17:28

what separates your offensive data,

play17:30

which is where it doesn't work,

play17:31

to the stuff that does work, right?

play17:33

Try all sorts of transformations.

play17:34

One of those is gonna point to some sort of difference

play17:37

between the stuff that works and the stuff that doesn't,

play17:39

right?

play17:40

If you do, that's a prompt, right?

play17:42

More validation is always gonna help.

play17:44

And then we saw the classification before, right?

play17:47

If you're trying to do more than one of those things

play17:49

inside the same system, inside the same,

play17:51

with the same model, usually separate it out, right?

play17:55

And it makes a huge difference.

play17:58

Finally, yeah, just classify your errors.

play18:01

Most errors I've seen sort of fall into these three issues.

play18:05

You've either got app level issues

play18:06

in terms of how that data is being fed in and fed out

play18:08

and how models are orchestrated once things get too large,

play18:12

or you get factuality issues, right?

play18:14

It's just making things up that don't exist

play18:16

or it's just giving you information

play18:18

that it really shouldn't

play18:19

or pulling out the wrong information.

play18:20

That's a factuality issue.

play18:22

The third one's just instruction following.

play18:24

Is it just not listening to the specific instructions

play18:26

that you're giving it, right?

play18:27

And this is at the model level,

play18:29

but it happens at the meta level as well.

play18:31

Even if you're working with, say, three models,

play18:32

and 300 prompts, all of these things still apply.

play18:35

Okay, so what do you do, right?

play18:41

The first one is whatever you're doing, right?

play18:45

Whatever you're doing as far as prompting

play18:46

and working with Models Go,

play18:48

you're almost always too verbose

play18:49

because in most cases it's English,

play18:51

and once we start adding things, they kind of work.

play18:54

So you get to this sort of Pareto level of, you know,

play18:57

it works, but it just doesn't.

play18:59

It's almost how humans behave.

play19:00

Cut them down.

play19:01

There's usually space to cut them down.

play19:03

Cut them again.

play19:04

The lower your task level, the more you can do it.

play19:05

The higher your task complexity per prompt,

play19:06

or per task, or per function, the better, right?

play19:10

The easier it is to debug,

play19:12

the easier it is for you to have, you know,

play19:13

things with defined blast radiuses,

play19:15

where if something goes wrong,

play19:16

you can swap it out and fix it.

play19:18

Otherwise, you know, something goes wrong someday,

play19:20

you're gonna have a problem.

play19:23

So, how much time have we got left?

play19:25

Okay, 10 minutes, perfect.

play19:27

So this is just an example of that particular project

play19:30

that I mentioned at the beginning, right?

play19:31

So it started with just a specific issue, you know?

play19:35

Like, you know, honestly, it wasn't even me.

play19:37

It was Hiby, who's actually here,

play19:38

who had a transcript for me, and she was like,

play19:40

okay, can we make docs out of this, right?

play19:41

Or I think it came partly from that.

play19:43

So there was a lot of talking.

play19:45

There was a lot of trying to figure out

play19:46

what we can pull out,

play19:47

what it understood out of the transcript.

play19:49

You're trying to look for understanding.

play19:51

You're trying to see if this can even be done.

play19:52

You're just testing very high level hypothesis, right?

play19:55

Some of the things I tested

play19:55

were sort of trying to pull out structure directly.

play19:58

Some of the other ones were trying to classify that data

play20:00

before pulling out structure.

play20:01

So you learn just a lot about what it is.

play20:03

You figure out where you want to put the transcript,

play20:05

whether chunking is a valid strategy,

play20:07

all of that you can learn from just talking, right?

play20:11

The next one is talk, but then start changing things, right?

play20:14

Now you start adding steps.

play20:15

Now you start adding structure.

play20:17

You start getting information out.

play20:19

And once you're done with that, you know,

play20:21

the entire thing, and this actually worked,

play20:23

was just this one script, right?

play20:25

Really, I mean, you don't have to read that.

play20:27

It's actually in the repo.

play20:28

It's just this one script, right?

play20:30

And really all it did was just loop, you know,

play20:34

twice over everything,

play20:35

and then break it down into sections

play20:37

and use different models to write different things, right?

play20:39

So there's one model, you know,

play20:41

that's generating the structure.

play20:42

There's another model that's actually doing

play20:43

the long form writing.

play20:46

And then the final one is just breaking it down

play20:48

into smaller and smaller and smaller functions.

play20:50

So if you look in the repo, still not that big, right?

play20:52

But there's a lot more state management.

play20:54

There's a lot more state management.

play20:55

There's a lot of self healing.

play20:56

There's a lot of correction.

play20:57

All of that stuff can go in after,

play20:59

like you've proven the thesis.

play21:02

Cool, actually I'm ahead of time.

play21:04

I didn't think I would be.

play21:05

So the final thing, and I will say this,

play21:08

is a lot of people I speak to

play21:10

are still very concerned about cost, right?

play21:12

I don't know how many of you guys watched

play21:13

the NVIDIA keynote that happened like a couple of days ago,

play21:17

but long story short, everything you're using now

play21:21

is gonna get at least 10x, if not 50x cheaper

play21:24

in very short order, right?

play21:26

It's gonna get 10x, if not 50x faster in very short order.

play21:31

So what would you build if you were building

play21:33

for say six months from now, or what would you make

play21:35

if you just presumed that today, right?

play21:38

And it's a different way of working with these things.

play21:40

You know, if something costs 10 bucks,

play21:42

that's a different system than if it costs one cent, right?

play21:45

If something takes an hour,

play21:47

that's different from if it takes six minutes, right?

play21:49

So I would say this is a valid presumption to make, right?

play21:53

When you're building something,

play21:54

is what more can you do if you just presume

play21:56

that about the future, immediate future, right?

play21:58

Because we still haven't even gotten

play22:00

hardware level optimizations,

play22:01

that's what NVIDIA is doing now.

play22:03

You know, that's a 10x.

play22:04

Memory level optimizations, again, still coming up.

play22:07

That's a 10x.

play22:08

Quantization, that's probably another 10x.

play22:10

So all of these things are almost being done now,

play22:13

and they're very comparatively easy engineering-wise.

play22:16

It's just incremental optimization to get there.

play22:19

Okay, that's everything.

play22:21

Feel free to find me after, or just reach out on Twitter.

play22:23

I'm happy to help.

play22:25

Do you have some questions?

play22:30

Oh, sure, yeah.

play22:30

They're gonna think about models, and embedding models.

play22:42

Okay, so long context is tough, right?

play22:45

Because I might say something

play22:46

where I don't know what I'm talking about.

play22:48

That said, this has been my question as well.

play22:50

The problem with context windows is

play22:52

our algorithm for attention is quadratic.

play22:55

What I mean by that, it scales exponentially.

play22:59

To get twice as much context out of something,

play23:01

you've gotta spend four times the amount of memory

play23:03

and compute.

play23:04

We still have that curse.

play23:06

There's no way to, we still don't know

play23:07

a good way to get around it, right?

play23:09

So what that means, effectively,

play23:12

is to get really long context windows, you have to cheat.

play23:15

You effectively have to say,

play23:16

okay, I'm gonna have something,

play23:17

before I run the model, that's gonna kind of figure out

play23:20

which part of the context to actually pay attention to.

play23:22

So you don't actually get the full context window, right?

play23:25

You kind of do, but if you take the full context window,

play23:27

and you're trying to use every single token in it

play23:29

to compute an answer, it's not gonna work.

play23:31

So that is still very much a problem.

play23:34

That could be solved, that's one of those open problems,

play23:35

I think it's an open problem,

play23:37

that could be solved tonight by someone

play23:38

that's working somewhere, or 10 years from now.

play23:40

We just don't know, right?

play23:44

You've mentioned a bunch about transforming the input.

play23:48

How do you go about doing that?

play23:50

Do you use AI to transform the user input?

play23:53

In most cases, yes, you're gonna be using AI

play23:55

to transform it, but there's also just a ton

play23:57

of structured stuff you can do, right?

play23:59

Very easily, like most documents.

play24:00

Let's say I've got a PDF, or I've got,

play24:02

let's say these slides, or I've got one of my documents

play24:05

is in Markdown, there's a ton of structure in there

play24:07

you can just grep for, right?

play24:09

Because I can very quickly figure out

play24:11

what the sections are, I can very easily

play24:12

separate by sentences, that's all stuff

play24:14

that you can do today, right?

play24:16

So even just knowing that that's got 300 sentences in it,

play24:19

that's a transformation of the input that is valuable.

play24:22

Super valuable, right?

play24:23

Because we can make assumptions already, right?

play24:26

If I give you a document that someone's written,

play24:27

I can presume that the title is probably

play24:30

the highest compressed information in there, right?

play24:33

That is a good enough thing.

play24:35

I can presume that the first section

play24:36

will have some sort of intro of what the thing is.

play24:39

Right, those are all transformations.

play24:40

But yes, usually you use AI.

play24:43

I had a question.

play24:44

So how do you think about leaving the new,

play24:48

the new feature?

play24:49

Devon?

play24:50

Yeah.

play24:51

I actually haven't used Devon.

play24:53

I just have not had the time,

play24:54

but I've had people tell me that it's good.

play24:56

Look, coding is gonna be where these models

play24:58

make just a massive, massive difference, right?

play25:00

I already use a cursor, which can understand

play25:04

just a massive amount of context and sort of work.

play25:07

It has been six months since I wrote any code

play25:10

that wasn't at least partially AI generated.

play25:13

So it's just gonna keep getting bigger and bigger and bigger.

play25:16

That said, I will say the time that most devs that I know

play25:20

and most companies that I know spend

play25:21

is in business logic, maintenance,

play25:24

and sort of really trying to transform customer input

play25:29

to really massive systems with a ton of legacy code.

play25:32

Like we're a long way away from that, right?

play25:34

What I mean is it's getting easier and easier

play25:37

for you to spin up a more and more

play25:39

and more complex project from scratch, right?

play25:42

But the massive dev work that sort of sits,

play25:45

kind of sits past that, right?

play25:48

That still hasn't been touched.

play25:53

The efforts to do something there,

play25:57

because that's where the money is in a lot of ways,

play25:58

because that's where most enterprises are, right?

play26:00

If you look at SAP or you look at most of these guys,

play26:03

have not so far borne active fruit.

play26:05

Like I know most of the companies in that space,

play26:08

they're still having trouble getting it to work

play26:10

with very large code bases, right?

play26:12

Like let's say anything above like a 50 person company

play26:15

that's existed for more than three years.

play26:17

That code base, so far, AI hasn't been able to touch, right?

play26:21

Yeah.

play26:23

Okay, I had a question.

play26:24

So I recently read a, not read,

play26:26

I wouldn't say read the paper, I read the abstract, right?

play26:29

So where it was like, I think from Amazon

play26:32

or from somewhere like, or Netflix perhaps,

play26:34

that getting cosine similarities between embeddings,

play26:38

it's not really a good measure

play26:40

for getting the meaning of things, right?

play26:43

And this is a preface for my question.

play26:46

And also when we do like vector searches

play26:49

and just try to pull relevant information,

play26:52

I don't know, it feels like it doesn't work.

play26:58

I'm trying to figure out like what am I doing wrong?

play27:00

How to do it better?

play27:01

I watched Jerry Hill's talk like on from our index,

play27:05

is it 18 minute talk or something?

play27:07

It's a very nice talk, but it just kind of flew over.

play27:10

So like, what's your recommendation?

play27:12

I think the problem here is embeddings

play27:17

are sort of fuzzy search on steroids.

play27:20

If you're using them for anything more,

play27:23

I think even today you have a problem, right?

play27:25

Because the couple of things,

play27:26

one, these are really tiny models comparatively, right?

play27:30

Big brain, small brain, tiny brain,

play27:32

these are really tiny models.

play27:33

In most cases, they don't have a good understanding

play27:35

of the underlying text.

play27:36

That's why long context embeddings never made sense, right?

play27:39

The longer the context, it just doesn't really make sense.

play27:42

Not to mention in most cases,

play27:44

that's a transformation of the input, right?

play27:45

What Hibbe was saying,

play27:46

that's a transformation of the input,

play27:48

is you're transforming it,

play27:49

but you're transforming it into a space

play27:51

where it's a lot harder for you to work with it, right?

play27:53

You're transforming it to a set of numbers.

play27:55

And now the only thing you have is cosine similarity.

play27:57

You can have a bias matrix,

play27:59

you can push that math a little bit more,

play28:01

but because that model is unknown to you,

play28:04

the model's workings are unknown to you,

play28:05

those are forever gonna be a bunch of numbers, right?

play28:10

In some insanely high dimensional space.

play28:12

So there's not a lot to do there, right?

play28:14

What is becoming very possible now

play28:16

that I see a lot of companies switching to

play28:18

is just use the whole brain.

play28:20

Use the LLM, right?

play28:23

Like whatever you're using embeddings for,

play28:25

you can use an LLM, right?

play28:27

It's just more expensive.

play28:30

Right, in most cases, you can use an LLM for that.

play28:32

Like let's say you're using,

play28:33

I'll give you the most brute force example of this.

play28:37

Let's say using embeddings to take 100,000 items

play28:39

and see which ones are similar

play28:41

or which ones closest to your query.

play28:42

You can take an LLM,

play28:44

run it through every single one of those documents

play28:45

and ask, hey, is this close?

play28:47

Is this close?

play28:48

Is this close?

play28:48

And you'll get an answer, right?

play28:49

That is not a good way to do it, do not do it this way.

play28:51

But you see what I mean, right?

play28:53

So they are kind of, you know,

play28:54

you can substitute one for the other just a little bit.

play28:57

I think embeddings have a place, right?

play28:58

But they should always be the last step in your pipeline.

play29:01

You should cut down the search space as much as possible

play29:04

with structured search, transformations,

play29:07

you say BM25, there's a bunch of stuff you can do, right?

play29:11

You should never be searching your search space

play29:13

with embeddings, right?

play29:14

You should always be searching some reduced search space

play29:16

where, you know, hey, last 20 things, you know?

play29:19

And I know these are relevant because keywords,

play29:21

I know these are relevant because location,

play29:23

I know these are relevant because an LLM told me

play29:25

after transformation, whatever.

play29:27

Now I can embed, that's fine, right?

play29:29

But if you embed at the beginning,

play29:31

like in most cases, it just doesn't work at scale.

play29:34

So it's more like to get the results and then sort it.

play29:37

Is that where embeddings come in?

play29:39

More like to get the results and yes, kind of to sort it,

play29:43

but kind of also to identify like useful parts

play29:45

of those results.

play29:46

Let's say the results you got were pages,

play29:47

but you want sentences, right?

play29:49

You wanna know which part of it is, you know,

play29:51

heat map wise, the most important.

play29:53

You can use embeddings for that, right?

play30:02

All right, thank you so much.

play30:03

Yeah.

play30:05

Thank you.

play30:06

Thank you.

Rate This

5.0 / 5 (0 votes)

Related Tags
Künstliche Intelligenzkommerzielle AnwendungenSchifffahrtsindustrieGreywingNLPRAGAI-ModelleDokumentengenerierungTechnik-EntwicklungAI-DebuggingZukunftsprognosen
Do you need a summary in English?