Code-First LLMOps from prototype to production with GenAI tools | BRK110

Microsoft Developer
23 May 202439:38

Summary

TLDRこのビデオスクリプトでは、Azure AIチームのCassieとDanがコードファーストのLM(大規模言語モデル)アプリケーション開発プロセスを紹介しています。開発者がAIを活用して顧客体験を向上させる方法と、モデルのパフォーマンス向上やデバッグの課題に対処するための新しいツールと機能について詳しく説明しています。Promptyという新しいツールやAI Studioの統合、GitHub Actionsを使った評価と自動化、そしてプロンプトフロートレーシングSDKを活用したローカル開発環境でのトレースとデバッグが紹介されています。さらに、CI/CD環境での評価と、App Insightsを使ったプロダクションでの監視と評価スコアの可視化もカバーされています。

Takeaways

  • 🌟 Azure AIチームのCassieとDanによる「コードファーストLMアプリケーションのプロトタイプからプロダクションへの移行」というトークを紹介しています。
  • 🛠️ 顧客からのフィードバックに基づいて、モデルを構築し、ローカルとプロダクションでのデバッグ、非決定性の扱い方など、LMアプリケーションの開発における課題に対処する新しいツールを作成しました。
  • 🔄 LMアプリケーションの開発プロセスには、アイデーション、開発、プロダクションという3つの反復ループがあります。それぞれの段階で異なる焦点を持つことが重要です。
  • 🚀 ASOSはAIを活用して顧客体験を向上させる例として挙げられています。彼らはAIパワードの経験を開発し、顧客との絆を築きながら好みを学習しています。
  • 🛠️ 新しいツールの紹介として、Azure Developer CLIのAI Studio統合、Prompty、VS Code拡張、GitHub Actionsなどが挙げられています。これらは開発者にとってAIをより使いやすいものに変えるものです。
  • 📝 Promptyは言語に依存しないプロンプトアセットで、VS Code上でプロンプトエンジニアリングを簡単に行えるよう設計されています。
  • 🔧 Promptyファイルを使用することで、開発者はプロトタイピングのループを開始し、AIがどのように応答するかを理解しやすくなります。
  • 🔄 開発者はPromptyを用いてコードを迅速に作成し、C#などの言語を使用してプロンプトを実行できます。
  • 📈 Azure Developer CLIを使用することで、AI Studioのテンプレートを利用してインフラストラクチャのコード化と自動化を行うことができます。
  • 🤖 クリエイティブエージェントの例では、AIがウェブ検索やベクターデータベース検索を活用し、記事を作成するプロセスを紹介しています。
  • 🔧 記事の途中で切れる問題が発生している場合、開発者はアプリケーションコードを掘り下げて、パラメータの調整を行うことができます。
  • 📊 評価指標として関連性、流暢性、一貫性、信頼性などが挙げられており、Prompt Flow Evaluators SDKを用いてアプリケーションの品質を評価できます。
  • 👀 トレーシングとデバッグは、AIモデルの動作を理解し、問題を迅速に特定し解決するのに非常に役立ちます。
  • 📊 App Insightsを使用した監視ダッシュボードを介して、プロダクションでのLMアプリケーションのパフォーマンスを監視できます。
  • 🔄 GitHub Actionsを用いたCI/CDにより、変更を自動的にテストし、アプリケーションの品質とパフォーマンスを継続的に向上させることができます。

Q & A

  • コードファースト言語モデル(LM)アプリケーションの開発にはどのような課題があるとカスタマーから報告されていますか?

    -モデルが良好に機能し、ローカルとプロダクションの両方でのデバッグ、非決定性の特性、そしてtrue/falseに慣れている開発者にとっての新たな課題があります。

  • 言語モデルアプリケーションの開発プロセスにはどのような3つの反復ループがありますか?

    -アイデアの創出、初期のProof of Conceptの構築、単一のユーザー入力での出力を確認する開発の初期段階、Proof of Conceptを超えたアプリケーションの開発、そしてアプリケーションを本番環境に移し、実際のユーザーからのフィードバックをもとに品質を向上させるプロダクションへの移行の3つのループがあります。

  • ASOSはどのようにAIを活用して顧客体験を向上させていますか?

    -ASOSはAIを活用して顧客との絆を築き、彼らの好みを学習するAIパワードの体験を開発しています。

  • Promptyツールとは何で、どのような利点がありますか?

    -Promptyは言語に依存しないプロンプトアセットで、VS Code内でプロンプトを反復的に操作し、言語モデルとやり取りするプロンプトエンジニアリングを簡単に行うことができます。

  • Azure Developer CLIにAI Studioの統合が追加されたことによる利点は何ですか?

    -AI Studioテンプレートを使用してインフラストラクチャをコードとして扱い、自動化を行うことができます。

  • GitHub Actionsはどのようにして評価と自動化に使われますか?

    -GitHub Actionsはプロンプトの評価と自動化に使用され、コードの変更やインフラの変更をテストし、最新バージョンのアプリケーションを常にテストすることができます。

  • AIツールが開発者に与える主な変化とは何ですか?

    -AIは開発者ツールとして開発者のツールベルトに加わり、研究やデータサイエンティストの手元から開発者全員の手に渡り、アプリケーション開発に活用されるようになりました。

  • プロンプトフロートレーシングとは何であり、どのように動作しますか?

    -プロンプトフロートレーシングは、コード内の特定の関数の入力と出力を記録し、アプリケーションの動作を理解しやすくするためのツールです。オープンテレメトリーに基づいて構築されており、Azure MonitorのトレースエクスポーターをインポートしてApplication Insightsに情報を送信できます。

  • アプリケーションのトレース情報をApplication Insightsに送信することの利点は何ですか?

    -トレース情報をApplication Insightsに送信することで、開発者はアプリケーションのパフォーマンスを監視し、問題をトラブルシューティングし、改善することができます。

  • プロンプトフローエバリュエーターSDKとは何であり、どのようにアプリケーションに統合されますか?

    -プロンプトフローエバリュエーターSDKは、生成された記事の結果に対して評価を行うツールです。アプリケーション内で評価を行うことで、記事の品質を自動的に評価し、フィードバックを得ることができます。

  • GitHub ActionsをCICD環境で使用することの利点は何ですか?

    -GitHub ActionsをCICD環境で使用することで、コードの変更がプッシュされるたびに自動的にインフラのプロビジョニングとアプリケーションのデプロイが行われ、常に最新バージョンのアプリケーションをテストできます。

  • Azure MonitorとAI Studioのどちらを使用してアプリケーションを監視するべきか、その理由は何ですか?

    -どちらのツールを使用するかは開発者の選択によるが、Azure Monitorは高度な監視機能を提供し、AI Studioは評価とトレーシングを組み込み、異なる実行を比較するためのツールを提供します。

  • プロンプトの変更がアプリケーションの評価スコアにどのように影響を与えるかを理解するために重要なポイントは何ですか?

    -プロンプトの変更が行われた後に、GitHub Actionsを通じてバッチ評価を行い、アプリケーションのパフォーマンスがどのように変化しているかを監視することが必要です。

  • トレースとデバッグがAIアプリケーション開発において重要な理由は何ですか?

    -トレースとデバッグは、言語モデルに何が送信され、何が返ってくるのかを理解し、アプリケーションが特定の方法で動作する理由を理解するのに非常に役立ちます。

  • プロンプトのパラメータを調整することで開発者が得られる主な利点は何ですか?

    -プロンプトのパラメータを調整することで、開発者はアプリケーションのパフォーマンスを最適化し、生成されたコンテンツの品質を向上させることができます。

  • プロンプトフローSDKを使用してカスタムエバリュエーターを作成することの利点は何ですか?

    -プロンプトフローSDKを使用してカスタムエバリュエーターを作成することで、開発者は特定のビジネスニーズに合わせて評価プロセスをカスタマイズできます。

  • プロンプトのパラメータを調整する際に主に注意すべきポイントは何ですか?

    -プロンプトのパラメータを調整する際には、言語モデルが生成するトークンの最大数と、生成された記事の長さを適切に設定することが重要です。

Outlines

00:00

😀 開発者向けAIツールの紹介

CassieとDanが主導するこのセッションでは、Azure AIチームが提供する新しいツールや機能について紹介しています。彼らは、言語モデル(LM)アプリケーションの開発が難しかったことと、モデルのパフォーマンス改善、デバッグ、非決定性の問題点に焦点を当てています。新しいツールは組織に開発プロセスを簡素化するのに役立つと期待されています。開発プロセスには、アイデアの創出、初期の概念実証、開発フェーズへの移行、そして本番環境への展開という3つの反復的なループがあります。各フェーズでの主なアクティビティが説明されています。

05:04

🛠️ Promptyツールを使ったプロンプト開発

Promptyは言語に依存しないプロンプトアセットで、VS Code内でプロンプトのイテレーションと開発を容易にします。YAMLとプロンプトテンプレートから構成されており、オーケストレーションフレームワークと統合されています。Promptyファイルを使用することで、開発者は迅速にプロトタイプを実行し、AIの応答を確認できます。また、PromptyのVS Code拡張機能を使用して、新しいPromptyファイルの作成や、既存のプロンプトの変更とテストが行えます。

10:05

🌐 Azure Developer CLIとAI Studioの統合

Azure Developer CLIにAI Studioの統合が加えられ、インフラストラクチャコードと自動化が可能になりました。これにより、AIスタジオテンプレートを使用して迅速にアプリケーションを展開できます。また、GitHub Actionsを用いた評価と自動化のデモンストレーションも行われています。

15:07

📝 Promptyを使ったコード開発の例

セッションではPromptyを使用した具体的なコード開発の例も紹介されています。Promptyファイルをエクスポートし、VS Codeでイテレーションを行う方法や、Promptyからコードへの変換方法が説明されています。C#を使用したSemantic Kernelでのプロンプトの実装例も紹介されています。

20:08

🔍 アプリケーションのデバッグとトレース

デバッグとトレースの重要性が強調されており、Prompt Flow Tracingを使用してアプリケーションの各ステップを追跡できます。トレースはオープンテレメトリーに基づいており、Application Insightsにデータを送信することができます。サンプリングレートの設定も可能です。

25:09

📈 プロンプトフローの評価と改善

Prompt Flow Evaluators SDKを使用して、生成された記事の品質を評価する方法が紹介されています。関連性、流暢性、一貫性、信頼性などの指標に基づいて記事を評価し、アプリケーションの品質を向上させるために使用されます。

30:09

🚀 CI/CDとGitHub Actionsを使った自動化

GitHub Actionsを用いたCI/CDの自動化について説明されています。コードの変更をプッシュすると、インフラストラクチャのプロビジョニングとアプリケーションの展開が自動的に行われます。また、評価アクションを通じて、モデルのパフォーマンスを評価し、品質を向上させることができます。

35:09

📊 監視ダッシュボードと運用の可視化

運用環境での監視と評価の重要性が強調されており、Application Insightsを使用して平均評価スコアや各モデルのトークン使用状況を監視できます。ダッシュボードから各モデルのパフォーマンスを確認し、コストとパフォーマンスの最適化が可能となっています。

🎉 新機能の概要と今後の展望

セッションの締めくくりとして、新機能の概要と今後の展望が示されています。Prompty、プロンプトフローのトレースとデバッグ、Jen AIの監視、AI StudioのAD統合など、開発者にとって有益なツールが提供されています。また、今後のセッションや情報源へのリンクも案内されています。

Mindmap

Keywords

💡Azure AI

Azure AIは、マイクロソフトが提供する人工知能プラットフォームです。ビデオでは、Azure AIチームが開発したツールを使って、言語モデル(LM)アプリケーションを構築し、プロトタイプから製品化するプロセスを紹介しています。

💡プロンプトエンジニアリング(Prompt Engineering)

プロンプトエンジニアリングは、言語モデルに提示する入力(プロンプト)を設計し、最適な出力を引き出す技術です。ビデオでは、Promptyツールを使ってプロンプトをローカル環境でイテレートし、AIの応答を評価するプロセスが説明されています。

💡プロトタイピング(Prototyping)

プロトタイピングは、アイデアの初期段階で言語モデルの能力を検証し、概念実証を行うプロセスです。ビデオでは、プロトタイピングを通じて言語モデルがアイデアを実現可能かどうかを判断する方法が紹介されています。

💡開発ツール(Development Tools)

開発ツールとは、ソフトウェア開発者がアプリケーションを構築するためのツールの総称です。ビデオでは、Azure Developer CLI、Prompty、VS Code拡張など、言語モデルアプリケーション開発に役立つツールが紹介されています。

💡AIスタジオ(AI Studio)

AIスタジオは、Azureが提供するAI開発環境です。ビデオでは、AIスタジオとAzure Developer CLIの統合が取り上げられており、インフラストラクチャをコードとして管理し、自動化することができると説明されています。

💡プロンプトフロー(Prompt Flow)

プロンプトフローは、言語モデルとインタラクションする際に使用される一連のプロンプトの流れを定義するフレームワークです。ビデオでは、プロンプトフローを使って言語モデルを効果的に活用し、アプリケーションの開発とデバッグを行う方法が紹介されています。

💡トレーシング(Tracing)

トレーシングは、ソフトウェア開発において、実行中のアプリケーションの動作を追跡し、問題を特定するための手法です。ビデオでは、言語モデルの呼び出しと応答をトレースし、問題を迅速に特定し、アプリケーションの品質を向上させる方法が説明されています。

💡CI/CD

CI/CDは、継続的インテグレーション(Continuous Integration)と継続的デリバリー(Continuous Delivery)の略称で、ソフトウェア開発における自動化されたビルドとデプロイメントのプロセスを指します。ビデオでは、GitHub Actionsを使ってCI/CDパイプラインを構築し、アプリケーションの品質とパフォーマンスを評価する例が紹介されています。

💡アプリケーションインサイト(Application Insights)

アプリケーションインサイトは、Azureが提供するアプリケーション パフォーマンス管理(APM)サービスです。ビデオでは、言語モデルアプリケーションの監視と評価にアプリケーションインサイトを使用し、リアルタイムのパフォーマンスデータを収集して分析する方法が説明されています。

💡プロンプトフローエバリュエータ(Prompt Flow Evaluators)

プロンプトフローエバリュエータは、言語モデルの出力を評価するためのツールです。ビデオでは、エバリュエータを使って記事の関連性、流暢性、一貫性、信頼性を自動的に評価し、アプリケーションの品質を向上させる方法が紹介されています。

Highlights

Introduction to the webinar on transitioning language model (LM) applications from prototype to production using Azure AI tools.

Challenges in building LM applications, such as model performance, debugging, and nondeterministic behavior.

The importance of new tooling to assist in the development and performance of LM applications.

The three iterative loops of building LM applications: ideation, development, and production.

Step-by-step process for developing LM applications, from identifying business use cases to deploying and monitoring in production.

Case study of ASOS using AI to improve shopping experiences and create an AI-powered customer preference learning system.

Introduction of new tooling like Azure Developer CLI with AI Studio integration, Prompty, and VS Code extensions for prompt engineering.

Explanation of Prompty as a language-agnostic prompt asset for iterative prompt engineering in VS Code.

Demonstration of using Prompty files for quick iteration and prototyping of LM applications.

Showcasing the Azure Developer CLI for streamlined deployment of applications to Azure with AI Studio support.

Discussion on the integration of AI into traditional development practices, making AI a common tool for developers.

Introduction of the creative agent solution using function calling and AI search for information retrieval.

Deployment of a creative agent application using a single command with the Azure Developer CLI.

Use of GitHub Code Spaces for setting up a development environment with all dependencies installed.

Demonstration of the creative writing team application showcasing the interaction between different AI agents.

Analysis of an issue with article cutoff and exploration of potential solutions by adjusting LM parameters.

Integration of prompt flow tracing for better observability and debugging of LM applications.

Introduction of the Prompt Flow Evaluators SDK for evaluating the quality of LM outputs with scores.

Customization of evaluation metrics and prompts to tailor to specific company needs.

Use of GitHub Actions for CI/CD to automate testing and deployment of LM applications.

Monitoring and evaluation of LM applications in production using Azure Application Insights.

Closing remarks highlighting the transformative impact of the new tools and methodologies on generative AI application development.

Transcripts

play00:13

Oh, they're already up. OK, Good.

play00:16

Good, good, good.

play00:17

So is it just one that ohh, there we go.

play00:20

Alright, welcome everyone, welcome to code first LM UPS from

play00:24

prototype to production with Jenny I tools. I'm Cassie Preview

play00:28

and I'm a Senior Technical Program Manager on the Azure

play00:31

AI team.

play00:32

And hi everyone, I'm Dan Taylor. I'm a Principal Product

play00:35

Architect working on our code first experiences in the Azure

play00:37

AI platform.

play00:38

So we have a lot of cool stuff to show

play00:41

you today. And we've been hearing from our customers that

play00:44

LM applications are difficult to build. There's a lot of

play00:47

new challenges

play00:48

that we're faced with, like how to get models to

play00:50

perform well debugging both locally and in production, and then

play00:54

also the fact that they're nondeterministic when we're used to

play00:57

true and false. And there's there's different ways that we

play01:00

can mitigate and work with models in order to get

play01:02

them to perform well.

play01:04

So we decided that we needed to make new and

play01:07

updated tooling to help make it real for organizations.

play01:12

So if you think about the process of building LM

play01:14

applications, it works a little bit differently. And we talk

play01:17

about these three different iterative loops. There's a lot more

play01:20

experimentation that's happening as you're developing the applications. First, you

play01:24

might start ideating, building an initial proof of concept. You

play01:28

just start with some LM calls with some hard coded

play01:30

information and some prompts just to see what the LM

play01:33

can do, to see if your ideas are possible, feasible.

play01:35

And then you might just start with a single user

play01:38

input just to see if you can get the output

play01:40

you're looking for. And then when you move

play01:42

want to building up the application beyond the proof of

play01:45

concept into the development phase, you'll start making sure that

play01:48

it works well across a wider range of inputs, that

play01:50

the quality is good, that it's ready for users to

play01:52

test it. And then as you move the application in

play01:55

production, that's when you put it out in the real

play01:57

world. And you want to use the inputs from the

play01:59

real world to understand how your application is performing and

play02:02

bring that data back into your development environment to keep

play02:05

improving that quality. So that's sort of like that life

play02:08

cycle. And we'll be walking through a little bit of

play02:10

these three phases today in our talk.

play02:12

And I love these high level slides, but I want

play02:14

to know what are the steps that I actually take

play02:16

when I'm doing this? So if we kind of dig

play02:18

into that, we can kind of take a look at

play02:20

the step by step process. So we identify a business

play02:23

use case or a task, Then we go and play

play02:25

around with different models, see which ones going to solve

play02:28

our problem best. We'll start testing it, iterating and evaluating

play02:31

until we think we have something that is good enough

play02:33

for production. Then we'll go ahead and deploy that and

play02:36

monitor that in production.

play02:40

So a great example of a customer that's creating amazing

play02:43

experiences with AI is ASOS. So they've been improving shoppers

play02:46

experiences with innovative AI. They've developed an AI powered experience

play02:50

that befriends customers to learn their preferences.

play02:54

So conceptually, this all makes sense, the high level, the

play02:56

phases that we go through. But as a developer, I

play02:59

really want to know like, what are the tools actually

play03:01

look like? What does the code look like? Could you

play03:03

could you help us dive in, Cassie?

play03:07

There we go. And this is where a lot of

play03:09

us are at in this journey is where do I

play03:12

start? How do I actually build these things? And that's

play03:15

where the tooling that we've been working on comes in.

play03:19

So in another interesting thing about the industry that's happening

play03:23

is that traditional development in AI development and, and AI

play03:27

personas are really kind of coming together into kind of

play03:30

a single thing. And really what happens is that AI

play03:33

just becomes another tool in your developer belt. It's not

play03:37

no longer in the research and in the data scientist

play03:40

hands. It's now in all of our hands to start

play03:43

building. And the tooling should really represent that change. And

play03:46

so we're going to be introducing new tooling and updated

play03:50

tooling to support this journey. One of the existing tools

play03:53

is the Azure Developer CLI, but we have added AI

play03:56

Studio integration into it. So you can start using AI

play03:59

Studio templates to do infrastructure as code and automation. We

play04:02

have Prompty, which is a brand new tool which you

play04:05

may have seen in the keynote. We're going to show

play04:08

you some more of it today

play04:10

and we'll talk about that. We have VX Code extension

play04:13

for Prompted that allows us to iterate really nicely in

play04:16

VS Code to start working on our prompts. And then

play04:18

we all know and love GitHub Actions, and we're gonna

play04:21

show how we can start using those for evaluation and

play04:24

automation as well.

play04:27

So introducing Prompty, in my opinion, the cutest new product

play04:31

that we've been talking about it build right. Yeah. So

play04:35

Promptly is a language agnostic prompt asset. So it's to

play04:39

help you start iterating and playing with prompts and kind

play04:43

of doing that prompt engineering piece right in VS Code

play04:47

and make it easier to get started.

play04:51

If we take a look at what promptly looks like,

play04:53

if you look on the right, this is a prompty

play04:56

file. The top part is the YAML, which gives instructions

play04:59

to the orchestration framework. It's integrated with prompt flow, semantic

play05:03

kernel, and lane chain. And so they're going to take

play05:06

that,

play05:07

create the necessary parameters, which you can also override in

play05:10

the frameworks. And then the bottom part is you're actually

play05:13

prop template, which is gonna grab the variables that you

play05:16

send in and send that to your LM.

play05:18

So how this works is it's actually makes it super

play05:20

easy to start because you can run this individually in

play05:22

VS Code, but then you also can start building on

play05:24

it with the different orchestration frameworks.

play05:27

So this I understand, it's a single file that I

play05:30

can just check into my code and I can start

play05:32

iterating on it and start that prototyping loop where I

play05:35

can just start running and executing the prompt template and

play05:38

get a sense for how the AI responds without needing

play05:40

to, you know, install a bunch of stuff and and

play05:43

figure out how to call all these different SDK's and

play05:46

things like.

play05:46

That exactly. So let's show how we can get started

play05:49

with Prompty.

play05:53

OK, so there's actually two different ways we can get

play05:55

started. We can start in the chat playground and AI

play05:58

studio, and we can play with different model deployments and

play06:01

we can start iterating here. And this is sometimes where

play06:03

a lot of people kind of fall off a Cliff

play06:05

of development. You they're like, OK, I kind of have

play06:07

this model working. I have this prompt that I've been

play06:10

working with in my system message, but how do I

play06:12

go to code? So now you can simply click export

play06:14

and go to a prompty file. That will open up

play06:16

the file and you can start iterating in VS Code.

play06:19

Another way that you can start and so I've already

play06:21

installed my VS Code extension. Here is I can right

play06:24

click and create new prompty. This is going to create

play06:27

a default promptly to help me get started. As you

play06:30

can see, it has the YAML. At the top I

play06:32

need to add my endpoint and deployment, and at the

play06:34

bottom we have the template.

play06:37

So I'm going to go ahead and paste in my

play06:39

endpoint and deployment and then I can just go ahead

play06:42

and hit play. So now I'm running a promptly file.

play06:45

It's going to use my deployed endpoint and it's going

play06:48

to give me a result from my LM.

play06:51

So there you go.

play06:52

Now let's say I want to start updating this. Here

play06:55

is the sample inputs that I'm using.

play06:57

I'm going to change the name to Cassie, and I'm

play07:00

going to tell it that I'm going to change the

play07:02

question to

play07:03

give me more information about the features.

play07:08

Now I can go ahead and hit play again, and

play07:10

now I'm doing iterative prompt engineering in a local playground.

play07:14

Brighton VS Code.

play07:17

There you go. We were able to see that the

play07:19

updates happened.

play07:21

Now how do I go to the next step? How

play07:23

do I start creating code? I can right click on

play07:25

the prompty and you can see we have a prompt

play07:28

flow, semantic kernel and lane chain. So I can right

play07:31

click create plump full code and you can see it's

play07:34

a very small amount of code to actually start using

play07:36

this prompty file.

play07:38

I can right click and use C# as I love

play07:41

C# and look at how simple this is in semantic

play07:44

kernel. This is all I need for my prompt to

play07:46

start executing and semantic kernel.

play07:51

And then of course we have link chain. So this

play07:54

is a really quick way to start getting creating your

play07:58

prompts, adding them, building out your application, and really get

play08:02

started with building LMS.

play08:13

So a tool that I mentioned earlier too is the

play08:15

Azure developer Celli. And if you're not familiar with that,

play08:19

it's a really amazing tool that really streamlines deploying applications

play08:23

to Azure. So we have new AI Studio support, as

play08:25

I mentioned, but we also have been working on samples.

play08:28

So if you go to this link, you'll find a

play08:30

list of samples that are using Prompty and AD that

play08:33

you can deploy with that single command

play08:35

of a CD up.

play08:37

It has, um, popular AI solutions. So we have things

play08:39

like summarization Reg, and then we also have the one

play08:42

that we're going to be looking at today, which is

play08:45

the creative agent. So you may have heard a little

play08:47

bit about agentic programming or multi agent programming. We're going

play08:51

to be looking at a solution that we're going to

play08:54

do that we're creating with prompting and prompt flow.

play08:57

So the way that they, a researcher works is it's

play09:00

using function calling to use the Bing API to go

play09:02

get the results from the web. And then it's going

play09:04

to use AI search in a vector database in order

play09:06

to get the appropriate information. And then it's going to

play09:09

give it to the writer agent. The writer is going

play09:11

to take all that information. It's going to create an

play09:14

article that we're going to return to the user so

play09:16

that they can start marketing their their products. And then

play09:19

lastly, the editor takes a look and says yes, this

play09:21

is good or no, it's not. Try again.

play09:24

So this is pretty cool, Cassie, but I notice there's

play09:27

a lot of infrastructure here I need to set up.

play09:29

You know, an AK S cluster, managed identity, front end,

play09:33

back end, AI search, Azure opening, I service. This would

play09:36

take me like a week to do if I was

play09:38

clicking around in the Azure portal and wiring everything up

play09:42

for myself.

play09:42

Right. So actually to spend the rest of this session

play09:45

just creating those resources. So you gonna go check out

play09:47

the

play09:48

Just kidding.

play09:50

We're here for a week, everyone. Yeah.

play09:53

OK, let's see how we can make this better. Let's

play09:55

go back to how we can build this.

play10:00

OK, so as I was talking about, we had the

play10:02

collections, and here you can see this is the one

play10:05

that we're looking at. You can go use these right

play10:07

now. This assistance API one was in a couple sessions

play10:10

already. You may have seen the Contoso chat one, which

play10:13

is our example. We have some examples with link chain

play10:16

and then also some process automation. And of course we

play10:19

have C# and Python

play10:22

so the next thing we want to do is take

play10:25

a look at the templates so we can do

play10:29

AD template list. And I just want to know what

play10:32

are all the templates in the Z gallery that include

play10:35

AI?

play10:36

So now I can take a look at the different

play10:38

ones that are available because there's obviously more than just

play10:41

the ones I just showed you. And here's the Creative

play10:43

Agent 1. So that's the one that we want to

play10:45

deploy. I've already downloaded the source here. And I'm just

play10:48

going to go ahead and hit a ZD up,

play10:55

select my region

play10:58

and now it's going to go ahead and package and

play11:00

deploy all those application are all those resources for me.

play11:03

But this takes a little while so.

play11:05

So I'm gonna, I've actually already deployed this application ran

play11:08

easy up earlier so I could have my own development

play11:11

environment to work in. Cassie's got her own copy of

play11:14

the development environment. And let's go ahead and dive into

play11:17

the application and see what it looks like.

play11:22

So one really cool thing about these templates is they

play11:25

also come with a dev container definition, which allows you

play11:28

to run them in GitHub Code Spaces. So if you

play11:30

don't have a development environment set up, you can just

play11:32

open it and GitHub Code Spaces, all your development dependencies

play11:36

are installed and ready to go. And you can just

play11:38

start running add commands right away. So I've run ASD

play11:41

up here and it's deployed that application infrastructure. And before

play11:44

I take a look at the application, I just want

play11:46

to call out a really cool thing that I love

play11:48

about code spaces. Sometimes it comes up with

play11:51

really cool names. So this code space is my crispy

play11:53

space yodel. And so whenever I find it cool code

play11:56

space name, I just sort of hang on to it

play11:58

and keep using that one because I come very attached

play12:01

to them. So

play12:04

let's take a look at what got deployed to Azure

play12:06

by running an ASD up command. You can see that

play12:09

we've got this resource group with application insights AKS, our

play12:13

Azure opening I service or Azure AI search. We've got

play12:16

a Bing search result and all this stuff is all

play12:18

wired up together and it works out-of-the-box. So I can

play12:21

go to this deployed application. We can see this is

play12:24

our creative writing team here where we can give it

play12:27

some context and some instructions for the article that we

play12:30

wanted to write. And when you run the application, you

play12:33

can see how the different agents

play12:35

pass the information between them. We can, we've got the

play12:38

researcher that runs and gets research data that passes to

play12:42

the writer. Then the writer outputs to the editor and

play12:45

the editor decides whether to accept the article or not.

play12:48

And then finally we can see the output article. So

play12:50

we've got this article here written about camping because everyone

play12:54

knows that we love camping here at Microsoft in Seattle.

play12:57

And the article overall looks pretty cool. It's selling some

play13:00

of our our products a little bit, but one thing

play13:03

we notice is that the article is actually cut off

play13:06

at the end.

play13:06

Yeah. And this is happening frequently. Yeah, it's been happening

play13:09

there. It's constantly just stopping mid sentence, which is not

play13:12

ideal.

play13:12

Yeah. So maybe we can take a look at how

play13:14

do we fix that. And so I'm gonna start to

play13:16

dive into the application code so you can see how

play13:18

this application actually works.

play13:22

Alright, so back to my crispy space yodel.

play13:26

So I'm gonna run the orchestrator that that's behind the

play13:30

web application that you saw. So I'll just call the

play13:33

API agents orchestrator module and that will run the orchestrator

play13:38

with some sample inputs and it will print out to

play13:41

the console as I'm running it. So while it's running,

play13:44

I'll take a look at just some of the code

play13:47

for you. So this is the right article function. And

play13:51

you can see that the IT calls the different agents

play13:54

here. So it calls the research agent, gets the

play13:57

product agent and it and it sort of gets the

play14:00

intermediate results and passes those between the agents. One cool

play14:04

thing about this is each agent actually uses a prompty

play14:07

file for its implementation. So we've taken that prompt flow

play14:11

code and we've structured our code base around it and

play14:14

built this application up.

play14:17

And another thing is that, you know, as a developer,

play14:20

I'm used to just printing stuff out to the console

play14:22

and using the console outputs, but it's kind of hard

play14:25

to read. So we actually got prompt flow tracing integrated

play14:28

into this. So how does that work? Well, one thing

play14:30

I'm doing here is I've decorated different methods in my

play14:33

code with this at trace method. And So what that

play14:35

does is we'll capture the inputs and outputs of specific

play14:38

functions that I'm calling that I want to see in

play14:40

my tracing tool, which I'll show you in a minute.

play14:44

And how do we turn on that tracing? Well, we

play14:46

simply import the prompt flow tracing SDK and we can

play14:49

just call start Trace and our local development environment to

play14:52

get a local tracing tool, which I'll show you in

play14:55

a moment. The other really cool thing is that prompt

play14:58

flow tracing is built on top of open telemetry. And

play15:01

so that means that you can import the Azure Monitor

play15:04

trace exporter and send the all the tracing information off

play15:07

to application Insights. And another thing that we've set up

play15:10

in this application is you can configure your sampling rate

play15:13

for

play15:14

about how much of the trace data you want to

play15:16

send. Do you want to send 100% of the request

play15:19

to Application Insights? Or you can lower that down depending

play15:22

on you know what makes sense for your production application,

play15:25

how much data you want to store and send off.

play15:28

So that's really not a lot of code at all

play15:31

to start adding really robust tracing into our solution.

play15:34

Yeah, yeah, this is 35 lines. And I even like,

play15:36

you know, check some environment variables along the way. So

play15:39

another really cool thing I wanted to mention, that's part

play15:41

of prompt flow. We've got this opening eye injector, so

play15:43

this will trace the opening,

play15:45

the calls so we can get the prompts and responses.

play15:48

Awesome. So let's go take a look at what that

play15:51

tracing looks like. It looks like our orchestrator has finished

play15:55

running. And actually if you look at the very top

play15:58

of the output, it put a little URL that we

play16:01

can click and run locally to view the tracing tool.

play16:06

And interestingly,

play16:08

the VS Code output decided not to let me click

play16:10

that, but that's OK. We'll switch over to the tracing

play16:13

view that we have open here in the browser. So

play16:15

this this tracing view captures all that data that's running

play16:18

locally. It makes it really easy to understand what's happening

play16:21

in the application. You can see the different parts of

play16:24

the agents that have instrumented to see the inputs and

play16:27

the outputs. And so I want to walk through kind

play16:29

of each stage, each step of this application, so you

play16:32

can get a little bit more of an understanding of

play16:34

how does this application actually work.

play16:37

So the first step here is that we're going to

play16:40

get the research. So the research takes those inputs from

play16:43

the user from that web page. So in this case,

play16:46

it asks can you find the latest camping trends and

play16:48

what folks are doing in the winter? And it's got

play16:51

some instructions for

play16:54

you know what to what to look up. And we

play16:56

noticed that it returns some Bing web search results, right?

play17:00

So how do we go from these inputs to Bing

play17:02

web search results? Well, if we go through the prompted

play17:06

execution down to the actual LLM call that was made,

play17:09

the tracing tool shows us the actual prompt and response

play17:12

that was sent to the LM.

play17:14

So we can see here that it says you're an

play17:17

expert researcher that helps put information together for a writer

play17:21

who's putting together an article. And you can see that,

play17:24

you know, it provides some instructions to the researcher. But

play17:28

then what happens at the end is that the researcher

play17:31

just outputs, hey, call this find information function with this

play17:34

query of good places to visit. So this is actually

play17:37

a tool that we've given the researcher. One of many

play17:40

tools that we've given the researcher that it can choose

play17:44

to execute in the LM is figuring out which tool

play17:46

to call

play17:47

and what arguments to pass it. A really cool thing

play17:50

here is that the department flow tracing shows us all

play17:53

the tools that were given to the LLM call. So

play17:55

that find information call is the one that was that

play17:58

it chose to use. And you can see here we

play18:01

give it a description of what the function is and

play18:04

what the different parameters are that it can call. We

play18:07

can see that it could have also chosen to use

play18:09

the find entities call, which uses the Bing Entities API

play18:13

to look up information about different entities. And it could

play18:16

also find news, so it could look up news

play18:19

about a particular topic. So this is all in those

play18:21

instructions. It depends on what the user asks for. If

play18:24

I said could you show me the latest news about

play18:26

camping, it would have decided to call this fine news

play18:28

article.

play18:31

So that's using tools to retrieve information by calling functions

play18:36

in our code. The next thing that we do is

play18:39

we actually do a vector search over an Azure AI

play18:42

search index. So we can see this is the get

play18:45

products function that's called in our code after the get

play18:49

research function is called. And we can see here that

play18:53

it takes the users input and

play18:57

it generates, it uses an Ada two model to generate

play19:00

a vector representation of that user's question. So here's that

play19:05

vector representation down here. And then we can see we

play19:09

run the search against an AI search index and we

play19:12

get these articles back, which are all articles that are

play19:16

relevant to camping, which is product information that's relevant to

play19:21

camping from our product database.

play19:24

So it's important to point out like we are using

play19:26

Bing and we're using AI search as our information retrieval.

play19:29

But you can do information retrieval for any type of

play19:32

data source. It doesn't have to be a vector database.

play19:35

Like maybe you're just going to pull information from SQL

play19:37

or maybe you have a different endpoint that you have

play19:40

information that you want to pull from. So ours is

play19:42

specific to our use case and you can use this

play19:45

this style application a lot of different ways, but this

play19:47

is not the only way you can get information into

play19:50

your LM.

play19:50

Exactly. And so you may need to just call

play19:54

like an API in your application to get information about

play19:57

the customer or all sorts of different things. There's lots

play20:01

of different information that you can feed to these alms

play20:05

in order to generate a prompt. So that brings us

play20:08

to the writer. So if we go to the writer,

play20:11

let's take a look at the prompt that we actually

play20:14

pass to the writer. And we can see the way

play20:16

it works is that we tell that you're an expert

play20:19

copywriter whose job it is to take research and generate

play20:23

an article.

play20:24

You can see all the product information that we retrieved

play20:27

is included here in the prompts that we send to

play20:30

the writer. You can see that the web search results

play20:33

are also included in the information that we send to

play20:36

the writer, as well as a few examples that show

play20:38

the writer. Here's what you do with, you know, sample

play20:41

product information. You know, here's an example that you might

play20:45

get, and here's how you might translate that into the

play20:48

article and so on. And then finally it says, hey,

play20:51

write a fun and engaging article between 800 and 1000

play20:54

words

play20:54

about the topic. And that's the, this is the initial

play20:57

request that the user passed in. And so then we

play21:00

can finally see here's the actual article that was generated

play21:03

as a result. And yeah, look, it's cut off again.

play21:06

You know, it's very durable as it.

play21:10

So what does the LM parameters that we're sending in?

play21:13

You know, that's a good question. So we can see

play21:15

not only the prompts but what were the parameters. And

play21:18

I'm looking here and it says that the Max tokens

play21:21

are 500 and 12112 tokens but we asked it for

play21:23

800.

play21:24

Words that's less.

play21:25

That's that seems a little.

play21:26

Bit less, that's probably that could be a problem.

play21:28

Yeah. So that's that's something that maybe we'll try and

play21:31

tweak that later and fix that. I'll get to that

play21:34

in a moment, but let's just kind of finish going

play21:36

through the trace here.

play21:39

Just, you know, finally we see that the editor gets

play21:41

to decide, you know, what to do and and it

play21:43

decides not to, you know, give some feedback, but decides

play21:45

the article is fine and we'll let that go back

play21:48

to the user.

play21:48

The other thing that's really cool is because we're using

play21:51

tracing, we were able to really easily see what we're

play21:54

sending to our prompt. So we actually able to see

play21:57

the prompt and see what the words are, but we're

play21:59

also able to see the parameters and we're easy to

play22:02

really dig in and figure out what's wrong with it.

play22:04

So we had this amazing as observability and before this

play22:07

it was really hard to debug LM. So this is

play22:09

a great way to really start understanding what they're sending,

play22:12

how they're generating it, and figure out what's happening when

play22:15

you're having issues.

play22:16

Yeah, and it and it really speeds up that that

play22:19

iteration loop as you're experimenting with different things because you

play22:22

can just figure out what exactly is happening and what

play22:25

you might want to try next. So another thing that

play22:28

we have integrated into this application is that we're actually

play22:31

running prompt flow evaluators inline the application on the result

play22:35

of the output article. And so you can see that

play22:37

the way this works at a high level

play22:39

is that we collect all that information that was generated

play22:43

as the orchestrator was running. And we put that into

play22:47

a set of fields, query, context and response, where query

play22:50

is the request from the user. The context is information

play22:54

that was generated by the system that we're going to

play22:57

use to ground the the LM's response. And then finally,

play23:01

the response is the is the output from the LM.

play23:04

So we passed these two through our Prompt Flow Evaluators

play23:07

SDK and that gives us some evaluation scores.

play23:10

Here you can see we're not doing super well on

play23:13

on some of these scores.

play23:14

Yeah. And I cut off article might be that. So

play23:17

if you're not familiar with what the evaluation metrics are,

play23:21

relevance is, is the answer relevant to the question that

play23:24

the user asked. Fluency is does it does it flow

play23:27

doesn't make sense. Coherence is also does it kind of

play23:30

make sense and it's working and making sense in answering

play23:34

the questions. And then groundedness is, is it truthful? Is

play23:38

it actually giving information based on the products

play23:41

that are coming in and grounded in that truth and

play23:44

not giving it out hallucinations or making up information that

play23:47

is not true?

play23:48

Yeah. So you can see, you know, we're definitely using

play23:51

the information about products in the article. So it seems

play23:54

pretty grounded, but the article is cut off. It's a

play23:57

little bit salesy, you know, it's a little bit salesy.

play24:00

So I don't know how relevant it is to what

play24:02

I was actually asking it. So, but these metrics can

play24:05

help you get a sense of the quality of your

play24:07

application without having to manually go in and inspect every

play24:10

single input and response

play24:12

and then configure help you figure out where you might

play24:15

want to focus your debugging efforts. So I'd love to

play24:18

actually take a moment to show you how you actually

play24:20

can integrate the Prompt Flow Evaluators SDK into your application

play24:24

to generate some metrics like these.

play24:28

So if we continue on through the end of the

play24:30

orchestrator, you can see there's an if statement here that

play24:34

says if evaluate, go ahead and evaluate the article in

play24:37

the background. And we pass all the contexts that was

play24:41

generated as part of running the application to this evaluate

play24:44

article in the background function that I have to find

play24:48

here. And we can just go to definition in code

play24:50

and see what that looks like.

play24:59

So the cool thing about the Evaluation SDK too is

play25:01

we have all these built-in evaluators where you can just

play25:04

grab them and use them. But if you sometimes you

play25:06

need to create custom ones as well, which is also

play25:08

supported, which you can create a prompted custom evaluator and

play25:11

just import that and do that. So maybe you want

play25:14

to edit it and make it more specific to your

play25:16

use case rather than using the out-of-the-box ones. Or you

play25:19

want to add additional types of of evaluators so you

play25:21

have the flexibility to use the built in ones or

play25:23

create whichever ones you might need.

play25:27

Yeah. So it seems like VS Code is decided not

play25:30

to make Go to definition work in this particular moment,

play25:33

but that's fine. But so the evaluate article and background

play25:37

function here, just that it generates that collects the data

play25:41

and organizes it into those 3 buckets I talked about

play25:44

earlier. And then ultimately here, what we do is we

play25:48

import the Prompt Flow Evals SDK and we import those

play25:51

built-in evaluators so we can instantiate those in code using

play25:55

our model configuration,

play25:56

which is just a pointer to the Azure Open AI

play25:59

endpoint and model we want to use. And then we

play26:01

just call each evaluator with the data that we wanted

play26:04

to evaluate, and it returns back a score from 1:00

play26:07

to 5:00. So this is really easy for you to

play26:09

integrate into your application. You can use evaluators in many

play26:13

different ways and it really depends on how you want

play26:16

to use that data to iterate and improve on your

play26:18

application.

play26:21

Let's see if Go to definition works. It'll take me

play26:25

to the. OK, So

play26:27

that what's really cool about Python is that this is

play26:29

all just Python scripts that are running and you can

play26:32

actually go to definition on these evaluators and you can

play26:35

see

play26:36

what the prompts are used. So these are implemented using

play26:40

prompty files that we were showing earlier. And so you

play26:43

can actually take those promptly files, copy them into your

play26:46

code, and you can make your own custom evaluator starting

play26:49

with the ones that we've defined as Microsoft. And actually

play26:53

the tracing view will show us one of those prompts.

play26:56

So here, for example, this is the prompt that we

play26:58

use for relevance. And you can see that, you know,

play27:01

relevance that it gives the definition of the metric, but

play27:04

you can see some of the examples that it uses

play27:07

are kind of generic. So one of the things I

play27:09

might want to do if I'm, you know, trying to

play27:11

tailor evaluation to my, my company is I might want

play27:14

to give it some examples that are more relevant to

play27:17

the type of information that my company might be evaluating.

play27:20

And I can make my own customized scores that I

play27:22

can use to help me gauge quality.

play27:25

Which is a really hard problem to solve and this

play27:27

is making it so much easier.

play27:28

Yeah, totally. So let's go ahead and look at maybe

play27:32

fixing that bug.

play27:33

That we saw earlier.

play27:38

OK, So what I did here was I just made

play27:41

a couple of changes to my my code. And the

play27:43

first thing I did was suggest that make the article

play27:47

between 300 and 500 words. And then I also increased

play27:50

the Max token setting to 1200, right? So these are

play27:53

parameters that we can experiment with, right? So we can

play27:57

run a lot of evaluation to see, you know, which

play28:00

parameters to really dial these in. And that's where that

play28:03

kind of experimentation comes in. You have to play

play28:06

with the parameters you're passing in and see how the

play28:09

application performs. So it's a little bit of a different

play28:11

style of development when you're building these generative AI applications.

play28:16

So I'm gonna go ahead and check in these changes

play28:19

and push it off to my CICD environment where I've

play28:23

got. The template has GitHub Actions that you can configure

play28:26

to run on your Git repo. So when you push

play28:29

changes, it will go ahead and run some actions. And

play28:34

with the GitHub Actions extension in VS Code, I can

play28:37

go ahead and see what actions are running and and

play28:39

look at my history of actions. So there's two actions

play28:42

that are kicked off. One, it actually runs the ASD

play28:44

provision and deploy command that both Cassie and I ran.

play28:47

So now we've got Cassie environment, we've got my environment

play28:50

and our we've got an integration environment that all is

play28:53

the same setup. And so we can eliminate sort of

play28:56

all those environmental differences when we're trying to troubleshoot and

play28:59

debug things, when we're chasing down bugs.

play29:02

So I can go ahead and look at what's happening

play29:05

right now when I'm running this GitHub action.

play29:08

And you can see here, it's just pulling down the

play29:11

AZD CLI and it's going to provision infrastructure. And this

play29:14

is using Terraform, by the way. It's really advanced stuff.

play29:18

I love it. And then it's going to go ahead

play29:20

and deploy the application so that if we make any

play29:23

changes to our infrastructure, any changes to our application code,

play29:26

the GitHub action will make sure we're testing against the

play29:30

latest version of everything.

play29:32

Umm, and then the other action that we have is

play29:35

we have our evaluate action. So this will actually, instead

play29:38

of, you know, evaluating inline in the application, the evaluate

play29:42

action will run a set of test articles through our

play29:44

code and give us the scores, the evaluation scores that

play29:47

we can use to check and see, you know, how

play29:50

well is this code doing? This is going to take

play29:52

just like another minute to run here.

play29:56

This is cool because now we're able to batch information

play29:58

in and batch tests. So we can have as many

play30:00

different tests or items as we want as we start

play30:03

batching and evaluating. And then we'll get those averages to

play30:06

understand how the model is performing. Yeah, and the cool

play30:09

thing is because we're using LM to evaluate, we don't

play30:11

have to add ground truth. We just need the inputs

play30:14

and then this will figure out the outputs. And then

play30:16

we use the LM to figure out if it's performing

play30:18

well or not.

play30:19

Yeah. And so in another minute, when that completes, you

play30:22

can see that here. It just shows here the different

play30:25

articles that we ran through our CI CD system. So

play30:28

I've got a few different articles requests that I'm using

play30:31

for testing and the updated scores that I get after

play30:33

I make that change. It looks like we're getting closer

play30:36

to all fives across the board.

play30:38

Awesome. Yeah, it looks like that helped.

play30:42

Uhm, I think that's all I wanted to show, but

play30:45

maybe I'll try one thing and just reloading my window

play30:48

here in VS Code to see if I can show

play30:50

you the prompting for the evaluator. Now that I'm done

play30:54

with my demo, I'm willing to try this out.

play30:58

And this is a pro tip if you're US code

play31:00

developer.

play31:03

Ohh. And one more fun thing, my CI CD environment

play31:06

is my super duper potato. So I think that's pretty

play31:09

chill.

play31:11

Alright,

play31:14

alright, so go to Definition works. That's really awesome. So

play31:18

you can see it uses the relevance prompt file. So

play31:20

this just uses prompt flow code just like we just

play31:23

showed you. This is really easy for you to copy

play31:25

and implement yourself. And if we dig into the site

play31:28

packages, we can actually see the property file that's used.

play31:31

And we can just copy and paste this, create our

play31:34

own property file, create our own evaluator. Really easy to

play31:37

build off of and customize.

play31:38

And that's a good way to do it is to

play31:40

grab the ones that are built in and then start

play31:42

iterating on those to customize them to your your solution.

play31:44

Yeah. Alright. So we've checked this change into our CI

play31:48

CD environment and I think we're ready to ship it

play31:50

off to production. But Cassie, why don't you tell us

play31:54

what's going on in production?

play31:55

Cool.

play31:58

So here, just like we were using App Insights locally

play32:01

to start pushing all that information. So everything you saw

play32:05

locally, we now pushed up to our cloud, which is

play32:08

running on our production endpoint. And we've set up a

play32:11

dashboard to give us a nice view of what's happening

play32:14

with our LM to allow us to monitor the production

play32:17

deployment. So if you take a look first, we have

play32:20

the average evaluation scores over time and you can see

play32:24

that it started performing lower. So this is the one

play32:27

without the fix yet.

play32:28

By the way, his is still deploying. So these are

play32:31

information on what was out there, and this could be

play32:33

an indicator that maybe there's an issue and we need

play32:36

to go take a look. Another metric we have here

play32:38

is the average time per per LM call. I think

play32:41

this one is super interesting because this GPT 4 model

play32:44

is our writer and this is our evaluator. And if

play32:46

you think about the way that these work, the GPT

play32:49

4 model had a ton of tokens coming out. So

play32:51

it's taking longer because every time it creates a token

play32:54

is actually an inference with the model.

play32:56

That's writing the article.

play32:57

It's writing the article, right,

play32:58

where with the evaluators, it's only ever giving one token.

play33:01

It's a number between 1:00 and 5:00. And so that's

play33:04

another interesting way to start seeing where you're spending and

play33:07

how much time, I'm sorry, where how much time things

play33:09

are are taking. We also have tokens used over time,

play33:12

so we can see if that how that's trending and

play33:14

make decisions based on that. And then we have tokens

play33:17

used by model. So this one's really useful for cost,

play33:20

right? If I have a ton of tokens being used,

play33:22

maybe I need to rethink about what I'm doing to

play33:24

maybe mitigate cost. And so if you look at this

play33:27

one, the evaluator is using a ton of tokens.

play33:29

And if you remember he's we're evaluating every single call

play33:33

that comes in and that's four LLM calls, LLM call

play33:36

for each of the evaluation.

play33:37

And the prompts for those are actually pretty.

play33:39

Sizable. That too, right?

play33:40

So it's a lot of tokens being used for evaluation.

play33:43

So maybe we want to think about doing a sampling

play33:45

of the inputs coming in for the evaluators and reduce

play33:48

cost there?

play33:49

Yeah. So I just have that sampling parameter that I

play33:51

can set my code, I can dial that down. I

play33:53

can really decrease the cost used for evaluation, but still

play33:56

get the same information, still understand how my evaluation is

play33:59

doing

play33:59

overtime, and this visibility really helps us make some of

play34:02

those small tweaks that to to really bring down the

play34:04

cost of our application.

play34:06

Exactly. So another cool thing with this is we can

play34:09

see each individual call in the operations view here from

play34:12

the query. So we can see all these different parameters

play34:16

are the different scores. And then we can grab this

play34:19

operation ID and we can do a transaction search. And

play34:22

now we have the exact same tracing that we had

play34:25

locally in our production endpoint. So if we see an

play34:28

issue, we can go in and actually dig into exactly

play34:31

what was happening

play34:32

in production and get detailed information at every step. And

play34:35

you can see the token information in here as well.

play34:39

So we can see this was before

play34:41

we made that update.

play34:43

Yeah, so what's really cool about this, by the way?

play34:46

I know exactly what's happening. I just walked you through

play34:49

that tracing view locally. I get that same view. So,

play34:51

you know, if if you have something with a low

play34:53

evaluation score and production, you can query against that in

play34:56

App Insights. And then you can drill into exactly what's

play34:59

happening. And you know, in each of these things, we're

play35:02

capturing the inputs and the outputs. And so I can

play35:04

figure out which component of my application I want to

play35:07

focus on improving. And then I can take the data

play35:09

from the application Insights, copy that into my development environment,

play35:12

maybe write a little unit test

play35:14

and start iterating until I get a better result and

play35:16

then check that change in and back to production again.

play35:19

Also, you may have remembered that I kicked off an

play35:22

ACD up when we started this talk. If you take

play35:25

a look, it took about 14 minutes to deploy this

play35:28

entire application and now I can go browse and I

play35:30

have a whole new instance set up.

play35:33

Wow, that's a lot of KS clusters.

play35:36

Being told to get to work and we can see

play35:38

that our agent is now deployed and working

play35:44

alright.

play35:50

This could have been a four week talk.

play35:54

Alright, so just like a little bit of a recap

play35:56

because we went over a lot of really cool new

play35:59

things. We have the new template scenarios. So these, the

play36:02

one that you saw today is available at that link.

play36:04

You can go use it right now as well as

play36:07

other scenarios. They are all prompted enabled and have really

play36:10

cool cutting edge things that you can start using. So

play36:13

definitely go check that out.

play36:16

We also saw how we can start doing GitHub actions

play36:18

to do batch evaluations on every check in. So we

play36:21

can see how our performance is changing as we're iterating

play36:23

on our prompt. And since we have a prompted, we

play36:26

also have that in our source. So we're actually being

play36:28

able to look at what what happened with the change

play36:31

in our prompt and then see that information in the

play36:33

evaluations.

play36:36

Then we saw the amazing tracing and debugging that is

play36:39

just so useful when you're trying to figure out what's

play36:41

happening. What are you sending to your LM, what's coming

play36:44

back? Why are things performing a particular way? The tracing

play36:47

and debugging makes a huge difference and starting to make

play36:50

these real both when you're creating and then also in

play36:52

production for monitoring.

play36:57

So you saw the dashboard where we were able to

play37:00

see the monitoring and production. And then we have these

play37:03

new announcements, which we've talked about. So we have prompty

play37:07

we had the tracing and debugging. We have the monitoring

play37:10

for Jen AI and then we have the AD integration

play37:13

for AI Studio. Another thing is so we showed you

play37:16

monitoring and evaluation in App insights and using a KS.

play37:19

You can also do all of this in AI Studio

play37:22

and it has evaluations and tracing built in there. So

play37:25

if you're deploying with I

play37:27

do you have all the same tooling there so you

play37:29

can choose the way that you want to deploy for

play37:31

the type of solution that you're creating?

play37:33

And you can log actually all of the evaluation results

play37:36

and tracing results to a studio. So you can have

play37:39

it there. And it's got some really good tools that

play37:41

you can use to drill in and compare different runs.

play37:44

So I think this tool tooling is all really game

play37:47

changing, is changing how I'm approaching my development of generative

play37:51

AI applications. And I'm really excited for all of you

play37:54

to try it as well. Yeah. So thanks everyone and

play37:57

yeah, happy allow them opposing.

play38:05

Be sure to check out the property session tomorrow. Yes,

play38:08

yeah, we're going.

play38:09

To go session is going to be awesome. Yeah, so

play38:10

we just showed you a little bit of promptly. There's

play38:12

more prompty.

play38:13

And For more information on more practical tips for building

play38:17

copilots, be sure to check out the Jeff and Sandra

play38:19

suggestion session tomorrow morning on building copilots, key lessons and

play38:23

best practices. And yet there's more about Advanced RAG with

play38:27

Azure a search as well. So enjoy the rest of

play38:29

the conference, everyone.

play38:31

All types of good AI.

Rate This

5.0 / 5 (0 votes)

Связанные теги
Azure AI開発ツールAIアプリケーションプロトタイプデバッグ非決定性モデルパフォーマンスプロンプトエンジニアリングコードファーストAIスタジオGitHubアクション
Вам нужно краткое изложение на английском?