New HYBRID AI Model Just SHOCKED The Open-Source World - JAMBA 1.5

AI Revolution
24 Aug 202408:15

Summary

TLDRAI 21 Labs 推出了两款新的开源大型语言模型:Jambo 1.5 mini 和 Jambo 1.5 large,它们采用了独特的混合架构SSM Transformer,结合了传统Transformer模型和结构化状态空间模型(SSM),以更高效地处理长序列数据。这些模型在处理复杂任务,如长文档摘要或多轮对话时,能提供更准确、有意义的回应,同时降低成本。Jambo模型在新的ruler基准测试中表现优异,速度快,内存占用低,支持多语言,且具有开发者友好的特性,如JSON输出和引用生成,非常适合企业级应用。

Takeaways

  • 🌟 AI 21 Labs发布了两款新的开源大型语言模型:Jambo 1.5 mini和Jambo 1.5 large,它们具有独特的混合架构。
  • 🔧 这些模型采用了SSM Transformer架构,结合了经典Transformer模型和结构化状态空间模型(SSM),以提高处理长序列数据的能力。
  • 🚀 Jambo模型能够处理更长的上下文窗口,这对于需要大量上下文理解的复杂任务是一大优势。
  • 🏆 Jambo模型在新的ruler基准测试中表现出色,超越了其他知名模型,如llama 317b和llama 3.1 45b。
  • 🔑 Jambo模型的关键组件Mamba,具有更低的内存占用和更高效的注意力机制,使其能够轻松处理长上下文窗口。
  • 📈 AI 21 Labs开发了一种新的量化技术experts int 8,通过减少模型计算中使用的数字精度,节省内存和计算成本。
  • 🌐 Jambo模型支持多种语言,包括西班牙语、法语、葡萄牙语、意大利语、荷兰语、德语、阿拉伯语和希伯来语,使其适用于全球应用。
  • 🛠️ Jambo模型为开发者友好,支持结构化JSON输出、函数调用和引用生成,适用于企业环境中的复杂AI应用。
  • 📊 Jambo 1.5 large能够在单个8 GPU节点上运行,同时使用其完整的256k上下文长度,显示出极高的资源效率。
  • 📈 Jambo 1.5模型在长上下文处理速度上是竞争对手的2.5倍,非常适合需要快速响应的企业应用。
  • 📘 Jambo模型在AI 21 Studio、Google Cloud、Microsoft Azure、Nvidia Nim等平台上开放使用,易于部署和实验。

Q & A

  • AI 21 Labs是什么组织?

    -AI 21 Labs是一个专注于开发先进人工智能语言模型的组织,他们最近推出了两个新的开源大型语言模型,名为Jambo 1.5 mini和Jambo 1.5 large。

  • Jambo 1.5 mini和Jambo 1.5 large模型的主要特点是什么?

    -这两个模型采用了独特的混合架构,结合了最新的技术以增强人工智能性能,特别是它们能够更有效地处理长文本数据。

  • 什么是SSM Transformer架构?

    -SSM Transformer是一种新型的混合架构,它结合了传统的Transformer模型和一个称为结构化状态空间模型(SSM)的模型,以提高处理长序列数据的效率。

  • 为什么处理长上下文窗口对于AI模型很重要?

    -处理长上下文窗口的能力对于AI模型在现实世界应用中至关重要,尤其是在需要大量上下文信息以提供准确和有意义的响应的企业级应用中。

  • Jambo模型如何提高处理长上下文的能力?

    -Jambo模型通过使用Mamba组件,它具有更低的内存占用和更高效的注意力机制,能够轻松处理更长的上下文窗口。

  • AI 21 Labs开发的新基准测试RULER是什么?

    -RULER是AI 21 Labs创建的新基准测试,用于评估模型在多跳追踪、检索聚合和问答等任务上的表现。

  • Jambo 1.5 mini和Jambo 1.5 large在RULER基准测试中的表现如何?

    -在RULER基准测试中,Jambo 1.5 mini和Jambo 1.5 large一致地超越了其他模型,如llama 317b、llama 3.1 45b和misra large 2。

  • Jambo 1.5模型的速度优势是什么?

    -Jambo 1.5模型在长上下文处理上的速度是竞争对手的2.5倍,这使得它们在企业应用中非常实用,无论是运行客户支持聊天机器人还是AI驱动的虚拟助手。

  • AI 21 Labs开发的experts int 8量化技术是什么?

    -experts int 8是一种新的量化技术,通过将模型中的权重量化为8位精度格式,并在GPU运行时直接进行反量化,从而减少模型大小并加快处理速度。

  • Jambo 1.5模型支持哪些语言?

    -Jambo 1.5模型除了支持英语外,还支持西班牙语、法语、葡萄牙语、意大利语、荷兰语、德语、阿拉伯语和希伯来语,使其非常适合全球应用。

  • Jambo 1.5模型如何支持开发者友好的特性?

    -Jambo 1.5 mini和large都内置了对结构化JSON输出、函数调用甚至引用生成的支持,这使得开发者可以创建更复杂的AI应用程序。

  • AI 21 Labs对Jambo 1.5模型的开放性承诺是什么?

    -AI 21 Labs承诺保持Jambo 1.5模型的开放性,它们在Jambo开放模型许可下发布,允许开发者、研究人员和企业自由地进行实验。

Outlines

00:00

🤖 AI 21 Labs发布新型开源语言模型

AI 21 Labs推出了两款新型开源语言模型:Jambo 1.5 mini和Jambo 1.5 large。这些模型采用独特的混合架构,结合了尖端技术以提升AI性能。它们可以自行在hugging face等平台尝试,或在Google Cloud vertex AI、Microsoft Azure和Nvidia等云服务上运行。Jambo模型通过SSM Transformer架构,结合了传统Transformer模型和结构化状态空间模型(SSM),后者基于更高效的神经网络技术,使得Jambo模型能够处理更长的数据序列,适用于需要大量上下文的任务,如复杂推理或长文档摘要。这种长上下文窗口处理能力对企业级应用至关重要,因为它能提供更准确、有意义的响应,减少重复数据处理,提高质量和降低成本。

05:02

🚀 Jambo模型的混合架构及其效率

Jambo模型的混合架构中,Mamba组件基于Carnegie Mellon和普林斯顿大学研究人员的见解,具有更低的内存占用和更高效的注意力机制,能够轻松处理更长的上下文窗口。与传统Transformer相比,Mamba通过维护较小状态并在处理数据时更新,从而实现更快、资源消耗更少的操作。AI 21 Labs还开发了一种新的量化技术experts int 8,通过将模型中的混合专家层的权重量化为8位精度格式,并在GPU运行时直接进行反量化,从而减小模型大小并加快处理速度。Jambo 1.5 large能够在单个8 GPU节点上使用完整的256k上下文长度,使其成为资源效率极高的模型之一。此外,Jambo模型支持多种语言,包括西班牙语、法语、葡萄牙语、意大利语、荷兰语、德语、阿拉伯语和希伯来语,使其成为全球应用的多功能选择。Jambo 1.5还提供内置支持,如结构化JSON输出、函数调用和引用生成,非常适合企业环境中的复杂AI应用。AI 21 Labs承诺保持这些模型的开源性,允许开发者、研究人员和企业自由实验,并在多个平台和云合作伙伴上提供广泛的部署和实验选项。

Mindmap

Keywords

💡AI 21 Labs

AI 21 Labs是视频中提到的开发Jambo 1.5 mini和Jambo 1.5 large语言模型的公司。这些模型采用了独特的混合架构,旨在提高AI性能。AI 21 Labs致力于开源,使得开发者和研究人员可以在多个平台上使用和测试这些模型。

💡Jambo 1.5 mini和Jambo 1.5 large

这两个关键词指的是AI 21 Labs发布的两种不同规模的开源语言模型。Jambo 1.5 mini和Jambo 1.5 large设计用于处理更长的数据序列,而不会遇到传统Transformer模型在处理大型上下文窗口时的性能瓶颈。

💡混合架构

混合架构是Jambo模型的一个核心特点,它结合了传统的Transformer模型和结构化状态空间模型(SSM)。这种架构使得Jambo模型能够更有效地处理长序列数据,提高了AI在复杂任务中的性能。

💡Transformer架构

Transformer架构是一种广泛用于语言模型的深度学习模型,它在处理自然语言处理任务时表现出色。然而,它在处理非常大的上下文窗口时存在局限性,这是Jambo模型试图解决的问题。

💡结构化状态空间模型(SSM)

SSM是一种结合了神经网络和卷积神经网络的技术,它在处理计算时更为高效。在Jambo模型中,SSM与Transformer模型结合,提高了模型处理长序列数据的能力。

💡长上下文窗口

长上下文窗口指的是在处理文本数据时需要考虑的大量前后文信息。在视频中,长上下文窗口对于复杂任务如文档摘要或会议记录分析至关重要,Jambo模型通过其架构优势在这方面表现出色。

💡Mamba组件

Mamba是Jambo模型中的一个关键组件,它通过降低内存占用和优化注意力机制,使得模型能够更高效地处理长上下文窗口。Mamba的设计灵感来自于卡内基梅隆大学和普林斯顿大学的研究人员。

💡RULER基准测试

RULER是AI 21 Labs创建的新基准测试,用于评估Jambo模型在多跳追踪、检索聚合和问答等任务上的表现。Jambo模型在RULER基准测试中的表现优于其他知名模型。

💡量化技术(Experts INT 8)

量化技术是一种减少模型计算中使用数字精度的方法,可以节省内存和计算成本。Experts INT 8是AI 21 Labs开发的一种特殊量化技术,它针对模型中的混合专家层的权重进行8位精度量化,从而减小模型大小并加快处理速度。

💡多语言支持

Jambo模型不仅支持英语,还支持西班牙语、法语、葡萄牙语、意大利语、荷兰语、德语、阿拉伯语和希伯来语等多种语言。这使得模型在全球应用中具有很高的适应性和多样性。

💡开发者友好

Jambo模型的设计考虑到了开发者的需求,提供了内置的支持结构化JSON输出、函数调用和引用生成等功能。这使得开发者可以创建更复杂的AI应用,执行调用外部工具、消化结构化文档和提供可靠引用等任务。

Highlights

AI 21 Labs发布了两款新的开源大型语言模型Jambo 1.5 mini和Jambo 1.5 large。

这些模型采用独特的混合架构,结合尖端技术以提高AI性能。

Jambo模型可以在Hugging Face平台或Google Cloud Vertex AI、Microsoft Azure和Nvidia等云服务上运行。

Jambo模型使用SSM Transformer架构,结合了经典Transformer模型和结构化状态空间模型(SSM)。

SSM基于更高效的技术,如神经网络和卷积神经网络,以提高处理效率。

Jambo模型能够处理更长的数据序列,而不会降低速度。

处理长上下文窗口的能力对于企业级生成AI应用至关重要。

Jambo模型通过保持更多相关信息在内存中,减少了重复数据处理的需求。

Jambo模型在ruler基准测试中,表现出色,超越了其他大型AI模型。

Jambo 1.5 mini和large在长上下文任务上的速度是竞争对手的2.5倍。

Mamba组件具有更低的内存占用和更高效的注意力机制。

AI 21 Labs开发了一种新的量化技术experts int 8,以减少模型大小并加快处理速度。

Jambo 1.5 large可以在单个8 GPU节点上运行,同时使用其完整的256k上下文长度。

Jambo模型支持多种语言,包括西班牙语、法语、葡萄牙语、意大利语、荷兰语、德语、阿拉伯语和希伯来语。

Jambo模型具有内置的支持结构化JSON输出、函数调用和引用生成的功能。

Jambo模型在多个平台上可用,并且与多个云合作伙伴合作,易于开发者和研究人员使用。

AI 21 Labs致力于保持Jambo模型的开源,允许自由实验和部署。

Jambo 1.5模型的混合架构使其在处理复杂数据密集型任务时更加高效、快速和多样化。

Transcripts

play00:02

so AI 21 Labs the brains behind the

play00:05

Jurassic language models has just

play00:06

dropped two brand new open- source llms

play00:09

called Jambo 1.5 mini and Jambo 1.5

play00:12

large and these models are designed with

play00:14

a unique hybrid architecture that

play00:16

incorporates Cutting Edge techniques to

play00:18

enhance AI performance and since they're

play00:20

open source you can try them out

play00:22

yourself on platforms like hugging face

play00:25

or run them on cloud services like

play00:27

Google Cloud vertex AI Microsoft aure

play00:30

and Nvidia Nim definitely worth checking

play00:32

out all right so what's this hybrid

play00:34

architecture all about okay let's break

play00:36

it down in simple terms most of the

play00:39

language models you know like the ones

play00:41

used in chat GPT are based on the

play00:44

Transformer architecture these models

play00:46

are awesome for a lot of tasks but

play00:49

they've got this one big limitation they

play00:51

struggle when it comes to handling

play00:53

really large context Windows think about

play00:56

when you're trying to process a super

play00:57

long document or a full transcript from

play01:00

a long meeting regular Transformers get

play01:03

kind of bogged down because they have to

play01:04

deal with all that data at once and

play01:07

that's where these new Jamba models from

play01:09

AI 21 Labs come into play with a totally

play01:12

new game-changing approach so AI 21 has

play01:15

cooked up this new hybrid architecture

play01:17

they're calling the SSM Transformer now

play01:20

what's cool about this is it combines

play01:22

the classic Transformer model with

play01:24

something called a structured State

play01:25

space model or SSM the SSM is built on

play01:29

some older more efficient techniques

play01:31

like neural networks and convolutional

play01:33

neural networks basically these are

play01:35

better at handling computations

play01:37

efficiently so by using this mix the

play01:39

Jamba models can handle much longer

play01:42

sequences of data without slowing down

play01:44

that's a massive win for tasks that need

play01:46

a lot of context like if you're doing

play01:48

some complex generative AI reasoning or

play01:50

trying to summarize a super long

play01:52

document now why is handling a long

play01:55

context window such a big deal well

play01:57

think about it when you're using AI for

play01:59

real world applications especially in

play02:01

businesses you're often dealing with

play02:03

complex tasks maybe you're analyzing

play02:05

long meeting transcripts or summarizing

play02:07

a giant policy document or even running

play02:09

a chatbot that needs to remember a lot

play02:12

of past conversations the ability to

play02:14

process large amounts of context

play02:16

efficiently means these models can give

play02:18

you more accurate and meaningful

play02:20

responses or denan the VP of product at

play02:23

AI 21 Labs actually nailed it when he

play02:26

said an AI model that can effectively

play02:28

handle long context is crucial for many

play02:31

Enterprise generative AI applications

play02:33

and he's right without this ability AI

play02:36

models often tend to hallucinate or just

play02:38

make stuff up because they're missing

play02:40

out on important information but with

play02:42

the Jamba models and their unique

play02:44

architecture they can keep more relevant

play02:47

info in memory leading to way better

play02:49

outputs and less need for repetitive

play02:51

data processing and you know what that

play02:53

means better quality and lower cost all

play02:55

right let's get into the nuts and bolts

play02:57

of what makes this hybrid architecture

play02:58

so efficient so there's one part of the

play03:01

model called Mamba which is actually

play03:03

very important it's developed with

play03:05

insights from researchers at Carnegie

play03:06

melon and Princeton and it has a much

play03:08

lower memory footprint and a more

play03:10

efficient attention mechanism than your

play03:12

typical Transformer this means it can

play03:14

handle longer context windows with ease

play03:17

unlike Transformers which have to look

play03:19

at the entire context every single time

play03:21

slowing things down Mamba keeps a

play03:23

smaller state that gets updated as it

play03:25

processes the data this makes it way

play03:27

faster and less resource intensive

play03:30

now you might be wondering how do these

play03:32

models actually perform well AI 21 Labs

play03:35

didn't just hype them up they put them

play03:36

to the test they created a new Benchmark

play03:39

called ruler to evaluate the models on

play03:42

tasks like multihop tracing retrieval

play03:44

aggregation and question answering and

play03:46

guess what the Jamba models came out on

play03:49

top consistently outperforming other

play03:51

models like llama 317b llama 3.1 45b and

play03:55

mistra large 2 on the arena hard

play03:57

Benchmark which is all about testing

play03:59

models on really tough tasks Jamba 1.5

play04:03

mini and large outperformed some of the

play04:05

biggest names in AI Jamba 1.5 mini

play04:08

scored an impressive

play04:10

46.1 beating models like mixol 8 x22 B

play04:14

and command R plus while Jambo 1.5 large

play04:17

scored a whopping 65.4 outshining even

play04:20

the big guns like llama 317b and

play04:23

45b one of the standout features of

play04:26

these models is their speed in

play04:29

Enterprise applications speed is

play04:31

everything whether you're running a

play04:33

customer support chatbot or an AI

play04:35

powered virtual assistant the model

play04:37

needs to respond quickly and efficiently

play04:40

the Jambo 1.5 models are reportedly up

play04:42

to 2.5 times faster on Long context than

play04:45

their competitors so not only are they

play04:47

powerful but they're also super

play04:49

practical for high-scale operations and

play04:51

it's not just about speed the Mamba

play04:53

component in these models allows them to

play04:55

operate with a lower memory footprint

play04:57

meaning they're not as demanding on

play04:58

hardware for for example Jambo 1.5 mini

play05:01

can handle context lengths up to 140,000

play05:04

tokens on a single GPU that's huge for

play05:07

developers looking to deploy these

play05:08

models without needing a massive

play05:10

infrastructure all right here's where it

play05:12

gets even cooler to make these massive

play05:14

models more efficient AI 21 Labs

play05:16

developed a new quantization technique

play05:18

called experts int 8 now I know that

play05:21

might sound a bit technical but here's

play05:23

the gist of it quantization is basically

play05:26

a way to reduce the Precision of the

play05:28

numbers used in the model's computations

play05:31

this can save on memory and

play05:33

computational costs Without Really

play05:35

sacrificing quality experts in eight is

play05:38

special because it specifically targets

play05:40

the weights in the mixture of experts or

play05:42

Mo layers of the model these layers

play05:44

account for about 85% of the models

play05:47

weights in many cases by quantizing

play05:49

these weights to an 8bit Precision

play05:51

format and then de quantizing them

play05:53

directly inside the GPU during runtime

play05:56

AI 21 Labs managed to cut down the model

play05:59

size size and speed up its processing

play06:01

the result Jamba 1.5 large can fit on a

play06:05

single 8 GPU node while still using its

play06:08

full context length of

play06:10

256k this makes Jamba one of the most

play06:12

resource efficient models out there

play06:15

especially if you're working with

play06:16

limited Hardware now besides English

play06:19

these models also support multiple

play06:21

languages including Spanish French

play06:23

Portuguese Italian Dutch German Arabic

play06:25

and Hebrew which makes them super

play06:27

versatile for Global applications and

play06:29

here's a cherry on top AI 21 Labs made

play06:32

these models developer friendly both

play06:34

Jamba 1.5 mini and large come with

play06:37

built-in support for structured Json

play06:39

output function calling and even

play06:41

citation generation this means you can

play06:43

use them to create more sophisticated AI

play06:46

applications that can perform tasks like

play06:48

calling external tools digesting

play06:50

structured documents and providing

play06:52

reliable references all of which are

play06:54

Super useful in Enterprise settings one

play06:57

of the coolest things about Jamba 1.5 is

play06:59

AI 21 lab's commitment to keeping these

play07:02

models open they're released under the

play07:04

Jamba open model license which means

play07:07

developers researchers and businesses

play07:09

can experiment with them freely and with

play07:11

availability on multiple platforms and

play07:13

Cloud Partners like AI 21 Studio Google

play07:16

Cloud Microsoft Azure Nvidia Nim and

play07:19

soon on Amazon Bedrock datab bricks

play07:21

Marketplace and more you've got tons of

play07:24

options for how you want to deploy and

play07:26

experiment with these models looking

play07:28

ahead it's pretty pretty clear that AI

play07:30

models that can handle extensive context

play07:32

windows are going to be a big deal in

play07:33

the future of AI as Oran from AI 21 Labs

play07:37

pointed out these models are just better

play07:39

suited for complex data heavy tasks that

play07:42

are becoming more common in Enterprise

play07:43

settings they're efficient fast and

play07:46

versatile making them a fantastic choice

play07:48

for developers and businesses looking to

play07:50

push the boundaries in AI so if you

play07:52

haven't checked out Jamba 1.5 mini or

play07:54

large yet now's the perfect time to dive

play07:57

in and see what these models can do for

play07:59

you all right if you found this video

play08:01

helpful smash that like button hit

play08:03

subscribe and stay tuned for more

play08:05

updates on the latest in AI te thanks

play08:07

for watching and I'll catch you in the

play08:09

next one

Rate This

5.0 / 5 (0 votes)

Etiquetas Relacionadas
AI模型开源JamboTransformerSSM性能提升长文本处理企业应用多语言支持资源效率开发者友好
¿Necesitas un resumen en inglés?