[ML News] Jamba, CMD-R+, and other new models (yes, I know this is like a week behind 🙃)

Yannic Kilcher
13 Apr 202427:31

Summary

TLDR在这段视频脚本中,我们探索了最近两周内发布的一些令人兴奋的新型人工智能模型。首先,AI 22 Labs 推出了 Jamba 模型,这是一个混合模型,结合了 Mamba 架构和注意力层,实现了长文本理解而不会显著增加内存需求。Data Bricks 也发布了一个超过100亿参数的大型语言模型,它在自然语言处理、编程和数学方面表现出色。此外,还介绍了 Cohere 的 Command R Plus,这是一个性能更强的模型,专门针对命令优化和检索增强的生成进行了优化。Google Research 展示了 Video Poet,这是一个文本到视频的零样本视频生成模型,以及 Magic Lens,一个使用自然语言检索图像的模型。还有关于合成数据生成的研究,以及 Salesforce AI Research 发布的 MOAI 时间序列基础模型,旨在提供通用的预测能力。最后,还提到了 H2O AI 的 H2O Den 2,一个在多个基准测试中表现出色的1.8亿参数基础语言模型。这些模型的发展不仅展示了人工智能领域的进步,也预示着未来技术应用的广阔前景。

Takeaways

  • 🚀 AI 22 labs发布了Jamba模型,这是一个混合模型,结合了Mamba架构和注意力层,能够在不增加内存需求的情况下实现长文本上下文的推理。
  • 📚 Databricks推出了新的dbrx模型,这是一个超过100亿参数的大型语言模型,不仅在自然语言理解上表现出色,还在编程和数学方面有优异的性能。
  • 🔍 Cohere发布了Command R Plus,这是一个优化的命令检索增强生成模型,支持10种不同的语言,但需要在商业用途中付费。
  • 📈 Mistil发布了新的7B模型,这是一个基础模型,用于训练他们的新指令模型,具有32,000个令牌的上下文窗口和指令进行微调。
  • 🎥 Google Research的Video Poet是一个文本到视频的零样本视频生成模型,能够处理各种文本和视频混合的任务。
  • 🔎 Google Deep Mind的Magic Lens是一个图像检索模型,使用自然语言进行检索,并且大部分使用合成数据进行训练。
  • 🧪 Cosmomedia项目是Fi模型的一个开源复制,使用合成数据训练,旨在展示高质量训练数据的重要性。
  • 📊 Google Deep Mind的另一份论文探讨了在大型语言模型中进行长篇事实性研究,开发了一种验证方法和模型,以提高事实性评估的准确性。
  • 🤖 Snap和特拉维夫大学的研究论文myvLM探讨了如何个性化视觉语言模型,使模型能够从用户的视角理解和推理特定的概念。
  • ⏱️ Nvidia的Latte 3D是一个文本到3D模型,能够快速生成高质量的纹理网格,推动了文本到3D领域的研究。
  • 🌐 Meta的场景脚本模型旨在理解周围的物理空间,支持增强现实技术,通过合成数据进行训练。

Q & A

  • Jamba模型是如何结合Mamba架构和注意力层的?

    -Jamba模型是一种混合模型,它结合了Mamba架构和一些注意力层。Mamba架构是一种状态空间模型的进化,类似于循环神经网络,但能够一次性完成所有操作。Jamba模型通过这种结合实现了非常长的上下文性能推断,同时没有显著增加内存需求。

  • dbrx模型在哪些领域表现出色?

    -dbrx模型不仅在自然语言理解方面表现出色,还在编程和数学领域也有很好的表现。它是一个大型模型,拥有超过100亿参数,并使用了专家混合架构,总共有1320亿参数,但任何给定输入时只有360亿参数是活跃的。

  • Command R plus模型的主要特点是什么?

    -Command R plus是一个更高效的状态命令优化和检索增强的生成优化模型。它支持10种不同的语言,并且被训练用于引用和使用工具。这个模型不是开源的,而是以开放权重的形式提供,允许个人使用,但在商业环境中使用需要向Cohere支付费用。

  • 视频诗人(Video Poet)模型是如何工作的?

    -视频诗人模型是一个文本到视频的零样本视频生成模型,它不仅能够进行视频生成,还能够处理任何类型的文本和视频混合。该模型经过多种不同目标的训练,使其能够执行文本到视频、图像到视频的风格化等任务。

  • Magic Lens模型的主要功能是什么?

    -Magic Lens模型是一个图像检索模型,它使用开放式指令,允许用户使用自然语言进行检索,模型会找到相应的内容。它在很大程度上是使用合成数据生成开发的,包括网络抓取、数据集分组、清洁、元数据扩展等步骤。

  • CosmoPedia项目的目的是什么?

    -CosmoPedia项目是尝试使用合成数据重新创建类似于微软的Fi模型的训练过程。Fi模型主要在教科书上进行训练,CosmoPedia项目的目的是通过使用高质量的合成数据,减少训练数据的需求,从而使得模型不必过于庞大也能表现出色。

  • LLM(大型语言模型)基础事实验证的重要性是什么?

    -LLM基础事实验证的重要性在于它可能标志着基于LLM的事实验证时代的开始。通过使用LLM代理作为评估者,开发了一种安全搜索增强的事实性评估方法,该方法在与人群源化注释者的意见不一致时,有76%的情况下自动系统是正确的。

  • myvLM模型的目的是什么?

    -myvLM模型的目的是个性化视觉语言模型。它尝试教授视觉语言模型关于“我”、“我的狗”、“我的房子”和“我的朋友”等概念,使模型能够从第一人称视角回答问题和接收指令。

  • Latte 3D模型的主要创新点是什么?

    -Latte 3D模型的主要创新点在于它能够非常快速地从文本生成高质量的纹理网格。它通过结合3D先验、摊销优化和表面渲染的第二阶段,在仅400毫秒内稳健地生成这些网格。

  • Dolphin 2.8模型的特点是什么?

    -Dolphin 2.8模型是一个微调的mstl 7B模型,它的特点是未审查的,意味着数据集已经被过滤以去除任何偏见和对齐样本。这使得模型更加合规,建议在将模型作为服务公开之前实施自己的对齐层。

  • Quen 1.5e模型的性能如何?

    -Quen 1.5e模型的性能与7亿参数模型相当,但它只有三分之一的激活参数。这表明通过使用混合专家模型,可以在减少活跃参数数量的同时保持模型性能。

Outlines

00:00

🚀 AI 22 Labs的Jamba模型介绍

本段落介绍了AI 22 Labs的Jamba模型,这是一个混合模型,结合了Mamba架构和注意力层。Jamba模型被称为开创性的SSM(State Space Machine)Transformer模型,它在不增加内存需求的情况下实现了长文本上下文的推理性能。模型公开可用,遵循Apache 2许可,并在关键基准测试上表现良好。此外,还提到了Data Bricks的dbrx模型,这是一个超过100亿参数的大型模型,不仅在自然语言理解方面表现出色,还在编程和数学方面有优异的性能。

05:01

📈 新模型和数据集的发布

这段内容涉及了多个新模型和数据集的发布。首先,提到了cohere的command R plus模型,这是一个在10种不同语言中训练的、性能更优的命令优化和检索增强的生成模型。然后是True的7B模型,这是一个基础模型,用于训练新的指令模型。接着,介绍了Google Research的Video Poet模型,这是一个文本到视频的零样本视频生成模型。此外,还提到了Magic Lens,这是一个使用自然语言进行图像检索的模型。最后,提到了Cosmo Media项目,该项目尝试使用合成数据来训练模型,以及Google Deep Mind的关于大型语言模型中长篇事实性的研究。

10:03

🧩 模型个性化和3D合成的新进展

本段落讨论了模型个性化和3D合成领域的最新进展。首先,提到了Snap和特拉维夫大学合作的myvLM论文,该研究旨在教授视觉语言模型理解与个人相关的概念,如“我”、“我的狗”、“我的房子”和“我的朋友”。接着,Nvidia的Latte 3D论文介绍了一种新的文本到3D合成方法,该方法能够快速生成高质量的纹理网格。此外,还有Meta的场景脚本模型,旨在理解物理空间,支持增强现实。最后,提到了Sakana AI关于进化基础模型的研究,探讨了模型合并和神经架构搜索的进展。

15:05

🔍 Salesforce AI研究的moai模型和其他新模型

这一段介绍了Salesforce AI研究发布的moai模型,这是一个用于通用预测的时间序列基础模型。接着,提到了关于QAR(查询响应系统)的泄露信息,以及H2O AI发布的H2O Den 2模型,这是一个在许多令牌上训练且在排行榜上表现良好的1.8亿参数基础语言模型。此外,还提到了Materiality Labs的Garment 3D Gen模型,该模型用于生成服装,以提高虚拟现实中的购物体验。还有Nexa AI的Octopus V2模型,这是一个为Android API工具使用和功能调用优化的20亿参数模型。

20:07

📚 无审查模型和OCR数据集的发布

本段落讨论了Dolphin Emoji模型,这是一个未经审查的、经过微调的mistal 7B模型,它去除了数据集中的偏见和对齐样本。接着,提到了新闻研究所发布的1bit llm或1.5 bit llm的复现,以及Jet Moe的模型,这是一个以不到10万美元成本训练的8亿参数模型,性能与Llama 2相当。此外,还提到了Quen 1.5e模型,它以较少的激活参数达到了7亿参数模型的性能。最后,提到了lm聊天机器人竞技场排行榜和Berkeley函数调用排行榜,以及LaMINI发布的摄影记忆评估套件和Hugging Face上的OCR数据集。

25:09

🏆 模型评估和排行榜更新

这段内容主要关注模型评估和排行榜的更新。首先,提到了Starling 7 Beam模型在排行榜上的表现,尽管其参数较小,但与其他大型模型相比表现良好。接着,讨论了Claude 3模型在Berkeley函数调用排行榜上的表现。此外,还提到了TREC(文本检索会议)的跟踪评估,以及LaMINI发布的摄影记忆评估套件,这是一个集合了多个基准测试的套件,专注于精确匹配特定事物的任务。最后,提到了Clement Delong发布的两个最大的OCR开放数据集,旨在改进OCR技术。

Mindmap

Keywords

💡Jamba模型

Jamba模型是由AI 22 labs开发的一种混合模型,它结合了Mamba架构和一些注意力层。这种模型能够在不显著增加内存需求的情况下,实现非常长的上下文性能和推理。Jamba模型在关键基准测试上表现良好,并且是开源的,遵循Apache 2.0许可。

💡Mamba架构

Mamba架构是一种进化的State space模型,它类似于循环神经网络,但能够一次性完成所有计算。它结合了卷积和长注意力机制的特点,但并不完全相同。Mamba架构是Jamba模型的核心组成部分,使其能够高效地处理长距离依赖。

💡dbrx模型

dbrx是由Data Bricks推出的一个大型语言模型,拥有超过100亿参数,并在自然语言理解、编程和数学等多个领域表现优异。该模型采用了专家混合架构,总共有1320亿参数,但每次输入仅激活360亿参数。这种精细的专家选择机制提供了更多的专家组合可能性,从而提高了模型质量。

💡合成数据生成

合成数据生成是一种利用算法生成模拟数据的技术,它在训练大型语言模型和图像模型中变得越来越重要。例如,Google Deep Mind的Magic Lens模型和Cosmo Media项目都使用了合成数据来训练模型,这样做可以在使用较少数据量的情况下,提高模型的性能。

💡长形式事实性

长形式事实性是指在大型语言模型中对长篇内容进行事实性评估的能力。Google Deep Mind的一篇论文中,研究者们开发了一种方法,使用大型语言模型作为评估代理,来验证长形式内容的真实性。这种方法在与人类评估者的意见不一致时,有76%的概率是正确的,这表明自动化系统在某些情况下可能比人类更准确。

💡个性化视觉语言模型

个性化视觉语言模型是指能够理解和推理关于个人特定概念(如用户、用户的狗、朋友等)的模型。这种模型通过教授预训练的视觉语言模型来理解这些个性化概念,从而能够从第一人称视角回答问题和接收指令。这对于社交媒体平台等需要个性化用户体验的应用场景非常有用。

💡3D合成

3D合成是指将文本转换为三维模型的技术。Nvidia的一篇论文介绍了一种名为Latte 3D的方法,它能够快速生成高质量的纹理网格,只需400毫秒。这项技术通过结合3D先验知识、优化和表面渲染的第二阶段,大大加快了3D合成的速度。

💡时间序列基础模型

时间序列基础模型是指能够处理和预测各种时间序列数据的模型。Salesforce AI Research发布的MOAI是一个前沿的时间序列基础模型,它旨在成为一个通用的预测器,适用于任何类型的时间序列数据。这种模型的提出意味着可能存在一种通用的方法来理解和预测时间序列数据。

💡模型合并

模型合并是一种将同一模型的不同微调版本结合起来的技术。这种方法可以创造出新的模型,它可能具有更好的性能。Sakana AI的一篇博文探讨了进化模型合并的概念,即通过神经架构搜索技术来自动化地找到最佳的模型合并策略。

💡OCR数据集

OCR(光学字符识别)数据集是为了改进OCR技术而发布的大型开放数据集。这些数据集包含了用于训练和评估OCR模型的各种文档和字体样式的图像。发布这些数据集有助于提高OCR技术的准确性,尤其是在处理复杂文档布局和不同字体时。

💡对话系统

对话系统是一种使用自然语言处理技术来模拟人与人之间对话的人工智能系统。在脚本中提到的QAR是一个由OpenAI概念化的对话系统,它使用基于能量的模型来全面评估潜在的回应,并进行优化和抽象表示。这种系统的目标是提供更自然、更人性化的对话体验。

Highlights

AI 22 labs发布了Jamba模型,这是一个混合模型,结合了Mamba架构和注意力层,实现了长上下文性能推断,同时没有显著增加内存需求。

Jamba模型在关键基准测试上表现良好,并且可以在Apache 2许可下公开获取。

Data Bricks发布了一个新的大型语言模型dbrx,它在自然语言理解、编程和数学方面表现出色。

dbrx模型使用了专家混合架构,具有1320亿参数,但每次给定输入时只有360亿参数处于活跃状态。

Cohere发布了Command R Plus,这是一个性能更优的模型,专门针对命令优化和检索增强的生成进行了优化。

Command R Plus模型不是开源的,而是以高级开放权重的形式提供,个人使用免费,商业用途需要付费。

Misto发布了一个新的7B模型,这是一个基础模型,用于训练他们的新指令模型,具有32,000个令牌的上下文窗口。

Google Research发布了Video Poet模型,这是一个文本到视频的零样本视频生成模型。

Google Deep Mind的Magic Lens是一个图像检索模型,可以使用自然语言进行检索。

CosmoMed项目使用合成数据来训练模型,展示了高质量训练数据的重要性。

Google Deep Mind的另一份论文探讨了大型语言模型中的长篇事实性问题,提出了一种安全搜索增强的事实性评估方法。

Snap和特拉维夫大学的研究论文myvLM探讨了视觉语言模型的个性化问题。

Nvidia的Latte 3D是一个文本到3D合成的模型,能够快速生成高质量的纹理网格。

Meta的场景脚本模型旨在理解物理空间,支持增强现实。

Sakana AI发布了关于进化基础模型的新博客文章,探讨了模型合并的策略。

Google Research的Rad Splat是一种用于鲁棒实时渲染的辐射场信息高斯溅射方法。

Salesforce AI Research发布了MOAI,这是一个用于通用预测的时间序列基础模型。

H2O AI发布了H2O Den 2,这是一个1.8亿参数的基础语言模型,在排行榜上表现良好。

Materiality Labs的Garment 3D Gen模型可以生成逼真的服装,为虚拟世界和增强现实提供更好的购物体验。

Nexa AI的Octopus V2是一个为Android API工具使用和功能调用优化的20亿参数模型。

Dolphin 2.8 Mistal 7B v0.2是一个经过微调的模型,去除了数据集中的偏见和对齐样本。

News Research发布了1bit llm或1.5 bit llm的复现,即Microsoft的Bit Net论文。

Quen 1.5e展示了如何通过激活更少的参数来匹配7亿参数模型的性能。

LM's chatbot Arena排行榜显示,较小的模型Starling 7在与大型模型的比较中表现良好。

Claude 3在Berkeley功能调用排行榜上名列前茅,该排行榜基于gorilla系列的模型和数据集。

LaMaMi发布了摄影记忆评估套件,这是一系列关注精确匹配特定事物的基准测试。

Clement Delong宣布发布了两个最大的OCR开放数据集,以改善OCR技术。

Transcripts

play00:00

helloo everyone I hope you're having a

play00:02

wonderful Monday today we're going to

play00:03

dive into some new models that came out

play00:06

in the last 2 weeks it's an exciting

play00:11

time and the first one is AI 22 labs's

play00:15

Jamba model this is a hybrid model

play00:18

between the Mamba architecture and if

play00:21

you don't know what a Mamba is it's

play00:23

essentially an evolution of State space

play00:26

models which is sort of like a recurrent

play00:29

neural network but not really because

play00:31

you can like do everything at once and

play00:33

it's like a convolution but not really

play00:35

but it's also like really long attention

play00:37

but not really I've made a video on

play00:39

Mamba if you want to check that out go

play00:41

check that out in any case Jamba is well

play00:45

they call it a groundbreaking SSM

play00:47

Transformer model it mixes the Mamba

play00:50

architecture with a few attention layers

play00:53

and therefore achieves really really

play00:54

long context performance inference

play00:57

without blowing up memory requirements

play01:00

this could be really interesting the

play01:01

model is available openly under Apache 2

play01:05

very very cool and performs quite well

play01:08

on key benchmarks so they solve they

play01:11

essentially this diagram is their

play01:13

advertising they do get benefits like

play01:15

the quality benefits from Transformers

play01:18

using attention layers but they also get

play01:21

very high throughput and low memory

play01:23

footprint very very excellent big fan so

play01:26

check out Jamba there's also a paper

play01:29

that includes exactly how they mix the

play01:32

different layers together a part of that

play01:33

is also covered in the blog post as you

play01:36

can see right here a Jamba block

play01:38

essentially contains Mamba layers

play01:40

mixture of expert layers Transformer

play01:43

blocks and so on another openly

play01:46

available language model is

play01:49

dbrx new state-of-the-art open llm by

play01:53

data bricks this is a model that's large

play01:56

so it is over 100 billion parameters but

play01:59

it performs really really well across

play02:02

not just natural language understanding

play02:04

but also programming and math excellent

play02:07

model openly available also uses mixture

play02:11

of expert architecture so it has 132

play02:14

billion parameters in total but only 36

play02:17

billion parameters are active on any

play02:19

given input it's been trained a lot of

play02:22

tokens code and data and they say

play02:25

compared to other open mixture of expert

play02:27

models it is fine grained meaning it

play02:29

uses a larger number of smaller experts

play02:33

so it has 16 experts and chooses four

play02:36

this provides 65 times more possible

play02:39

combination of experts and they found

play02:40

this improves model quality they do

play02:42

release numbers on a whole set of

play02:44

benchmarks and outperform the

play02:46

competition models on many of them and

play02:49

even if they compareed to closed models

play02:51

like big API models their performance

play02:54

holds up very well now it didn't take

play02:56

long before people discovered what its

play02:59

system prompt is and the interesting

play03:02

bits in here is that essentially it has

play03:04

kind of the guard rails prompting inside

play03:07

of it but then also it has a section

play03:10

about you were not trained on

play03:12

copyrighted books song lyrics poems

play03:14

video transcripts or news articles you

play03:17

do not divulge in details of your

play03:19

training data like sure sure I have no

play03:23

clue whether it was trained on these

play03:25

things or not but sometimes like

play03:27

malicious journalists will like type

play03:29

into a model were you trained on

play03:32

copyrighted material and the model will

play03:35

say I don't I don't know like these

play03:37

journalists do not understand that that

play03:40

type of question is absolutely

play03:42

nonsensical to ask such a model and any

play03:45

output is just random noise and then

play03:47

they will write like a big story about

play03:49

it or something like this out of their

play03:51

complete lack of understanding so I get

play03:53

the fact that you want to prevent that

play03:56

by putting it into the system prompt but

play03:59

it's still kind of funny Co here

play04:01

introduces command R plus after command

play04:04

R command R plus is a more performant

play04:08

model that's state-of-the-art command

play04:10

optimized and retrieval augmented

play04:12

generation optimized it's available in

play04:15

10 different languages and it's trained

play04:17

to do citations to use tools and so on

play04:20

this model is not available under an

play04:22

open source license it's available as

play04:24

open weight so it's essentially premium

play04:26

you get to play with it personally as

play04:28

much as you want but then as soon as you

play04:30

want to use it in a commercial context

play04:32

you have to pay coh here some money also

play04:34

nice cookie Banner here coh here so

play04:37

there's an accept all button there's no

play04:39

decline there's a manage settings and

play04:41

here oh yeah okay everything's turned

play04:43

off that that's nice okay how about the

play04:45

oh no it's again accept all but that's

play04:48

on top ah Shenanigans in any case for

play04:51

the research Community this is

play04:53

definitely a cool model essentially what

play04:56

you can do with this is you can kind of

play04:58

prepare for up coming true open- Source

play05:01

models that have these capabilities we

play05:03

can expect that in a month or two fully

play05:05

open- Source models will have caught up

play05:07

with this development and you can

play05:09

essentially use this in order to prepare

play05:11

for that now obviously coh here's game

play05:13

plan is that you'll get so used to this

play05:16

model and you'll make your prompts so

play05:19

that they work well in this model such

play05:21

that when you start making money with it

play05:24

you'll have to come to them we'll see

play05:26

it's a new world and new business models

play05:28

emerge this of Premium type open weight

play05:32

models is an approach that we've seen

play05:34

from other places as well back to True

play05:36

open source mistol releases a new 7B

play05:40

model it's a base model they say it's a

play05:42

raw pre-train model used to train their

play05:44

new instruct model 32,000 tokens context

play05:48

window and instructions to f- tune they

play05:50

released this for a hackathon but does

play05:53

seem to be quite performant all right

play05:54

back to the world of places that do not

play05:57

release their models video poet by

play06:00

Google research still a very cool model

play06:02

this is a text to video Zero shot video

play06:05

generation model it's not only video

play06:08

generation but it's any sort of text and

play06:11

video intermingling you can see here the

play06:14

video poet model is trained on a

play06:16

multitude of different objectives which

play06:19

then allow it to do yeah text to video

play06:21

image to video stylization and so on

play06:24

there is a paper available if you want

play06:27

to read that and quite a number of Dem

play06:29

demonstrations yeah that's it another

play06:31

paper by Google Deep Mind Is Magic lens

play06:34

this is a image retrieval with

play06:37

open-ended instructions meaning you can

play06:39

retrieve with natural language and it

play06:42

will go and find that stuff so kind of

play06:44

like a search engine I guess but the

play06:47

clue here is that this was developed to

play06:49

a large part with synthetic data

play06:52

generation and that's something that we

play06:54

see more and more frequent in recent

play06:56

times there is an entire pipeline here

play06:59

that goes into you know how they did it

play07:01

uh web scraping grouping cleaning

play07:04

metadata expansion right not just

play07:07

extraction but expansion so this already

play07:10

includes some trained models then

play07:12

scoring and filtering including other

play07:14

text and image models such as clip and

play07:17

then instruction generation meaning

play07:19

image description and so on so

play07:21

essentially reversing the process

play07:23

starting from the images generating the

play07:25

synthetic data around that and then that

play07:28

gives you a data set to do this kind of

play07:30

open world or how do you call it open

play07:33

instruction image retrieval training so

play07:36

the model itself yeah it's cool but I

play07:38

think the trend of synthetic data

play07:41

generation is a cool Trend and certainly

play07:43

something to look out for another

play07:45

investigation into synthetic training

play07:48

generation is the cosmop media project

play07:51

there is a blog post on hugging face

play07:53

this is a reproduction of fi Microsoft's

play07:56

fi models have been trained largely on

play07:59

textbooks essentially their premise is

play08:01

hey if you use really high quality

play08:03

training data you don't need so much

play08:06

training data and your models don't have

play08:08

to be so large and you can still perform

play08:11

really really well and cosmop media is a

play08:14

attempt at recreating that except using

play08:18

synthetic data starting out from all

play08:21

kinds of seed prompts and so on Oh no I

play08:23

got it wrong according to F 1.5

play08:25

technical report the authors curated

play08:27

20,000 topics to produce M billion

play08:30

tokens of synthetic textbooks while

play08:32

using samples from the web data sets for

play08:35

diversity see I got it wrong fi already

play08:38

made synthetic data I wasn't aware of

play08:41

that I thought really they just had high

play08:43

quality data apparently fi already used

play08:46

synthetic data and then this is a open

play08:49

reproduction of that very very cool

play08:52

another Google Deep Mind paper long form

play08:54

factuality in large language models this

play08:57

is a paper that goes into well as they

play08:59

say long form factuality in open domains

play09:03

they first generate a data set long fact

play09:06

a prompt set comprising thousands of

play09:09

questions then they develop a method to

play09:12

use llm agents as evaluators which they

play09:15

call Safe Search augmented factuality

play09:18

evaluator and then they develop a model

play09:21

that tackles this they develop a data

play09:23

set they develop an evaluation method

play09:26

and they develop a model this could be

play09:28

three different papers but it's all in

play09:30

one right here what I found really

play09:32

really interesting is they obviously

play09:35

have to research and validate each of

play09:37

those steps and especially this safe

play09:39

method this verification method using

play09:42

agents they say it agrees with

play09:44

crowdsourced human annotators 72% of the

play09:48

time so not a huge disagree and not a

play09:50

huge agreement I would have expected

play09:52

more but on a random subset of 100

play09:55

disagreement cases safe wins 76% of the

play10:00

time whenever this and humans disagree

play10:03

more often than not the automated system

play10:06

is correct and this is pretty

play10:08

interesting the era of llm based fact

play10:12

verification might have begun so the

play10:15

code to this is available thankfully so

play10:17

you can go and check that out my vlm is

play10:20

a paper by snap and Tel Aviv University

play10:23

and it is about personalization of

play10:26

vision language models this paper

play10:28

attempts to to teach a vision language

play10:31

model about Concepts such as me and my

play10:35

dog and my house and my friend so

play10:39

personalizing these models to then be

play10:41

able to answer questions and give

play10:44

instructions for receive instructions

play10:46

from an ego perspective from a

play10:48

perspective of of me and you know what's

play10:51

important to me I guess given a set of

play10:53

images depicting user specific Concepts

play10:56

such as you your dog your friend we

play10:58

teach pre-trained Vision language model

play11:01

to understand and reason over these

play11:03

Concepts so the result is that the

play11:05

vision language model can do for example

play11:07

do personalized visual question

play11:09

answering for example what are you doing

play11:11

it will understand who you are what is

play11:13

your friend wearing it'll understand who

play11:16

that is and then do the correct thing

play11:18

obviously very interesting for social

play11:20

media platforms such as snap also

play11:23

interesting probably in many other cases

play11:25

latte 3D large scale amorti text to

play11:28

enhanced 3 D synthesis is paper by

play11:31

Nvidia and takes care of text to 3D text

play11:35

to 3D is another new domain propping up

play11:38

in the whole text to XYZ stuff the clue

play11:42

here is that they managed to do this

play11:44

really really quickly so they say latte

play11:46

3D generates highquality textured meshes

play11:49

from text robustly in just 400

play11:52

milliseconds by combining 3D priors

play11:54

amortized optimization and a second

play11:57

stage of surface rendering generating

play11:59

these things so far has always taken a

play12:02

long time and with combining several

play12:05

advances in the field and this paper

play12:07

pushes this ahead and can do really

play12:09

really fast generation that wasn't

play12:11

generation that's just a website but

play12:13

still really fast scene script by meta

play12:16

is a model that is aimed to understand

play12:19

the physical spaces around you with the

play12:21

goal of supporting augmented reality as

play12:24

you can see it will try to reconstruct

play12:27

or annotate things in the environment

play12:30

and do scene understanding of what there

play12:33

is interestingly enough this is also

play12:36

trained on synthetic data from newer

play12:39

from simulators again synthetic data

play12:42

being the basis for many of those newer

play12:45

things very neat very cool champ is a

play12:48

human image animation method taking an

play12:51

image of a person you can make it go

play12:53

boing boing and Dany dance I have not

play12:56

followed this branch of research too

play12:59

much I have to say however you can see

play13:02

the advantage like the how it goes

play13:04

forward in the field it used to be that

play13:06

you could maybe turn your head a little

play13:08

bit one year ago and then you could

play13:09

maybe make the pictures say something

play13:12

and now from a single image you can have

play13:15

them fullblown dance around with giant

play13:18

movements and being consistent now this

play13:21

is powered by a parametric model of

play13:25

humans so actually something that's

play13:26

aware that humans have arms and head and

play13:29

so on but it's still extremely

play13:32

impressive just the progress that is

play13:34

made by these models in such a short

play13:37

time Sakana AI has released a new blog

play13:40

post about evolving Foundation models

play13:43

they are investigating into evolutionary

play13:46

model merging model merging is when you

play13:48

combine different fine tunes of the same

play13:51

models you can take a layer from here

play13:53

and take a layer from here and kind

play13:55

of the problem of course is if you have

play13:59

bunch of models there are almost an

play14:01

infinite way of merging the two models

play14:04

so how are you going to do that and I

play14:06

think this is where we go back to a

play14:07

domain that has been I say neglected a

play14:10

little bit in the last few years which

play14:12

is kind of architecture search neural

play14:14

architecture search and automated neural

play14:16

architecture search except now this

play14:19

obviously is in the context of model

play14:21

merging so can we evolve architectures

play14:25

merging uh strategies between different

play14:28

models and that's what this blog post

play14:31

goes into so it's very very cool if you

play14:35

want to explore that another Advance by

play14:37

Google research paper rad Splat is

play14:40

Radiance field in formed gausian

play14:42

splatting for robust real-time rendering

play14:44

with over 900 frames per second again I

play14:48

remember just like a very few years ago

play14:52

when Nerfs came on the scene and were

play14:54

like oh wow if you just invest one hour

play14:57

of GPU time you can sort of tilt the

play15:00

camera a little bit around the scene

play15:03

that had to be really defined really in

play15:05

one place you can you can like move

play15:08

through rooms and go at 900 FPS and

play15:11

whatnot absolutely crazy absolutely

play15:14

crazy Salesforce AI research releases

play15:17

moai a Time series Foundation model for

play15:20

Universal forecasting a Cutting Edge

play15:23

time series Foundation model offering

play15:24

Universal forecasting capabilities this

play15:27

is supposed to be one model for all

play15:31

kinds of Time series which is a bit

play15:33

special because time series are they're

play15:36

just pound together by the fact that

play15:37

they somehow do something over time

play15:40

there's nothing inherently connecting

play15:42

and know web search patterns with uh

play15:46

Health metrics with e-commerce stuff and

play15:49

so on yet still this model is a combined

play15:52

model that attempts to unify all these

play15:54

things kind of like how we shove all

play15:56

kinds of language into language models

play15:58

including PR programming including

play15:59

different languages and so on this model

play16:01

aims to be a foundation model and a

play16:03

universal forecaster for any kind of

play16:06

Time series meaning that if this works

play16:09

out it would mean technically that

play16:11

there's something kind of fundamental to

play16:13

the domain of being a Time series and

play16:16

maybe that is some sort of deeper

play16:18

insight into the Universe I have no idea

play16:21

but in any case very cool that there is

play16:23

an investigation into that looking

play16:25

forward to seeing how this field

play16:27

develops several people have published

play16:30

leaked info around qar in a paste bin

play16:35

that's now deleted can confirm the

play16:38

authenticity oh it's a dialogue system

play16:40

conceptualized by open AI yab yab

play16:43

there's energy based model for dialogue

play16:46

generation evaluate potential responses

play16:48

holistically you the optimization and

play16:51

Abstract representation this is complete

play16:53

crap I mean it could be and it could not

play16:56

be it's just funny that people have wild

play17:00

imaginations let's say H2O AI releases

play17:03

H2O Den 2 which is a 1.8 billion

play17:07

parameter Foundation language model

play17:09

trained on many many tokens and

play17:13

Performing very well on leader boards

play17:15

compared to models of its own size it's

play17:17

available if you want to check that out

play17:19

garment 3D gen by materiality Labs is a

play17:24

model that well as you can see generates

play17:27

garments the idea is that I guess in the

play17:30

metaverse you can look at your clothes

play17:33

and they look really realistic there and

play17:36

by being able to very well render

play17:39

clothes in sort of augmented reality and

play17:42

so on making sure they you know the

play17:45

fabric flows well around any sort of

play17:47

animated human it gives you a better

play17:49

shopping experience so you can look at

play17:52

really how this is going to look in real

play17:54

life or ultimately we won't have to ever

play17:56

leave VR ever again again and then it's

play17:59

also cool that we have clothes that

play18:02

behave like in a 100 years kids will be

play18:04

like oh this is like a history book of

play18:07

what the real world used to be like now

play18:09

we're all just in capsules in some uh

play18:11

goo plugged into the Matrix I guess you

play18:14

can consider this an important moment

play18:16

for future historians octopus V2 by Nexa

play18:19

AI is a 2 billion parameter model that

play18:23

is specifically optimized for Tool use

play18:26

and function calling specifically in the

play18:29

domain or in the context of the Android

play18:31

API so the idea is that you can interact

play18:34

with your Android phone or Android

play18:36

device using natural language and this

play18:39

model is very good at using the Android

play18:42

API to get different things for you done

play18:45

very cool dolphin 2.8 mistal 7B v0.2

play18:49

dolphin Emoji is a model that is a fine

play18:52

tune of mistal 7B notably it is a fine

play18:55

tune of mstl 7B v0.2 the model we saw

play18:59

before but notably as all dolphin models

play19:02

they are uncensored which essentially

play19:04

means that the data set has been

play19:07

filtered to remove any kind of bias and

play19:10

Alignment samples that are in there

play19:12

saying this makes the model more

play19:13

compliant you are advised to implement

play19:15

your own alignment layer before any

play19:17

exposing the model as a service it will

play19:20

be highly compliant to any request even

play19:22

unethical ones I for one am very

play19:24

grateful that people going into this

play19:26

direction this doesn't mean I endorse

play19:29

using these models for any sort of evil

play19:32

or dumb purposes what it does mean is

play19:35

that you want to pass the responsibility

play19:37

of making sure that your deployments are

play19:39

safe to the people actually deploying

play19:42

the model in this way they have a choice

play19:44

over how to implement their own guard

play19:46

rails and not be subject to the guard

play19:49

rails that are already in place and

play19:50

especially for specialized Fields this

play19:53

makes a lot of sense for example the

play19:55

apis like open AI they're super duper

play19:58

careful for anything medical right now

play20:01

imagine you actually want to purpose

play20:03

build this thing in the medical domain

play20:05

where you know all the users are

play20:07

educated professionals who are aware of

play20:09

the limitations of these language models

play20:11

it makes a lot of sense that you can put

play20:13

into place your own guard rails here

play20:15

news research is releasing A

play20:18

reproduction of the famous 1bit llm or

play20:21

1.5 bit llm the bit net paper by

play20:25

Microsoft they published their findings

play20:27

on weights and biases and the hugging

play20:29

face Hub Excellent Jet Moe says they're

play20:32

reaching llama 2 performance with 0.1

play20:36

million this is a 8 billion parameter

play20:40

model trained with less than $0.1

play20:42

million cost whereas llama 2 of the same

play20:46

size cost probably who has a

play20:48

multi-billion dollar training resources

play20:51

I like the fact that they still express

play20:53

it in millions to make the number seem

play20:55

small like this is still 80,000 doll

play20:59

right this is not like oh it's so cheap

play21:02

it is certainly getting cheaper and

play21:05

investigations into how to be more

play21:08

efficient MoneyWise in training these

play21:10

things is very welcome very very cool

play21:13

that people are doing it but it's still

play21:15

80 grand it's still quite a hefty thing

play21:18

it's an h100 cluster 96 of them for 2

play21:22

weeks that's going to run you quite a

play21:24

bill one notable thing of this paper is

play21:27

that it has two phases of training the

play21:30

first phase is more General data mixture

play21:33

while the second phase is more

play21:35

specialized more highquality data set

play21:38

mixture notably during the first phase

play21:40

the learning rate ramps up and then

play21:42

stays constant and during the second

play21:44

phase the learning rate actually decays

play21:46

learning rate in combination with the

play21:49

sequencing of data seems to be quite an

play21:52

important piece in training these modern

play21:54

language models and lastly quen

play21:57

1.5e says they're matching a 7 billion

play22:01

parameter model performance with 1/3

play22:04

activated parameters it's a new world so

play22:07

how you present your models and which

play22:09

numbers and so on they are looking at

play22:12

okay what if we take like mistal 7B or

play22:15

quen 1.5 7B which are 7 billion

play22:18

parameter models but they're densely

play22:20

activated models so all 7 billion

play22:22

parameters are active like what if we

play22:25

compare that to a mixture of expert

play22:27

model that only has 2.7 billion

play22:30

activated parameters per forward pass

play22:33

now this isn't saying how many

play22:35

parameters are total there however like

play22:38

you can play around with the numbers to

play22:40

make your model look as good as possible

play22:43

still these are very very cool

play22:44

developments and I do think one

play22:47

recognition here is the fact that they

play22:49

say the training I'm not sure where it

play22:51

was but essentially the initialization

play22:53

of these mixture of expert models can

play22:55

make quite a difference what they do is

play22:58

they don't train a mixture of expert

play22:59

models from scratch but they take an

play23:02

already pre-trained small model so their

play23:06

1.8 billion parameter model and they use

play23:09

it as an initialization of these

play23:12

different experts so already having

play23:14

pre-trained models in place and then

play23:16

kind of initializing your experts with

play23:19

those they say makes a big difference to

play23:22

their training every now and then we

play23:24

like to keep an eye on the lm's chatbot

play23:26

Arena leaderboard this is where models

play23:29

are pitted against each other with the

play23:31

same prompts and then humans decide

play23:33

which one's better and notably you can

play23:36

see the kind of usual contenders here

play23:39

actually claw three Opus Rising on top

play23:42

which is uh very cool to see but you can

play23:45

see that among all of these really big

play23:48

models is one model that's quite small

play23:51

Starling 7 beam under Apache 2.0 7

play23:54

billion parameters and holds itself

play23:57

quite well compared to these other

play23:58

models you can see the first comparable

play24:00

model is another Starling model that's

play24:03

way down here yeah it seems to hold up

play24:06

fairly well against these other models

play24:08

not this is one way of evaluating models

play24:10

it's obviously not going to be good at

play24:12

everything whereas I would trust that

play24:14

these larger models they're going to

play24:16

have much more of a spread of abilities

play24:18

so you can use them in a versatile

play24:20

manner whereas this might just be good

play24:23

in this sort of okay give one prompt and

play24:25

then do a head-to-head comparison

play24:27

between the to single shot and then

play24:30

whatever humans care about which might

play24:32

not be the full spectrum of abilities

play24:34

but still very cool another leaderboard

play24:37

Claude 3 also tops the Berkeley function

play24:40

calling leaderboard so the function

play24:42

calling leaderboard is a leaderboard

play24:44

that's based on the gorilla series of

play24:47

papers models data sets I'm not sure how

play24:49

to call that but it measures how well

play24:52

language models can call functions which

play24:54

is important for any sort of agent

play24:56

framework or anything like this

play24:58

apparently Claude 3 now topping that

play25:00

leaderboard Trek now has a rag uh track

play25:05

Tre track on rack an evaluation Tre is

play25:09

an information retrieval conference I

play25:12

guess they organize evaluations of

play25:14

information retrieval systems it's very

play25:16

cool to see a retrieval augmented

play25:18

generation track now inside of those

play25:21

evaluations so obviously that

play25:22

organization has a lot of experience

play25:24

evaluating information retrieval systems

play25:27

and now there's going to be one in tre

play25:30

2024 for rack lamini releases a

play25:33

photographic memory evaluation Suite

play25:36

this is a conglomeration of benchmarks

play25:38

essentially a collection of benchmarks

play25:40

that all have to do with what they call

play25:42

photographic memory what is photographic

play25:45

memory it's not what you think at first

play25:47

essentially what they just say is there

play25:49

are tasks where it's really important to

play25:51

remember the thing correctly like if you

play25:53

want to classify diseases into their

play25:56

disease codes there are these icd1 codes

play25:59

for diseases you really better make sure

play26:01

that it's the actual correct one and not

play26:04

just like a similar one a collection of

play26:06

these benchmarks that really focus on

play26:08

exact matching of a particular thing is

play26:12

comprises this evaluation Suite so these

play26:14

are existing benchmarks but there is one

play26:17

script that you can run and evaluate

play26:20

your model on these kinds of benchmarks

play26:22

very cool and lastly Clement Delong just

play26:25

tweeted out or X Out someone telling me

play26:27

he just posted that they have released

play26:30

the two biggest open data sets for OCR

play26:34

excellent you can check those on hugging

play26:36

face OCR still a mystery after all these

play26:39

years because layouts of documents are

play26:42

just dumb and the PDF standard is just

play26:46

dumb and fonts are just dumb after all

play26:50

these years OCR still a big problem and

play26:52

it's very cool to see data sets being

play26:55

published in order to make OCR better it

play26:58

will will help us all thank you all

play26:59

right this was it for models this week

play27:01

and eval and other fun stuff uh will

play27:05

return with a bit more see you

play27:11

[Music]

play27:27

around

Rate This

5.0 / 5 (0 votes)

Related Tags
AI模型Jamba模型dbrx模型自然语言编程能力数学模型视频生成TransformerMamba架构混合专家开源模型性能基准
Do you need a summary in English?