Unboxing the Tenstorrent Grayskull AI Accelerator!

TechTechPotato
31 Jan 202425:49

Summary

TLDRIn this engaging conversation, Yasmina, a fellow at T-torrent, discusses the company's innovative approach to hardware-software co-design, emphasizing the importance of accessible programming for new architectures. She introduces T-torrent's gen one hardware, the Grull dev kit, designed for developers with a focus on ease of use and community feedback. The discussion highlights the balance between high-level and low-level programming entry points, the company's commitment to open-sourcing their bare metal software stack, and the future integration with PyTorch 2.0. Yasmina shares insights on the company's goals to empower developers and grow the AI hardware ecosystem collaboratively.

Takeaways

  • πŸŽ‰ Yasmina is a guest on the channel, discussing the exciting developments at Tenstorrent, a company specializing in novel hardware and software.
  • 🌟 Tenstorrent focuses on hardware-software co-design, emphasizing the importance of making new hardware accessible through programming.
  • πŸ”§ Yasmina's background in FPGAs and high-level synthesis has been instrumental in her work at Tenstorrent, contributing to the development of their novel architectures.
  • πŸ’‘ The company is releasing its first generation hardware, called 'Grull', in the form of a developer kit, with two versions: E75 and E150, differing in wattage and size.
  • πŸ“¦ The Grull developer kit includes everything needed to set up and start working with the hardware, targeting developers and small to medium-sized businesses.
  • πŸ’» The developer kit is designed to be used with Linux, with Windows support planned for the future, and it comes with tools and drivers for developers to get started.
  • πŸ› οΈ Tenstorrent offers two software stacks: a high-level entry point called 'Buddha' for easy model deployment, and a low-level 'bare metal' programming model for fine-grained control.
  • πŸ’Έ The E75 version of the Grull developer kit is priced at $599, while the E150 version is priced at $799, positioning them as accessible to developers and businesses.
  • πŸ”„ Tenstorrent is committed to open-sourcing their bare metal software stack, allowing developers to see and interact with the underlying APIs and system components.
  • πŸ”„ The company aims for forward compatibility, ensuring that anything developers create for the current generation will work with future Tenstorrent hardware.
  • 🀝 Tenstorrent values community engagement and feedback, encouraging developers to test, experiment, and provide input to improve their products.

Q & A

  • What is the significance of Yasmina's background in FPGAs to her current role at Tenstorrent?

    -Yasmina's background in FPGAs, including her doctorate and experience in high-level synthesis, is significant because it provides her with a strong foundation in hardware-software co-design. This expertise is crucial for developing novel architectures that are easy and enjoyable to program, which is a key focus at Tenstorrent.

  • What is the main goal of Tenstorrent's hardware-software co-design approach?

    -The main goal of Tenstorrent's hardware-software co-design approach is to create hardware and software environments that are accessible and engaging for developers. They aim to provide easy entry points for programming, empowering developers to create innovative applications and feel excited about the potential of new hardware and software integrations.

  • What are the two different versions of the Tenstorrent's Grail dev kit mentioned in the transcript?

    -The two different versions of the Tenstorrent's Grail dev kit mentioned are the e75, which is a smaller card with 75 watts, and the e150, which is slightly larger and has a higher wattage.

  • How much does the Tenstorrent's Grail dev kit cost, and who is the intended audience for this product?

    -The Tenstorrent's Grail dev kit costs $599 for the 75-watt version and $799 for the 150-watt version. The intended audience is developers and small to medium-sized businesses interested in exploring the utility of the hardware for their models.

  • What is the purpose of the 'Buddha' software stack offered by Tenstorrent?

    -The 'Buddha' software stack is a compiler developed by Tenstorrent. It serves as a high-level entry point for developers, allowing them to easily download models from sources like Hugging Face and run them on Tenstorrent's hardware without needing to change their environment or rewrite their models.

  • What is the 'bare metal programming model' that Yasmina refers to, and who is it intended for?

    -The 'bare metal programming model' is a low-level software stack that provides developers with fine-grained control over the workloads running on Tenstorrent's hardware. It is intended for developers who want to have a deep understanding of the underlying architecture and are interested in customizing their applications at a lower level, including writing custom operators and exploring novel hardware features.

  • What is Tenstorrent's policy on community feedback and open sourcing their software?

    -Tenstorrent encourages community feedback and plans to open source their full bare metal software stack. They are interested in hearing both positive and negative feedback to improve their products. They believe in empowering developers and fostering an open collaboration environment, which includes sharing APIs, kernel compilation methods, memory allocators, and otherεΊ•ε±‚η»†θŠ‚ to allow developers to fully explore and utilize their hardware.

  • How does Tenstorrent ensure backward compatibility for their APIs?

    -Tenstorrent aims to maintain backward compatibility for their host APIs, which are designed to be intuitive and familiar to developers, mimicking existing models like OpenCL and CUDA. For kernel APIs, they value backward compatibility but also recognize the potential for significant performance and functionality improvements in next-generation architectures, which might require them to allow for some changes while ensuring that everything written for the current generation will work on future generations.

  • What are the differences between high-level and low-level entry points for developers in Tenstorrent's software stacks?

    -High-level entry points, like the Buddha software stack, offer a quick path to desired outcomes without requiring developers to change their environment or rewrite their models. They are less susceptible to low-level changes and provide a consolidated, specialized API for certain purposes. Low-level entry points, on the other hand, give developers full access to the hardware and allow for fine-grained control, enabling them to write custom kernels and explore the hardware's capabilities in depth.

  • How does Tenstorrent support developers in accessing their hardware, both physically and through the cloud?

    -null

  • What is the significance of Tenstorrent's decision to open source their bare metal software stack?

    -By open sourcing their bare metal software stack, Tenstorrent aims to foster community engagement and collaboration. This transparency allows developers to see and understand the underlying mechanisms of the software, including how kernels are compiled, memory is allocated, and runtime arguments are managed. It also ensures that anything developers write for the current hardware will be forward-compatible with future Tenstorrent hardware generations.

Outlines

00:00

πŸŽ₯ Introduction and Meeting Yasmina

The video begins with an introduction to the setting, a special guest, Yasmina, is welcomed. Yasmina is a fellow at T-torrent, and the host expresses excitement about upcoming developments at T-torrent. Yasmina's background in FPGAs and high-level synthesis is highlighted, emphasizing her expertise in hardware-software code design, which is crucial for making new hardware accessible through programming. The conversation touches on the challenges of programming FPGAs and the goal of making novel architectures user-friendly for developers.

05:01

πŸ“¦ Unboxing the T-torrent Hardware

The host and Yasmina proceed to unbox T-torrent's hardware, revealing two products: the Grull chip and two different cards, the e75 and e150. The e75, a 75-watt card, is the focus of the unboxing. The host describes the card's features, including its use in machine learning applications and its developer orientation. Pricing and availability for developers are discussed, along with the company's intention to make the hardware accessible and gather feedback from the developer community.

10:01

πŸ› οΈ Developer Experience and Software Stacks

The conversation shifts to the developer experience with T-torrent's hardware. Yasmina explains the two software stacks available for developers: Buddha, a high-level compiler for quickly running models, and a low-level, bare metal programming model for fine-grained control. The importance of low-level APIs for IP customers and the company's commitment to open-sourcing their software stack are emphasized. The host and Yasmina also discuss the company's approach to API compatibility and the balance between maintaining backward compatibility and leveraging new architecture capabilities.

15:02

🌐 Cloud Access and Future Roadmaps

The discussion moves to T-torrent's cloud offerings and how they serve as an easy entry point for developers. The cloud's role as an internal testing ground and its importance for customer feedback loops is highlighted. Yasmina shares insights into the company's roadmap, including plans for hardware in desktops and the desire to empower developers with both cloud and physical hardware access. The conversation also touches on the company's approach to community engagement and updates, as well as the potential for collaboration with other AI hardware startups.

20:02

🀝 Community Feedback and Next Steps

Yasmina and the host conclude the discussion by emphasizing the importance of community feedback. Yasmina outlines the process for developers to get started with the hardware, including a 'first five things' guide for both software stacks. The conversation highlights the company's commitment to open sourcing and the desire to grow collaboration with the open-source community. The host and Yasmina express optimism about the future of T-torrent and the AI hardware ecosystem.

Mindmap

Keywords

πŸ’‘Hardware-Software Co-Design

Hardware-Software Co-Design refers to the collaborative process of designing both the hardware and software components of a system to ensure they are optimized for each other. In the context of the video, Yasmina emphasizes the importance of this approach in creating accessible entry points for developers and ensuring that new hardware is easily programmable. The video highlights how this co-design strategy is a key building block at Tenstorrent, where the focus is on making novel architectures programmer-friendly and creating a smooth development experience.

πŸ’‘FPGAs

Field-Programmable Gate Arrays (FPGAs) are integrated circuits that can be programmed after manufacturing to perform specific tasks. Yasmina mentions her doctorate work in FPGAs, indicating their importance in the field of hardware development. FPGAs are noted for their flexibility and reprogrammability, which are crucial for developing novel hardware architectures and software ecosystems, as discussed in the video.

πŸ’‘High-Level Synthesis

High-Level Synthesis (HLS) is a technique used in electronic design automation that allows developers to design hardware using high-level programming languages, such as C or C++. Yasmina's experience in HLS is highlighted, showcasing her expertise in translating high-level code into hardware description languages, which is essential for creating accessible programming environments for new hardware technologies.

πŸ’‘Machine Learning Frameworks

Machine learning frameworks are software libraries that provide an environment for developing, training, and deploying machine learning models. The video mentions frameworks like PyTorch and TensorFlow, which are widely used in the industry. These frameworks are important for Tenstorrent as they aim to support and integrate with these tools to make their hardware more accessible and useful for machine learning developers.

πŸ’‘Grull

Grull is the name given to Tenstorrent's first-generation hardware architecture, which is a developer kit aimed at machine learning and AI applications. The video discusses two versions of the Grull card, the e75 and e150, differing in wattage and presumably performance. The Grull cards are designed to be developer-friendly, allowing easy access to the hardware for testing and development purposes.

πŸ’‘Bare Metal Programming

Bare metal programming refers to the process of writing software that interacts directly with the hardware, without an operating system or other abstraction layers. In the video, Yasmina talks about Tenstorrent's approach to providing developers with the ability to program at a low level, giving them fine-grained control over the hardware. This approach is contrasted with high-level entry points and is intended for developers who want to optimize their code for performance.

πŸ’‘Open Sourcing

Open sourcing refers to the practice of making the source code of a software project publicly available, allowing others to view, use, modify, and distribute the code. In the context of the video, Tenstorrent plans to open source their bare metal software stack, which includes APIs and other tools, to foster community engagement and collaboration. This is seen as a way to empower developers and to ensure that the hardware remains accessible and adaptable to future needs.

πŸ’‘Community Engagement

Community engagement in the context of the video refers to Tenstorrent's strategy of involving developers and users in the development process of their hardware and software. By providing access to the hardware, tools, and documentation, they aim to create a feedback loop that can help improve their products. Yasmina emphasizes the importance of community feedback and the company's commitment to being open and transparent with their development process.

πŸ’‘NDA (Non-Disclosure Agreement)

A Non-Disclosure Agreement (NDA) is a legal contract that establishes a confidential relationship between parties and restricts the disclosure of certain information. In the video, it is mentioned that Tenstorrent does not require NDAs for developers working with their hardware, encouraging open communication and feedback. This approach is intended to foster a collaborative environment and to enable developers to freely share their experiences and insights with the technology.

πŸ’‘Backwards Compatibility

Backwards compatibility means that a newer version of a technology or software is able to understand and work with data or code from older versions. In the video, the importance of maintaining backwards compatibility is discussed, particularly in relation to the APIs and software stacks that Tenstorrent is developing. The goal is to ensure that anything developers create for the current generation of hardware will also work with future generations, providing stability and continuity for developers.

πŸ’‘Cloud Computing

Cloud computing refers to the delivery of computing services, such as storage, processing power, and databases, over the internet (the 'cloud'). In the video, Tenstorrent mentions having their own cloud infrastructure, which allows clients and developers to access and test their hardware and software without needing physical access to the devices. This cloud-based approach provides a convenient and quick way for users to experiment with and develop for Tenstorrent's hardware platforms.

Highlights

Introduction of Yasmina, a key figure at Tenstorrent, a company focusing on novel hardware and software.

Discussion on the importance of hardware-software co-design for accessible programming.

Yasmina's background in FPGAs and her contribution to Tenstorrent's high-level synthesis efforts.

The unveiling of Tenstorrent's first-generation hardware, the Grail chip, aimed at developers and enthusiasts.

Explanation of the two different cards available: the 75-watt e75 and the 150-watt e150.

Unboxing of the Grail chip and its development kit, providing insights into the physical product.

Discussion on the necessity of active cooling even at 75 watts and future optimizations.

Pricing details for the Grail chip, with the e75 at $599 and the e150 at $799.

Emphasis on the Grail chip as a developer kit, designed for hands-on experience and feedback.

Mention of the lack of Windows support, with plans to add it in the future.

Introduction to Tenstorrent's software stack, including the Buddha compiler for high-level entry points.

Explanation of the bare metal programming model for developers seeking fine-grained control over workloads.

Tenstorrent's commitment to open-source their full bare metal software stack for community engagement.

The importance of low-level APIs for IP customers and the company's dedication to providing direct access to hardware.

Tenstorrent's cloud-based hardware, offering an easy entry point for clients and developers.

The company's approach to community feedback and support, with plans for open collaboration.

The significance of having both cloud and desktop hardware options for developers, empowering them to choose their preferred environment.

Discussion on the trend of specialized software stacks for specific hardware and the benefits of such an approach.

The competitive landscape of AI hardware startups and Tenstorrent's view on collaboration over competition.

The importance of community values in software development and Tenstorrent's commitment to making the entry point convenient and enjoyable for developers.

Transcripts

play00:00

[Music]

play00:02

[Applause]

play00:08

so hi everyone once again I'm here at 10

play00:11

torrent um we're going to ignore these

play00:13

for a second cuz I've got a special

play00:14

guest I want you to meet this is yasmina

play00:17

one of the fellows here at T torrent

play00:19

welcome to the channel yasmina thanks

play00:21

awesome to be here I've known yasmina

play00:23

since I've been doing my visits here to

play00:24

bit Jim and the family um and the first

play00:27

time we met you gave me a a very secet

play00:30

deep dive into some of the architecture

play00:32

on the T I'm not sure it was it was very

play00:35

fun and it's stuff that they definitely

play00:37

want to talk about at some point in the

play00:38

future um but we're here because ten T

play00:42

has got some exciting stuff lined up

play00:45

we're it's actually just before all that

play00:47

stuff is being finalized so there's some

play00:48

things we can't talk about but I wanted

play00:50

to get you asmin on because she's part

play00:53

of this industry that's hustling and

play00:56

bustling working with new novel Hardware

play00:59

Hardware software side um I mean you may

play01:02

deal on the software side um how's that

play01:05

been we do a lot of Hardware software

play01:07

Cod design yeah so the hardware software

play01:10

code design is one of the key aspects of

play01:11

the software right because if you just

play01:13

make new hardware but nobody can program

play01:15

it what's the point like the access the

play01:18

entry point is really really important

play01:20

right so the hardware software codesign

play01:22

is is one of the key building blocks of

play01:25

what we do and it's the key thought

play01:26

process that drives a lot of things here

play01:28

at Thor so

play01:30

what in your background brought you to

play01:31

tens T oh that's a good question I I did

play01:35

my doctorate in

play01:36

fpgas and uh sort of you know a little

play01:40

bit of the cat tool place and routee and

play01:43

then uh and then most of it on the high

play01:45

level synthesis side right and uh you

play01:48

know sometimes in the fpj industry we

play01:50

joke that the fpj is the hardest thing

play01:52

to program in the

play01:54

world which makes our our pain tolerance

play01:57

level really high uh so you could be

play02:00

needled for a long time and up with it

play02:02

yeah right yes and and it gives us a lot

play02:05

of this like a strong drive to make

play02:07

novel architectures really fun to

play02:10

program really convenient to program to

play02:12

create these easy you know entry points

play02:14

to hello world programs and then to

play02:17

create hardware and software

play02:19

environments that are fun to Tinker with

play02:21

and that developers feel empowered and

play02:24

excited about new Futures coming in and

play02:26

sort of can just brainstorm and think

play02:28

about all the co and fun apps they can

play02:31

build on top it's uh I I speak with a

play02:33

number of the fpga companies out there

play02:35

and and it's I always say you guys need

play02:38

to abstract higher and higher make it

play02:40

more and more and more accessible and I

play02:42

feel like we're kind of at that same

play02:43

point a little bit with machine learning

play02:45

right are we're dealing with lots of

play02:47

these Frameworks pytorch tensorflow Onyx

play02:50

that's right and support for those is

play02:52

vital now that's right that's right it's

play02:55

it's really interesting to think about

play02:56

the different entry points what they're

play02:58

for and then to think think about the

play03:00

levels of abstraction and then you know

play03:02

is it a really high level entry point or

play03:03

is it a low level entry point and each

play03:06

one of those is important but for a

play03:07

different use case right so you always

play03:10

want to have a high level entry point if

play03:11

you don't have one like like you know

play03:13

like the entry points are a matter of an

play03:15

and not an or right you never want to go

play03:18

I want to you know grab the developers

play03:20

by the florens and force them down this

play03:22

path never force a developer you will

play03:24

never force a developer you will welcome

play03:25

them and you will guide them and have

play03:27

them enjoy all the different paths that

play03:29

lead to Rome I.E Tor T stor and hardware

play03:33

and uh yeah it's been really fun to

play03:34

watch sort of the development of the

play03:36

different Frameworks so before we go a

play03:38

little bit down to the through I do want

play03:40

to get into one of the reasons why we're

play03:41

here so I kind of reached out and said

play03:44

hey do you have any hardware for me to

play03:46

unbox for us to unbox uh and they and

play03:49

they said kind of um so what we have

play03:52

here explain explain what we have so

play03:54

there's two products here that's right

play03:56

so in these boxes is a product that we

play04:00

call Grull or a chip we call Grull it is

play04:03

our gen one Hardware our gen one

play04:05

architecture so it's the first one that

play04:07

is going out to customers to developers

play04:09

it's a dev kit MH and uh there are two

play04:12

different cards uh e75 which is which is

play04:16

a smaller card a smaller 75 cuz it's 75

play04:19

Watts 75 watts and then this guy is e

play04:23

E150 again again with a reference to

play04:25

wattage it's it's slightly bigger and uh

play04:28

yeah I'd love to show yeah it's uh we've

play04:31

been told to focus on this one um all

play04:33

right so so so that's the 150 let's

play04:35

focus on the 75 watt um I can already

play04:37

feel it's a bit it's a bit lighter than

play04:39

this one it's a bit lighter than this

play04:41

one right yeah yeah yeah yeah that's

play04:42

right how about you do the honors oh

play04:45

it's it's it's it's Christmas well it's

play04:47

not Christmas for another few weeks but

play04:49

it's Christmas it's Christmas early so

play04:51

so so so uh when people get their hands

play04:54

on these this is what it's this is what

play04:56

they're going to end up with that's

play04:58

right this is what so

play05:00

for

play05:02

it so the minute we open uh obviously

play05:07

the standard stuff but because I know a

play05:09

lot of people in the audience are very

play05:10

familiar with the unboxing say graphics

play05:11

cards sure and everybody knows what a

play05:13

graphics card is for what's a a she

play05:15

learning card for well here's here's the

play05:17

handy everything you need to set up um

play05:20

your your your your card um and we kind

play05:24

of already know that this is the very

play05:26

basic very basic this is the PCI version

play05:30

that's right so we've got a half height

play05:32

full length card um let's shift this out

play05:35

of the way just for a second um I can

play05:38

tell you've been using this on demos

play05:40

already with clients that's right he's

play05:43

got a very few fingerprints on it um but

play05:47

this is a typical sort of machine

play05:48

learning PCO card we would see you know

play05:50

for Mass scale inference perhaps in a

play05:53

data center um but this is the developer

play05:56

version that's right so there's a bit

play05:58

more branding that's right definitely

play06:00

got the logo on side um you've got a

play06:03

blower fan it's a bit hard not to notice

play06:06

um yep you got to plug this guy in it'll

play06:08

be fun and it'll dry out you know drown

play06:10

out a little bit of noise from your

play06:11

neighbors um but it's not it's not that

play06:13

loud um and it it does fit into your

play06:15

desktop yeah yeah but it's Cu uh even at

play06:19

75 Watts you still need some amount of

play06:21

active Cooling in this form factor I

play06:23

mean if you put this in a dual wide

play06:25

double proper height PCI card that's

play06:27

right so there's there's fun things that

play06:30

we are sort of exploring and optimizing

play06:33

with respect to the fans and the cooling

play06:35

so there'll be more fun announcements

play06:37

coming down the line for that uh this is

play06:39

what it looks like today and we were

play06:41

very eager to get these out to get them

play06:43

into the hands of developers we you know

play06:45

we're we're okay with the cooling um

play06:47

it's it's it's interesting like we just

play06:49

want people using them we want them

play06:50

going to the website downloading the

play06:52

tools plugging them in trying things out

play06:54

like just banging away on the hardware

play06:57

it's um so the way that is going to go

play07:00

for you guys who are interested in this

play07:01

stuff there's um at some point they're

play07:03

going to be available to signups um with

play07:06

that somebody will reach out and at

play07:09

least acknowledge that you're actually

play07:11

truly a developer and you're actually

play07:12

going to use this stuff and and then

play07:14

it's going to be a case of you'll be

play07:15

able to buy it can we mention the

play07:16

pricing y absolutely so so uh if I if I

play07:20

remember this correctly this is going to

play07:21

be the 599 version that's US Dollars um

play07:25

and then the 150 W version will be

play07:27

$799 yeah is that right that's right um

play07:31

which uh I know for a lot of um very you

play07:34

know entrylevel developers may seem like

play07:36

a lot of money that the the realistic

play07:39

expectation you have to have is this is

play07:41

essentially a dev kit that's right it's

play07:43

designed for developers to get a uh

play07:45

grasp with a system so there are going

play07:48

to be small medium businesses who are

play07:49

wanting to see if this is useful for

play07:51

their models and that's the sort of

play07:53

price point that it goes at the it's it

play07:55

low volume typically than a massive GPU

play07:58

launch exact so you have to do factor

play08:00

that in for reference um uh SCI did a

play08:03

board and that was

play08:05

$666 uh qualcomm's dev kit for the hexan

play08:09

is is about 600 so this fits in right

play08:12

about that level exactly the ballpark is

play08:14

there we want to make the hardware

play08:15

accessible we don't want it you know to

play08:17

hit your pockets too hard like we want

play08:19

people to be excited to play around with

play08:21

it the the the the one thing I so you

play08:24

know I speak to many companies in this

play08:25

space and I I keep telling them where's

play08:27

the dev kit where's the dev kit

play08:29

accessible make it accessible these guys

play08:30

are actually doing that because it

play08:34

benefits a company like ten turnor to

play08:37

have you know several thousand

play08:39

developers with the even with if you

play08:41

even if it's high level still the

play08:42

framework and and and and and or the

play08:45

lower level um

play08:47

optimization um but yeah we actually

play08:50

have

play08:50

Hardware that's right you can smell

play08:54

it it's uh we're excited we're excited

play08:57

to get these into the hands developers

play08:59

to get feedback um it's going to be you

play09:02

know Tor is very very proud to ship to

play09:04

ship hardware and to have people you

play09:06

know give us feedback good or bad we

play09:08

want to hear it all so so let's go

play09:12

through a bit what that experience is

play09:13

going to be like and this won't

play09:14

definitely won't stand up I didn't plan

play09:16

for this it's okay there it'll be there

play09:19

um so so what's that going to look like

play09:22

for for for developers who get in touch

play09:25

end up with a card in hand is it is just

play09:28

a link to websites downloaded the yeah

play09:30

so the card sort of welcomes you and

play09:31

tells you where to go that's your first

play09:33

entry points from there you can sort of

play09:35

download the driver's tools and get the

play09:37

basic setup going um and that's kind of

play09:39

table Stakes so you can plug this into

play09:41

your desktop and then install our

play09:43

drivers and tools is it Linux only or

play09:46

are you supporting Windows as well we're

play09:47

not supporting Windows yet that's right

play09:49

yet she said yet yes I did say yet

play09:51

that's right that's right um it's it's

play09:53

on the road map it's not it's not there

play09:55

yet um and uh once you install the basic

play09:58

tools and drivers you can sort of check

play10:00

the health of the car you can see it

play10:02

come up uh there there tools that will

play10:04

you know give you confirmation that that

play10:06

the hardware works and that your

play10:08

computer recognizes what's has been

play10:10

plugged in uh from there you have a

play10:12

choice of going down one of two paths um

play10:16

or both preferably both preferably both

play10:18

it's an and not an or yeah yeah they are

play10:20

looking for testers so that's right

play10:21

that's right so one is a software stock

play10:23

that we call Buddha and it's a compiler

play10:25

um Buddha that sounds familiar it does

play10:29

yeah we talked about it a lot it's on

play10:30

our website um it's it's the compiler

play10:33

that we've been working on and uh and

play10:35

it's a it's a really fun way to get

play10:37

models working out of the box right it's

play10:40

You Know download model from hugging

play10:42

voice uh from from hugging face uh

play10:44

Buddha will compile it and it will run

play10:45

it on our so that's really really fun

play10:48

and uh we refer to this entry point as a

play10:50

you know a high level top down entry

play10:52

point because you don't have to change

play10:53

your environment you don't have to

play10:54

rewrite your model it's kind of push

play10:56

button it works it will run the hardware

play10:58

and show you what's going the other way

play11:01

is a bottom up stock that we call our

play11:02

bare metal programming model okay and

play11:05

that one is a lot lower level right so

play11:07

comes back to sort of the abstractions

play11:08

and the enter points its use case is a

play11:10

bit different right it requires you to

play11:14

you know rewrite things in Python apis

play11:18

right it's not a pie torch out of the

play11:20

box experience yeah right it's for

play11:23

developers that want to have fine grain

play11:25

control over the workloads that they're

play11:27

running on our hardware and an

play11:29

alternative path to being able to write

play11:30

kernels all the way down to kernels that

play11:32

run on our risk cores and and drive the

play11:34

heavy Math logic like custom operators

play11:37

custom operators custom data movement

play11:40

custom hacks custom you know custom

play11:43

Explorations with Noble Ops they want to

play11:46

plug into their llns like control flow

play11:49

like any kind of you know fancy caching

play11:51

new embedding like like all of that is

play11:53

accessible to you as a developer to

play11:54

Tinker with and you are never sort of

play11:57

blocked by a high level of raction layer

play11:59

you can go you can bypass it and you can

play12:01

go directly to kernels and control the L

play12:03

level Hardware in order for developers

play12:05

to do that they're going to need to have

play12:06

a deep understanding of the underlying

play12:08

architecture that's right so there's

play12:09

going to be some disclosures about the

play12:11

10 six cores soon so the feedback that

play12:14

we got from customers that have looked

play12:16

at our bare metal software stock is you

play12:19

know they would come in and they would

play12:21

they would look at a sum and after a few

play12:22

weeks they would say we understand

play12:24

everything that's in your Hardware now

play12:27

right without the need for that

play12:28

discussion which is really really fun I

play12:29

mean we have documentations right like

play12:31

obviously you know it's a non-trivial

play12:33

entry point right and um you know we

play12:35

have documentation that explains the

play12:37

high level view of the architecture and

play12:39

the programming model you know the

play12:40

two-dimensional grid of course the knock

play12:43

so we do set them up with with Basics

play12:46

and then what we see happen is that you

play12:47

know experts will go in and they will

play12:49

they will read our low-level programming

play12:51

model and and you know we we say like

play12:54

this layer is just a reflection a mere

play12:57

image of the hardware right so what you

play12:59

see there is what you get we don't try

play13:01

to package it for you we don't try to

play13:03

steer you this way or that way like you

play13:05

know what's in the hardware it's is

play13:07

what's there and then the bare metal

play13:08

programming model is just a reflection

play13:10

of how to directly Drive everything

play13:12

that's available in the engine which is

play13:14

really cool and I'm really excited about

play13:16

sharing that with the community well

play13:18

it's the developers you've worked with

play13:19

today and the the clients the companies

play13:21

the partners they've obviously been n ND

play13:24

up to the hill until now that's right um

play13:27

but but any developer gets hands on this

play13:29

there's going to be no sort of NDA in

play13:30

place and the idea is go out there go

play13:34

talk play poke right tell us what's

play13:37

wrong tell us what's right yes we want

play13:39

that feedback we want the community

play13:42

engagement so with the hardware that

play13:44

we're going to be shipping we're also

play13:45

going to be open sourcing our full bare

play13:48

metal software stock so that means that

play13:50

you get to see the apis of course but

play13:52

you also get to see everything under the

play13:54

hood you get to see the way that the

play13:55

kernels get compiled the memory

play13:57

allocator the way that the runtime

play13:58

arguments get copied onto device the

play14:00

kernels get dispatched like you see all

play14:02

of the plumbing and functionality so

play14:05

it's kind of cool everything that

play14:07

somebody's going to write for one of

play14:08

these cards is going to be Ford

play14:09

compatible with all future tense tent

play14:12

Hardware right so that's an interesting

play14:15

point um maybe the right way to think

play14:19

about it is that there's two aspects of

play14:21

apis because that's kind of like API

play14:23

compatibility s process right so there

play14:25

are host apis and then there are kernel

play14:27

apis right on the host side of the apis

play14:31

um we looked at opencl we looked at Cuda

play14:34

like we're very familiar with these

play14:35

lowlevel programming models we didn't

play14:38

want to reinvent the wheel there and so

play14:40

we mimicked those apis to be intuitive

play14:43

and kind of you know behave very very

play14:46

similar and be very familiar to

play14:47

developers at that level so host apis

play14:50

are are you know relatively easy to ke

play14:52

to keep backwards compatible I'll

play14:54

probably regret that as as soon as I say

play14:56

it right um but you know they're kind of

play14:59

defined and they've matured to a certain

play15:01

degree that you know the design space is

play15:04

is not being like wildly explored in

play15:06

that area and then on the Kernel API

play15:09

side there is a strong desire to keep

play15:11

backwards compatibility that's important

play15:13

to us um however in reality you know if

play15:17

you are allowed to color outside the box

play15:20

with next gen architectures you can make

play15:22

leaps in performance and functionality

play15:25

and it's a conscious decision right to

play15:27

kind of go okay I'm going to you know

play15:29

this is a new architecture New Gen next

play15:31

gen we're going to maintain backwards

play15:33

compatibility or we're going to allow

play15:35

ourselves to color outside the box and

play15:37

make a leap so that's backwards

play15:39

compatibility but everything will be

play15:40

forwards compatible anything you write

play15:42

for this gen will work on next gen

play15:44

that's the goal that's the goal that's

play15:46

the goal I know I know you guys have

play15:47

been pretty vocal about upcoming road

play15:50

maps and as the company has taken on you

play15:53

know new clients and new investors some

play15:54

of those things are changing um from

play15:58

your pers perspective obviously it's one

play15:59

thing to support this but you've still

play16:01

got to think about what's coming down

play16:02

the line uh how much does that change

play16:05

over time what's coming down the line

play16:08

world now no I mean how how you look at

play16:10

it from from that sort of high and high

play16:12

and low level software layer um yeah

play16:15

that's a really really good question

play16:17

high level entry points are less

play16:19

susceptible to lowlevel changes yeah

play16:21

right and developers like them because

play16:24

of that right and because they give them

play16:27

a quick path to a desired outcome when

play16:30

they stay within that that sort of high

play16:33

level programming model and we want that

play16:35

and you know we've seen like to go back

play16:37

to what you were saying earlier right

play16:39

like like before there were a lot of

play16:41

Frameworks that developers used over

play16:43

time they kind of Consolidated on two

play16:45

pie torch right so there was a

play16:47

consolidation effort now we see a growth

play16:49

in number of Frameworks high level ones

play16:51

and it's interesting to knowe that you

play16:53

know like developers seem to enjoy

play16:56

things that are for a particular purpose

play16:58

mhm and we see a lot of specialization

play17:00

there like if you make a high L API that

play17:03

aims to do everything under the sun you

play17:05

usually end up with leaky abstractions

play17:07

and developers that kind of get and

play17:09

attack surface for exactly exactly so it

play17:12

seems that at the high level you know

play17:14

there are things that are special there

play17:16

are apis that are you know specializing

play17:18

towards a certain purpose Frameworks

play17:19

that are specializing towards a purpose

play17:21

and we are in that game as well um with

play17:24

the lowlevel API we want to make sure

play17:26

that developers always have access ACC

play17:28

to the hardware and that nothing is

play17:30

hidden from them and IP business is a

play17:32

big branch of ours that's fairly

play17:33

important and for a customer that's a

play17:36

potential IP customer or is an IP

play17:37

customer they want to know exactly

play17:39

what's in there yeah right they want to

play17:40

control it they want to drive it and

play17:42

then sometimes they will get ideas about

play17:46

oh I wish I had this feature that

play17:47

feature and then they can sort of

play17:49

visualize that and and and and sort of

play17:51

figure out what they could want through

play17:53

the low-level software yeah so that

play17:55

entry point is is super important for us

play17:58

um

play17:59

I mean these cards here that we got in

play18:00

front of us they're they're more for you

play18:02

know sort of the take-home put in your

play18:03

workstation but you guys have had uh

play18:07

Hardware in the cloud for a little bit

play18:09

as well that's right buil built its own

play18:12

cloud ground ground

play18:14

up no it's important for those first

play18:17

clients who are getting on board to

play18:18

start testing how's that been it's an

play18:20

easy entry point it's super convenient

play18:23

right it's you know it's SSH and off you

play18:25

go to Hello World so that's been really

play18:28

really fun it's the fastest way to get

play18:30

people to access the hardware and try

play18:32

out simple things and then also try out

play18:34

complicated things we have customers who

play18:36

are running on the cloud today and it's

play18:38

a great test bed for us right so we make

play18:41

drops to the cloud we deliver our

play18:42

software to the cloud they are our first

play18:44

internal customer and that feedback loop

play18:46

has been really really important

play18:47

customers enjoy the very quick

play18:49

turnaround that we can give them okay in

play18:51

terms of machine access and everything

play18:53

is set up and works as they just as in

play18:56

so and the Reas so the reason why why

play18:59

you're perhaps not opening so some

play19:00

companies open up a cloud to developers

play19:02

to get one instance or whatever you guys

play19:04

are going down the Hardware Route

play19:07

because we're about ANS not ores about

play19:10

or yeah we want both y we want both um

play19:13

we have a certain amount of cloud

play19:15

capacity today we utilize it almost 100%

play19:18

every single new server we put in

play19:21

there's there's a weight list the folks

play19:22

ready to try okay and uh we want to make

play19:24

sure that developers can get Hardware in

play19:27

their desktop you know they can hear it

play19:28

run and you know they can go on and

play19:30

install all the tools and and test that

play19:32

flow as well I think hardcore serious

play19:35

developers they love being able to touch

play19:37

the hardware and install and having

play19:39

things in their own hands right as

play19:41

opposed to some server distance

play19:42

somewhere that something happened to it

play19:44

and went down like it's you know it's

play19:46

about it's about empowering vs right so

play19:49

so with the cloud it's easy to do those

play19:52

very FastTrack updates especially if

play19:53

it's a client who's putting in you know

play19:56

money um and money over time for

play19:59

individual developers how are you going

play20:02

to uh discuss with that and that

play20:05

Community um about how updates are being

play20:07

rolled out and so we have releases for

play20:11

both software Stacks y um announcements

play20:14

that will go uh together with that

play20:16

release Cadence um the sort of

play20:19

underlying tools have their own releases

play20:21

as well yeah and so all of this will be

play20:22

announced on our on our website yeah and

play20:25

developers can uplift to latest versions

play20:27

as as kind of as see fit of course on

play20:29

the cloud uh this is a little bit more

play20:31

behind behind the scenes fluid yeah a

play20:34

bit more fluid right yeah so when this

play20:38

Hardware goes out you've already got a

play20:39

support staff ready to De that's right

play20:43

we have uh we're you know we're a pretty

play20:45

small team yeah I'll say um so on the

play20:48

bare metal programming model we're we're

play20:50

not a we're not a huge team we're we're

play20:52

a small team of of you know very very

play20:54

smart individuals super excited and

play20:56

super dedicated to shipping the software

play20:59

to open sourcing it and and kind of

play21:01

showing it to the community that said

play21:03

we're not yet at the stage of being able

play21:05

to service and and sort of keep up with

play21:07

large pull requests right and I think I

play21:10

think that's normal like I think that's

play21:12

kind of like a growing stage that that

play21:14

Community mostly mostly understands um

play21:17

we want to be able to develop in the

play21:19

open we want to be able to show what's

play21:21

there um you know and then and then we

play21:24

have strong ambition to grow to the

play21:25

point that that we can have strong

play21:27

collaboration with open source

play21:30

Community does it matter that there are

play21:33

several dozen other AI Hardware startups

play21:36

out there doing their own thing did you

play21:38

ever think about you know competition

play21:40

versus collaboration or anything El yeah

play21:43

I think it helps that there are other

play21:45

startups doing similar things as we Doh

play21:49

um you know the goal here like we're in

play21:51

the race against Nvidia and and we want

play21:54

as many players in our court as possible

play21:56

right of course we with them you know a

play21:59

lot of our colleagues work in the other

play22:00

startups we all kind of know each other

play22:01

it's right it makes for for interesting

play22:04

Thanksgiving dinners okay okay okay yeah

play22:07

but um one family M at one company one

play22:10

family member another it it's happened

play22:12

it's happened my husband and I worked at

play22:14

zyink and ala at the same time so it's

play22:16

for okay yeah um yeah I think I think it

play22:21

helps I think things that you know

play22:24

activities and companies software stocks

play22:26

that that grow the community and and get

play22:28

it to be diverse like increase that

play22:30

design space it helps we learn from each

play22:32

other um and and it's a long it's a long

play22:35

path ahead I I think it's going to be

play22:37

really really fun and now we see

play22:39

software stocks that are moving away

play22:41

from just pytorch right so for us uh we

play22:45

have a road map item to integrate into

play22:46

pytorch 2.0 natively and to generate a

play22:49

pull request and to be in the open open

play22:51

source repo right so that's that's on

play22:53

our road map um but we also see that

play22:56

that there's a trend of software Stacks

play22:57

that are are being developed

play22:59

specifically for a piece of Hardware

play23:01

because you can then control the apis

play23:04

and allow users to do specific things

play23:06

that are native to that Hardware right

play23:08

without sort of forcing developers to go

play23:10

behind this General one you know to rule

play23:12

them all right API and ecosystem that

play23:15

that leads to Leaky abstractions if you

play23:17

want to do specific

play23:19

things um so I think it helps anything

play23:21

that grows the ecosystem is good and fun

play23:24

so for somebody who ends up getting this

play23:26

Hardware what's the first thing that

play23:29

they should do what's the first model

play23:31

that they should run just to make sure

play23:32

it all works you what's what's the

play23:34

Shakedown procedure what's the Shakedown

play23:36

that's cool we have for both software

play23:39

stocks um a landing page that takes you

play23:41

through first five things Y and that's

play23:44

really fun for Buddha there's a there's

play23:45

a few models you can just sort of you

play23:47

know like you're five clicks away and a

play23:49

script away from running end to end on

play23:51

the bare metal side uh there are a few

play23:53

models that are optimized for

play23:55

performance that you can run out of the

play23:56

box and then there are few few kernels

play23:58

that you can run and sort of see how

play24:00

things work end to end and the stack

play24:02

also comes with debug tools kernel print

play24:06

uh performance uh performance we

play24:08

integrated into the open source Tracy

play24:10

tool so you can see performance profile

play24:12

of what's Happening um so it's kind of

play24:14

you know what's the level of exposure

play24:16

and deep dive that you want to go in and

play24:18

there's the first five things that take

play24:19

you progressively down that path all the

play24:22

way to running running kernels and

play24:23

seeing what happens so how often do you

play24:27

have to report back to Jim on what the

play24:29

community is saying is that decided yet

play24:33

uh it's it's it's moving from hourly to

play24:36

daily yeah

play24:38

yeah it's it's it's frequent he he

play24:41

deeply cares about what the community is

play24:44

doing yeah and and he's driving T

play24:46

torrent into strong awareness of of

play24:50

Community Values software development

play24:52

values and ensuring that the entry point

play24:55

is is really convenient and fun for for

play24:58

developers right so it's good you you

play25:01

you say Hardware software

play25:03

co-optimization you're very you're very

play25:04

much I I feel like so passionate about

play25:06

the software

play25:07

so it's it's it's it's good to hear it's

play25:11

great that you guys are being open and

play25:13

and essentially explain to everybody

play25:14

what you're doing right it's I have so

play25:17

many um conversations with other people

play25:19

and they just want to keep it closed up

play25:22

just for them and um and and and and you

play25:25

guys know I've been advocating for more

play25:27

Hardware the hands of people that's

play25:29

right um so so this is this is a first

play25:31

step long may continue thank you y for

play25:33

being on the channel thanks yeah it's

play25:36

really

play25:36

[Laughter]

play25:37

[Music]

play25:48

fun

Rate This
β˜…
β˜…
β˜…
β˜…
β˜…

5.0 / 5 (0 votes)

Related Tags
AI HardwareTenstorrentYasmina InterviewGrull Dev KitFPGAsSoftware DevelopmentOpen SourceCommunity EngagementHigh-Level SynthesisMachine Learning