How to make AI Startup worth over $30M l Twelve Labs Jae Lee
Summary
TLDRJae, the co-founder and CEO of Twelve Labs, shares the inspiring journey of building an AI research and product company focused on developing video foundation models. He narrates how the team overcame challenges, participated in a prestigious competition, and secured partnerships with industry giants like Nvidia. Twelve Labs aims to empower developers and enterprises with cutting-edge AI models that can deeply understand video content, enabling applications like semantic search, classification, and summarization. With a 'video-first' ethos, the company tackles the massive problem of making sense of the vast video data in the world, impacting sectors like law enforcement and media.
Takeaways
- 😄 Twelve Labs is an AI research and product company building video foundation models that can understand videos like humans, serving developers and enterprises via APIs.
- 🔍 Their models aim to map human language to video content, enabling capabilities like semantic search, classification, and summarization of videos.
- 🏆 Participating and winning a major video understanding competition helped Twelve Labs gain exposure and attract customers and investors.
- 💻 The founders started the company while still in the military, working on laptops at a bagel shop during their free time.
- 🌟 The company's 'secret sauce' is going head-on with the video understanding problem instead of reframing it as language or image understanding.
- 📈 Twelve Labs has over 20,000 developers actively using their search API, with millions of monthly API calls and rapid growth in enterprise adoption.
- 🎯 Setting an incredibly ambitious goal and having the determination to solve hard problems is crucial for founders to achieve impactful outcomes.
- 🛡️ Building a moat by gathering unique data and fine-tuning smaller models, instead of relying solely on foundation models, is important for long-term success.
- 📢 Effective communication and the ability to explain technical products to non-technical audiences is vital for widespread adoption and impact.
- 🔭 The founders believe AI will present amazing opportunities for tech and humanity, and all products will be impacted by AI in the future.
Q & A
What is the main goal of Twelve Labs?
-Twelve Labs aims to build massive AI models that can understand videos like humans and provide video understanding capabilities to developers and enterprises through APIs for tasks like semantic search, classification, and summarization.
How did Twelve Labs start, and what was the founding story?
-Twelve Labs was founded by three co-founders who were serving in the Korean Cyber Command. They would meet during their weekends and work on their ideas before they were all discharged. The founding story was quite challenging as they had to coordinate their efforts while still in the military.
What was the significance of Twelve Labs participating in the ICC competition?
-Participating in the ICC (International Conference on Computer Vision) competition for video understanding helped Twelve Labs gain exposure and recognition from potential customers and investors interested in multimodal AI and video understanding.
How does the Twelve Labs AI model work, and what data is used for training?
-The Twelve Labs model aims to map human language to video content, enabling capabilities like search, classification, and summarization. It is trained on large amounts of video data, with the help of data partners who provide labeled data and licensed content in a copyright-friendly manner.
What is the current status of Twelve Labs' product adoption and usage?
-As of June 2023, Twelve Labs had soft-launched their search API, which is actively used by over 20,000 developers and has crossed a couple million monthly API calls. The company is also seeing adoption from enterprise customers, including large creators, media/entertainment organizations, sports organizations, and law enforcement agencies.
What advice does the CEO of Twelve Labs give to founders and technical product builders?
-The CEO advises founders to be patient and able to explain their technology and its impact to different audiences. He also emphasizes the importance of building a moat and not relying too heavily on foundation models, as well as setting ambitious goals and having the determination to solve incredibly hard problems.
What was the approach taken by Twelve Labs in building their video understanding technology?
-Twelve Labs took a "video-first" approach, building their machine learning pipeline and systems specifically for handling videos from the ground up, instead of reframing the problem into other domains like language or image understanding.
How did the partnership with NVIDIA come about for Twelve Labs?
-Jensen Huang, the CEO of NVIDIA, seemed to have a special interest in computer vision and video understanding, which was one of the first use cases for NVIDIA chips. NVIDIA's venture team reached out to Twelve Labs, seeing a perfect match between their vision and what Twelve Labs was doing in video understanding.
What are some of the mission-critical use cases mentioned for Twelve Labs' products?
-One use case mentioned is digital evidence management for police departments, where Twelve Labs' technology can help search for specific evidence in body cam footage quickly and generate police reports more efficiently, reducing time spent by up to 40%.
What is the CEO's perspective on the impact of AI and the importance of staying ahead of the technology curve?
-The CEO believes that AI will present amazing opportunities not only for tech but also for humanity. He emphasizes the importance of learning more about the technology and discerning trends to build a moat, as technology advancements like foundation models can potentially impact businesses relying too heavily on them.
Outlines
🤝 The Formation of Twelve Labs and Partnership with NVIDIA
This paragraph discusses the founding of Twelve Labs, a company focused on building video foundation models for developers and enterprises. It details the initial meeting between the founders and Jensen from NVIDIA, where they discussed Twelve Labs' vision and NVIDIA's interest in computer vision and video understanding. The paragraph also provides background on the founders, their experience, and the funding and partnerships they've secured, including with NVIDIA, Intel, and Samsung.
🎥 Twelve Labs' Video Foundation Model and Its Applications
This paragraph delves into the core technology behind Twelve Labs: their video foundation model. It explains how the model is designed to understand videos like humans, mapping human language to video content, enabling capabilities like semantic search, classification, and summarization. The paragraph also discusses Twelve Labs' data partnerships, their approach to licensing and using data responsibly, and the process of training the model on millions of video examples. It also mentions the launch of their search API, its adoption by developers and enterprises across various sectors like law enforcement and media entertainment, and their growth plans.
💡 Advice for Building Impactful Products and Maintaining a Competitive Edge
This paragraph offers advice for founders and companies aiming to build impactful products and maintain a competitive edge. It emphasizes the importance of being able to explain and communicate the value of one's technology to different audiences, not just experts in the field. The paragraph also cautions against relying too heavily on foundation models like GPT and advises companies to gather unique data and fine-tune smaller models to create a sustainable moat. Additionally, it stresses the importance of setting ambitious goals and maintaining determination to solve incredibly hard problems, as this will drive companies to achieve more impactful results.
Mindmap
Keywords
💡Computer Vision
💡Video Understanding
💡Foundation Model
💡APIs
💡Digital Evidence Management
💡Machine Learning Pipeline
💡Multimodal AI
💡Moat
💡Fine-tuning
💡Ambitious Goal
Highlights
Nvidia's venture team reached out to Twelve Labs due to their work in computer vision and video understanding, which was a perfect match with Nvidia's interests.
Twelve Labs is an AI research and product company building video foundation models for developers and enterprises to understand videos like humans.
The founding story of Twelve Labs involved the co-founders working on the company while still in the Korean Cyber Command, meeting at a bagel shop on weekends with their laptops.
Twelve Labs took a 'video-first' ethos, not reframing the problem into language or image understanding, but innovating technologies specifically for video understanding.
Participating in the International Conference on Computer Vision competition and winning helped Twelve Labs gain exposure and attract customers and investors.
Twelve Labs' foundation model aims to map human language to video content, enabling emergent capabilities like search, classification, and summarization.
Twelve Labs soft-launched their search API in June 2023, with over 20,000 developers actively using it and crossing millions of monthly API calls.
Explaining technical products and their impact to non-technical audiences is incredibly important for widespread adoption.
Building a moat by gathering unique data to fine-tune smaller models is crucial, as relying too much on foundation models can be dangerous for a business.
Having an ambitious goal and determination to solve incredibly hard problems is crucial for founders, as it fuels their journey and leads to greater impact.
Transcripts
Aidan and I had chance to meet with Jensen.
We had, I think, 5 to 10 minutes to talk about Twelve Labs.
And I think Jensen always it seems like he has a special place in his heart
about computer vision and video understanding is that was one of the first
use cases that Nvidia chips powered.
And then the Nvidia's venture team reached out to us.
We were talking about the future that we're drawing, and I think the venture
team also had an idea of what Twelve labs is doing and what Nvidia wants.
Vision and video understanding was just like a perfect match.
Hi yo, my name is Jae.
I'm one of the co-founders and CEO of Twelve labs. 12 labs is an AI
research and product company based here in San Francisco and Seoul.
We're building video foundation models for developers and enterprises
building video centric products.
We basically build humongous AI models that can understand videos like humans,
and we serve it to developers via APIs that are looking into building really
powerful semantic search or classification or summarization into their products.
And we started the company about two and a half years ago with five people.
And right now we are a little over 40 across Seoul and San Francisco,
with $30 million raised in seed funding from companies like Index Ventures, Radical
Ventures, and with recent partnership with Nvidia, Intel and Samsung.
So foundation model.
It's basically this really large AI model that can do many things at once.
It's at the foundation of providing intelligence to software.
So the idea was, hey, the problem that we're solving is massive.
80% of the world's data is in video, and there's no adequate solution
technology out there for developers and enterprises to make sense of it all.
So that is the market that we're tackling.
We want to index all of that in like 80% of the world's data.
So the US Police department owns terabytes and petabytes worth
of police body cam footage, and we call it digital evidence management.
And police officers spend a lot of time looking through a specific evidence
within the content that they captured for auditing purposes or for writing reports.
So if you think about it, you know, their main job is to be out on the streets
helping the citizens, protecting the security of this nation.
And, you know, they're spending too much time searching for things.
So 12 lap search and generate APIs can help searching for digital
evidence incredibly fastly, as well as writing police report generation
is also cut down by more than 40%.
In terms of like time spent, these are some mission critical use cases
that Twelve labs products are being used.
So the founding story is Wild Man because, you know, we didn't all join
the Korean Cyber Command at the same time.
So like, okay, we decided we're gonna start this company,
but then SJ is like leaving next year and then I'm leaving the year after,
and then Aiden's like six months after.
So how do we do it? Right.
So it was genuinely very scary. So we had our ideas.
Okay SJ when you get discharged, you bring all of our laptops
and you visit us every weekend.
You take us out and then we'll go to a bagel shop and we'll work.
I clearly remember talking to Aiden after SJ had left.
I remember SJ was discharged on Thursday and he came back to the military base
Saturday of that week with our laptops and in front of our military base
there was a bagel shop called La Bagel, and that was like our office, bring all of
our laptops and we would do our research.
So we did that for like a good year before so that everyone is out.
Some people say ignorance is bliss.
And I think we were just like really naive and just really excited
about building this company.
I guess not knowing what was ahead allowed us to do what we did.
You have to understand how are we creating stronger product by leveraging AI?
Are you just showing your investors that, oh, like we've implemented GPT into our
product, or is it actually creating value?
And are you capturing value at a level that OpenAI or other companies
can't really capture?
So I think the secret sauce of 12 labs is just going head on with the problem.
I think a lot of companies might have failed because they tried reframing
the video understanding problem into language understanding or image
understanding or speech understanding.
Right.
Makes sense because that's where we've seen most improvement with large language
models with amazing speech to text models.
So it's easy to think that, oh, video is incredibly hard.
How can I reframe that problem into something that's already solved?
And sometimes it works, but for some really important task,
that approach does not work.
For Twelve Labs, we've always had video first ethos, so when we created our machine
learning pipeline systems to our models, we always had videos in mind
and our system should be able to handle petabytes worth of data as well
as this, like really long videos.
Videos are usually, you know, not ten 15 seconds long.
It's like two hours, three hours, four hours long.
So we had that video first ethos like from the get go, and we had to build
a bunch of new technologies to support it.
So not taking the shortcut and going head on with the problem and not reframing it,
and really innovating is probably the secret sauce that we have.
The important thing is, if you're building something really impactful and you
think that it's going to significantly change the industry that you're in,
there will always be someone that has very similar thesis.
It's just a matter of how do you get yourself out there?
How do you let the people know that you exist?
And for us, that was the competition.
Figuring out what was going to be impactful that we can do given our current
resources that will put us in the map, or at least, you know, let the world know
that what we're doing is relevant.
So our tactic here was, okay, we're going to talk to a bunch of customers.
And there were early believers in Twelve Labs who took our APIs and build awesome things
with us, but we needed more exposure.
Basically, we're already getting a lot of questions
about how is Google better than you?
Or you actually better than Google? Or are you actually better than Microsoft?
So as a team, we've decided to participate in ICC.
It's international conference in computer Vision.
They're putting this like awesome competition for video understanding.
So basically the competition dedicated for evaluating AI models
kind of ability to understand videos.
Back then we talked to Aiden and hey Aiden, I think we should participate
and see what we can do.
We have nothing to lose and only to gain.
The team was extremely supportive of of Aiden kind of spearheading
that effort with the team, with a limited number of team.
I think we only had like three team members back then.
All I can do to support is, you know, there was some ideas and,
and directional kind of feedback that I gave to Aiden, but we needed compute
and we needed determination to put some serious cash behind and back.
Then for Twelve Labs like $200,000 in in compute was a lot of money for us.
Right.
So and just thinking that, okay, we're going to blow through $200,000
in ten days in compute was really scary.
But the team was able to use that capital, that precious capital,
and build something incredible helped us win the competition.
And serious customers, investors that are really serious about video
and solving this video problems had their eyes on that competition, right?
And that's how we were able to get a lot of inbounds from not only customers,
but also investors that had thesis around multimodal AI. So that really took
us off fast iterations of building out our initial set of APIs so that our
customers can test it so that they can build trust with the top labs technology,
the model that Twelve Labs is building.
So the tasks that it's optimizing for is it's basically trying
to map human language into whatever that's happening in video content.
Right?
So if you can map precise human language to whatever that's happening
within video content that gives you this emergent capabilities, like being able to
search for things really well or being able to classify things or summarize.
Right. So that is what the model is doing.
And in terms of data, we work with amazing data partners in terms of like
providing us with label data or help us license the data that we need and do it
in a very copyright friendly manner.
Right.
And have the the model watch hundreds of millions of video taxpayers
and try to learn the ability to map precise human language to video content.
Top labs soft launched our search API in June of 2023.
We currently have a little over 20,000 developers
that are actively using our search API.
We've crossed a couple million monthly API calls.
The company is also really excited about enterprise customers adopting.
Thing, our technology.
So we have largest creators of the world adopting Twelve Labs as well as media,
entertainment and large sports organizations and law enforcement
organizations that are leveraging this technology growing pretty rapidly.
Then we're adding in a couple million monthly API calls
in quarter after quarter.
So hopefully with the new model releases, we hit 100 million API calls monthly soon.
If you're highly technical and you're building deeply technical product,
you're probably working on this deeply technical product to change the world.
And the world is populated by not only technical people or just the
experts that understand your technology.
There are technical folks that might be a fan of your work, but then there is
99% of the world that needs to understand.
So being patient, be able to explain what you what you do and why it's
impactful for different audiences is incredibly important.
I'm in AI space, so I find AI fascinating, and I think it's going to present
this amazing opportunity for not only like tech, but also humanity.
So I think all product, whether you're in creator economy space or even like
blockchain or some very traditional retail brick and mortar, I think it's
all going to be impacted by AI. What's really important here is being
able to learn more about the technology.
I think once you learn more about the technology and be able to kind of discern
what the trend is going to be, I think that allows you to build your own moat.
That's kind of far away from the technology advancement curve, I would say.
So, companies that are building foundation model, the idea of it is you keep making
these models stronger and stronger so that it is able to solve,
you know, more complex problems.
So let's say you're in copywriting, and what you've done is maybe building
like a UI on top of OpenAI's GPT model.
It's not that OpenAI probably wants to, like, affect your business, but it's part
of their technology improvement journey.
If the GPT models get better and better, it's probably going to get
much better at copywriting.
And if what you've built is just kind of like a wrapper around it, it's going
to get affected regardless of their their intention of affecting you or not.
Thinking about moat is incredibly important.
What is the unique data that you can gather to fine tune smaller models
to make your business better is incredibly important, and relying too much on
foundation models can be can be dangerous.
So the thing is, given the proprietary data that you have outsourced
some of the really hard, intelligent, requiring tasks to foundation models,
but building your entire business on top of OpenAI's or other companies
APIs could affect you, right?
So I think that's probably the best advice I can give.
Having an ambitious goal, and the determination to solve incredibly
hard problems is crucial for every founders, because there will come a time
where you're going to have to settle.
And if your goal was not big to begin with, then I think you will
end up in a place where what you're doing isn't very impactful.
So setting an incredibly ambitious goal then you can probably even reach
gives you the fuel to go forward.
And the North Star that you can ever reach is crucial.
And before you know it, you will always feel like you're far away from that goal.
But in doing so, you would have achieved a lot more.
浏览更多相关视频
How to find a Killer Idea worth $40B | Co-founder of Snowflake, Benoit Dageville
OpenAI's STUNNING "GPT-based agents" for Businesses | Custom Models for Industries | AI Flywheels
5 Genius Ways to Make Money From Home (Using AI)
Nvidia's meteoric rise to $3 trillion | About That
Open sourcing the AI ecosystem ft. Arthur Mensch of Mistral AI and Matt Miller
Generative AI Vs NLP Vs LLM - Explained in less than 2 min !!!
5.0 / 5 (0 votes)