Easy AI Voice Cloning with KITS AI - Online Platform and API Usage

Jarods Journey
1 Mar 202432:14

TLDRThe video provides an in-depth tutorial on using KITS AI, an online platform for voice cloning and conversion. The host demonstrates how to create an account, navigate the website, and utilize the API for voice conversion projects without needing RVC or UVR on your computer. The process includes training an RVC voice, uploading and converting voices, and using YouTube videos for conversion. The host also compares the output of KITS AI with RVC-trained voices and discusses the platform's plans and pricing, highlighting the free plan's limitations and the benefits of the Creator plan for unlimited download minutes.


  • 🌐 KITS AI is an online platform offering services like voice conversion and voice cloning without the need for local installations of RVC or UVR.
  • 🔍 Users can create an account with various providers such as Google or Discord to access the platform.
  • 📚 The platform includes sections for conversion, training tools, and a library where users can manage their voice models.
  • 🎓 Training a voice model on KITS AI involves uploading files and applying effects like Harmony and D Reverb, which can take significant time.
  • 🔄 The conversion process allows for pitch adjustment and volume blending, and supports using YouTube links for conversion input.
  • 📈 KITS AI provides an API for developers to integrate voice conversion into their projects, with documentation available for guidance.
  • 💾 Downloading converted files uses up the user's allocated minutes, which varies based on the subscription plan.
  • 🔉 The platform also includes tools like a vocal remover and AI mastering, although the effectiveness of these tools may vary.
  • 📈 Users can blend different voices using the platform's blender tool, allowing for the creation of unique voice models.
  • 📝 The video provides a step-by-step guide on how to use the KITS AI API within a Python script for audio conversion.
  • 📉 The free plan on KITS AI allows for unlimited conversions but restricts the ability to download the results, requiring an upgrade for full access.

Q & A

  • What is the main purpose of Kits AI?

    -Kits AI is an online service that provides voice conversion and voice cloning capabilities. It allows users to train their own voice models and perform audio conversions without needing specific software like RVC or UVR on their local machine.

  • How does one get started with Kits AI?

    -To get started with Kits AI, one needs to log in to their website and create an account using Google, Discord, or another preferred method. From there, users can access the various tools and features offered by the platform.

  • What are the two main components of Kits AI that the video discusses?

    -The video discusses the website interface of Kits AI and its API. The website allows users to train and convert voices, while the API enables developers to integrate voice conversion into their own projects or applications.

  • How long did it take to train a voice model on Kits AI in the video?

    -It took approximately 11 hours to train a voice model on Kits AI in the video.

  • What is the process for converting a voice on Kits AI?

    -To convert a voice on Kits AI, users select a voice model, input the audio or a YouTube link they want to convert, adjust settings such as pitch and conversion strength, and then initiate the conversion process. The platform will then process the audio and provide the converted result.

  • How does Kits AI handle the storage of voice models and conversions?

    -All voice models and conversions are stored on Kits AI's servers. Users can access their trained models and conversions from their account on the Kits AI platform.

  • What are the different types of plans and pricing offered by Kits AI?

    -Kits AI offers a free plan with limited capabilities, a Converter plan at $9.99 per month with 30 download minutes, and a Creator plan with unlimited download minutes and additional slots for composers. The free plan allows for voice conversions but does not permit downloads.

  • How can users utilize Kits AI's API for voice conversion in their own projects?

    -Users can make API calls to Kits AI to perform voice conversions programmatically. This involves setting up an API key, constructing the appropriate HTTP requests with headers and parameters, and handling the responses to initiate and retrieve the results of voice conversions.

  • What is the role of the 'Blender' tool in Kits AI?

    -The 'Blender' tool in Kits AI allows users to merge two different voice models, creating a blended voice. This can be used to achieve unique vocal effects and to customize the sound further.

  • How does Kits AI handle the conversion of YouTube videos?

    -Kits AI enables users to input a YouTube link directly into the conversion tool. The platform then processes the video to extract the audio, which can then be used for voice conversion.

  • What are the limitations of the free plan on Kits AI?

    -On the free plan, users can perform voice conversions using Kits AI's pre-trained voices but are unable to download the converted files. To download files, users need to upgrade to a paid plan.



🌐 Introduction to Kits AI and Its Features

The video introduces Kits AI, an online service that consolidates tools like RVC (Resemblyzer Voice Conversion) and uvr (Melodyne Unveil) into a single platform. The host demonstrates how to access the website, create an account, and navigate the interface. Two main topics are covered: the website's functionality and its API, which allows for voice conversion without the need for RVC or uvr on the user's computer. The service requires an account and has associated plans and pricing. The left-hand side of the website provides access to conversion, training, and library tools. The training tab is used for voice cloning, where one voice can be converted to another for various applications. The process involves uploading files, applying audio effects, and initiating training, which can be time-consuming. Once trained, users can convert voices and utilize pre-trained models. The video also covers uploading and converting voices, using YouTube videos for conversion, and adjusting conversion settings.


🎤 Exploring Download Minutes and Text-to-Speech

The host explains the concept of download minutes, which are consumed when downloading songs but not during conversions. The ability to switch between different voice models is demonstrated, and a text-to-speech feature is showcased using a YouTube prompt. The host also discusses the use of the Kits blender for merging voices and adjusting blend ratios. Additionally, tools such as the vocal remover (similar to uvr) and AI mastering are briefly mentioned, with a note that AI mastering is a newer feature that may receive improvements in the future.


📚 Accessing Voice Models and API Documentation

The video transitions to using the Kits API within a programming environment, specifically using Python. The host guides viewers through setting up a virtual environment, installing necessary packages, and accessing the API documentation for voice model retrieval. The process involves creating a Python script, importing required modules, and constructing a request to fetch voice models using the API key and specific parameters. The host also demonstrates how to handle the API response and parse the returned JSON data to extract voice model IDs.


🔄 Performing Voice Conversion Using the API

The host continues the API demonstration by showing how to perform a voice conversion. This involves setting up a POST request with the necessary headers, parameters, and data, including the voice model ID and the sound file to be converted. The process of sending the conversion request and handling the response is detailed, including checking the status code and extracting the job ID from the conversion data. The host also outlines the steps to retrieve the completed conversion using a GET request and the corresponding job ID.


📚 Parsing Job Data and Downloading Conversions

The host demonstrates parsing the job data to extract the output file URL, which is a temporary link for downloading the conversion. The process of downloading the conversion into the current working directory is shown, including handling any necessary imports and running the script to save the file. The successful completion of the voice conversion job is confirmed by playing the converted audio for the audience.


🎚️ Comparing RVC Trained Voice to Kits AI Voice

The host conducts a comparison between an RVC trained voice and a Kits AI trained voice. The same audio file is processed through both methods, and the host adjusts parameters to match settings between RVC and Kits for a fair comparison. The audience is encouraged to judge which conversion they prefer. The host also tests the conversion with different pitch settings to see how it handles variations outside the normal range.


💰 Plans, Pricing, and Subscription Considerations

The video concludes with an overview of the plans and pricing for Kits AI. The host outlines the differences between the free plan, which allows for unlimited conversions without downloads, and the paid plans, which offer a set number of download minutes and unlimited downloads with the Creator plan. The host also mentions the additional composer slots available with paid plans and encourages viewers to check the FAQ for more information. The video ends with a thank you to Kits for the collaboration and a note on the provided affiliate and non-affiliate links for signing up.




KITS AI is an online platform that provides voice conversion services, including voice cloning and text-to-speech functionalities. It is the central focus of the video, as the speaker discusses how to use the platform and its API for various voice-related tasks. The platform is used to train and convert voices, as well as to manipulate pitch and other vocal characteristics.

💡Voice Cloning

Voice cloning refers to the process of replicating a voice to make it sound like another person's voice. In the video, the speaker demonstrates how to use KITS AI for voice cloning, which involves training the system with voice samples to convert one voice to another. This is showcased as a key feature of the platform.


An API, or Application Programming Interface, is a set of rules and protocols that allows different software applications to communicate with each other. The video explains how developers can use KITS AI's API to integrate voice conversion capabilities into their own projects without needing to host the service locally.


UVR, or Universal Vocal Remover, is a tool mentioned in the video that is used for separating vocals from instrumentals in audio tracks. It is one of the functionalities that can be accessed through KITS AI, allowing users to clean up vocals and remove instruments from songs.


RVC, or Reverse Voice Conversion, is a process used to convert one voice to another, as mentioned in the video. It is part of the voice cloning process where the user can train the system with specific voice samples to achieve the desired voice conversion.

💡Pre-trained Model

A pre-trained model in the context of the video refers to a machine learning model that has already been trained on a large dataset. KITS AI uses a different pre-trained model for training their voices, which is said to result in better pitch accuracy and vocal conversion quality.

💡Conversion Strength

Conversion strength is a parameter within the KITS AI platform that determines the intensity of the voice conversion process. The speaker in the video adjusts this setting to control how much the input voice is altered during the conversion to match the target voice.

💡Download Minutes

Download minutes are a part of the KITS AI subscription plan that dictates how much audio a user can download after conversion. The video mentions different plans, including a free plan with limited download capabilities and a Creator plan with unlimited download minutes.


Text-to-speech (TTS) is a technology that converts written text into spoken words. The video demonstrates the TTS feature of KITS AI, where the user can input text and have it converted into a voice recording using the trained voice models.


In the context of the video, a blender refers to a feature within KITS AI that allows users to merge two different voice models to create a new, blended voice. This can be used for creative purposes to achieve unique vocal tones.

💡Python Scripting

The video also covers how to use KITS AI's API with Python scripting. The speaker shows the audience how to write Python code to interact with the KITS AI API for fetching voice models and performing audio conversions programmatically.


KITS AI is an online service that integrates RVC and UVR functionalities into one platform.

Users can train an RVC voice for voice cloning to convert one voice to another for various applications.

The training process can be time-consuming, taking up to 11 hours to complete.

KITS AI offers an API for developers to perform voice conversion without needing RVC or UVR on their computers.

All data is stored on KITS AI, and there are different plans and pricing for using the service.

The platform allows for text-to-speech conversion with a quality that is considered good, though not as refined as some other tools.

KITS AI provides a 'blender' tool to merge two voices and adjust the blend ratio, creating a unique voice model.

The vocal remover tool functions similarly to UVR, separating vocals from instrumentals in a track.

AI mastering is a newer feature on KITS AI, offering some differences in audio quality, but may not be as impressive as expected.

The platform allows for an unlimited number of conversions, with charges only applied for downloading the results.

Different subscription plans offer varying features, including download minutes and the ability to clone voices.

The video demonstrates how to use KITS AI's API for audio conversion within a Python script.

An API key is required to authenticate and use the KITS AI API for development purposes.

The process of fetching voice models and performing voice conversion via API is detailed in the video.

KITS AI's conversion process involves creating a POST request with specific parameters and headers.

The output of a conversion job includes a URL that can be used to download the converted audio file.

A comparison between RVC-trained voice and KITS AI-trained voice is presented to evaluate the quality of conversions.

The video concludes with a discussion on the different subscription plans available for KITS AI users.