Generating Songs With Neural Networks (Neural Composer)

CodeParade

24 Jul 201812:22

Summary

TLDRIn this video, the creator explores the potential of AI in music composition, aiming to generate a catchy song that resonates with listeners. Utilizing a dataset of video game music, the project employs a unique neural network architecture to encode and decode structured melodies. As training progresses, the generated music evolves from simplistic notes to more complex compositions. The creator shares insights on the challenges of AI-generated music and reveals the first principal component that influences time signatures. The video concludes with a showcase of the AI's musical creations, inviting viewers to appreciate the intersection of technology and art.

Takeaways

🎹 The project aims to generate catchy music, focusing on creating melodies that are memorable.
🎮 The creator chose video game music for its catchy, repetitive structure, making it suitable for AI generation.
📊 A dataset of about 4000 piano MIDI songs was compiled to train the neural network.
🔍 The music was converted into a piano roll format, treating each note as a single strike without gaps.
🧠 Traditional techniques like convolutions and LSTMs were found inadequate for generating structured songs.
🔄 A dense network was developed to encode and decode musical measures, effectively using autoencoders.
📈 The training process was monitored by observing how principal components evolved over time.
🎶 Early generated music sounded simplistic, but complexity increased with training epochs.
🎛️ Real-time controls were implemented for generating and mixing songs, similar to live DJing.
🕒 The first principal component revealed a correlation with time signatures, indicating a foundational aspect of the music structure.

Q & A

What was the primary goal of the project described in the video?
-The primary goal was to generate at least one song that would get stuck in the creator's head, which they considered the ultimate test for good music.
Why did the creator choose piano as the instrument for generating music?
-Piano MIDI files are common and allow for a simpler dataset, making it easier to generate structured music.
What type of music did the creator decide to use for training the AI, and why?
-The creator chose video game music because it is often catchy, repetitive, and has a strong structure, which is ideal for generating memorable melodies.
What format was used to convert the music for the AI's processing?
-The music was converted into a piano roll format, treating all notes as single strikes without holes.
What neural network techniques did the creator initially consider, and what was the issue with them?
-The creator considered convolutional networks and LSTMs but found that convolutions did not capture the necessary long-range relationships in piano rolls, and LSTMs were not suited for the structured songs they aimed to produce.
How did the creator modify the neural network to better suit their needs?
-They created a dense network to encode each measure into a feature vector, which was then processed through a dense autoencoder to output another feature vector that could be converted back into a measure.
What did the creator observe about the principal components during training?
-They noticed that the largest components were not very big compared to the smallest ones, indicating less feature correlation than expected.
What changes were observed in the generated music after various epochs of training?
-Initially, the music was repetitive and simple, but over epochs, it developed rhythm, key changes, and eventually produced melodies that sounded more like real songs.
What controls were implemented for real-time manipulation of the generated music?
-Controls were added for volume, speed, and the certainty of note playback, allowing users to adjust these parameters in real-time while the music played.
What was revealed about the first principal component at the end of the video?
-The first principal component controlled whether the time signature was a multiple of three or four, highlighting the relationship between these signatures.