Introducing “Musica” — The Large Music Language Model by Wavv

Wavv
7 min readNov 17, 2023

--

By Ivan Linn & Wavv AI Labs

Wavv has been devoted to developing Musica — — a Large Music Language Model (lmLM) since 2023. Musica is able to translate emotional labels into corresponding musical notes and tones, integrate them with matching beats and tempos and finally create comprehensive musical compositions.

The great music creation ability of Musica is achieved through the application of mathematical equations and sequences, enhancing the predictability of the text-emotion-music alignment. Simultaneously, we facilitate music language training via self-designed modules by deconstructing music samples into fixed module components, which are modeled using fundamental musical elements such as chords, modes, tempos, intervals, and rhythm types, thereby forming a rich material library.

In Musica, these materials are orchestrated and combined to produce music files that precisely cater to the user’s requirements. Through this algorithm, we can ultimately generate complete music files that are devoid of copyright concerns, possess commercial viability, can be licensed, and are tradable. This research paper aims to provide a systematic analysis of the operational mechanism of Musica, a music generation system.

With the powerful music production capabilities of Musica, Wavv aims to establish an open source ecosystem that welcomes people from all backgrounds to produce and enjoy music or other audios from the Wavv platform. The ecosystem features collaborations and sharing, different users may choose to collectively work on the same audio project and upload or share their audio. This endeavor positions Wavv as a critical infrastructure within the music industry value chain, fostering collaboration and innovation across diverse communities.

The paper will first introduce Wavv’s vision of building on the open source ecosystem in the music industry. Then it will delve into how Musica helps to achieve this vision.

Build on Open Source Ecosystem, Powered by Musica

Wavv is building a distributed control system that allows multiple people to work on the same music production project simultaneously. It keeps track of changes in source code during music development, enabling developers to collaborate on projects efficiently.

Wavv also provides a platform for hosting and sharing music files and related programs repositories. Users can create repositories to store their project’s data and related files, and these repositories can be either public or private. Public repositories are visible and accessible to anyone, while private repositories require permission to access.

In Wavv, developers can create branches to work on specific features or fixes without affecting the main database. This allows for parallel development and experimentation.

Wavv is aimed to establish a music creation community that invites and is open to a broader spectrum of music developers, encompassing music producers, composers, vocalists, etc. Developers can use the platform to generate music chords, compose paragraphs, and create entire songs using this innovative music model. Upon doing so, they have the option to utilize their creations on the platform, granting authorization for others to use them with the charge of copyright fees.

In other words, Wavv will create a central platform for open-source development and collaborative music development projects and will become a critical underlying infrastructure of the music industry.

MIDI Learning Algorithm

When we analyze the specific running mechanism of Musica’s lmLM, we must first understand what language Musica uses, in other words, what format of music symbols can be recognized, trained and learned by Musica.

At present, Musica uses musical subunits composed of notes in the MIDI format. Wavv believes that it is necessary to arrange and combine the smallest basic elements in order to produce creative music that is completely different from existing music.

In the language of computers, each note is expressed in MIDI form and is recognized as a separate Token. To train the model, we transform MIDI into different Tokens, and convert the Tokens into a list of sequenced notes/chords.

To enable machine software to predict the next note, Wavv employs recurrent neural networks (RNN), which enables us to produce an unlimited amount of music by iteratively feeding the generated note back into the model. Since different musical instruments have different timbre characteristics, the MIDI format, as a machine learning language form, is in line with the characteristics of point source like the sound of percussion instruments.

To be more specific, Musica is based on the musical theory of 12 major scales and 12 minor scales in Western music theory. Like a lot of western classic music, church mode is their root. In church mode, music is usually based on specific melodies and harmonies. Church modes originated in the medieval era, and are classified by their use of the diatonic collection, their final, the relationships of other pitches to that final, and their range.¹

As Western music evolved, the use of church modes gradually gave way to the emergence of the major and minor scales, which became the foundation for tonal music. Through this process, it gradually evolved into the traditional Western musical mode theory. Creating music within the framework of Western music modes is significant. These modes, rooted in historical traditions, offer a structured system that has shaped Western musical compositions for centuries. Abiding by these modes provides a foundation for composers to convey specific emotions, establish tonal relationships, and craft melodies and harmonies.

After collecting all the basic units of music such as mode, beat, speed, strength and so on as a token, how to arrange and combine them through mathematical operations and provide corresponding formulas is the unique processing mode of each music language model. After countless experimental tests and verification, Musica has its own unique expression.

We believe that different musical tones have different background colors behind them, and they all have unique musical expression emotions. For example, music in e minor usually gives a feeling of melancholy and sadness, usually dark blue, black often also represents sadness. This creates a relationship between mood and music and color. For example, G major often gives people a feeling of excitement, joy, and sounds like being in a rural area. Orange and bright yellow are closer to the feeling of G major (refer to the Appendix for the example of partial Musical Emotional Language formula).

This distinguishes between different modes, and even people who do not understand music theory can find the music they want to hear. At the same time, Musica also classifies chords to distinguish the different musical effects produced by different chords. For example, the main triad gives the listener a more coordinated and stable feeling, while the subordinate chord, the dominant chord, etc., gives the listener a tense and unstable feeling. Then according to the distribution position of different chords in music theory, the arrangement and combination are reasonable.

Different arrangements and combinations can also bring different experiences to people. For example, music in the positive case from the main triad to the dominant chord will often make the music sound richer and more relaxed, but the chord arrangement in the variable case from the main triad to the subordinate chord will be more cramped in comparison. More chord directions are set in different musical chord progressions as formulas, so that more chords can be arranged and combined through the model to form more diversified musical changes and bring more possibilities to music.

Spectrograms and Sound Waves Learning Model

In contrast, the use of spectrograms and sound waves in machine language learning is more suitable for the machine learning of Linear sound sources such as linear instruments and vocals. This means that although we currently use MIDI as the primary language for machine learning, we cannot completely deny the advantages of alternative forms. And in fact, Wavv uses spectrograms and sound waves for other critical applications.

Wavv employs spectrograms and sound wave language model not merely as an augmentation to music creation, but, more importantly, as a means to systematically archive existing musical assets, enhance the precision of music style identification, and monitor music that may be susceptible to infringement risks. This innovative technology leverages established machine learning methodologies to pioneer entirely novel software functionalities, broaden the product portfolio, and extend the range of services offered.

We aspire, within a span of two years, for Musica to learn a comprehensive database comprising 100 million songs — a quantity almost equivalent to the entirety of Spotify’s music collection. This encompassing repository will include content from pre-existing copyrighted music, as well as compositions generated by Musica itself.

Conclusion

Wavv emerges as a dynamic and innovative platform at the intersection of music creation and open-source development. By leveraging an open-source ecosystem and harnessing the power of Musica, Wavv facilitates collaborative music production by implementing a distributed control system.

Musica is like a music magician.

It reorganizes and classifies different musical elements according to the emotional language of music, arranges and combines them according to the algorithmic formulas based on the western music theory, and endows music pieces with a new musical composition.

It reorganizes and classifies different musical elements according to the emotional language of music, arranges and combines them according to the algorithmic formulas based on the western music theory, and endows music pieces with a new musical composition.

Musica is a completely new form of musical language expression, and it is also a model based entirely on musical language. At the same time, it is also an excellent protector for existing copyrighted music, and is a function with a very large potential market that is different from neglect. Using Musica will be a new music creation experience, and each piece of music will be the music that the creator wants to hear at the moment.

Appendix: Musical Emotional Language

Musical note emotion language and color symbols

[¹]: Merritt, Justin, and David Castro. 2020. “Pentatonic and Diatonic Modes.” In Comprehensive Aural Skills, 159–70. Second edition. | New York: Routledge, 2020.: Routledge.

--

--

Wavv

Empowering Entertainment with AI, Connecting Creativity and Technology, Building the Digital World of Tomorrow.