My journey in AI begins with grammar. Raised in a mathematical home, I think I was discovering prime numbers when most kids learn the alphabet.
The Quick Brown, Mechanical Fox
Math seemed to make sense but English (this may be called “Letters”, “Language”, “Writing”, etc. outside the USA or UK) was frustrating due to a lack of properties such as the commutative law and order of operations, unless you count “I before E…”. Ask 1,000 children the answer to 2*3 + 4 and you should (hopefully) get the same answer. Ask the same children to describe their summer vacation and you get 1,000 unique versions.
Learning about grammar was much more comforting. Here were some rules to govern language and give it structure. Many years later, the development and subsequent popularity of Natural Language Processing (NLP) methods and tools have tightly bound mathematics and language—whether you prefer infinite series or split infinitives, you should be happy.
Language Models and LSTM
It turns out that you teach a machine to write differently than you’d teach human students in Freshman composition. If you consider language to be merely a sequence of words (with some punctuation), there is a very effective algorithm known as a “Recurrent Neural Network” that can guess what token (i.e. word) comes next by analyzing a very, very large number of sequences.
A further refinement, the “Long Short-Term Memory” (LSTM) architecture (see above), allows the machine to train itself about how much to “remember & forget” about the previous sequence when predicting the next token. An elegant outcome of this method is the ability to generate “original” output. This now-classic blog includes some background on LSTM and a repo to get you generating “brand-new” Shakespeare in minutes. (GitHub Link)
Every Good Boy Does Fine*
* This is a common mnemonic for the ascending lines on the treble clef: E-G-B-D-F.
Since language is the order of words, and LSTM can be used to teach a machine to generate sequences that seem reasonable, I wondered if similar techniques could be applied to music. The sequence here is the melody and timing (plus chords, rests, lyrics, etc.). An open-source package that provides a good introduction and a few interactive tools is Google’s Magenta.
Magenta has a demo called “Piano Genie” that turns an 8-note keyboard into a full 88-key piano experience. It uses a pair of LSTMs (pictured above) to convert the input (8-keys) into a sequence that can then be decoded into notes on the full keyboard (88-keys), thus providing a mapping that can be considered an improvisation.
The “Vanilla Ice” Turing Test
If you experimented with the links above, you may have noticed that the output might be musically correct, but chances are you won’t be whistling the melodies in the shower tomorrow. Maybe it’s not fair to compare AI compositions to the greats such as Mozart, Beethoven, etc. whose works have been appreciated and played by piano students for 200+ years (including myself).
It isn’t just computers that have trouble writing great songs. Humans do, too. We have a somewhat-derogatory name for those who manage this amazing feat only once: “one-hit wonder.” Inclusions to this category are highly subjective, but Vanilla Ice seems to earn an entry on most compilations (one hit wonders even have a dedicated day).
It seems that we’re still far off from a machine creating something artistically unique that isn’t just a dexterous arrangement of the examples it’s been presented. The implication for today’s data scientist should be that regardless of the complexity of your machine-learning models, the bright ideas still need to come from you. When a machine can compose a piece of music regarded to be better than Vanilla Ice, then it may be time to face the music…