Sound Classification and Detection with STFT and CNNs


Sound Classification and Detection with STFT and CNNs

Applications for audio based machine learning include virtual assistants, automatic speech recognition, speech to text, firearm locators, vehicle accident early detection, wildlife monitoring, audio anomaly detection, denoising, and music classification.

After completing this workshop, you will be able to use a short-time Fourier transform to convert audio into features suitable for use in machine learning models, and apply these features in a sound classification and sound detection task. You will be able to develop your own feature generation pipeline for audio data, and be able to implement and adapt published sound detection and classification models for your particular use case.

In part 1, we will discuss characteristics of sound waveforms, import sample audio, and produce spectrograms using a Short Time Fourier Transform (STFT). We will also investigate the consequence of different choices of rectangular, triangular, and Hann window functions used in STFT.

In part 2, we will work through a wildlife monitoring use case which will require using STFT to transform audio recordings of rainforest sounds into spectrograms, create time slices of these spectrograms, and classify the sound slices according to species. We will then extend the classification task to a sound detection class and create bounding boxes around time periods during which species calls are present.

This workshop will make use of a Jupyter Lab running inside a Docker container preloaded with required packages - Tensorflow and Librosa. Some familiarity with tensorflow and/or audio data will help with understanding, but is not required for this workshop.

After feature generation, the neural network training shares some similarity with image tasks, so this workshop may also be informative for those seeking to learn more about image classification and object detection.


Ryan Kasichainula is a data science instructor at Galvanize, Inc, an industry leader in technology education, with data science and software engineering immersive bootcamps. They are also an independent data consultant with experience in the technology, agriculture, energy, and pharmaceutical industries. Ryan enjoys applying data science techniques to a wide variety of domains, and they always have at least one side project in the works, usually in the realm of natural language generation.

Open Data Science




Open Data Science
One Broadway
Cambridge, MA 02142

Privacy Settings
We use cookies to enhance your experience while using our website. If you are using our Services via a browser you can restrict, block or remove cookies through your web browser settings. We also use content and scripts from third parties that may use tracking technologies. You can selectively provide your consent below to allow such third party embeds. For complete information about the cookies we use, data we collect and how we process them, please check our Privacy Policy
Consent to display content from - Youtube
Consent to display content from - Vimeo
Google Maps
Consent to display content from - Google