M3309.002900: Machine Listening
-
Lecturer
-
Prof. Kyogu Lee ([email protected])
Department of Intelligence and Information;
jointly affiliated with Graduate School of Artificial Intelligence
-
Teaching assistant
Lecture Info
- Lectures & Labs
- Where: GSCST 18-212
- When: Mondays 09:00 ~ 11:50 AM
- Office hours
- After class (other times by appointment only)
Course Details
- Credit-Lecture-LabĀ 3-3-0
- Course Completion Classification: Combined Masters/Doctorate
- Teaching materials
- Lecture slides
- Textbooks (references) you will find useful:
- Spectral Audio Signal Processing by Julius O. Smith
- Pattern Recognition and Machine Learning by Christopher Bishop
- Deep Learning by Ian Goodfellow, Yoshua Bengio, and Aaron Courville
- Papers from relevant conferences and/or journals will be used as references.
Course Outline
Machine Listening (or Computer Audition) is one of the most widely used fields of artificial intelligence, in addition to Computer Vision. There are already many machine listening services such as speech recognition algorithms, like Siri, and automatic music search through audio fingerprinting, penetrating deep into our lives. In this course, you will learn through a series of lectures about the fundamentals of state-of-the-art the machine learning algorithms used to create artificial hearing or machine listening systems, including:
- LTI systems, convolution theorem, sampling theorem
- Mel spectrogram, MFCC, Spectral envelope, Linear prediction, Phase vocoder
- Onset detection, F0 detection, Chroma, CQT, Key & Chord Estimation
- Deep learning, Back-propagation, CNNs, RNNs, Transformers
- Acoustic scene classification
- Audio-text multimodal learning
- Neural harmonic-plus-noise model
Students will have a chance to actually implement such algorithms in the lab sessions. Finally, we aim to build a real-world system that can be applied to audio/music/auditory perception through the final project.