publications:
- title: "Automatic Drum Transcription Using the Student-Teacher Learning Paradigm with Unlabeled Music Data"
  author:
  - name: "Chih-Wei Wu"
    link: "https://researchr.org/alias/chih-wei-wu"
  - name: "Alexander Lerch"
    link: "https://www.AudioContentAnalysis.org"
  year: "2017"
  doi: "https://ismir2017.smcnus.org/wp-content/uploads/2017/10/148_Paper.pdf"
  abstract: "Automatic drum transcription is a sub-task of automatic music transcription that converts drum-related audio events into musical notation. While noticeable progress has been made in the past by combining pattern recognition methods with audio signal processing techniques, the major limitation of many state-of-the-art systems still originates from the difficulty of obtaining a meaningful amount of annotated data to support the data-driven algorithms. In this work, we address the challenge of insufficiently labeled data by exploring the possibility of utilizing unlabeled music data from online resources. Specifically, a student neural network is trained using the labels generated from multiple teacher systems. The performance of the model is evaluated on a publicly available dataset. The results show the general viability of using unlabeled music data to improve the performance of drum transcription systems."
  links:
    doi: "https://ismir2017.smcnus.org/wp-content/uploads/2017/10/148_Paper.pdf"
    dblp: "http://dblp.uni-trier.de/rec/bibtex/conf/ismir/WuL17"
  researchr: "https://researchr.org/publication/WuL17-21"
  cites: 0
  citedby: 0
  pages: "613-620"
  booktitle: "ismir"
  kind: "inproceedings"
  key: "WuL17-21"
- title: "Learning to Traverse Latent Spaces for Musical Score Inpainting"
  author:
  - name: "Ashis Pati"
    link: "https://researchr.org/alias/ashis-pati"
  - name: "Alexander Lerch"
    link: "https://www.AudioContentAnalysis.org"
  - name: "Gaëtan Hadjeres"
    link: "https://researchr.org/alias/ga%C3%ABtan-hadjeres"
  year: "2019"
  doi: "http://archives.ismir.net/ismir2019/paper/000040.pdf"
  links:
    doi: "http://archives.ismir.net/ismir2019/paper/000040.pdf"
    dblp: "http://dblp.uni-trier.de/rec/bibtex/conf/ismir/PatiLH19"
  researchr: "https://researchr.org/publication/PatiLH19"
  cites: 0
  citedby: 0
  pages: "343-351"
  booktitle: "ismir"
  kind: "inproceedings"
  key: "PatiLH19"
- title: "Automatic Sample Detection in Polyphonic Music"
  author:
  - name: "Siddharth Gururani"
    link: "https://researchr.org/alias/siddharth-gururani"
  - name: "Alexander Lerch"
    link: "https://www.AudioContentAnalysis.org"
  year: "2017"
  doi: "https://ismir2017.smcnus.org/wp-content/uploads/2017/10/118_Paper.pdf"
  abstract: "The term `sampling' refers to the usage of snippets or loops from existing songs or sample libraries in new songs, mashups, or other music productions. The ability to automatically detect sampling in music is, for instance, beneficial for studies tracking artist influences geographically and temporally. We present a method based on Non-negative Matrix Factorization (NMF) and Dynamic Time Warping (DTW) for the automatic detection of a sample in a pool of songs. The method comprises of two processing steps: first, the DTW alignment path between NMF activations of a song and query sample is computed. Second, features are extracted from this path and used to train a Random Forest classifier to detect the presence of the sample. The method is able to identify samples that are pitch shifted and/or time stretched with approximately 63% F-measure. We evaluate this method against a new publicly available dataset of real-world sample and song pairs."
  links:
    doi: "https://ismir2017.smcnus.org/wp-content/uploads/2017/10/118_Paper.pdf"
    dblp: "http://dblp.uni-trier.de/rec/bibtex/conf/ismir/GururaniL17"
  researchr: "https://researchr.org/publication/GururaniL17"
  cites: 0
  citedby: 0
  pages: "264-271"
  booktitle: "ismir"
  kind: "inproceedings"
  key: "GururaniL17"
- title: "Assessment of Percussive Music Performances with Feature Learning"
  author:
  - name: "Chih-Wei Wu"
    link: "https://researchr.org/alias/chih-wei-wu"
  - name: "Alexander Lerch"
    link: "https://www.AudioContentAnalysis.org"
  year: "2018"
  doi: "https://doi.org/10.1142/S1793351X18400147"
  links:
    doi: "https://doi.org/10.1142/S1793351X18400147"
    dblp: "http://dblp.uni-trier.de/rec/bibtex/journals/ijsc/WuL18"
  researchr: "https://researchr.org/publication/WuL18-45"
  cites: 0
  citedby: 0
  journal: "ijsc"
  volume: "12"
  number: "3"
  pages: "315-333"
  kind: "article"
  key: "WuL18-45"
- title: "Learned Features for the Assessment of Percussive Music Performances"
  author:
  - name: "Wu, Chih-Wei"
    link: "https://researchr.org/alias/wu%2C-chih-wei"
  - name: "Alexander Lerch"
    link: "https://www.AudioContentAnalysis.org"
  year: "2018"
  abstract: "The automatic assessment of (student) music performance involves the characterization of the audio recordings and the modeling of human judgments. To build a computational model that provides a reliable assessment, the system must take into account various aspects of a performance including technical correctness and aesthetic standards. While some progress has been made in recent years, the search for an effective feature representation remains open-ended. In this study, we explore the possibility of using learned features from sparse coding. Specifically, we investigate three sets of features, namely a baseline set, a set of designed features, and a feature set learned with sparse coding. In addition, we compare the impact of two different input representations on the effectiveness of the learned features. The evaluation is performed on a dataset of annotated recordings of students playing snare exercises. The results imply the general viability of feature learning in the context of automatic assessment of music performances."
  links:
    "url": "http://www.musicinformatics.gatech.edu/wp-content_nondefault/uploads/2018/01/Wu_Lerch_2018_Learned-Features-for-the-Assessment-of-Percussive-Music-Performances.pdf"
  researchr: "https://researchr.org/publication/wulearned2018-0"
  cites: 0
  citedby: 0
  booktitle: "Proceedings of the International Conference on Semantic Computing ({ICSC})"
  kind: "inproceedings"
  key: "wulearned2018-0"
- title: "Learning to code through MIR"
  author:
  - name: "Xambo, Anna"
    link: "https://researchr.org/alias/xambo%2C-anna"
  - name: "Alexander Lerch"
    link: "https://www.AudioContentAnalysis.org"
  - name: "Freeman, Jason"
    link: "https://researchr.org/alias/freeman%2C-jason"
  year: "2016"
  abstract: "An approach to teaching computer science (CS) in high- schools is using EarSketch, a free online tool for teaching CS concepts while making music. In this demonstration we present the potential of teaching music information retrieval (MIR) concepts using EarSketch. The aim is twofold: to discuss the benefits of introducing MIR concepts in the classroom and to shed light on how MIR concepts can be gently introduced in a CS curriculum. We conclude by identifying the advantagesofteachingMIR inthe classroom and pointing to future directions for research."
  links:
    "url": "http://www.musicinformatics.gatech.edu/wp-content_nondefault/uploads/2016/08/Xambo-et-al.-2016-Learning-to-code-through-MIR.pdf"
  researchr: "https://researchr.org/publication/xambolearning2016"
  cites: 0
  citedby: 0
  booktitle: "Proceedings of the International Society for Music Information Retrieval Conference ({ISMIR}), Late Breaking Demo ({Extended} {Abstract})"
  kind: "inproceedings"
  key: "xambolearning2016"
- title: "Analysis of Speech Rhythm for Language Identification Based on Beat Histograms"
  author:
  - name: "Lykartsis, Athanasios"
    link: "https://researchr.org/alias/lykartsis%2C-athanasios"
  - name: "Alexander Lerch"
    link: "https://www.AudioContentAnalysis.org"
  - name: "Weinzierl, Stefan"
    link: "https://researchr.org/alias/weinzierl%2C-stefan"
  year: "2015"
  links:
    "url": "http://www.musicinformatics.gatech.edu/wp-content_nondefault/uploads/2015/06/Lykartsis%20et%20al_2015_Analysis%20of%20Speech%20Rhythm%20for%20Language%20Identification%20Based%20on%20Beat%20Histograms.pdf"
  researchr: "https://researchr.org/publication/lykartsisanalysis2015"
  cites: 0
  citedby: 0
  booktitle: "Proceedings of the DAGA ({Jahrestagung} fur {Akustik})"
  kind: "inproceedings"
  key: "lykartsisanalysis2015"
- title: "Live Repurposing of Sounds: MIR Explorations with Personal and Crowdsourced Databases"
  author:
  - name: "Anna Xambó"
    link: "https://researchr.org/alias/anna-xamb%C3%B3"
  - name: "Gerard Roma"
    link: "https://researchr.org/alias/gerard-roma"
  - name: "Alexander Lerch"
    link: "https://www.AudioContentAnalysis.org"
  - name: "Mathieu Barthet"
    link: "https://researchr.org/alias/mathieu-barthet"
  - name: "György Fazekas"
    link: "https://researchr.org/alias/gy%C3%B6rgy-fazekas"
  year: "2018"
  doi: "http://www.nime.org/proceedings/2018/nime2018_paper0081.pdf"
  links:
    doi: "http://www.nime.org/proceedings/2018/nime2018_paper0081.pdf"
    dblp: "http://dblp.uni-trier.de/rec/bibtex/conf/nime/XamboRLBF18"
  researchr: "https://researchr.org/publication/XamboRLBF18"
  cites: 0
  citedby: 0
  pages: "364-369"
  booktitle: "nime"
  kind: "inproceedings"
  key: "XamboRLBF18"
- title: "Multi-{Track} Crosstalk Reduction"
  author:
  - name: "Seipel, Fabian"
    link: "https://researchr.org/alias/seipel%2C-fabian"
  - name: "Alexander Lerch"
    link: "https://www.AudioContentAnalysis.org"
  year: "2018"
  abstract: "While many music-related blind source separation methods focus on mono or stereo material, the detection and reduction of crosstalk in multi-track recordings is less researched. Crosstalk or ?bleed? of one recorded channel in another is a very common phenomenon in specific genres such as jazz and classical, where all instrumentalists are recorded simultaneously. We present an efficient algorithm that estimates the crosstalk amount in the spectral domain and applies spectral subtraction to remove it. Randomly generated artificial mixtures from various anechoic orchestral source material were employed to develop and evaluate the algorithm, which scores an average SIR-Gain result of 15.14dB on various datasets with different amounts of simulated crosstalk."
  links:
    "url": "http://www.musicinformatics.gatech.edu/wp-content_nondefault/uploads/2018/06/Seipel-and-Lerch-2018-Multi-Track-Crosstalk-Reduction.pdf"
  researchr: "https://researchr.org/publication/seipelmulti-track2018"
  cites: 0
  citedby: 0
  booktitle: "Proceedings of the Audio Engineering Society Convention"
  kind: "inproceedings"
  key: "seipelmulti-track2018"
- title: "Evaluation of Feature Learning Methods for Voice Disorder Detection"
  author:
  - name: "Hongzhao Guan"
    link: "https://researchr.org/alias/hongzhao-guan"
  - name: "Alexander Lerch"
    link: "https://www.AudioContentAnalysis.org"
  year: "2019"
  doi: "https://doi.org/10.1142/S1793351X19400191"
  links:
    doi: "https://doi.org/10.1142/S1793351X19400191"
    dblp: "http://dblp.uni-trier.de/rec/bibtex/journals/ijsc/GuanL19"
  researchr: "https://researchr.org/publication/GuanL19-2"
  cites: 0
  citedby: 0
  journal: "ijsc"
  volume: "13"
  number: "4"
  pages: "453-470"
  kind: "article"
  key: "GuanL19-2"
- title: "Improving Singing Voice Separation using {Attribute}-{Aware} Deep Network"
  author:
  - name: "Swaminathan, Rupak Vignesh"
    link: "https://researchr.org/alias/swaminathan%2C-rupak-vignesh"
  - name: "Alexander Lerch"
    link: "https://www.AudioContentAnalysis.org"
  year: "2019"
  abstract: "Singing Voice Separation (SVS) attempts to separate the predominant singing voice from a polyphonic musical mixture. In this paper, we investigate the effect of introducing attribute-specific information, namely, the frame level vocal activity information as an augmented feature input to a Deep Neural Network performing the separation. Our study considers two types of inputs, i.e, a ground-truth based ?oracle? input and labels extracted by a state-of-the-art model for singing voice activity detection in polyphonic music. We show that the separation network informed of vocal activity learns to differentiate between vocal and non-vocal regions. Such a network thus reduces interference and artifacts better compared to the network agnostic to this side information. Results on the MIR1K dataset show that informing the separation network of vocal activity improves the separation results consistently across all the measures used to evaluate the separation quality."
  links:
    "url": "http://www.musicinformatics.gatech.edu/wp-content_nondefault/uploads/2019/01/Swaminathan-and-Lerch-2019-Improving-Singing-Voice-Separation-using-Attribute.pdf"
  researchr: "https://researchr.org/publication/swaminathanimproving2019"
  cites: 0
  citedby: 0
  booktitle: "Proceedings of the International Workshop on Multilayer Music Representation and Processing ({MMRP})"
  kind: "inproceedings"
  key: "swaminathanimproving2019"
- title: "An Unsupervised Approach to Anomaly Detection in Music Datasets"
  author:
  - name: "Yen-Cheng Lu"
    link: "https://researchr.org/alias/yen-cheng-lu"
  - name: "Chih-Wei Wu"
    link: "https://researchr.org/alias/chih-wei-wu"
  - name: "Chang-Tien Lu"
    link: "https://researchr.org/alias/chang-tien-lu"
  - name: "Alexander Lerch"
    link: "https://www.AudioContentAnalysis.org"
  year: "2016"
  doi: "http://doi.acm.org/10.1145/2911451.2914700"
  links:
    doi: "http://doi.acm.org/10.1145/2911451.2914700"
    dblp: "http://dblp.uni-trier.de/rec/bibtex/conf/sigir/LuWLL16"
  researchr: "https://researchr.org/publication/LuWLL16"
  cites: 0
  citedby: 0
  pages: "749-752"
  booktitle: "sigir"
  kind: "inproceedings"
  key: "LuWLL16"
- title: "Music Information Retrieval"
  author:
  - name: "Alexander Lerch"
    link: "https://www.AudioContentAnalysis.org"
  year: "2014"
  researchr: "https://researchr.org/publication/lerchmusic2014"
  cites: 0
  citedby: 0
  pages: "79-102"
  booktitle: "Akustische Grundlagen der Musik"
  number: "5"
  series: "Handbuch der Systematischen Musikwissenschaft"
  publisher: "Laaber"
  isbn: "978-3-89007-699-7"
  kind: "incollection"
  key: "lerchmusic2014"
- title: "Drum Transcription Using Partially Fixed Non-Negative Matrix Factorization with Template Adaptation"
  author:
  - name: "Chih-Wei Wu"
    link: "https://researchr.org/alias/chih-wei-wu"
  - name: "Alexander Lerch"
    link: "https://www.AudioContentAnalysis.org"
  year: "2015"
  doi: "http://ismir2015.uma.es/articles/199_Paper.pdf"
  links:
    doi: "http://ismir2015.uma.es/articles/199_Paper.pdf"
    dblp: "http://dblp.uni-trier.de/rec/bibtex/conf/ismir/WuL15"
  researchr: "https://researchr.org/publication/WuL15-20"
  cites: 0
  citedby: 0
  pages: "257-263"
  booktitle: "ismir"
  kind: "inproceedings"
  key: "WuL15-20"
- title: "Beat Histogram Features for {Rythm}-based Musical Genre Classification Using Multiple Novelty Functions"
  author:
  - name: "Lykartsis, Athanasios"
    link: "https://researchr.org/alias/lykartsis%2C-athanasios"
  - name: "Alexander Lerch"
    link: "https://www.AudioContentAnalysis.org"
  year: "2015"
  abstract: "In this paper we present beat histogram features for multiple level rhythmdescriptionandevaluatetheminamusicalgenreclassifica- tion task. Audio features pertaining to various musical content cat- egories and their related novelty functions are extracted as a basis for the creation of beat histograms. The proposed features capture not only amplitude, but also tonal and general spectral changes in the signal, aiming to represent as much rhythmic information as possible. The most and least informative features are identi- fied through feature selection methods and are then tested using Support Vector Machines on five genre datasets concerning classi- fication accuracy against a baseline feature set. Results show that the presented features provide comparable classification accuracy with respect to other genre classification approaches using period- icity histograms and display a performance close to that of much more elaborate up-to-date approaches for rhythm description. The use of bar boundary annotations for the texture frames has pro- vided an improvement for the dance-oriented Ballroom dataset. The comparably small number of descriptors and the possibility of evaluating the influence of specific signal components to the gen- eral rhythmic content encourage the further use of the method in rhythm description tasks."
  links:
    "url": "http://www.musicinformatics.gatech.edu/wp-content_nondefault/uploads/2015/12/DAFx-15_submission_42-1.pdf"
  researchr: "https://researchr.org/publication/lykartsisbeat2015"
  cites: 0
  citedby: 0
  booktitle: "Proceedings of the International Conference on Digital Audio Effects (DAFX)"
  kind: "inproceedings"
  key: "lykartsisbeat2015"
- title: "From Labeled to Unlabeled Data - On the Data Challenge in Automatic Drum Transcription"
  author:
  - name: "Chih-Wei Wu"
    link: "https://researchr.org/alias/chih-wei-wu"
  - name: "Alexander Lerch"
    link: "https://www.AudioContentAnalysis.org"
  year: "2018"
  doi: "http://ismir2018.ircam.fr/doc/pdfs/185_Paper.pdf"
  links:
    doi: "http://ismir2018.ircam.fr/doc/pdfs/185_Paper.pdf"
    dblp: "http://dblp.uni-trier.de/rec/bibtex/conf/ismir/WuL18"
  researchr: "https://researchr.org/publication/WuL18-21"
  cites: 0
  citedby: 0
  pages: "445-452"
  booktitle: "ismir"
  kind: "inproceedings"
  key: "WuL18-21"
- title: "Music Performance Analysis: A Survey"
  author:
  - name: "Alexander Lerch"
    link: "https://www.AudioContentAnalysis.org"
  - name: "Claire Arthur"
    link: "https://researchr.org/alias/claire-arthur"
  - name: "Ashis Pati"
    link: "https://researchr.org/alias/ashis-pati"
  - name: "Siddharth Gururani"
    link: "https://researchr.org/alias/siddharth-gururani"
  year: "2019"
  doi: "http://archives.ismir.net/ismir2019/paper/000002.pdf"
  links:
    doi: "http://archives.ismir.net/ismir2019/paper/000002.pdf"
    dblp: "http://dblp.uni-trier.de/rec/bibtex/conf/ismir/LerchAPG19"
  researchr: "https://researchr.org/publication/LerchAPG19"
  cites: 0
  citedby: 0
  pages: "33-43"
  booktitle: "ismir"
  kind: "inproceedings"
  key: "LerchAPG19"
- title: "Tuning Frequency Dependency in Music Classification"
  author:
  - name: "Qin, Yi"
    link: "https://researchr.org/alias/qin%2C-yi"
  - name: "Alexander Lerch"
    link: "https://www.AudioContentAnalysis.org"
  year: "2019"
  doi: "10.1109/ICASSP.2019.8683340"
  abstract: "Deep architectures have become ubiquitous in Music Information Retrieval (MIR) tasks, however, concurrent studies still lack a deep understanding of the input properties being evaluated by the networks. In this study, we show by the example of a Music Genre Classification system the potential dependency on the tuning frequency, an irrelevant and confounding variable. We generate adversarial samples through pitch-shifting the audio data and investigate the classification accuracy of the output depending on the pitch shift. We find the accuracy to be periodic with a period of one semitone, indicating that the system is utilizing tuning information. We show that proper data augmentation including pitch-shifts smaller than one semitone helps minimizing this problem and point out the need for carefully designed augmentation procedures in related MIR tasks."
  links:
    "url": "http://www.musicinformatics.gatech.edu/wp-content_nondefault/uploads/2019/04/Qin-and-Lerch-2019-Tuning-Frequency-Dependency-in-Music-Classificatio.pdf"
  researchr: "https://researchr.org/publication/qintuning2019"
  cites: 0
  citedby: 0
  pages: "401-405"
  booktitle: "Proceedings of the International Conference on Acoustics Speech and Signal Processing ({ICASSP})"
  kind: "inproceedings"
  key: "qintuning2019"
- title: "Music Information Retrieval in Live Coding: A Theoretical Framework"
  author:
  - name: "Anna Xambó"
    link: "https://researchr.org/alias/anna-xamb%C3%B3"
  - name: "Alexander Lerch"
    link: "https://www.AudioContentAnalysis.org"
  - name: "Jason Freeman"
    link: "https://researchr.org/alias/jason-freeman"
  year: "2018"
  doi: "https://doi.org/10.1162/comj_a_00484"
  links:
    doi: "https://doi.org/10.1162/comj_a_00484"
    dblp: "http://dblp.uni-trier.de/rec/bibtex/journals/comj/XamboLF18"
  researchr: "https://researchr.org/publication/XamboLF18"
  cites: 0
  citedby: 0
  journal: "comj"
  volume: "42"
  number: "4"
  pages: "9-25"
  kind: "article"
  key: "XamboLF18"
- title: "Concert Stitch: Organization and Synchronization of Crowd Sourced Recordings"
  author:
  - name: "Vinod Subramanian"
    link: "https://researchr.org/alias/vinod-subramanian"
  - name: "Alexander Lerch"
    link: "https://www.AudioContentAnalysis.org"
  year: "2018"
  doi: "http://ismir2018.ircam.fr/doc/pdfs/182_Paper.pdf"
  links:
    doi: "http://ismir2018.ircam.fr/doc/pdfs/182_Paper.pdf"
    dblp: "http://dblp.uni-trier.de/rec/bibtex/conf/ismir/SubramanianL18"
  researchr: "https://researchr.org/publication/SubramanianL18"
  cites: 0
  citedby: 0
  pages: "608-614"
  booktitle: "ismir"
  kind: "inproceedings"
  key: "SubramanianL18"
- title: "Learning Strategies for Voice Disorder Detection"
  author:
  - name: "Guan, Hongzhao"
    link: "https://researchr.org/alias/guan%2C-hongzhao"
  - name: "Alexander Lerch"
    link: "https://www.AudioContentAnalysis.org"
  year: "2019"
  abstract: "Voice disorder is a health issue that is frequently encountered, however, many patients either cannot afford to visit a professional doctor or neglect to take good care of their voice. In order to give a patient a preliminary diagnosis without using professional medical devices, previous research has shown that the detection of voice disorders can be carried out by utilizing machine learning and acoustic features extracted from voice recordings. Considering the increasing popularity of deep learning and feature learning, this study explores the possibilities of using these methods to assign voice recordings into one of the two classes?Normal and Pathological. While the results show the general viability of deep learning and feature learning for the automatic recognition of voice disorder, they also demonstrate the shortcomings of the existing datasets for this task such as insufficient dataset size and lack of generality."
  links:
    "url": "http://www.musicinformatics.gatech.edu/wp-content_nondefault/uploads/2019/01/Guan-and-Lerch-2019-Learning-Strategies-for-Voice-Disorder-Detection.pdf"
  researchr: "https://researchr.org/publication/guanlearning2019"
  cites: 0
  citedby: 0
  booktitle: "Proceedings of the International Conference on Semantic Computing ({ICSC})"
  kind: "inproceedings"
  key: "guanlearning2019"
- title: "A Review of Automatic Drum Transcription"
  author:
  - name: "Chih-Wei Wu"
    link: "https://researchr.org/alias/chih-wei-wu"
  - name: "Christian Dittmar"
    link: "https://researchr.org/alias/christian-dittmar"
  - name: "Carl Southall"
    link: "https://researchr.org/alias/carl-southall"
  - name: "Richard Vogl"
    link: "https://researchr.org/alias/richard-vogl"
  - name: "Gerhard Widmer"
    link: "https://researchr.org/alias/gerhard-widmer"
  - name: "Jason Hockman"
    link: "https://researchr.org/alias/jason-hockman"
  - name: "Meinard Müller"
    link: "https://researchr.org/alias/meinard-m%C3%BCller"
  - name: "Alexander Lerch"
    link: "https://www.AudioContentAnalysis.org"
  year: "2018"
  doi: "https://doi.org/10.1109/TASLP.2018.2830113"
  links:
    doi: "https://doi.org/10.1109/TASLP.2018.2830113"
    dblp: "http://dblp.uni-trier.de/rec/bibtex/journals/taslp/WuDSVWHML18"
  researchr: "https://researchr.org/publication/WuDSVWHML18"
  cites: 0
  citedby: 0
  journal: "taslp"
  volume: "26"
  number: "9"
  pages: "1457-1483"
  kind: "article"
  key: "WuDSVWHML18"
- title: "Instrument Activity Detection in Polyphonic Music using Deep Neural Networks"
  author:
  - name: "Siddharth Gururani"
    link: "https://researchr.org/alias/siddharth-gururani"
  - name: "Cameron Summers"
    link: "https://researchr.org/alias/cameron-summers"
  - name: "Alexander Lerch"
    link: "https://www.AudioContentAnalysis.org"
  year: "2018"
  doi: "http://ismir2018.ircam.fr/doc/pdfs/275_Paper.pdf"
  links:
    doi: "http://ismir2018.ircam.fr/doc/pdfs/275_Paper.pdf"
    dblp: "http://dblp.uni-trier.de/rec/bibtex/conf/ismir/GururaniSL18"
  researchr: "https://researchr.org/publication/GururaniSL18"
  cites: 0
  citedby: 0
  pages: "569-576"
  booktitle: "ismir"
  kind: "inproceedings"
  key: "GururaniSL18"
- title: "Automatic Assessment of Sight-reading Exercises"
  author:
  - name: "Jiawen Huang"
    link: "https://researchr.org/alias/jiawen-huang"
  - name: "Alexander Lerch"
    link: "https://www.AudioContentAnalysis.org"
  year: "2019"
  doi: "http://archives.ismir.net/ismir2019/paper/000070.pdf"
  links:
    doi: "http://archives.ismir.net/ismir2019/paper/000070.pdf"
    dblp: "http://dblp.uni-trier.de/rec/bibtex/conf/ismir/HuangL19"
  researchr: "https://researchr.org/publication/HuangL19-34"
  cites: 0
  citedby: 0
  pages: "581-587"
  booktitle: "ismir"
  kind: "inproceedings"
  key: "HuangL19-34"
- title: "Automatic Practice Logging: Introduction, Dataset & Preliminary Study"
  author:
  - name: "R. Michael Winters"
    link: "https://researchr.org/alias/r.-michael-winters"
  - name: "Siddharth Gururani"
    link: "https://researchr.org/alias/siddharth-gururani"
  - name: "Alexander Lerch"
    link: "https://www.AudioContentAnalysis.org"
  year: "2016"
  doi: "https://wp.nyu.edu/ismir2016/wp-content/uploads/sites/2294/2016/07/181_Paper.pdf"
  links:
    doi: "https://wp.nyu.edu/ismir2016/wp-content/uploads/sites/2294/2016/07/181_Paper.pdf"
    dblp: "http://dblp.uni-trier.de/rec/bibtex/conf/ismir/WintersGL16"
  researchr: "https://researchr.org/publication/WintersGL16"
  cites: 0
  citedby: 0
  pages: "598-604"
  booktitle: "ismir"
  kind: "inproceedings"
  key: "WintersGL16"
- title: "Chord Detection Using Deep Learning"
  author:
  - name: "Xinquan Zhou"
    link: "https://researchr.org/alias/xinquan-zhou"
  - name: "Alexander Lerch"
    link: "https://www.AudioContentAnalysis.org"
  year: "2015"
  doi: "http://ismir2015.uma.es/articles/96_Paper.pdf"
  links:
    doi: "http://ismir2015.uma.es/articles/96_Paper.pdf"
    dblp: "http://dblp.uni-trier.de/rec/bibtex/conf/ismir/ZhouL15"
  researchr: "https://researchr.org/publication/ZhouL15-13"
  cites: 0
  citedby: 0
  pages: "52-58"
  booktitle: "ismir"
  kind: "inproceedings"
  key: "ZhouL15-13"
- title: "On Drum Playing Technique Detection in Polyphonic Mixtures"
  author:
  - name: "Chih-Wei Wu"
    link: "https://researchr.org/alias/chih-wei-wu"
  - name: "Alexander Lerch"
    link: "https://www.AudioContentAnalysis.org"
  year: "2016"
  doi: "https://wp.nyu.edu/ismir2016/wp-content/uploads/sites/2294/2016/07/268_Paper.pdf"
  links:
    doi: "https://wp.nyu.edu/ismir2016/wp-content/uploads/sites/2294/2016/07/268_Paper.pdf"
    dblp: "http://dblp.uni-trier.de/rec/bibtex/conf/ismir/WuL16"
  researchr: "https://researchr.org/publication/WuL16-14"
  cites: 0
  citedby: 0
  pages: "218-224"
  booktitle: "ismir"
  kind: "inproceedings"
  key: "WuL16-14"
- title: "Mixing Secrets: A multitrack dataset for instrument detection in polyphonic music"
  author:
  - name: "Gururani, Siddharth"
    link: "https://researchr.org/alias/gururani%2C-siddharth"
  - name: "Alexander Lerch"
    link: "https://www.AudioContentAnalysis.org"
  year: "2017"
  abstract: "Instrument recognition as a task in MIR is largely data drive. This drives a need for large datasets that cater to the need of these algorithms. Several datasets exist for the task of instrument recognition in monophonic signals. For polyphonic music, creating a finely labeled dataset for instrument recognition is a hard task and using multi-track data eases that process. We present 250+ multi-tracks that have been labeled for instrument recognition and release the annotations to be used in the community. The process of data acquisition, cleaning and labeling has been detailed in this late-breaking demo."
  links:
    "url": "http://www.musicinformatics.gatech.edu/wp-content_nondefault/uploads/2017/10/Gururani_Lerch_2017_Mixing-Secrets.pdf"
  researchr: "https://researchr.org/publication/gururanimixing2017"
  cites: 0
  citedby: 0
  booktitle: "Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), Late Breaking Demo (Extended Abstract)"
  kind: "inproceedings"
  key: "gururanimixing2017"
- title: "Towards the Objective Assessment of Music Performances"
  author:
  - name: "Wu, Chih-Wei"
    link: "https://researchr.org/alias/wu%2C-chih-wei"
  - name: "Gururani, Siddharth"
    link: "https://researchr.org/alias/gururani%2C-siddharth"
  - name: "Laguna, Christopher"
    link: "https://researchr.org/alias/laguna%2C-christopher"
  - name: "Pati, Ashis"
    link: "https://researchr.org/alias/pati%2C-ashis"
  - name: "Vidwans, Amruta"
    link: "https://researchr.org/alias/vidwans%2C-amruta"
  - name: "Alexander Lerch"
    link: "https://www.AudioContentAnalysis.org"
  year: "2016"
  abstract: "The qualitative assessment of music performances is a task that is influenced by technical correctness, deviations from established performance standards, and aesthetic judgment. Despite its inherently subjective nature, a quantitative overall assessment is often desired, as exemplified by US all-state auditions or other competitions. A model that automatically generates assessments from the audio data would allow for objective assessments and enable musically intelligent computer-assisted practice sessions for students learning an instrument. While existing systems are already able to provide similar basic functionality, they rely on the musical score as prior knowledge. In this paper, we present a score-independent system for assessing student instrument performances based on audio recordings. This system aims to characterize the performance with both well-established and custom-designed audio features, model expert assessments of student performances, and predict the assessment of unknown audio recordings. The results imply the viability of modeling human assessment with score-independent audio features. Results could lead towards more general software music tutoring systems that do not require score information for the assessment of student music performances."
  links:
    "url": "http://www.icmpc.org/icmpc14/proceedings.html"
  researchr: "https://researchr.org/publication/wutowards2016"
  cites: 0
  citedby: 0
  pages: "99-103"
  booktitle: "Proceedings of the International Conference on Music Perception and Cognition (ICMPC)"
  kind: "inproceedings"
  key: "wutowards2016"
- title: "CMMSD: A Data Set for Note-Level Segmentation of Monophonic Music"
  author:
  - name: "Henrik von Coler"
    link: "https://researchr.org/alias/henrik-von-coler"
  - name: "Alexander Lerch"
    link: "https://www.AudioContentAnalysis.org"
  year: "2014"
  doi: "http://www.aes.org/e-lib/browse.cfm?elib=17099"
  links:
    doi: "http://www.aes.org/e-lib/browse.cfm?elib=17099"
    dblp: "http://dblp.uni-trier.de/rec/bibtex/conf/semanticaudio/ColerL14"
  researchr: "https://researchr.org/publication/ColerL14"
  cites: 0
  citedby: 0
  booktitle: "semanticaudio"
  kind: "inproceedings"
  key: "ColerL14"
- title: "On the Evaluation of Generative Models in Music"
  author:
  - name: "Yang, Li-Chia"
    link: "https://researchr.org/alias/yang%2C-li-chia"
  - name: "Alexander Lerch"
    link: "https://www.AudioContentAnalysis.org"
  year: "2018"
  month: "nov"
  doi: "10.1007/s00521-018-3849-7"
  abstract: "The modeling of artificial, human-level creativity is becoming more and more achievable. In recent years, neural networks have been successfully applied to different tasks such as image and music generation, demonstrating their great potential in realizing computational creativity. The fuzzy definition of creativity combined with varying goals of the evaluated generative systems, however, makes subjective evaluation seem to be the only viable methodology of choice. We review the evaluation of generative music systems and discuss the inherent challenges of their evaluation. Although subjective evaluation should always be the ultimate choice for the evaluation of creative results, researchers unfamiliar with rigorous subjective experiment design and without the necessary resources for the execution of a large-scale experiment face challenges in terms of reliability, validity, and replicability of the results. In numerous studies, this leads to the report of insignificant and possibly irrelevant results and the lack of comparability with similar and previous generative systems. Therefore, we propose a set of simple musically informed objective metrics enabling an objective and reproducible way of evaluating and comparing the output of music generative systems. We demonstrate the usefulness of the proposed metrics with several experiments on real-world data."
  links:
    "url": "http://www.musicinformatics.gatech.edu/wp-content_nondefault/uploads/2018/11/postprint.pdf"
  researchr: "https://researchr.org/publication/yangevaluation2018"
  cites: 0
  citedby: 0
  journal: "Neural Computing and Applications"
  kind: "article"
  key: "yangevaluation2018"
- title: "Drum transcription using partially fixed non-negative matrix factorization"
  author:
  - name: "Chih-Wei Wu"
    link: "https://researchr.org/alias/chih-wei-wu"
  - name: "Alexander Lerch"
    link: "https://www.AudioContentAnalysis.org"
  year: "2015"
  doi: "http://dx.doi.org/10.1109/EUSIPCO.2015.7362590"
  links:
    doi: "http://dx.doi.org/10.1109/EUSIPCO.2015.7362590"
    dblp: "http://dblp.uni-trier.de/rec/bibtex/conf/eusipco/WuL15"
  researchr: "https://researchr.org/publication/WuL15-25"
  cites: 0
  citedby: 0
  pages: "1281-1285"
  booktitle: "eusipco"
  kind: "inproceedings"
  key: "WuL15-25"
- title: "An Attention Mechanism for Musical Instrument Recognition"
  author:
  - name: "Siddharth Gururani"
    link: "https://researchr.org/alias/siddharth-gururani"
  - name: "Mohit Sharma"
    link: "https://researchr.org/alias/mohit-sharma"
  - name: "Alexander Lerch"
    link: "https://www.AudioContentAnalysis.org"
  year: "2019"
  doi: "http://archives.ismir.net/ismir2019/paper/000007.pdf"
  links:
    doi: "http://archives.ismir.net/ismir2019/paper/000007.pdf"
    dblp: "http://dblp.uni-trier.de/rec/bibtex/conf/ismir/GururaniSL19"
  researchr: "https://researchr.org/publication/GururaniSL19"
  cites: 0
  citedby: 0
  pages: "83-90"
  booktitle: "ismir"
  kind: "inproceedings"
  key: "GururaniSL19"
- title: "Learning Strategies for Voice Disorder Detection"
  author:
  - name: "Hongzhao Guan"
    link: "https://researchr.org/alias/hongzhao-guan"
  - name: "Alexander Lerch"
    link: "https://www.AudioContentAnalysis.org"
  year: "2019"
  doi: "https://doi.org/10.1109/ICOSC.2019.8665504"
  links:
    doi: "https://doi.org/10.1109/ICOSC.2019.8665504"
    dblp: "http://dblp.uni-trier.de/rec/bibtex/conf/semco/GuanL19"
  researchr: "https://researchr.org/publication/GuanL19"
  cites: 0
  citedby: 0
  pages: "295-301"
  booktitle: "semco"
  kind: "inproceedings"
  key: "GuanL19"
- title: "Learned Features for the Assessment of Percussive Music Performances"
  author:
  - name: "Wu, Chih-Wei"
    link: "https://researchr.org/alias/wu%2C-chih-wei"
  - name: "Alexander Lerch"
    link: "https://www.AudioContentAnalysis.org"
  year: "2018"
  abstract: "The automatic assessment of (student) music performance involves the characterization of the audio recordings and the modeling of human judgments. To build a computational model that provides a reliable assessment, the system must take into account various aspects of a performance including technical correctness and aesthetic standards. While some progress has been made in recent years, the search for an effective feature representation remains open-ended. In this study, we explore the possibility of using learned features from sparse coding. Specifically, we investigate three sets of features, namely a baseline set, a set of designed features, and a feature set learned with sparse coding. In addition, we compare the impact of two different input representations on the effectiveness of the learned features. The evaluation is performed on a dataset of annotated recordings of students playing snare exercises. The results imply the general viability of feature learning in the context of automatic assessment of music performances."
  links:
    "url": "http://www.musicinformatics.gatech.edu/wp-content_nondefault/uploads/2018/01/Wu_Lerch_2018_Learned-Features-for-the-Assessment-of-Percussive-Music-Performances.pdf"
  researchr: "https://researchr.org/publication/wulearned2018"
  cites: 0
  citedby: 0
  booktitle: "Proceedings of the International Conference on Semantic Computing (ICSC)"
  kind: "inproceedings"
  key: "wulearned2018"
- title: "Learning to Fuse Music Genres with Generative Adversarial Dual Learning"
  author:
  - name: "Zhiqian Chen"
    link: "https://researchr.org/alias/zhiqian-chen"
  - name: "Chih-Wei Wu"
    link: "https://researchr.org/alias/chih-wei-wu"
  - name: "Yen-Cheng Lu"
    link: "https://researchr.org/alias/yen-cheng-lu"
  - name: "Alexander Lerch"
    link: "https://www.AudioContentAnalysis.org"
  - name: "Chang-Tien Lu"
    link: "https://researchr.org/alias/chang-tien-lu"
  year: "2017"
  abstract: "FusionGAN is a novel genre fusion framework for music generation that integrates the strengths of generative adversarial networks and dual learning. In particular, the proposed method offers a dual learning extension that can effectively integrate the styles of the given domains. To efficiently quantify the difference among diverse domains and avoid the vanishing gradient issue, FusionGAN provides a Wasserstein based metric to approximate the distance between the target domain and the existing domains. Adopting the Wasserstein distance, a new domain is created by combining the patterns of the existing domains using adversarial learning. Experimental results on public music datasets demonstrated that our approach could effectively merge two genres."
  links:
    "url": "http://www.musicinformatics.gatech.edu/wp-content_nondefault/uploads/2017/11/Zhiqian-Chen-et-al_2017_Learning-to-Fuse-Music-Genres-with-Generative-Adversarial-Dual-Learning.pdf"
  researchr: "https://researchr.org/publication/zhiqianchenlearning2017"
  cites: 0
  citedby: 0
  booktitle: "Proceedings of the International Conference on Data Mining (ICDM)"
  kind: "inproceedings"
  key: "zhiqianchenlearning2017"
- title: "Beat Histogram Features from NMF-Based Novelty Functions for Music Classification"
  author:
  - name: "Athanasios Lykartsis"
    link: "https://researchr.org/alias/athanasios-lykartsis"
  - name: "Chih-Wei Wu"
    link: "https://researchr.org/alias/chih-wei-wu"
  - name: "Alexander Lerch"
    link: "https://www.AudioContentAnalysis.org"
  year: "2015"
  doi: "http://ismir2015.uma.es/articles/300_Paper.pdf"
  links:
    doi: "http://ismir2015.uma.es/articles/300_Paper.pdf"
    dblp: "http://dblp.uni-trier.de/rec/bibtex/conf/ismir/LykartsisWL15"
  researchr: "https://researchr.org/publication/LykartsisWL15"
  cites: 0
  citedby: 0
  pages: "434-440"
  booktitle: "ismir"
  kind: "inproceedings"
  key: "LykartsisWL15"
- title: "Automatic Outlier Detection in Music Genre Datasets"
  author:
  - name: "Yen-Cheng Lu"
    link: "https://researchr.org/alias/yen-cheng-lu"
  - name: "Chih-Wei Wu"
    link: "https://researchr.org/alias/chih-wei-wu"
  - name: "Alexander Lerch"
    link: "https://www.AudioContentAnalysis.org"
  - name: "Chang-Tien Lu"
    link: "https://researchr.org/alias/chang-tien-lu"
  year: "2016"
  doi: "https://wp.nyu.edu/ismir2016/wp-content/uploads/sites/2294/2016/07/215_Paper.pdf"
  links:
    doi: "https://wp.nyu.edu/ismir2016/wp-content/uploads/sites/2294/2016/07/215_Paper.pdf"
    dblp: "http://dblp.uni-trier.de/rec/bibtex/conf/ismir/LuWLL16"
  researchr: "https://researchr.org/publication/LuWLL16-0"
  cites: 0
  citedby: 0
  pages: "101-107"
  booktitle: "ismir"
  kind: "inproceedings"
  key: "LuWLL16-0"
- title: "Software based extraction of objective parameters from music performances"
  author:
  - name: "Alexander Lerch"
    link: "https://www.AudioContentAnalysis.org"
  year: "2008"
  doi: "http://opus.kobv.de/tuberlin/volltexte/2008/2067/"
  links:
    doi: "http://opus.kobv.de/tuberlin/volltexte/2008/2067/"
    dblp: "http://dblp.uni-trier.de/rec/bibtex/phd/de/Lerch2008"
  researchr: "https://researchr.org/publication/de-5223"
  cites: 0
  citedby: 0
  school: "Berlin Institute of Technology"
  kind: "phdthesis"
  key: "de-5223"
- title: "Live Repurposing of Sounds: MIR Explorations with Personal and Crowd-sourced Databases"
  author:
  - name: "Xambo, Anna"
    link: "https://researchr.org/alias/xambo%2C-anna"
  - name: "Roma, Gerard"
    link: "https://researchr.org/alias/roma%2C-gerard"
  - name: "Alexander Lerch"
    link: "https://www.AudioContentAnalysis.org"
  - name: "Barthet, Matthieu"
    link: "https://researchr.org/alias/barthet%2C-matthieu"
  - name: "Fazekas, Gyorgy"
    link: "https://researchr.org/alias/fazekas%2C-gyorgy"
  year: "2018"
  abstract: "The recent increase in the accessibility and size of personal and crowd-sourced digital sound collections brought about a valuable resource for music creation. Finding and retrieving relevant sounds in performance leads to challenges that can be approached using music information retrieval (MIR). In this paper, we explore the use of MIR to retrieve and repurpose sounds in musical live coding. We present a live coding system built on SuperCollider enabling the use of audio content from Creative Commons (CC) sound databases such as Freesound or personal sound databases. The novelty of our approach lies in exploiting high-level MIR methods (e.g. query by pitch or rhythmic cues) using live coding techniques applied to sounds. We demonstrate its potential through the reflection of an illustrative case study and the feedback from four expert users. The users tried the system with either a personal database or a crowd-source database and reported its potential in facilitating tailorability of the tool to their own creative workflows. This approach to live repurposing of sounds can be applied to real-time interactive systems for performance and composition beyond live coding, as well as inform live coding and MIR research."
  links:
    "url": "http://www.musicinformatics.gatech.edu/wp-content_nondefault/uploads/2018/04/Xambo-et-al.-2018-Live-Repurposing-of-Sounds-MIR-Explorations-with-.pdf"
  researchr: "https://researchr.org/publication/xambolive2018"
  cites: 0
  citedby: 0
  booktitle: "Proceedings of the Conference on New Interfaces for Musical Expression ({NIME})"
  kind: "inproceedings"
  key: "xambolive2018"
- title: "An Efficient Algorithm For Clipping Detection And Declipping Audio"
  author:
  - name: "Laguna, Christopher"
    link: "https://researchr.org/alias/laguna%2C-christopher"
  - name: "Alexander Lerch"
    link: "https://www.AudioContentAnalysis.org"
  year: "2016"
  abstract: "We present an algorithm for end to end declipping, which includes clipping detection and the replacement of clipped samples. To detect regions of clipping, we analyze the signal?s amplitude histogram and the shape of the signal in the time-domain. The sample replacement algorithm uses a two-pass approach: short regions of clipping are replaced in the time-domain and long regions of clipping are replaced in the frequency-domain. The algorithm is robust against different types of clipping and is efficient compared to existing approaches. The algorithm has been implemented in an open source JavaScript client-side web application. Clipping detection is shown to give an f-measure of 0.92 and is robust to the clipping level."
  links:
    "url": "http://www.musicinformatics.gatech.edu/wp-content_nondefault/uploads/2016/09/Laguna_Lerch_2016_An-Efficient-Algorithm-For-Clipping-Detection-And-Declipping-Audio.pdf"
  researchr: "https://researchr.org/publication/lagunaefficient2016"
  cites: 0
  citedby: 0
  booktitle: "Proceedings of the 141st Audio Engineering Society Convention"
  kind: "inproceedings"
  key: "lagunaefficient2016"
- title: "Assessment of Student Music Performances Using Deep Neural Networks"
  author:
  - name: "Pati, Kumar Ashis"
    link: "https://researchr.org/alias/pati%2C-kumar-ashis"
  - name: "Gururani, Siddharth"
    link: "https://researchr.org/alias/gururani%2C-siddharth"
  - name: "Alexander Lerch"
    link: "https://www.AudioContentAnalysis.org"
  year: "2018"
  month: "mar"
  doi: "10.3390/app8040507"
  abstract: "Music performance assessment is a highly subjective task often relying on experts to gauge both the technical and aesthetic aspects of the performance from the audio signal. This article explores the task of building computational models for music performance assessment, i.e., analyzing an audio recording of a performance and rating it along several criteria such as musicality, note accuracy, etc. Much of the earlier work in this area has been centered around using hand-crafted features intended to capture relevant aspects of a performance. However, such features are based on our limited understanding of music perception and may not be optimal. In this article, we propose using Deep Neural Networks (DNNs) for the task and compare their performance against a baseline model using standard and hand-crafted features. We show that, using input representations at different levels of abstraction, DNNs can outperform the baseline models across all assessment criteria. In addition, we use model analysis techniques to further explain the model predictions in an attempt to gain useful insights into the assessment process. The results demonstrate the potential of using supervised feature learning techniques to better characterize music performances."
  links:
    "url": "http://www.mdpi.com/2076-3417/8/4/507"
  researchr: "https://researchr.org/publication/patiassessment2018"
  cites: 0
  citedby: 0
  journal: "Applied Sciences"
  volume: "8"
  number: "4"
  pages: "507"
  kind: "article"
  key: "patiassessment2018"
- title: "On the Requirement of Automatic Tuning Frequency Estimation"
  author:
  - name: "Alexander Lerch"
    link: "https://www.AudioContentAnalysis.org"
  year: "2006"
  researchr: "https://researchr.org/publication/Lerch06"
  cites: 0
  citedby: 0
  pages: "212-215"
  booktitle: "ismir"
  kind: "inproceedings"
  key: "Lerch06"
- title: "A Review of Automatic Drum Transcription"
  author:
  - name: "Wu, Chih-Wei"
    link: "https://researchr.org/alias/wu%2C-chih-wei"
  - name: "Dittmar, Christian"
    link: "https://researchr.org/alias/dittmar%2C-christian"
  - name: "Southall, Carl"
    link: "https://researchr.org/alias/southall%2C-carl"
  - name: "Vogl, Richard"
    link: "https://researchr.org/alias/vogl%2C-richard"
  - name: "Widmer, Gerhard"
    link: "https://researchr.org/alias/widmer%2C-gerhard"
  - name: "Hockman, Jason A"
    link: "https://researchr.org/alias/hockman%2C-jason-a"
  - name: "Muller, Meinard"
    link: "https://researchr.org/alias/muller%2C-meinard"
  - name: "Alexander Lerch"
    link: "https://www.AudioContentAnalysis.org"
  year: "2018"
  doi: "10.1109/TASLP.2018.2830113"
  abstract: "In Western popular music, drums and percussion are an important means to emphasize and shape the rhythm, often defining the musical style. If computers were able to analyze the drum part in recorded music, it would enable a variety of rhythm-related music processing tasks. Especially the detection and classification of drum sound events by computational methods is considered to be an important and challenging research problem in the broader field of Music Information Retrieval. Over the last two decades, several authors have attempted to tackle this problem under the umbrella term Automatic Drum Transcription (ADT). This paper presents a comprehensive review of ADT research, including a thorough discussion of the task-specific challenges, categorization of existing techniques, and evaluation of several state-of-the-art systems. To provide more insights on the practice of ADT systems, we focus on two families of ADT techniques, namely methods based on Non-negative Matrix Factorization and Recurrent Neural Networks. We explain the methods' technical details and drum-specific variations and evaluate these approaches on publicly available datasets with a consistent experimental setup. Finally, the open issues and under-explored areas in ADT research are identified and discussed, providing future directions in this field."
  links:
    "url": "http://www.musicinformatics.gatech.edu/wp-content_nondefault/uploads/2018/05/Wu-et-al.-2018-A-review-of-automatic-drum-transcription.pdf"
  researchr: "https://researchr.org/publication/wureview2018"
  cites: 0
  citedby: 0
  journal: "IEEE/ACM Transactions on Audio, Speech, and Language Processing"
  volume: "26"
  number: "9"
  pages: "1457-1483"
  kind: "article"
  key: "wureview2018"
- title: "Analysis of Objective Descriptors for Music Performance Assessment"
  author:
  - name: "Gururani, Siddharth"
    link: "https://researchr.org/alias/gururani%2C-siddharth"
  - name: "Pati, Kumar Ashis"
    link: "https://researchr.org/alias/pati%2C-kumar-ashis"
  - name: "Wu, Chih-Wei"
    link: "https://researchr.org/alias/wu%2C-chih-wei"
  - name: "Alexander Lerch"
    link: "https://www.AudioContentAnalysis.org"
  year: "2018"
  abstract: "The assessment of musical performances in, e.g., student competitions or auditions, is a largely subjective evaluation of a performer's technical skills and expressivity. Objective descriptors extracted from the audio signal have been proposed for automatic performance assessment in such a context. Such descriptors represent different aspects of pitch, dynamics and timing of a performance and have been shown to be reasonably successful in modeling human assessments of student performances through regression. This study aims to identify the influence of individual descriptors on models of human assessment in 4 categories: musicality, note accuracy, rhythmic accuracy, and tone quality. To evaluate the influence of the individual descriptors, the descriptors highly correlated with the human assessments are identified. Subsequently, various subsets are chosen using different selection criteria and the adjusted R-squared metric is computed to evaluate the degree to which these subsets explain the variance in the assessments. In addition, sequential forward selection is performed to identify the most meaningful descriptors. The goal of this study is to gain insights into which objective descriptors contribute most to the human assessments as well as to identify a subset of well-performing descriptors. The results indicate that a small subset of the designed descriptors can perform at a similar accuracy as the full set of descriptors. Sequential forward selection shows how around 33\\% of the descriptors do not add new information to the linear regression models, pointing towards redundancy in the descriptors."
  links:
    "url": "http://www.musicinformatics.gatech.edu/wp-content_nondefault/uploads/2018/06/Gururani-et-al.-2018-Analysis-of-Objective-Descriptors-for-Music-Perfor.pdf"
  researchr: "https://researchr.org/publication/gururanianalysis2018"
  cites: 0
  citedby: 0
  booktitle: "Proceedings of the International Conference on Music Perception and Cognition ({ICMPC})"
  kind: "inproceedings"
  key: "gururanianalysis2018"
- title: "MDB Drums --- An Annotated Subset of MedleyDB for Automatic Drum Transcription"
  author:
  - name: "Southall, Carl"
    link: "https://researchr.org/alias/southall%2C-carl"
  - name: "Wu, Chih-Wei"
    link: "https://researchr.org/alias/wu%2C-chih-wei"
  - name: "Alexander Lerch"
    link: "https://www.AudioContentAnalysis.org"
  - name: "Hockman, Jason A"
    link: "https://researchr.org/alias/hockman%2C-jason-a"
  year: "2017"
  abstract: "In this paper we present MDB Drums, a new dataset for automatic drum transcription (ADT) tasks. This dataset is built on top of the MusicDelta subset of the MedleyDB dataset, taking advantage of real-world recordings in multi-track format. The dataset is comprised of a variety of genres, providing a balanced pool for developing and evaluating ADT models with respect to various musical styles. To reduce the cost of the labor-intensive process of manual annotation, a semi-automatic process was utilised in both the annotation and quality control processes. The pre sented dataset consists of 23 tracks with a total of 7994 onsets. These onsets are divided into 6 classes based on drum instruments or 21 subclasses based on playing techniques. Every track consists of a drum-only track as well as multiple accompanied tracks, enabling audio files containing different combinations of instruments to be used in the ADT evaluation process."
  links:
    "url": "http://www.musicinformatics.gatech.edu/wp-content_nondefault/uploads/2017/10/Wu-et-al_2017_MDB-Drums-An-Annotated-Subset-of-MedleyDB-for-Automatic-Drum-Transcription.pdf"
  researchr: "https://researchr.org/publication/southallmdb2017"
  cites: 0
  citedby: 0
  booktitle: "Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), Late Breaking Demo (Extended Abstract)"
  kind: "inproceedings"
  key: "southallmdb2017"
- title: "Objective Descriptors for the Assessment of Student Music Performances"
  author:
  - name: "Amruta Vidwans"
    link: "https://researchr.org/alias/amruta-vidwans"
  - name: "Siddharth Gururani"
    link: "https://researchr.org/alias/siddharth-gururani"
  - name: "Chih-Wei Wu"
    link: "https://researchr.org/alias/chih-wei-wu"
  - name: "Vinod Subramanian"
    link: "https://researchr.org/alias/vinod-subramanian"
  - name: "Rupak Swaminathan"
    link: "https://researchr.org/alias/rupak-swaminathan"
  - name: "Alexander Lerch"
    link: "https://www.AudioContentAnalysis.org"
  year: "2017"
  doi: "http://www.aes.org/e-lib/browse.cfm?elib=18758"
  abstract: "Assessment of students’ music performances is a subjective task that requires the judgment of technical correctness as well as aesthetic properties. A computational model automatically evaluating music performance based on objective measurements could ensure consistent and reproducible assessments for, e.g., automatic music tutoring systems. In this study, we investigate the effectiveness of various audio descriptors for assessing performances. Specifically, three different sets of features, including a baseline set, score-independent features, and score-based features, are compared with respect to their efficiency in regression tasks. The results show that human assessments can be modeled to a certain degree, however, the generality of the model still needs further investigation."
  links:
    doi: "http://www.aes.org/e-lib/browse.cfm?elib=18758"
    dblp: "http://dblp.uni-trier.de/rec/bibtex/conf/semanticaudio/VidwansGWSSL17"
  researchr: "https://researchr.org/publication/VidwansGWSSL17"
  cites: 0
  citedby: 0
  booktitle: "semanticaudio"
  kind: "inproceedings"
  key: "VidwansGWSSL17"
- title: "Tuning Frequency Dependency in Music Classification"
  author:
  - name: "Yi Qin"
    link: "https://researchr.org/alias/yi-qin"
  - name: "Alexander Lerch"
    link: "https://www.AudioContentAnalysis.org"
  year: "2019"
  doi: "https://doi.org/10.1109/ICASSP.2019.8683340"
  links:
    doi: "https://doi.org/10.1109/ICASSP.2019.8683340"
    dblp: "http://dblp.uni-trier.de/rec/bibtex/conf/icassp/QinL19"
  researchr: "https://researchr.org/publication/QinL19-1"
  cites: 0
  citedby: 0
  pages: "401-405"
  booktitle: "icassp"
  kind: "inproceedings"
  key: "QinL19-1"
- title: "Learning to Fuse Music Genres with Generative Adversarial Dual Learning"
  author:
  - name: "Zhiqian Chen"
    link: "https://researchr.org/alias/zhiqian-chen"
  - name: "Chih-Wei Wu"
    link: "https://researchr.org/alias/chih-wei-wu"
  - name: "Yen-Cheng Lu"
    link: "https://researchr.org/alias/yen-cheng-lu"
  - name: "Alexander Lerch"
    link: "https://www.AudioContentAnalysis.org"
  - name: "Chang-Tien Lu"
    link: "https://researchr.org/alias/chang-tien-lu"
  year: "2017"
  doi: "http://doi.ieeecomputersociety.org/10.1109/ICDM.2017.98"
  links:
    doi: "http://doi.ieeecomputersociety.org/10.1109/ICDM.2017.98"
    dblp: "http://dblp.uni-trier.de/rec/bibtex/conf/icdm/ChenWLLL17"
  researchr: "https://researchr.org/publication/ChenWLLL17"
  cites: 0
  citedby: 0
  pages: "817-822"
  booktitle: "icdm"
  kind: "inproceedings"
  key: "ChenWLLL17"
- title: "A Dataset and Method for Guitar Solo Detection in Rock Music"
  author:
  - name: "Kumar Ashis Pati"
    link: "https://researchr.org/alias/kumar-ashis-pati"
  - name: "Alexander Lerch"
    link: "https://www.AudioContentAnalysis.org"
  year: "2017"
  doi: "http://www.aes.org/e-lib/browse.cfm?elib=18773"
  abstract: "This paper explores the problem of automatically detecting electric guitar solos in rock music. A baseline study using standard spectral and temporal audio features in conjunction with an SVM classifier is carried out. To improve detection rates, custom features based on predominant pitch and structural segmentation of songs are designed and investigated. The evaluation of different feature combinations suggests that the combination of all features followed by a post-processing step results in the best accuracy. A macro-accuracy of 78.6% with a solo detection precision of 63.3% is observed for the best feature combination. This publication is accompanied by release of an annotated dataset of electric guitar solos to encourage future research in this area."
  links:
    doi: "http://www.aes.org/e-lib/browse.cfm?elib=18773"
    dblp: "http://dblp.uni-trier.de/rec/bibtex/conf/semanticaudio/PatiL17"
  researchr: "https://researchr.org/publication/PatiL17-1"
  cites: 0
  citedby: 0
  booktitle: "semanticaudio"
  kind: "inproceedings"
  key: "PatiL17-1"
- title: "Learned Features for the Assessment of Percussive Music Performances"
  author:
  - name: "Chih-Wei Wu"
    link: "https://researchr.org/alias/chih-wei-wu"
  - name: "Alexander Lerch"
    link: "https://www.AudioContentAnalysis.org"
  year: "2018"
  doi: "http://doi.ieeecomputersociety.org/10.1109/ICSC.2018.00022"
  links:
    doi: "http://doi.ieeecomputersociety.org/10.1109/ICSC.2018.00022"
    dblp: "http://dblp.uni-trier.de/rec/bibtex/conf/semco/WuL18"
  researchr: "https://researchr.org/publication/WuL18-9"
  cites: 0
  citedby: 0
  pages: "93-99"
  booktitle: "semco"
  kind: "inproceedings"
  key: "WuL18-9"
- title: "On the perceptual relevance of objective source separation measures for singing voice separation"
  author:
  - name: "Udit Gupta"
    link: "https://researchr.org/alias/udit-gupta"
  - name: "Elliot Moore II"
    link: "https://researchr.org/alias/elliot-moore-ii"
  - name: "Alexander Lerch"
    link: "https://www.AudioContentAnalysis.org"
  year: "2015"
  doi: "http://dx.doi.org/10.1109/WASPAA.2015.7336923"
  links:
    doi: "http://dx.doi.org/10.1109/WASPAA.2015.7336923"
    dblp: "http://dblp.uni-trier.de/rec/bibtex/conf/waspaa/GuptaML15"
  researchr: "https://researchr.org/publication/GuptaML15"
  cites: 0
  citedby: 0
  pages: "1-5"
  booktitle: "waspaa"
  kind: "inproceedings"
  key: "GuptaML15"
- title: "Genre-specific Key Profiles"
  author:
  - name: "Cian O'Brien"
    link: "https://researchr.org/alias/cian-o%27brien"
  - name: "Alexander Lerch"
    link: "https://www.AudioContentAnalysis.org"
  year: "2015"
  doi: "http://hdl.handle.net/2027/spo.bbp2372.2015.012"
  links:
    doi: "http://hdl.handle.net/2027/spo.bbp2372.2015.012"
    dblp: "http://dblp.uni-trier.de/rec/bibtex/conf/icmc/OBrienL15"
  researchr: "https://researchr.org/publication/OBrienL15"
  cites: 0
  citedby: 0
  booktitle: "icmc"
  kind: "inproceedings"
  key: "OBrienL15"
- title: "The Relation Between Music Technology and Music Industry"
  author:
  - name: "Alexander Lerch"
    link: "https://www.AudioContentAnalysis.org"
  year: "2018"
  doi: "10.1007/978-3-662-55004-5_44"
  abstract: "The music industry has changed drastically over the last century and most of its changes and transformations have been technology-driven. Music technology ? encompassing musical instruments, sound generators, studio equipment and software, perceptual audio coding algorithms, and reproduction software and devices ? has shaped the way music is produced, performed, distributed, and consumed. The evolution of music technology enabled studios and hobbyist producers to produce music at a technical quality unthinkable decades ago and have affordable access to new effects as well as production techniques. Artists explore nontraditional ways of sound generation and sound modification to create previously unheard effects, soundscapes, or even to conceive new musical styles. The consumer has immediate access to a vast diversity of songs and styles and is able to listen to individualized playlists virtually everywhere and at any time. The most disruptive technological innovations during the past 130 years have probably been:1. The possibility to record and distribute recordings on a large scale through the gramophone. 2. The introduction of vinyl disks enabling high-quality sound reproduction. 3. The compact cassette enabling individualized playlists, music sharing with friends and mobile listening. 4. Digital audio technology enabling high quality professional-grade studio equipment at low prices. 5. Perceptual audio coding in combination with online distribution, streaming, and file sharing. This text will describe these technological innovations and their impact on artists, engineers, and listeners."
  links:
    "url": "https://link.springer.com/chapter/10.1007/978-3-662-55004-5_44"
  researchr: "https://researchr.org/publication/lerchrelation2018"
  cites: 0
  citedby: 0
  pages: "899-909"
  booktitle: "Springer Handbook of Systematic Musicology"
  series: "Springer Handbooks"
  publisher: "Springer, Berlin, Heidelberg"
  isbn: "978-3-662-55002-1 978-3-662-55004-5"
  kind: "incollection"
  key: "lerchrelation2018"
- title: "Proceedings of the 2nd Web Audio Conference ({WAC}-2016)"
  year: "2016"
  links:
    "url": "https://smartech.gatech.edu/handle/1853/54577"
  researchr: "https://researchr.org/publication/freemanproceedings2016"
  cites: 0
  citedby: 0
  editor:
  - name: "Freeman, Jason"
    link: "https://researchr.org/alias/freeman%2C-jason"
  - name: "Alexander Lerch"
    link: "https://www.AudioContentAnalysis.org"
  - name: "Paradis, Matthew"
    link: "https://researchr.org/alias/paradis%2C-matthew"
  address: "Atlanta"
  publisher: "Georgia Institute of Technology"
  isbn: "978-0-692-61973-5"
  kind: "book"
  key: "freemanproceedings2016"