Dates: from May 28 to May 31, 2019
Place: Málaga, Spain
Proceedings info: Proceedings of the 16th Sound & Music Computing Conference, ISBN 978-84-09-08518-7
Abstract
n this paper we describe the ongoing research on the devel- opment of a body movement sonification system. High precision, high resolution wireless sensors are used to track the body movement and record muscle excitation. We are currently using 6 sensors. In the final version of the system full body tracking can be achieved. The recording system provides a web server including a simple REST API, which streams the recorded data in JSON format. An inter- mediate proxy server pre-processes the data and transmits it to the final sonification system. The sonification system is implemented using the web au- dio api. We are experimenting with a set of different soni- fication strategies and algorithms. Currently we are testing the system as part of an interactive, guided therapy, estab- lishing additional acoustic feedback channels for the patient. In a second stage of the research we are going to use the sys- tem in a more musical and artistic way. More specifically we plan to use the system in cooperation with a violist, where the acoustic feedback channel will be integrated into the performance
Keywords
body movement, music and therpy, real time interactive machine learning, sonification, wekinator
Paper topics
Auditory display and data sonification, Automatic music generation/accompaniment systems, Interaction in music performance, Interfaces for sound and music, Sonic interaction design, Sound and music for accessibility and special needs
Easychair keyphrases
body movement [17], arm movement [12], sonification system [10], body movement data [9], bio feedback [6], body movement sonification [6], musical performance [6], sonification strategy [5], central chord [4], web audio api [4]
Paper type
Position paper / Poster
DOI: 10.5281/zenodo.3249449
Zenodo URL: https://zenodo.org/record/3249449
Abstract
The need for loudness compensation is a well known fact arising from the nonlinear behavior of human sound per- ception. Music and sound are mixed and mastered at a certain loudness level, usually louder than the level at which they are commonly played. This implies a change in the perceived spectral balance of the sound, which is largest in the low-frequency range. As the volume setting in music playing is decreased, a loudness compensation filter can be used to boost the bass appropriately, so that the low frequencies are still heard well and the perceived spectral balance is preserved. The present paper proposes a loudness compensation function derived from the standard equal-loudness level contours and its implementation via a digital first-order shelving filter. Results of a formal listening test validate the accuracy of the proposed method.
Keywords
Audio, Digital filters, DSP, Equalization, Listening test
Paper topics
Digital audio effects, Perception and cognition of sound and music, Sound/music signal processing algorithms
Easychair keyphrases
trace guide [15], shelving filter [14], first order shelving filter [12], listening level [12], first order [11], first order low shelving filter [11], loudness compensation [10], digital filter [9], perceived spectral balance [9], crossover frequency [8], listening test [8], spectral balance [7], audio eng [6], equal loudness level contour [6], magnitude response [6], aalto acoustics lab [4], box plot [4], compensation method [4], first order filter [4], fractional order filter [4], iir filter [4], loudness level [4], order filter [4], pure tone [4], sensitivity function [4], sound pressure level [4]
Paper type
Full paper
DOI: 10.5281/zenodo.3249289
Zenodo URL: https://zenodo.org/record/3249289
Abstract
In actual piano practice, people of different skill levels exhibit different behaviors, for instance leaping forward or to an upper staff, mis-keying, repeating, and so on. However, many of the conventional score following systems hardly adapt such accidental behaviors depending on individual skill level, because conventional systems usually learn the frequent or general behaviors. We develop a score-following system that can adapt a users individuality by combining keying information with gaze, because it is well-known that the gaze is a highly reliable means of expressing a performers thinking. Since it is difficult to collect a large amount of piano performance data reflecting individuality, we employ the framework of the Bayesian inference to adapt individuality. That is, to estimate the users current position in piano performance, keying and gaze information are integrated into a single Bayesian inference by Gaussian mixture model (GMM). Here, we assume both the keying and gaze information conform to normal distributions. Experimental results show that, taking into account the gaze information, our score-following system can properly cope with repetition and leaping to an upper row of a staff, in particular.
Keywords
Bayesian inference, Gaussian mixture model, gaze information, score-following
Paper topics
Multimodality in sound and music computing, Music performance analysis and rendering
Easychair keyphrases
gaze information [19], keying information [17], score following system [15], score following [12], bayesian inference [11], user current position [9], gaze point [8], mixture rate [8], normal distribution [8], user individuality [8], mixture ratio [7], note number [7], accuracy rate [6], piano performance [6], gaze distribution [5], keying distribution [5], random variable [5], set piece [5], eye hand span [4], eye movement [4], future university hakodate [4], keying position [4]
Paper type
Full paper
DOI: 10.5281/zenodo.3249400
Zenodo URL: https://zenodo.org/record/3249400
Abstract
One of the biggest challenges in learning how to play a musical instrument is learning how to move one's body in a nuanced physicality. Technology can expand available forms of physical interactions to help cue specific movements and postures to reinforce new sensorimotor couplings and enhance motor learning and performance. Using audiovisual first person perspective taking with a piano teacher in Mixed Reality, we present a system that allows students to place their hands into the virtual gloves of a teacher. Motor learning and audio-motor associations are reinforced through motion feedback and spatialized audio. The Augmented Design to Embody a Piano Teacher (ADEPT) application is an early design prototype of this piano training system.
Keywords
Augmented Reality, Mixed Reality, Pedagogy, Piano, Technology-Enhanced Learning
Paper topics
Interactive performance systems, Interfaces for sound and music, Multimodality in sound and music computing, Sound and music for Augmented/Virtual Reality and games
Easychair keyphrases
piano teacher [19], adept system [18], first person perspective [14], piano playing [9], piano training application [7], virtual reality [7], augmented reality [6], embodied music cognition [6], expert pianist [6], motion feedback [6], perspective taking [6], piano student [6], real time feedback [6], teacher hand [6], virtual embodiment [6], augmented embodiment [5], embodied perspective [5], piano instruction [5], piano training [5], real time [5], visual cue [5], visual environment [5], audio spatialization [4], magic leap headset [4], mixed reality [4], motion capture [4], piano performance [4], real time motion feedback [4], user physical piano [4], virtual hand [4]
Paper type
Position paper / Poster
DOI: 10.5281/zenodo.3249333
Zenodo URL: https://zenodo.org/record/3249333
Abstract
Fundamental frequency (f0) modeling is an important but relatively unexplored aspect of choir singing. Performance evaluation as well as auditory analysis of singing, whether individually or in a choir, often depend on extracting f0 contours for the singing voice. However, due to the large number of singers, singing at a similar range of frequencies, extracting the exact pitch contour is a challenging task. In this paper, we present a methodology for modeling the pitch contours of an SATB choir. A typical SATB choir consists of four parts, each covering a distinct range of fundamental frequencies and often with multiple singers each. We first evaluate some state-of-the-art multi-f0 estimation systems for the particular case of choirs with one singer per part, and observe that the pitch of the individual singers can be estimated to a relatively high degree of accuracy. We observe, however, that the scenario of multiple singers for each choir part is far more challenging. In this work we combine a multi-f0 estimation methodology based on deep learning followed by a set of traditional DSP techniques to model the f0 dispersion for each choir part instead of a single f0 trajectory. We present and discuss our observations and test our framework with different configurations of singers.
Keywords
choral singing, multi-pitch, pitch modeling, singing voice, unison
Paper topics
Models for sound analysis and synthesis, Music information retrieval, Music performance analysis and rendering
Easychair keyphrases
multi f0 estimation [42], choir section [16], f0 estimation system [9], vocal quartet [8], f0 estimation algorithm [7], ground truth [7], multiple singer [7], traditional dsp technique [6], unison ensemble singing [6], choir recording [5], dispersion value [5], spectral peak [5], choral singing [4], deep learning [4], fundamental frequency [4], individual track [4], interpolated peak [4], locus iste [4], multi f0 extraction [4], multiple fundamental frequency estimation [4], music information retrieval [4], non peak region [4], pitch salience [4], satb quartet [4], singing voice [4], standard deviation [4], unison performance [4], unison singing [4], universitat pompeu fabra [4], vocal ensemble [4]
Paper type
Position paper / Poster
DOI: 10.5281/zenodo.3249421
Zenodo URL: https://zenodo.org/record/3249421
Abstract
This paper presents a framework that supports the development and evaluation of graphical interpolated parameter mapping for the purpose of sound design. These systems present the user with a graphical pane, usually two-dimensional, where synthesizer presets can be located. Moving an interpolation point cursor within the pane will then create new sounds by calculating new parameter values, based on the cursor position and the interpolation model used. The exploratory nature of these systems lends itself to sound design applications, which also have a highly exploratory character. However, populating the interpolation space with “known” preset sounds allows the parameter space to be constrained, reducing the design complexity otherwise associated with synthesizer-based sound design. An analysis of previous graphical interpolators in presented and from this a framework is formalized and tested to show its suitability for the evaluation of such systems. The framework has then been used to compare the functionality of a number of systems that have been previously implemented. This has led to a better understanding of the different sonic outputs that each can produce and highlighted areas for further investigation.
Keywords
Interface, Interpolation, Sound Design, Synthesis, Visual
Paper topics
Digital audio effects, Interfaces for sound and music, New interfaces for interactive music creation
Easychair keyphrases
interpolation space [35], interpolation point [29], sonic output [18], synthesis engine [18], graphical interpolation system [17], sound design [17], interpolation system [14], interpolation function [12], parameter mapping [12], graphical interpolator [11], interpolation model [11], visual representation [11], synthesis parameter [10], visual model [9], graphical interpolation [8], preset location [8], real time [7], preset point [6], sound design application [6], gaussian kernel [5], parameter preset [5], parameter space [5], audio processing [4], cursor position [4], dimensional graphical [4], gravitational model [4], interpolation method [4], node based interpolator [4], synthesizer parameter [4], visual interpolation model [4]
Paper type
Full paper
DOI: 10.5281/zenodo.3249366
Zenodo URL: https://zenodo.org/record/3249366
Abstract
The objective of this project is to create a digital “work-bench” for quantitative analysis of popular music. The workbench is a collection of tools and data that allow for efficient and effective analysis of popular music. This project integrates software from pre-existing analytical tools including music21 but adds methods for collecting data about popular music. The workbench includes tools that allow analysts to compare data from multiple sources. Our working prototype of the workbench con-tains several novel analytical tools which have the poten-tial to generate new musicological insights through the combination of various datasets. This paper demonstrates some of the currently available tools as well as several sample analyses and features computed from this data that support trend analysis. A future release of the work-bench will include a user-friendly UI for non-programmers.
Keywords
Music data mining, Music metadata, popular music analysis
Paper topics
Computational musicology and ethnomusicology, Models for sound analysis and synthesis, Music information retrieval
Easychair keyphrases
popular music [27], music information retrieval [12], chord transcription [11], existing tool [10], th international society [9], billboard hot [8], chord detection [8], multiple source [8], pitch vector [8], popular song [7], symbolic data [7], musical analysis [6], ultimate guitar [6], chord estimation [5], chord recognition [5], ground truth [5], harmonic analysis [5], musical data [5], audio feature [4], data collection [4], mcgill billboard dataset [4], midi transcription [4], music scholar [4], pitch class profile [4], popular music scholar [4], programming skill [4], spotify data [4], spotify database [4], symbolic metadata [4], timbre vector [4]
Paper type
Full paper
DOI: 10.5281/zenodo.3249402
Zenodo URL: https://zenodo.org/record/3249402
Abstract
This paper focuses on predictive processing of chords in Ludwig van Beethoven's string quartets. A dataset consisting of harmonic analyses of all Beethoven string quartets was used to evaluate an n-gram language model as well as a recurrent neural network (RNN) architecture based on long-short-term memory (LSTM). Through assessing model performance results, this paper studies the evolution and variability of Beethoven's harmonic choices in different periods of his activity as well as the flexibility of predictive models in acquiring basic patterns and rules of tonal harmony.
Keywords
chord prediction, harmony, lstm, musical expectancy, neural networks, ngram models, predictive processing
Paper topics
Computational musicology and ethnomusicology, Perception and cognition of sound and music
Easychair keyphrases
n gram model [14], beethoven string quartet [7], recurrent neural network [7], chord symbol [6], long term dependency [6], scale degree [5], annotated beethoven corpus [4], best performing n gram [4], cognitive science [4], cross validation [4], full roman numeral representation [4], gram language model [4], neural network [4], n gram language [4], optimal n gram length [4], predictive processing [4], short term memory [4]
Paper type
Position paper / Poster
DOI: 10.5281/zenodo.3249335
Zenodo URL: https://zenodo.org/record/3249335
Abstract
In this paper we study tahrir, a melismatic vocal ornamentation which is an essential characteristic of Persian classical music and can be compared to yodeling. It is considered the most important technique through which the vocalist can display his/her prowess. In Persian, nightingale’s song is used as a metaphor for tahrir and sometimes for a specific type of tahrir. Here we examine tahrir through a case study. We have chosen two prominent singers of Persian classical music one contemporary and one from the twentieth century. In our analysis we have appropriated both audio recordings and notation. This paper is the first step towards computational modeling and recognition of different types of tahrirs. Here we have studied two types of tahrirs, mainly nashib and farāz, and their combination through three different performance samples by two prominent vocalists. More than twenty types of tahrirs have been identified by Iranian musicians and music theorists. We hope to develop a method to computationally identify these models in our future works.
Keywords
Iranian classical music, Iranian traditional music, radif, tahrir, vocal ornamentation
Paper topics
Computational musicology and ethnomusicology, Music information retrieval
Easychair keyphrases
persian classical music [14], main note [7], twentieth century [7], vocal radif [7], iranian music [6], traditional music [5], dar amad [4], persian music [4]
Paper type
Position paper / Poster
DOI: 10.5281/zenodo.3249414
Zenodo URL: https://zenodo.org/record/3249414
Abstract
Rhythm-based auditory cues have been shown to significantly improve walking performance in patients with numerous neurological conditions. This paper presents the design, implementation and evaluation of a gait training device capable of real-time synthesis and automated manipulation of rhythmic musical stimuli, as well as auditory feedback based on measured walking parameters. The proof-of-concept was evaluated with six healthy participants, as well as through critical review by one neurorehabilitation specialist. Stylistically, the synthesized music was found by participants to be conducive to movement, but not uniformly enjoyable. The gait capture/feedback mechanisms functioned as intended, although discrepancies between measured and reference gait parameter values may necessitate a more robust measurement system. The specialist acknowledged the potential of the gait measurement and auditory feedback as novel rehabilitation aids, but stressed the need for additional gait measurements, superior feedback responsiveness and greater functional versatility in order to cater to individual patient needs. Further research must address these findings, and tests must be conducted on real patients to ascertain the utility of such a device in the field of neurorehabilitation.
Keywords
Gait Rehabilitation, real-time synthesis, Rhythmic Auditory Stimulation, Sonification
Paper topics
Auditory display and data sonification, Automatic music generation/accompaniment systems, Sound/music and the neurosciences
Easychair keyphrases
gait parameter [17], gait measurement [11], real time [8], short term [8], auditory stimulus [7], im gait mate [7], swing time [7], auditory feedback [6], gait performance [6], long term [6], master clock [6], neurological condition [6], rhythmic auditory stimulation [6], stimulus generation subsystem [6], white noise white noise [6], subtractive subtractive [5], time variability [5], analytical subsystem [4], gait training [4], interactive music [4], measured gait parameter [4], neurorehabilitation specialist [4], real time auditory feedback [4], reference value [4], stance time [4], synthesis method [4], target group [4], temporal gait parameter [4]
Paper type
Full paper
DOI: 10.5281/zenodo.3249297
Zenodo URL: https://zenodo.org/record/3249297
Abstract
Imaging when reading sheet music on computing devices, users could listen audio synchronizing with the sheet. To this end, the sheet music must be acquired, analyzed and transformed into digitized information of melody, rhythm, duration, chord, expressiveness and physical location of scores. As we know, the optical music recognition (OMR) is an appropriate technology to approach the purpose. However, the commercial OMR system of numbered music notation is not available as best as our knowledge. In the paper, we demonstrate our proprietary OMR system and show three human-interactive applications: sheet music browser and multimodal accompanying and games for sight-reading of sheet music. With the illustration, we hope to foster the usage and obtain the valuable opinions of the OMR system and the applications.
Keywords
musical game, numbered music notation, optical music recognition, sight-reading for sheet music, singing accompaniment
Paper topics
not available
Easychair keyphrases
sheet music [17], omr system [7], sheet music browser [7], numbered music notation [6], multimedia application [4], optical music recognition [4]
Paper type
Demo
DOI: 10.5281/zenodo.3249272
Zenodo URL: https://zenodo.org/record/3249272
Abstract
Sequencers almost exclusively share the trait of a single master clock. Each track is laid out on an isochronously spaced sequence of beat positions. Vertically aligned positions are expected to be in synchrony as all tracks refer to the same clock. In this work we present an experimental implementation of a decoupled sequencer with different underlying clocks. Each track is sequenced by the peaks of a designated oscillator. These oscillators are connected in a network and influence each other's periodicities. A familiar grid-type graphical user interface is used to place notes on beat positions of each of the interdependent but asynchronous tracks. Each track clock can be looped and node points specify the synchronisation of multiple tracks by tying together specific beat positions. This setup enables simple global control of microtiming and polyrhythmic patterns.
Keywords
Microtiming, Oscillator, Polyrhythm, Sequencer
Paper topics
not available
Easychair keyphrases
beat position [6], node point [4]
Paper type
Demo
DOI: 10.5281/zenodo.3249268
Zenodo URL: https://zenodo.org/record/3249268
Abstract
Perceived arousal, valence, and effort were measured continuously from auditory, visual, and audiovisual cues in contemporary cello performance. Effort (perceived exertion of the performer) was added for two motivations: to investigate its potential as a measure and its association with arousal in audiovisual perception. Fifty-two subjects participated in the experiment. Results were analyzed using Activity Analysis and functional data analysis. Arousal and effort were perceived with significant coordination between participants from auditory, visual, as well as audiovisual cues. Significant differences were detected between auditory and visual channels but not between arousal and effort. Valence, in contrast, showed no significant coordination between participants. Relative importance of the visual channel is discussed.
Keywords
Audiovisual, Contemporary music, Multimodality, Music perception, Real-time perception
Paper topics
Multimodality in sound and music computing, Perception and cognition of sound and music
Easychair keyphrases
audiovisual rating [13], visual channel [11], visual cue [11], activity analysis [10], activity level [9], audiovisual perception [9], auditory rating [9], factor combination [9], visual rating [9], differenced rating [6], music performance [6], significant coordination [6], valence rating [6], auditory channel [5], auditory cue [5], functional analysis [5], perceived arousal [5], rating increase [5], audiovisual cue [4], auditory modality [4], functional data analysis [4], intensity change [4], mean valence level [4], musical performance [4], screen size [4], significant bi coordination [4], visual condition [4], visual perception [4]
Paper type
Full paper
DOI: 10.5281/zenodo.3249453
Zenodo URL: https://zenodo.org/record/3249453
Abstract
This study investigates the use of non-linear unsupervised dimensionality reduction techniques to compress a music dataset into a low-dimensional representation, which can be used in turn for the synthesis of new sounds. We systematically compare (shallow) autoencoders (AEs), deep autoencoders (DAEs), recurrent autoencoders (with Long Short-Term Memory cells -- LSTM-AEs) and variational autoencoders (VAEs) with principal component analysis (PCA) for representing the high-resolution short-term magnitude spectrum of a large and dense dataset of music notes into a lower-dimensional vector (and then convert it back to a magnitude spectrum used for sound resynthesis). Our experiments were conducted on the publicly available multi-instrument and multi-pitch database NSynth. Interestingly and contrary to the recent literature on image processing, they showed that PCA systematically outperforms shallow AE. Only deep and recurrent architectures (DAEs and LSTM-AEs) lead to a lower reconstruction error. The optimization criterion in VAEs being the sum of the reconstruction error and a regularization term, it naturally leads to a lower reconstruction accuracy than DAEs but we show that VAEs are still able to outperform PCA while providing a low-dimensional latent space with nice ``usability'' properties. We also provide corresponding objective measures of perceptual audio quality (PEMO-Q scores), which generally correlate well with the reconstruction error.
Keywords
autoencoder, music sound modeling, unsupervised dimension reduction
Paper topics
Models for sound analysis and synthesis, Sound/music signal processing algorithms
Easychair keyphrases
latent space [26], latent dimension [12], latent coefficient [9], reconstruction accuracy [9], variational autoencoder [9], reconstruction error [8], latent vector [7], control parameter [6], latent space dimension [6], layer wise training [6], machine learning [6], magnitude spectra [6], music sound [6], pemo q score [6], principal component analysis [6], neural network [5], phase spectra [5], preprint arxiv [5], signal reconstruction [5], audio synthesis [4], decoded magnitude spectra [4], dimensionality reduction technique [4], encoding dimension [4], good reconstruction accuracy [4], latent representation [4], latent variable [4], low dimensional latent space [4], neural information process [4], pemo q measure [4], sound synthesis [4]
Paper type
Position paper / Poster
DOI: 10.5281/zenodo.3249404
Zenodo URL: https://zenodo.org/record/3249404
Abstract
In this work, we demonstrate the market-readiness of a recently published state-of-the-art chord recognition method, where automatic chord recognition is extended beyond major and minor chords to the extraction of seventh chords. To do so, the proposed chord recognition method was integrated in the Songs2See Editor, which already includes the automatic extraction of the main melody, bass line, beat grid, key, and chords for any musical recording.
Keywords
chord recognition, gamification, music education, music information retrieval
Paper topics
not available
Easychair keyphrases
songs2see editor [5], automatic chord recognition [4], score sheet [4]
Paper type
Demo
DOI: 10.5281/zenodo.3249362
Zenodo URL: https://zenodo.org/record/3249362
Abstract
In this work, we study and evaluate different computational methods to carry out a "modal harmonic analysis" for Jazz improvisation performances by modeling the concept of \textit{chord-scales}. The Chord-Scale Theory is a theoretical concept that explains the relationship between the harmonic context of a musical piece and possible scale types to be used for improvisation. This work proposes different computational approaches for the recognition of the chord-scale type in an improvised phrase given the harmonic context. We have curated a dataset to evaluate different chord-scale recognition approaches proposed in this study, where the dataset consists of around 40 minutes of improvised monophonic Jazz solo performances. The dataset is made publicly available and shared on \textit{freesound.org}. To achieve the task of chord-scale type recognition, we propose one rule-based, one probabilistic and one supervised learning method. All proposed methods use Harmonic Pitch Class Profile (HPCP) features for classification. We observed an increase in the classification score when learned chord-scale models are filtered with predefined scale templates indicating that incorporating prior domain knowledge to learned models is beneficial. This study has its novelty for demonstrating one of first computational analysis on chord-scales in the context of Jazz improvisation.
Keywords
audio signal processing, computational musicology, machine learning, music information retrieval
Paper topics
Automatic separation, classification of sound and music, Computational musicology and ethnomusicology, Music information retrieval, Music performance analysis and rendering, recognition, Sound/music signal processing algorithms
Easychair keyphrases
chord scale [65], pitch class [33], chord scale type [25], music information retrieval [17], chord scale recognition [14], harmonic pitch class profile [12], scale type [12], jazz improvisation [10], chord scale model [9], chord recognition [8], audio signal [7], binary template [7], chord scale theory [7], gaussian mixture model [7], automatic chord scale recognition [6], binary chord scale template [6], chroma feature [6], frame level hpcp [6], international society [6], pitch class profile [6], predefined binary chord scale [6], scale pitch class [6], feature vector [5], scale recognition [5], standard deviation [5], binary template matching [4], level hpcp vector [4], pitch class distribution [4], predefined binary template [4], scale recognition method [4]
Paper type
Position paper / Poster
DOI: 10.5281/zenodo.3249258
Zenodo URL: https://zenodo.org/record/3249258
Abstract
This work aims at bridging the gap between two completely distinct research fields: digital telecommunications and Music Information Retrieval. While works in the MIR community have long used algorithms borrowed from voice signal processing, text recognition or image processing, to our knowledge no work based on digital telecommunications algorithms has been produced. This paper specifically targets the use of the Belief Propagation algorithm for the task of Automatic Chord Estimation. This algorithm is of widespread use in iterative decoders for error correcting codes and we show that it offers improved performances in ACE. It certainly represents a promising alternative to the Hidden Markov Models approach.
Keywords
Automatic Chord Detection, Belief Propagation Algorithm, General Belief Propagation, Hidden Markov Model, Music Information Retrieval
Paper topics
Music information retrieval
Easychair keyphrases
ground truth [14], transition matrix [11], automatic chord estimation [9], belief propagation [9], bayesian graph [7], long term [7], audio signal [6], belief propagation algorithm [6], chord estimation [6], self transition [6], transition probability [6], hidden state [5], chord progression [4], computation time [4], deep learning [4], fifth transition matrix [4], graphical model [4], ground truth beat [4], inference process [4], long term correlation [4], minor chord [4], observation probability [4], pattern matching [4], short cycle [4]
Paper type
Full paper
DOI: 10.5281/zenodo.3249467
Zenodo URL: https://zenodo.org/record/3249467
Abstract
Reaction times (RTs) are an important source of information in experimental psychology and EEG data analysis. While simple auditory RT has been widely studied, response time when discriminating between two different auditory stimuli have not been determined yet. The purpose of this experiment is to measure the RT for the discrimination between two different auditory stimuli: speech and instrumental music.
Keywords
Auditory stimuli, Distinguish voice-music, Reaction time
Paper topics
not available
Easychair keyphrases
reaction time [7], speech excerpt [5]
Paper type
Demo
DOI: 10.5281/zenodo.3249274
Zenodo URL: https://zenodo.org/record/3249274
Abstract
This paper describes a novel framework for real-time sonification of surface textures in virtual reality (VR), aimed towards realistically representing the experience of driving over a virtual surface. A combination of capturing techniques of real-world surfaces are used for mapping 3D geometry, texture maps or auditory attributes (aural and vibrotactile) feedback. For the sonification rendering, we propose the use of information from primarily graphical texture features, to define target units in concatenative sound synthesis. To foster models that go beyond current generation of simple sound textures (e.g., wind, rain, fire), towards highly “synchronized” and expressive scenarios, our contribution draws a framework for higher-level modeling of a bicycle's kinematic rolling on ground contact, with enhanced perceptual symbiosis between auditory, visual and vibrotactile stimuli. We scanned two surfaces represented as texture maps, consisting of different features, morphology and matching navigation. We define target trajectories in a 2-dimensional audio feature space, according to a temporal model and morphological attributes of the surfaces. This synthesis method serves two purposes: a real-time auditory feedback, and vibrotactile feedback induced through playing back the concatenated sound samples using a vibrotactile inducer speaker.% For this purpose, a Virtual Environment was created including four surfaces variation and consisting on a bicycle ride allowing to test the proposed architecture for real time adaptation and adequate haptic feedback.
Keywords
Sonic Interaction Design, Sonification, Sound Synthesis, Virtual Reality
Paper topics
Auditory display and data sonification, Interactive performance systems, Models for sound analysis and synthesis, Multimodality in sound and music computing, Sonic interaction design, Sound and music for Augmented/Virtual Reality and games
Easychair keyphrases
vibrotactile feedback [12], concatenative sound synthesis [11], displacement map [10], descriptor space [9], dirt road [9], capture technique [7], real time [7], haptic feedback [6], sound texture [6], surface texture [6], texture map [6], aural feedback [5], feature vector [5], virtual environment [5], virtual reality [5], aalborg university [4], audio capture technique [4], audio stream [4], concatenative sound synthesis engine [4], first order autocorrelation coefficient [4], rubber hand [4], sensory feedback [4], sound synthesis [4], target definition [4]
Paper type
Full paper
DOI: 10.5281/zenodo.3249378
Zenodo URL: https://zenodo.org/record/3249378
Abstract
Augmented mobile instruments combine digitally-fabricated elements, sensors, and smartphones to create novel musical instruments. Communication between the sensors and the smartphone can be challenging as there doesn’t exist a universal lightweight way to connect external elements to this type of device. In this paper, we investigate the use of two techniques to transmit sensor data through the built-in audio jack input of a smartphone: digital data transmission using the Bell 202 signaling technique, and analog signal transmission using digital amplitude modulation and demodulation with Goertzel filters. We also introduce tools to implement such systems using the Faust programming language and the Teensy development board.
Keywords
Data Transmission Standards, Faust, Microcontrollers, Mobile Music, Sensors
Paper topics
and software environments for sound and music computing, Hardware systems for sound and music computing, Interfaces for sound and music, Languages, protocols
Easychair keyphrases
uart tx encoder [15], audio jack [10], sensor data [8], signaling technique [8], goertzel filter [7], digital amplitude modulation [6], channel number [5], data transmission [5], faust program [5], musical instrument [5], parallel stream [5], analog audio sensor data [4], audio jack output speaker [4], audio sensor data transmission [4], audio signal [4], augmented mobile instrument [4], digital signal processing [4], faust generated block diagram [4], faust programming language [4], output audio signal [4], pin audio jack [4], sound synthesis [4]
Paper type
Full paper
DOI: 10.5281/zenodo.3249311
Zenodo URL: https://zenodo.org/record/3249311
Abstract
This paper describes a tool for gesture-based control of sound spatialization in Augmented and Virtual Reality (AR and VR). While the increased precision and availability of sensors of any kind has made possible, in the last twenty years, the development of a considerable number of interfaces for sound spatialization control through gesture, the combination with VR and AR has not been fully explored yet. Such technologies provide an unprecedented level of interaction, immersivity and ease of use, by letting the user visualize and modify position, trajectory and behaviour of sound sources in 3D space. Like VR/AR painting programs, the application allows to draw lines that have the function of 3D automations for spatial motion. The system also stores information about movement speed and directionality of the sound source. Additionally, other parameters can be controlled from a virtual menu. The possibility to alternate AR and VR allows to switch between different environment (the actual space where the system is located or a virtual one). Virtual places can also be connected to different room parameters inside the spatialization algorithm.
Keywords
Augmented Reality, Spatial Audio, Spatialisation Instrument, Virtual Reality
Paper topics
and virtual acoustics, New interfaces for interactive music creation, reverberation, Sound and music for Augmented/Virtual Reality and games, Spatial sound
Easychair keyphrases
sound source [32], sound spatialization [12], virtual object [7], spatialization algorithm [6], real time [5], cast shadow [4], digital musical instrument [4], sound source position [4], virtual object position [4]
Paper type
Position paper / Poster
DOI: 10.5281/zenodo.3249329
Zenodo URL: https://zenodo.org/record/3249329
Abstract
This paper presents and discusses the Compose With Sounds (CwS) Digital Audio Workstation (DAW) and its approach to sequencing musical materials. The system is designed to facilitate the composition within the realm of Sound-based music wherein sound objects (real or synthesised) are the main musical unit of construction over traditional musical notes. Unlike traditional DAW’s or graphical audio pro- gramming environments (such as Pure Data, MAX MSP etc.) that are based around interactions with sonic ma- terials within tracks or audio graphs, the implementation presented here is based solely around sound objects. To achieve this a bespoke cross-platform audio engine known FSOM (Free Sound Object Mixer) was created in C++. To enhance the learning experience, imagery, dynamic 3D animations and models are used to allow for efficient ex- ploration and learning. To support the educational focus of the system further, the metaphor of a sound card is used over the term sound object. Collections of cards can sub- sequently be imported/exported to and from the software package. When applying audio transformations on cards, interactive 3D graphics are used to illustrate the transfor- mation in real time based on their current settings. Audio transformations and tools within the system all hook into a flexible permissions system that allows users or workshop leaders to create template sessions with features enabled or disabled based on the theme or objective of the usage. The system is part of a suite of pedagogical tools for the cre- ation of experimental electronic music. A version for live performance is currently in development, as is the ability to utilise video within the system.
Keywords
digital audio workstation design, new interfaces for music creation, object-oriented composition, pedagogy
Paper topics
Algorithms and Systems for music composition, Interfaces for sound and music, Music creation and performance, New interfaces for interactive music creation, Sound/music signal processing algorithms
Easychair keyphrases
free sound object mixer [6], software package [5], audio engine [4], electroacoustic resource site [4], granular synthesis [4], interface project [4], sonic postcard [4]
Paper type
Full paper
DOI: 10.5281/zenodo.3249368
Zenodo URL: https://zenodo.org/record/3249368
Abstract
It has been developed an interactive application that allows sonify human voice and visualize a graphic interface in relation to the sounds produced. This program has been developed in MAX MSP, and it takes the spoken voice signal, and from its treatment, it allows to generate an automatic and tonal musical composition.
Keywords
acoustics, automatic composition, music, sonification, sonifiying voice, tonal music, voice
Paper topics
not available
Easychair keyphrases
real time [6], tonal musical sequence [4], voice signal [4]
Paper type
Demo
DOI: 10.5281/zenodo.3249354
Zenodo URL: https://zenodo.org/record/3249354
Abstract
A Recurrent Neural Network (RNN) is trained to predict sound samples based on audio input augmented by con-trol parameter information for pitch, volume, and instrument identification. During the generative phase follow-ing training, audio input is taken from the output of the previous time step, and the parameters are externally con-trolled allowing the network to be played as a musical instrument. Building on an architecture developed in previous work, we focus on the learning and synthesis of transients – the temporal response of the network during the short time (tens of milliseconds) following the onset and offset of a control signal. We find that the network learns the particular transient characteristics of two different synthetic instruments, and furthermore shows some ability to interpolate between the characteristics of the instruments used in training in response to novel parameter settings. We also study the behavior of the units in hidden layers of the RNN using various visualization techniques and find a variety of volume-specific response characteristics.
Keywords
analaysis/synthesis, audio synthesis, deep learning, musical instrument modeling
Paper topics
Content processing of music audio signals, Interaction in music performance, Models for sound analysis and synthesis, Sonic interaction design
Easychair keyphrases
steady state [10], output signal [6], recurrent neural network [6], decay transient [5], hidden layer [5], hidden unit [4], hidden unit response [4], musical instrument [4], sudden change [4], synthetic instrument [4]
Paper type
Full paper
DOI: 10.5281/zenodo.3249457
Zenodo URL: https://zenodo.org/record/3249457
Abstract
A blindfolded instructor (evaluator) plays a clave pattern. A computer captures and repeats the pattern, after 1 minute the experiment stops. This processes is repeated by a human who also tries to copy the clave. After another minute they stop and the evaluator assess both performances.
Keywords
clave, Interaction, Machine listening
Paper topics
not available
Easychair keyphrases
not available
Paper type
Demo
DOI: 10.5281/zenodo.3249441
Zenodo URL: https://zenodo.org/record/3249441
Abstract
The link between musicians and dancers is generally described as strong in many traditional musics and this holds also for Scandinavian Folk Music - spelmansmusik. Understanding the interaction of music and dance has potential for developing theories of performance strategies in artistic practice and for developing interactive systems. In this paper we investigate this link by having Swedish folk musicians perform to animations generated from motion capture recordings of dancers. The different stimuli focus on motions of selected body parts as moving white dots on a computer screen with the aim to understand how different movements can provide reliable cues for musicians. Sound recordings of fiddlers playing to the "dancing dot" were analyzed using automatic alignment to the original music performance related to the dance recordings. Interviews were conducted with musicians and comments were collected in order to shed light on strategies when playing for dancing. Results illustrate a reliable alignment to renderings showing full skeletons of dancers, and an advantage of focused displays of movements in the upper back of the dancer.
Keywords
dance, folk dance, folk music, interaction, Motion Capture, music, music performance, performance strategies, playing for dancing, polska
Paper topics
Computational musicology and ethnomusicology, Interaction in music performance, Interactive performance systems, Music performance analysis and rendering
Easychair keyphrases
alignment curve [9], automatic alignment [7], body movement [6], reduced rendering [6], secondary recording [6], music performance [5], body part [4], dance movement [4], drift phase [4], folk dance [4], folk music [4], scandinavian folk music [4], stimulus type [4], swedish folk dance [4]
Paper type
Full paper
DOI: 10.5281/zenodo.3249455
Zenodo URL: https://zenodo.org/record/3249455
Abstract
Rhythm analysis is a well researched area in music information retrieval that has many useful applications in music production. In particular, it can be used to synchronize the tempo of audio recordings with a digital audio workstation (DAW). Conventionally this is done by stretching recordings over time, however, this can introduce artifacts and alter the rhythmic characteristics of the audio. Instead, this research explores how rhythm analysis can be used to do the reverse by synchronizing a DAW's tempo to a source recording. Drawing on research by Percival and Tzanetakis, a simple beat extraction algorithm was developed and integrated with the Renoise DAW. The results of this experiment show that, using user input from a DAW, even a simple algorithm can perform on par with popular packages for rhythm analysis such as BeatRoot, IBT, and aubio.
Keywords
Beat Extraction, Beat Induction, Beat Tracking, Digitial Audio Workstation, Music Production, Renoise, Rhythm Analysis
Paper topics
Algorithms and Systems for music composition, Interfaces for sound and music, Music information retrieval
Easychair keyphrases
beat tracking [28], tempo curve [8], beat extraction [7], beat tracking system [6], mir eval [6], music research [6], real time beat tracking [6], beat delta [5], music production [5], oss calculation [5], audio recording [4], beat extraction algorithm [4], beat time [4], beat tracking algorithm [4], daw integrated beat tracking [4], digital audio workstation [4], digital music [4], expected beat delta [4], language processing [4], music information retrieval [4], peak picking [4], spectral flux [4], streamlined tempo estimation algorithm [4]
Paper type
Position paper / Poster
DOI: 10.5281/zenodo.3249237
Zenodo URL: https://zenodo.org/record/3249237
Abstract
Anticipating a human musician's tempo for a given piece of music using a predictable model is important for interactive music applications, but existing studies base such an anticipation based on hand-crafted features. Based on recent trends in using deep learning for music performance rendering, we present an online method for multi-step prediction of the tempo curve, given the past history of tempo curves and the music score that the user is playing. We present a linear autoregressive model whose parameters are determined by a deep convolutional neural network whose input is the music score and the history of tempo curve; such an architecture allows the machine to acquire a music performance idioms based on musical contexts, while being able to predict the timing based on the user's playing. Evaluations show that our model is capable of improving tempo estimate over a commonly-used baseline for tempo prediction by 18%.
Keywords
Deep Neural Networks, Music Interaction, Tempo Prediction
Paper topics
Automatic music generation/accompaniment systems, Interaction in music performance, Music performance analysis and rendering
Easychair keyphrases
music score [51], music score feature [31], tempo curve [18], score feature [14], hand crafted feature [12], linear ar model [11], score feature extraction [11], timing prediction [10], fully connected layer [9], prediction coefficient function [9], deep linear ar model [8], music performance [8], feature extraction [7], music performance rendering [7], prediction error [7], segment duration [7], tempo prediction [7], deep non linear ar model [6], duet interaction [6], expressive timing [6], leaky relu batch norm [6], music score sn [6], performance feature [6], performance history [6], beat duration [5], deep learning [5], eighth note [5], human musician [5], piano playing [5], real time [5]
Paper type
Full paper
DOI: 10.5281/zenodo.3249387
Zenodo URL: https://zenodo.org/record/3249387
Abstract
In the design of new musical instruments, from acoustic to digital, merging conventional methods with new technologies has been one of the common approaches. Incorporation of prior design expertise with experimental or sometimes industrial methods suggests new directions in both musical expression design and the development of new manufacturing tools. This paper describes key concepts of digital manufacturing processes in musical instrument design. It provides a review of current manufacturing techniques which are commonly used to create new musical interfaces, and discusses future directions of digital fabrication which are applicable to numerous areas in music research, such as digital musical instrument (DMI) design, interaction design, acoustics, performance studies, and education. Additionally, the increasing availability of digital manufacturing tools and fabrication labs all around the world make these processes an integral part of the design and music classes. Examples of digital fabrication labs and manufacturing techniques used in education for student groups whose age ranges from elementary to university level are presented. In the context of this paper, it is important to consider how the growing fabrication technology will influence the design and fabrication of musical instruments, as well as what forms of new interaction methods and aesthetics might emerge.
Keywords
acoustics of musical instruments, design and manufacturing of musical instrument, interaction design, iterative design
Paper topics
Hardware systems for sound and music computing, Interfaces for sound and music, Music creation and performance, New interfaces for interactive music creation, Sonic interaction design, Sound and music for accessibility and special needs
Easychair keyphrases
musical instrument [27], musical instrument design [19], musical expression [15], instrument design [14], additive manufacturing [12], rapid prototyping [9], digital fabrication [8], digital manufacturing [8], hybrid manufacturing [8], manufacturing technique [7], manufacturing tool [7], digital musical instrument [6], fabrication lab [6], injection molding [6], instrument body [6], fabrication method [5], acoustic instrument [4], brass pan flute [4], digital manufacturing tool [4], electronic circuit [4], incremental robotic sheet forming [4], industrial manufacturing [4], manufacturing process [4], music research [4], personal manufacturing [4], portable digital manufacturing tool [4], printing technology [4]
Paper type
Full paper
DOI: 10.5281/zenodo.3249280
Zenodo URL: https://zenodo.org/record/3249280
Abstract
In this contribution, a system that represents drawings of geometric figures along with their description transcribed in Braille controlled by means of commands acquired by a speech recognition scheme is presented. The designed system recognizes the spoken descriptions needed to draw simple geometric objects: shape, colour, size and position of the figures in the drawing. The speech recognition method selected is based on a distance measure defined with Mel Frequency Cepstral Coefficients (MFCCs). The complete system can be used by both people with visual and with hearing impairments thanks to its interface which, in addition to showing the drawing and the corresponding transcription in Braille, also allows the user to hear the description of commands and final drawing.
Keywords
Braille, Drawing, MFCCs, Speech recognition
Paper topics
not available
Easychair keyphrases
speech recognition [12], speech recognition subsystem [6], geometric figure [4], speech recognition scheme [4]
Paper type
Demo
DOI: 10.5281/zenodo.3249437
Zenodo URL: https://zenodo.org/record/3249437
Abstract
This paper reports on the procedure and results of an experiment to evaluate a continuous sonic interaction with an everyday wind-like sound created by both acoustic and digital means. The interaction is facilitated by a mechanical theatre sound effect, an acoustic wind machine, which is performed by participants. This work is part of wider research into the potential of theatre sound effect designs as a means to study multisensory feedback and continuous sonic interactions. An acoustic wind machine is a mechanical device that affords a simple rotational gesture to a performer; turning its crank handle at varying speeds produces a wind-like sound. A prototype digital model of a working acoustic wind machine is programmed, and the acoustic interface drives the digital model in performance, preserving the same tactile and kinaesthetic feedback across the continuous sonic interactions. Participants’ performances are elicited with sound stimuli produced from simple gestural performances of the wind-like sounds. The results of this study show that the acoustic wind machine is rated as significantly easier to play than its digital counterpart. Acoustical analysis of the corpus of participants’ performances suggests that the mechanism of the wind machine interface may play a role in guiding their rotational gestures.
Keywords
Evaluation, Multimodality, Perception of Sound, Sonic Interaction, Sound Performance
Paper topics
access and modelling of musical heritage, Interactive performance systems, Interfaces for sound and music, Models for sound analysis and synthesis, Multimodality in sound and music computing, Perception and cognition of sound and music, Sonic interaction design, Technologies for the preservation
Easychair keyphrases
acoustic wind machine [55], wind machine [39], digital wind [33], continuous sonic interaction [19], digital model [18], theatre sound effect [16], crank handle [13], statistically significant difference [12], participant performance [10], digital counterpart [9], wind sound [9], rotational gesture [7], sound effect [7], digital musical instrument [6], everyday sound [6], historical theatre sound effect [6], sonic feedback [6], sound stimulus [6], statistical testing [6], wilcoxon signed rank test [6], easiness rating [5], order effect [5], performance gesture [5], similarity rating [5], steady rotation [5], theatre sound [5], digital wind machine [4], early twentieth century [4], free description [4], theatre wind machine [4]
Paper type
Full paper
DOI: 10.5281/zenodo.3249286
Zenodo URL: https://zenodo.org/record/3249286
Abstract
Experimental research into the fundamental acoustic aspects of musical instruments and other sound generating devices is an important part of the history of musical acoustics and of physics in general. This paper presented experimental proof of dispersive wave propagation on metal guitar strings. The high resolution experimental data of string displacement are gathered using video-kymographic high-speed imaging of the vibrating string. The experimental data are indirectly compared against a dispersive Euler-Bernoulli type model described by a PDE. In order to detect the minor wave features associated with the dispersion and distinguish them from other effects present, such as frequency-dependent dissipation, a second model lacking the dispersive (stiffness) term is used. Unsurprisingly, the dispersive effects are shown to be minor but definitively present. The results and methods presented here in general should find application in string instrument acoustics.
Keywords
dispersion analysis, dispersive wave propagation, experimental acoustics, guitar string, kymography, line-scan camera, nylon string, stiff string, String vibration
Paper topics
Digital audio effects, Models for sound analysis and synthesis
Easychair keyphrases
string displacement [14], traveling wave [12], string vibration [11], guitar string [9], dispersive wave propagation [7], boundary condition [6], frequency dependent [6], high frequency [6], high frequency wave component [6], high speed line scan [6], time series [6], digital waveguide [5], dispersion analysis [5], full model [5], digital audio effect [4], dispersive euler bernoulli type [4], dispersive high frequency oscillating tail [4], electric field sensing [4], frequency dependent loss [4], general solution [4], group velocity [4], high resolution experimental data [4], initial value problem [4], line scan camera [4], piano string [4], signal processing [4], triangular shaped initial condition [4], video kymographic [4], wave equation [4], window size [4]
Paper type
Full paper
DOI: 10.5281/zenodo.3249372
Zenodo URL: https://zenodo.org/record/3249372
Abstract
The user experience of a virtual reality intrinsically depends upon how the underlying system relays information to the user. Auditory and visual cues that make up the user interface of a VR help users make decisions on how to proceed in a virtual scenario. These interfaces can be diegetic (i.e. presented as part of the VR) or non-diegetic (i.e. presented as an external layer superimposed onto the VR). In this paper, we explore how auditory and visual cues of diegetic and non-diegetic origins affect a user’s decision-making process in VR. We present the results of a pilot study, where users are placed into virtual situations where they are expected to make choices upon conflicting suggestions as to how to complete a given task. We analyze the quantitative data pertaining to user preferences for modality and diegetic-quality. We also discuss the narrative effects of the cue types based on a follow-up survey conducted with the users.
Keywords
Auditory and visual interfaces, Diegetic and non-diegetic cues, Virtual Reality
Paper topics
Sound and music for Augmented/Virtual Reality and games
Easychair keyphrases
non diegetic [33], diegetic quality [16], virtual reality [14], diegetic audio [11], diegetic visual [11], virtual environment [11], visual cue [11], diegetic audio cue [9], second attempt [9], cue type [8], virtual room [8], decision making [7], user experience [7], first attempt [6], non diegetic cue [6], non diegetic visual object [6], cinematic virtual reality [4], diegetic audio object [4], diegetic cue [4], diegetic quality pairing [4], diegetic sound [4], implied universe [4], make decision [4], non diegetic audio cue [4], non diegetic sound [4], user interface [4], virtual space [4], visual element [4]
Paper type
Full paper
DOI: 10.5281/zenodo.3249315
Zenodo URL: https://zenodo.org/record/3249315
Abstract
We previously introduced JamSketch, a system which enabled users to improvise music by drawing a melodic outline. However, users could not control the rhythm and intensity of the generated melody. Here, we present extensions to JamSketch to enable rhythm and intensity control.
Keywords
Automatic music composition, Genetic algorithm, Melodic outline, Musical improvisation, Pen pressure
Paper topics
not available
Easychair keyphrases
melodic outline [23], pen pressure [8], note density [6], piano roll display [4]
Paper type
Demo
DOI: 10.5281/zenodo.3249349
Zenodo URL: https://zenodo.org/record/3249349
Abstract
In this contribution, we present a facial activity detection system using image processing and machine learning techniques. Facial activity detection allows monitoring people emotional states, attention, fatigue, reactions to different situations, etc., in a non-intrusive way. The designed system can be used in many fields such as education and musical perception. Monitoring the facial activity of a person can help us to know if it is necessary to take a break, change the type of music that is being listened to or modify the way of teaching the class.
Keywords
Education, Facial activity detection, Monitor Attention, Musical perception, SVM
Paper topics
not available
Easychair keyphrases
facial activity detection system [10], facial activity detection [6], finite state machine [6], temporal analysis [6], mouth state detection [4], mouth status [4], non intrusive way [4], person emotional state [4]
Paper type
Demo
DOI: 10.5281/zenodo.3249358
Zenodo URL: https://zenodo.org/record/3249358
Abstract
Our demo is a web app that suggests new practice material to music learners based on automatic chord analysis. It is aimed at music practitioners of any skill set, playing any instrument, as long as they know how to play along with a chord sheet. Users need to select a number of chords in the app, and are then presented with a list of music pieces containing those chords. Each of those pieces can be played back while its chord transcription is displayed in sync to the music. This enables a variety of practice scenarios, ranging from following the chords in a piece to using the suggested music as a backing track to practice soloing over.
Keywords
automatic chord recognition, music education, music recommendation, web application
Paper topics
not available
Easychair keyphrases
chord transcription [4]
Paper type
Demo
DOI: 10.5281/zenodo.3249445
Zenodo URL: https://zenodo.org/record/3249445
Abstract
The use of recurrent neural networks for modeling and generating music has seen much progress with textual transcriptions of traditional music from Ireland and the UK. We explore how well these models perform for textual transcriptions of traditional music from Scandinavia. This type of music can have characteristics that are similar to and different from those of Irish music, e.g. structure, mode, and rhythm. We investigate the effects of different architectures and training regimens, and evaluate the resulting models using two methods: a comparison of statistics between real and generated transcription populations, and an appraisal of generated transcriptions via a semi-structured interview with an expert in Swedish folk music. As for the models trained on Irish transcriptions, we see these recurrent models can generate new transcriptions that share characteristics with Swedish folk music. One of our models has been implemented online at http://www.folkrnn.org.
Keywords
Deep Learning, Folk Music, GRU, LSTM, Neural Network, Polka, RNN
Paper topics
Algorithms and Systems for music composition, Automatic music generation/accompaniment systems, New interfaces for interactive music creation
Easychair keyphrases
scandinavian folk music [15], folk music [12], training data [10], recurrent neural network [7], swedish folk music [7], folkwiki dataset [6], real transcription [6], short term memory [6], gru layer [5], music transcription [5], traditional music [5], eric hallstr om [4], fake transcription [4], gated recurrent unit [4], gru model [4], irish traditional music [4], neural network [4], semi structured interview [4], transcription model [4]
Paper type
Full paper
DOI: 10.5281/zenodo.3249474
Zenodo URL: https://zenodo.org/record/3249474
Abstract
This paper explores how notation developed for the representation of sound-based musical structures could be used for the transcription of vocal sketches representing expressive robot movements. A mime actor initially produced expressive movements which were translated to a humanoid robot. The same actor was then asked to illustrate these movements using vocal sketching. The vocal sketches were transcribed by two composers using sound-based notation. The same composers later synthesised new sonic sketches from the annotated data. Different transcriptions and synthesised versions of these were compared in order to investigate how the audible outcome changes for different transcriptions and synthesis routines. This method provides a palette of sound models suitable for the sonification of expressive body movements.
Keywords
robot sound, sonic interaction design, sonification, sound representation, Sound trascription, voice sketching
Paper topics
Auditory display and data sonification, Models for sound analysis and synthesis, Multimodality in sound and music computing, Music performance analysis and rendering, Perception and cognition of sound and music, Social interaction in sound and music computing, Sonic interaction design
Easychair keyphrases
vocal sketch [21], sound synthesis [8], mime actor [7], notation system [7], sound structure [7], synthesized version [6], vocal sketching [6], humanoid robot [5], sonic sketch [5], expressive gesture [4], human robot interaction [4], kmh royal college [4], movement sonification [4], pitched sound [4], sonao project [4], sound based musical structure [4], sound model [4]
Paper type
Full paper
DOI: 10.5281/zenodo.3249299
Zenodo URL: https://zenodo.org/record/3249299
Abstract
Abstract Physical Modeling for Sound Synthesis: Graph Based Physical Models for Sound Synthesis Pelle Juul Christensen Aalborg University Copenhagen, Denmark pelle.juul@tuta.io Stefania Serafin Aalborg University Copenhagen, Denmark sts@create.aau.dk ABSTRACT We focus on physical models in which multiple strings are connected via junctions to form graphs. Starting with the case of the 1D wave equation, we show how to ex- tend it to a string branching into two other strings, and from there how to build complex cyclic and acyclic graphs. We introduce the concept of dense models and show that a discretization of the 2D wave equation can be built us- ing our methods, and that there are more efficient ways of modelling 2D wave propagation than a rectangular grid. We discuss how to apply Dirichlet and Neumann boundary conditions to a graph model, and show how to compute the frequency content of a graph using common methods. We then prove general lower and upper bounds computational complexity. Lastly, we show how to extend our results to other kinds of acoustical objects, such as linear bars, and how to add dampening to a graph model. A reference implementation in MATLAB and an interactive JUCE/C++ application is available online.
Keywords
Digital signal processing, Physical modeling for sound synthesis, Sound and music computing
Paper topics
and virtual acoustics, Models for sound analysis and synthesis, reverberation, Sound/music signal processing algorithms, Spatial sound
Easychair keyphrases
boundary condition [22], d wave equation [20], finite difference scheme [20], wave equation [14], pendant node [9], branching topology [8], rectangular grid [8], computational complexity [7], physical model [7], string segment [7], graph based physical model [6], mass spring network [6], digital waveguide [5], graph model [5], sound synthesis [5], aalborg university copenhagen [4], dense model [4], difference operator [4], d wave equation based model [4], edge node [4], hexagonal grid [4], linear bar model [4], mass spring system [4], n branch topology [4], physical modelling [4], stability condition [4], wave propagation [4]
Paper type
Position paper / Poster
DOI: 10.5281/zenodo.3249331
Zenodo URL: https://zenodo.org/record/3249331
Abstract
Playing techniques such as ornamentations and articulation effects constitute important aspects of music performance. However, their computational analysis is still at an early stage due to a lack of instrument diversity, established methodologies and informative data. Focusing on the Chinese bamboo flute, we introduce a two-stage glissando detection system based on hidden Markov models (HMMs) with Gaussian mixtures. A rule-based segmentation process extracts glissando candidates that are consecutive note changes in the same direction. Glissandi are then identified by two HMMs. The study uses a newly created dataset of Chinese bamboo flute recordings, including both isolated glissandi and real-world pieces. The results, based on both frame- and segment-based evaluation for ascending and descending glissandi respectively, confirm the feasibility of the proposed method for glissando detection. Better detection performance of ascending glissandi over descending ones is obtained due to their more regular patterns. Inaccurate pitch estimation forms a main obstacle for successful fully-automated glissando detection. The dataset and method can be used for performance analysis.
Keywords
Ethnomusicology, Glissando, Hidden Markov models, Playing technique detection
Paper topics
Automatic separation, classification of sound and music, Computational musicology and ethnomusicology, Content processing of music audio signals, Music information retrieval, Music performance analysis and rendering, recognition, Sound/music signal processing algorithms
Easychair keyphrases
playing technique [22], music information retrieval [11], descending glissando [10], glissando detection [10], isolated glissando [10], ground truth [9], ascending glissando [8], detection system [8], note change [8], pitch estimation [8], ascending and descending [7], chinese bamboo flute [7], international society [7], whole piece recording [7], automated glissando detection system [6], computational analysis [6], fully automated glissando [6], guitar playing technique [6], performed glissando [6], rule based segmentation [6], glissando candidate [5], modeling of magnitude and phase derived [5], note number [5], signal processing [5], ascending performed glissando [4], cbf playing technique [4], glissando detection system [4], hidden markov model [4], pitch estimation accuracy [4], playing technique detection [4]
Paper type
Full paper
DOI: 10.5281/zenodo.3249470
Zenodo URL: https://zenodo.org/record/3249470
Abstract
This paper presents some of the outcomes of a one year Higher Education Innovation Fund funded project exam-ining the use of music technology to increase access to music for children within special educational need (SEN) settings. Despite the widely acknowledged benefits of interacting with music for children with SEN there are a number of well documented barriers to access [1, 2, 3]. These barriers take a number of forms including financial, knowledge based or attitudinal. The aims of this project were to assess the current music technology provision in SEN schools within a particular part of the Dorset region, UK, determine the barriers they were facing and develop strategies to help the schools overcome these barriers. An overriding concern for this project was to leave the schools with lasting benefit and meaningful change. As such an Action Research [4] methodology was followed, which has at its heart an understanding of the participants as co-researchers helping ensure any solutions presented met the needs of the stakeholders.. Although technologi-cal solutions to problems were presented to the school, it was found that the main issues were around the flexibil-ity of equipment to be used in different locations, staff time and staff attitudes to technology.
Keywords
access, inclusion, interaction, SEN
Paper topics
Sound and music for accessibility and special needs
Easychair keyphrases
music technology [21], music therapy [15], action research [12], resonance board [12], music therapist [6], music therapy perspective [6], vibro tactile resonance board [6], sen setting [5], action research methodology [4]
Paper type
Position paper / Poster
DOI: 10.5281/zenodo.3249345
Zenodo URL: https://zenodo.org/record/3249345
Abstract
Sound synthesis represents an indispensable tool for modern composers and performers, but achieving desired sonic results often requires a tedious manipulation of various numeric parameters. In order to facilitate this process, a number of possible approaches have been proposed, but without a systematic user research that could help researchers to articulate the problem and to make informed design decisions. The purpose of this study is to fill that gap and to investigate attitudes and habits of sound synthesizer users. The research was based on a questionnaire answered by 122 participants, which, beside the main questions about habits and attitudes, covered questions about their demographics, profession, educational background and experience in using sound synthesizers. The results were quantitatively analyzed in order to explore relations between all those dimensions. The main results suggest that the participants more often modify or create programs than they use existing presets or programs and that such habits do not depend on the participants’ education, profession, or experience.
Keywords
automatic parameter selection, quantitative studies, sound synthesis, user research
Paper topics
and software environments for sound and music computing, Interfaces for sound and music, Languages, Models for sound analysis and synthesis, protocols
Easychair keyphrases
sound synthesizer [37], user interface [18], synthesis parameter [17], synthesizer programming [14], rank sum test [12], wilcoxon rank sum [12], computer music [11], music education [11], existing program [10], genetic algorithm [10], sound synthesis [9], usage habit [9], automatic selection [7], create program [7], creating and modifying [7], spearman correlation coefficient [7], automatic parameter selection [6], formal music education [6], international computer [6], modifying program [6], music student [6], professional musician [6], statistically significant difference [6], user research [6], desired sound [5], synthesis engine [5], audio engineering society [4], audio feature [4], computer science technique [4], music education level [4]
Paper type
Full paper
DOI: 10.5281/zenodo.3249370
Zenodo URL: https://zenodo.org/record/3249370
Abstract
This demo presents an acoustic interface which allows to directly excite digital resonators (digital waveguides, lumped models, modal synthesis and sample convolution). Parameters are simultaneously controlled by the touch position on the same surface. The experience is an intimate and intuitive interaction with sound for percussive and melodic play.
Keywords
not available
Paper topics
not available
Easychair keyphrases
not available
Paper type
Demo
DOI: 10.5281/zenodo.3249260
Zenodo URL: https://zenodo.org/record/3249260
Abstract
tinySounds is a collaborative work for live performer and musebot ensemble. Musebots are autonomous musical agents that interact, via messaging, to create a musical performance with or without human interaction.
Keywords
generative music, interactive system, musebots, musical agents
Paper topics
not available
Easychair keyphrases
musebot ensemble [7]
Paper type
Demo
DOI: 10.5281/zenodo.3249347
Zenodo URL: https://zenodo.org/record/3249347
Abstract
This paper proposes a computational method for the analysis and visualization of structure in freely improvised musical pieces, based on source separation and interaction patterns. A minimal set of descriptive axes is used for eliciting interaction modes, regions and transitions. To this end, a suitable unsupervised segmentation model is selected based on the author's ground truth, and is used to compute and compare event boundaries of the individual audio sources. While still at a prototypal stage of development, this method offers useful insights for evaluating a musical expression that lacks formal rules and protocols, including musical functions (e.g., accompaniment, solo, etc.) and form (e.g., verse, chorus, etc.).
Keywords
Computational musicology, Interaction and improvisation, Interaction in music performance, Perception and cognition of sound and music
Paper topics
Computational musicology and ethnomusicology, Improvisation in music through interactivity, Interaction in music performance, Music information retrieval, Perception and cognition of sound and music
Easychair keyphrases
freely improvised music [23], musical expression [13], free jazz [11], audio source [10], free improvisation [10], real time [9], dynamic mode [8], source separation [7], audio source separation [6], clear cut [6], musical improvisation [6], musical surface [6], music information retrieval [6], ordinal linear discriminant analysis [6], audio mix [5], jazz improvisation [5], static mode [5], activation time [4], auditory stream segregation [4], convex non negative matrix factorization [4], ground truth [4], improvised music [4], individual audio source [4], inter region [4], multi track recording [4], music theory [4], segmentation boundary [4], signal processing [4], structural segmentation [4]
Paper type
Position paper / Poster
DOI: 10.5281/zenodo.3249239
Zenodo URL: https://zenodo.org/record/3249239
Abstract
In this contribution, we present an interactive system for playing while learning music. The game is based on different computer games controlled by the user with a remote control. The remote control has been implemented using IMU sensors for 3D tranking. The computer games are programming in Python and allows to practice rhythm as well as the tune, ascending or descending of musical notes.
Keywords
IMU sensors, Interactive system, Music learning, Serious Games
Paper topics
not available
Easychair keyphrases
remote control [16], interactive music training system [10], computer game [8], practice rhythm [5], serious game [5], ascending or descending [4], lleva el cursor [4], mover el cursor [4], note order game [4], ve una partitura con [4]
Paper type
Demo
DOI: 10.5281/zenodo.3249439
Zenodo URL: https://zenodo.org/record/3249439
Abstract
When designing interactive sound for non-utilitarian ludic interaction internal complexity can be a way of opening up a space for curiosity and exploration. Internal complexity should be understood as non-linear mappings between the input and the parameters they affect in the output (sound). This paper presents three different experiments which explore ways to create internal complexity with rather simple interfaces for curious exploration.
Keywords
8 Bit synth, Curiosity, Exploration, Interaction, Ludic play
Paper topics
not available
Easychair keyphrases
noise machine [5], internal complexity [4]
Paper type
Demo
DOI: 10.5281/zenodo.3249447
Zenodo URL: https://zenodo.org/record/3249447
Abstract
Abstract “Jazz mapping" is a multi-layer analytical approach to jazz improvisation based on hierarchical segmentation and categorization of segments, or constituents, according to their function in the overall improvisation. In this way higher-level semantics of transcribed and recorded jazz solos can be exposed. In this approach, the knowledge of the expert jazz performer is taken into account in all ana-lytical decisions. We apply the method to two well-known solos, by Sonny Rollins and Charlie Parker and we discuss how improvisations resemble story telling, employing a broad range of structural, expressive, tech-nical and emotional tools usually associated with the pro-duction and experience of language and of linguistic meaning. We make explicit the choices of the experi-enced jazz improviser who has developed a strong com-mand over the language and unfolds a story in real time, very similar to prose on a given framework, He/she uti-lizes various mechanisms to communicate expressive intent, elicit emotional responses, and make his/her musi-cal “story,” memorable and enjoyable to fellow musicians and listeners. We also comment on potential application areas of this work related to music and artificial intelli-gence.
Keywords
Interaction with music, Jazz Analyses, Jazz performance and AI, Machine learning, Music information retrieval, Semantics
Paper topics
Interaction in music performance, Models for sound analysis and synthesis, Music creation and performance, Music performance analysis and rendering, Perception and cognition of sound and music
Easychair keyphrases
jazz improvisation [14], thematic development [10], sonny rollin [8], personal voice [6], structural element [6], machine learning [5], charlie parker [4], jazz mapping [4], jazz solo [4], kapodistrian university [4], music study [4], story telling [4]
Paper type
Position paper / Poster
DOI: 10.5281/zenodo.3249431
Zenodo URL: https://zenodo.org/record/3249431
Abstract
As deep learning advances, algorithms of music composition increase in performance. However, most of the successful models are designed for specific musical structures. Here, we present BachProp, an algorithmic composer that can generate music scores in many styles given sufficient training data. To adapt BachProp to a broad range of musical styles, we propose a novel representation of music and train a deep network to predict the note transition probabilities of a given music corpus. In this paper, new music scores generated by BachProp are compared with the original corpora as well as with different network architectures and other related models. A set of comparative measures is used to demonstrate that BachProp captures important features of the original datasets better than other models and invite the reader to a qualitative comparison on a large collection of generated songs.
Keywords
Automated Music Composition, Deep Learning, Generative Model of Music, Music Representation, Recurrent Neural Networks
Paper topics
Algorithms and Systems for music composition
Easychair keyphrases
neural network [15], generative model [12], recurrent neural network [11], time shift [11], novelty profile [9], music score [8], latexit latexit [7], music composition [7], hidden state [6], note sequence [6], reference corpus [6], auto novelty [5], bach chorale [5], john sankey [5], local statistic [5], machine learning [5], midi sequence [5], novelty score [5], string quartet [5], base unit [4], data set [4], hidden layer [4], musical structure [4], preprint arxiv [4], probability distribution [4], recurrent layer [4], recurrent neural network model [4], science ecole polytechnique [4], song length [4], utzxqxj0wy1i3a2q4fip7kfdydqi5jqljw8hoauq3hrk2ilz3oe64h9gbeqfgfk300ex hegyxu565ypbdwcgy6swkwrkapbkx91 znkik2ssgtopuqn6gduomostujc1pkfsrae8y6mietd nrt4qk6s0idhrg0pjdp190rgi2pguwa7i4pd [4]
Paper type
Full paper
DOI: 10.5281/zenodo.3249394
Zenodo URL: https://zenodo.org/record/3249394
Abstract
Mass-interaction methods for sound synthesis, and more generally for digital artistic creation, have been studied and explored for over three decades, by a multitude of researchers and artists. However, for a number of reasons this research has remained rather confidential, subsequently overlooked and often considered as the "odd-one-out" of physically-based synthesis methods, of which many have grown exponentially in popularity over the last ten years. In the context of a renewed research effort led by the authors on this topic, this paper aims to reposition mass-interaction physical modelling in the contemporary fields of Sound and Music Computing and Digital Arts: what are the core concepts? The end goals? And more importantly, which relevant perspectives can be foreseen in this current day and age? Backed by recent developments and experimental results, including 3D mass-interaction modelling and emerging non-linear effects, this proposed reflection casts a first canvas for an active, and resolutely outreaching, research on mass-interaction physical modelling for the arts.
Keywords
3D Physical Modeling, Emerging Non-linear Behaviors, Mass Interaction, Multi-Sensory, Processing
Paper topics
Interactive performance systems, Models for sound analysis and synthesis, Multimodality in sound and music computing, Music creation and performance, New interfaces for interactive music creation
Easychair keyphrases
mass interaction [31], sound synthesis [17], mass interaction physical modelling [16], physical modelling [15], real time [11], non linear [9], discrete time [7], interaction physical [7], non linear behaviour [7], computer music [6], mass interaction model [6], mass interaction physical model [6], non linearity [5], tension modulation [5], chaotic oscillation [4], finite difference scheme [4], grenoble inp gipsa lab [4], haptic interaction [4], modular physical modelling [4], musical instrument [4], physical model [4], virtual object [4]
Paper type
Full paper
DOI: 10.5281/zenodo.3249313
Zenodo URL: https://zenodo.org/record/3249313
Abstract
Mechanical Entanglement is a musical composition for three performers. Three force feedback devices each containing two haptic faders are mutually coupled using virtual linear springs and dampers. During the composition, the performers feel each others' gestures and collaboratively process the music material. The interaction's physical modelling parameters are modified during the different sections of the composition. An algorithm which process three stereo channels, is stretching in and out-of-sync three copies of the same music clip. The performers are “controlling” the stretching algorithm and an amplitude modulation effect, both applied to recognisable classical and contemporary music compositions. Each of them is substantially modifying the length and the dynamics of the same music clip but also simultaneously affecting subtly or often abruptly the gestural behaviour of the other performers. At fixed points in the length of the composition, the music becomes gradually in sync and the performers realign their gestures. This phasing “game” between gestures and sound, creates tension and emphasises the physicality of the performance.
Keywords
collaborative performance, composition, force-feedback, haptics, interactive music performance, lumped element modelling, mass-interaction networks, physical modelling
Paper topics
Improvisation in music through interactivity, Interaction in music performance, Interactive performance systems, Interfaces for sound and music, Music creation and performance, New interfaces for interactive music creation, Social interaction in sound and music computing
Easychair keyphrases
force feedback [10], computer music [9], haptic device [9], physical model [7], force feedback device [6], audio file [5], haptic fader [5], musical expression [5], musical instrument [5], signal processing [5], haptic digital audio effect [4], haptic signal processing [4], haptic signal processing framework [4], led light [4], mechanical entanglement [4], musical composition [4]
Paper type
Position paper / Poster
DOI: 10.5281/zenodo.3249242
Zenodo URL: https://zenodo.org/record/3249242
Abstract
Melody identification is an important early step in music analysis. This paper presents a tool to identify the melody in each measure of a Standard MIDI File. We also share an open dataset of manually labeled music for researchers. We use a Bayesian maximum-likelihood approach and dynamic programming as the basis of our work. We have trained parameters on data sampled from the million song dataset and tested on a dataset including 1706 measures of music from different genres. Our algorithm achieves an overall accuracy of 90% in the test dataset. We compare our results to previous work.
Keywords
Bayesian, Melody, Music analysis, Standard MIDI File, Viterbi
Paper topics
Automatic separation, classification of sound and music, Music information retrieval, recognition
Easychair keyphrases
training data [13], melody channel [12], midi file [10], window size [10], melody identification [8], note density [8], dynamic programming [7], standard deviation [7], switch penalty [7], channel containing [5], melody extraction [5], test data [5], bayesian probability model [4], channel switch [4], cross validation [4], feature set [4], fold cross [4], pitch mean [4]
Paper type
Position paper / Poster
DOI: 10.5281/zenodo.3249256
Zenodo URL: https://zenodo.org/record/3249256
Abstract
This paper describes our interactive music system called the “Melody Slot Machine,” which enables control of a holographic performer. Although many interactive music systems have been proposed, manipulating perfor-mances in real time is difficult for musical novices because melody manipulation requires expert knowledge. Therefore, we developed the Melody Slot Machine to provide an experience of manipulating melodies by enabling users to freely switch between two original melodies and morphing melodies.
Keywords
Generative Theory of Tonal Music, Interactive Music System, Melody Morphing
Paper topics
not available
Easychair keyphrases
melody slot machine [15], time span tree [12], melody morphing method [7], holographic display [6], cache size [5], melody segment [5], frame rate [4], virtual performer [4]
Paper type
Demo
DOI: 10.5281/zenodo.3249262
Zenodo URL: https://zenodo.org/record/3249262
Abstract
In the context of a general research question about the effectiveness of computer-based technologies applied to early music-harmony learning, this paper proposes a web-based tool to foster and quantitatively measure harmonic awareness in children. To this end, we have developed a Web interface where young learners can listen to the leading voice of well-known music pieces and associate chords to it. During the activity, their actions can be monitored, recorded, and analyzed. An early experimentation involved 45 primary school teachers, whose performances have been measured in order to get user-acceptance opinions from domain experts and to determine the most suitable metrics to conduct automated performance analysis. This paper focuses on the latter aspect and proposes a set of candidate metrics to be used for future experimentation with children.
Keywords
assessment, harmony, metrics, music education, web tools
Paper topics
Perception and cognition of sound and music
Easychair keyphrases
tonal harmony [18], music tune [11], harmonic touch [9], leading voice [9], final choice [6], music education [6], parallel chord [6], harmonic function [5], final chord [4], harmonic awareness [4], harmonic space [4], harmony awareness [4], implicit harmony [4], learning effect [4], melody harmonization [4], primary chord [4], primary school child [4], research question [4], tonal function [4], tonic chord [4], web interface [4]
Paper type
Full paper
DOI: 10.5281/zenodo.3249389
Zenodo URL: https://zenodo.org/record/3249389
Abstract
Physical modelling techniques are now an essential part of digital sound synthesis, allowing for the creation of complex timbres through the simulation of virtual matter and expressive interaction with virtual vibrating bodies. However, placing these tools in the hands of the composer or musician has historically posed challenges in terms of a) the computational expense of most real-time physically based synthesis methods, b) the difficulty of implementing these methods into modular tools that allow for the intuitive design of virtual instruments, without expert physics and/or computing knowledge, and c) the generally limited access to such tools within popular software environments for musical creation. To this end, a set of open-source tools for designing and computing mass-interaction networks for physically-based sound synthesis is presented. The audio synthesis is performed within Max/MSP using the gen~ environment, allowing for simple model design, efficient calculation of systems containing single-sample feedback loops, as well as extensive real-time control of physical parameters and model attributes. Through a series of benchmark examples, we exemplify various virtual instruments and interaction designs.
Keywords
Mass-interaction, Max/MSP, Physical modelling, Toolbox
Paper topics
Models for sound analysis and synthesis, Multimodality in sound and music computing, Sound/music signal processing algorithms
Easychair keyphrases
mass interaction [25], mass interaction model [12], physical model [12], sound synthesis [11], computer music [9], physical modelling [9], mass interaction physical modelling [8], discrete time [7], harmonic oscillator [7], mass interaction modelling [7], mass interaction network [7], motion buffer [7], physical modeling [6], real time [6], drunk triangle [5], physical parameter [5], stability condition [5], control rate parameter [4], digital sound synthesis [4], external position [4], force feedback [4], gen patch [4], grenoble inp gipsa lab [4], mass interaction physical modeling [4], mass type element [4], mi gen toolbox [4], model based digital piano [4], multisensory virtual musical instrument [4], non linear [4], physically based synthesis method [4]
Paper type
Full paper
DOI: 10.5281/zenodo.3249376
Zenodo URL: https://zenodo.org/record/3249376
Abstract
The MiningSuite is a free open-source and comprehensive Matlab framework for the analysis of signals, audio recordings, music recordings, music scores, other signals such as motion capture data, etc., under a common modular framework. It adds a syntactic layer on top of Matlab, so that advanced operations can be specified using a simple and adaptive syntax. This makes the Matlab environment very easy to use for beginners, and in the same time allows power users to design complex workflows in a modular and concise way through a simple assemblage of operators featuring a large set of options. The MiningSuite is an extension of MIRtoolbox, a Matlab toolbox that has become a reference tool in MIR.
Keywords
Matlab toolbox, MIR, open source
Paper topics
not available
Easychair keyphrases
not available
Paper type
Demo
DOI: 10.5281/zenodo.3249435
Zenodo URL: https://zenodo.org/record/3249435
Abstract
We present a model to express preferences on rhythmic structure, based on probabilistic context-free grammars, and a procedure that learns the grammars probabilities from a dataset of scores or quantized MIDI files. The model formally defines rules related to rhythmic subdivisions and durations that are in general given in an informal language. Rules preference is then specified with probability values. One targeted application is the aggregation of rules probabilities to qualify an entire rhythm, for tasks like automatic music generation and music transcription. The paper also reports an application of this approach on two datasets.
Keywords
Digital Music Scores, Grammatical Inference, Rhythmic notation, Weighted Context-Free-Grammars
Paper topics
Algorithms and Systems for music composition, Automatic music generation/accompaniment systems, Music information retrieval
Easychair keyphrases
parse tree [33], music notation [10], probabilistic context free grammar [8], weight value [8], context free grammar [7], rhythmic notation [7], time interval [7], time signature [7], training set [7], midi file [6], rhythm structure [6], rhythm tree [6], hierarchical structure [5], rhythm notation [5], enhanced wikifonia leadsheet dataset [4], k div rule [4], musical event [4], non terminal symbol [4]
Paper type
Full paper
DOI: 10.5281/zenodo.3249476
Zenodo URL: https://zenodo.org/record/3249476
Abstract
In this article we explore how the different semantics of spectrograms’ time and frequency axes can be exploited for musical tempo and key estimation using Convolutional Neural Networks (CNN). By addressing both tasks with the same network architectures ranging from shallow, domain-specific approaches to VGG variants with directional filters, we show that axis-aligned architectures perform similarly well as common VGG-style networks, while being less vulnerable to confounding factors and requiring fewer model parameters.
Keywords
CNN, Confounds, Key, MIR, Tempo
Paper topics
Automatic separation, classification of sound and music, Content processing of music audio signals, Models for sound analysis and synthesis, Music information retrieval, recognition, Sound/music signal processing algorithms
Easychair keyphrases
music information retrieval [22], tempo estimation [18], convolutional neural network [15], directional filter [13], key detection [12], key estimation [12], th international society [12], gtzan key [9], square filter [8], tempo task [8], convolutional layer [7], deepmod deepmod deepmod [6], electronic dance music [6], genre recognition [6], mir task [6], shallow architecture [6], standard deviation [6], deep architecture [5], giantstep key [5], network architecture [5], signal processing [5], validation accuracy [5], feature extraction module [4], giantstep tempo [4], key accuracy [4], key task [4], layer input conv [4], similar parameter count [4], tempo annotation [4], temporal filter [4]
Paper type
Position paper / Poster
DOI: 10.5281/zenodo.3249250
Zenodo URL: https://zenodo.org/record/3249250
Abstract
It is impossible for one temperament to achieve optimally both of consonance and modulation. The dissonance level has been calculated by the ratio of two pitch frequencies, however in the current homophonic music, the level should be measured by chords, especially by triads. In this research, we propose to quantify them as Dissonance Index of Triads (DIT). We select eight well-known temperaments and calculate seven diatonic chords in 12 keys and compare the weighted average and standard deviation to quantify the consonance, and then we visualize our experimental results in a two-dimensional chart to compare the trade-offs between consonance and modulation.
Keywords
equal temperament, mean tone, Pythagoras, Scale, visualization
Paper topics
Computational musicology and ethnomusicology, Perception and cognition of sound and music
Easychair keyphrases
equal temperament [14], sanfen sunyi fa [12], dissonance value [8], dit value [8], just intonation [8], dissonance curve [7], pythagorean tuning [6], critical bandwidth [5], music temperament [5], average consonant level [4], base tone [4], dissonance index [4], dissonance level [4], horizontal axis [4], mean tone [4], pure tone [4], quarter comma meantone [4]
Paper type
Position paper / Poster
DOI: 10.5281/zenodo.3249327
Zenodo URL: https://zenodo.org/record/3249327
Abstract
An Android application has been developed to encrypt messages using musical notes that can be automatically played from the smartphone and/or stored in a midi file to be transmitted over any available connection. The app has been designed to recover the original message on-the-fly detecting the notes played by a different device. The main objective of this project is to make known the rela-tionship between cryptography and music showing old systems (XVII century) implemented in modern devices.
Keywords
Android, Cryptography, Encryption, Fundamental Frecuency, Guyot, Java, Music, Real Time Audio Capture
Paper topics
not available
Easychair keyphrases
musical note [6], android application [4]
Paper type
Demo
DOI: 10.5281/zenodo.3249270
Zenodo URL: https://zenodo.org/record/3249270
Abstract
Sound design is an integral part of making a virtual environment come to life. Spatialization is important to the perceptual localization of sounds, while the quality determines how well virtual objects come to life. The implementation of pre-recorded audio for physical interactions in virtual environments often require a vast library of audio files to distinguish each interaction from the other. This paper explains the implementation of a modal synthesis toolkit for the Unity game engine to automatically add impact and rolling sounds to interacting objects. Position-dependent sounds are achieved using a custom shader that can contain textures with modal weighting parameters. The two types of contact sounds are synthesized using a mechanical oscillator describing a spring-mass system. Since the contact force that is applied to the system includes a non-linear component, its value is found using an approximating algorithm. In this case the Newton-Rhapson algorithm is used. The mechanical oscillator is discretized using the K method with the bilinear transform.
Keywords
Game Audio, Impact, K Method, Non-linear, Physical Modelling, Rolling, Sound Synthesis
Paper topics
Models for sound analysis and synthesis, Sonic interaction design, Sound and music for Augmented/Virtual Reality and games, Sound/music signal processing algorithms
Easychair keyphrases
mechanical oscillator [10], modal texture [10], modal synthesis [9], modal weight [9], virtual environment [8], glass table [6], impact sound [6], modal weighting [6], physical modelling [6], rolling sound [6], computer graphic [5], fundamental frequency [5], game engine [5], micro impact [5], sound synthesis [5], interaction type [4], normal mode [4], unity game engine [4]
Paper type
Position paper / Poster
DOI: 10.5281/zenodo.3249410
Zenodo URL: https://zenodo.org/record/3249410
Abstract
In this paper we propose a multisensory simulation of plucking guitar strings in virtual reality. The auditory feedback is generated by a physics-based simulation of guitar strings, and haptic feedback is provided by a combination of high fidelity vibrotactile actuators and a Phantom Omni. Moreover, we present a user study (n=29) exploring the perceived realism of the simulation and the relative importance of force and vibrotactile feedback for creating a realistic experience of plucking virtual strings. The study compares four conditions: no haptic feedback, vibrotactile feedback, force feedback, and a combination of force and vibrotactile feedback. The results indicate that the combination of vibrotactile and force feedback elicits the most realistic experience, and during this condition, the participants were less likely to inadvertently hit strings after the intended string had been plucked. Notably, no statistically significant differences were found between the conditions involving either vibrotactile or force feedback, which points towards an indication that haptic feedback is important but does not need to be high fidelity in order to enhance the quality of the experience.
Keywords
guitar simulation, haptic feedback, virtual reality
Paper topics
Sonic interaction design, Sound and music for Augmented/Virtual Reality and games
Easychair keyphrases
vibrotactile feedback [23], haptic feedback [18], virtual string [17], physical string [13], statistically significant difference [12], force feedback [10], significant difference [10], perceived realism [9], pairwise comparison [8], real guitar string [7], aalborg university [6], guitar string [6], involving force feedback [6], phantom omni haptic device [6], plucking guitar string [6], realistic experience [6], virtual guitar [6], auditory feedback [5], median score [5], perceptual similarity [5], questionnaire item [5], audio engineering society [4], computer music [4], musical instrument [4], real string [4], relative importance [4], vibrotactile actuator [4], virtual reality [4]
Paper type
Full paper
DOI: 10.5281/zenodo.3249321
Zenodo URL: https://zenodo.org/record/3249321
Abstract
In a common music practice scenario a player works with a musical score, but may jump arbitrarily from one passage to another in order to drill on difficult technical challenges or pursue some other agenda requiring non-linear movement through the score. In this work we treat the associated score alignment problem in which we seek to align a known symbolic score to audio of the musician's practice session, identifying all ``do-overs'' and jumps. The result of this effort facilitates a quantitative view of a practice session, allowing feedback on coverage, tempo, tuning, rhythm, and other aspects of practice. If computationally feasible we would prefer a globally optimal dynamic programming search strategy; however, we find such schemes only barely computationally feasible in the cases we investigate. Therefore, we develop a computationally efficient off-line algorithm suitable for practical application. We present examples analyzing unsupervised and unscripted practice sessions on clarinet, piano and viola, providing numerical evaluation of our score-following results on hand-labeled ground-truth audio data, as well as more subjective and easy-to-interpret visualizations of the results.
Keywords
beam search, music practice, score following
Paper topics
Automatic music generation/accompaniment systems, Automatic separation, classification of sound and music, Content processing of music audio signals, Interaction in music performance, Interactive performance systems, Music creation and performance, recognition, Sound/music signal processing algorithms
Easychair keyphrases
score alignment [26], practice session [16], score position [13], data model [9], hidden markov model [9], beam search [7], score alignment problem [7], pitch tree [6], ground truth [5], musical score [5], real time [5], score note [5], dynamic programming [4], mozart clarinet concerto [4], non terminal node [4], quarter note [4], score location [4], skip problem [4], traditional score alignment [4]
Paper type
Full paper
DOI: 10.5281/zenodo.3249396
Zenodo URL: https://zenodo.org/record/3249396
Abstract
We present ongoing works exploring the use of artificial intelligence and machine learning in computer-assisted music composition. The om-ai library for OpenMusic implements well-known techniques for data classification and prediction, in order to integrate them in composition workflows. We give examples using simple musical structures, highlighting possible extensions and applications.
Keywords
Artificial Intelligence, Common Lisp, Computer-Assisted Composition, Descriptors, Machine Learning, OpenMusic, Vector-Space
Paper topics
not available
Easychair keyphrases
machine learning [8], vector space [7], feature vector [5], computer assisted composition system [4]
Paper type
Demo
DOI: 10.5281/zenodo.3249264
Zenodo URL: https://zenodo.org/record/3249264
Abstract
Currently, developing immersive music environments for extended reality (XR) can be a tedious process requiring designers to build 3D audio controllers from scratch. OSC-XR is a toolkit for Unity intended to speed up this process through rapid prototyping, enabling research in this emerging field. Designed with multi-touch OSC controllers in mind, OSC-XR simplifies the process of designing immersive music environments by providing prebuilt OSC controllers and Unity scripts for designing custom ones. In this work, we describe the toolkit's infrastructure and perform an evaluation of the controllers to validate the generated control data. In addition to OSC-XR, we present UnityOscLib, a simplified OSC library for Unity utilized by OSC-XR. We implemented three use cases, using OSC-XR, to inform its design and demonstrate its capabilities. The Sonic Playground is an immersive environment for controlling audio patches. Hyperemin is an XR hyperinstrument environment in which we augment a physical theremin with OSC-XR controllers for real-time control of audio processing. Lastly, we add OSC-XR controllers to an immersive T-SNE visualization of music genre data for enhanced exploration and sonification of the data. Through these use cases, we explore and discuss the affordances of OSC-XR and immersive music interfaces.
Keywords
Extended Reality, Immersive Interaction, Immersive Interfaces for Musical Expression, Open Sound Control, Virtual Environments
Paper topics
Interactive performance systems, Interfaces for sound and music, New interfaces for interactive music creation, Sound and music for Augmented/Virtual Reality and games
Easychair keyphrases
immersive environment [21], osc message [14], osc xr controller [11], immersive music environment [9], osc controller [9], computer music [8], controller prefab [7], multi touch osc [7], transmitting osc message [7], musical expression [6], osc receiver [6], osc xr slider [6], pad controller [6], sound designer [6], touch osc controller [6], unity inspector [6], use case [6], virtual reality [6], audio processing [5], immersive interface [5], international computer [5], multi touch [5], musical interaction [5], performance environment [5], rapid prototyping [5], traditional instrument [5], immersive musical environment [4], immersive music interface [4], osc controller prefab [4], osc transmit manager [4]
Paper type
Full paper
DOI: 10.5281/zenodo.3249319
Zenodo URL: https://zenodo.org/record/3249319
Abstract
The use of real-time sound synthesis for sound effects can improve the sound design of interactive experiences such as video games. However, synthesized sound effects can be often perceived as synthetic, which hampers their adoption. This paper aims to determine whether or not sounds synthesized using filter-based modal synthesis are perceptually comparable to sounds directly recorded. Sounds from 4 different materials that showed clear modes were recorded and synthesized using filter-based modal synthesis. Modes are the individual sinusoidal frequencies at which objects vibrate when excited. A listening test was conducted where participants were asked to identify, in isolation, whether a sample was recorded or synthesized. Results show that recorded and synthesized samples are indistinguishable from each other. The study outcome proves that, for the analysed materials, filter-based modal synthesis is a suitable technique to synthesize hit sound in real-time without perceptual compromises.
Keywords
Game Audio, Modal Synthesis, Procedural Audio, Sound Design
Paper topics
Digital audio effects, Models for sound analysis and synthesis, Perception and cognition of sound and music, Sound and music for Augmented/Virtual Reality and games
Easychair keyphrases
filter based modal synthesis [14], pre recorded sample [12], real time [10], sound effect [10], modal synthesis [9], procedural audio [9], sound design [9], audio file [8], impact based sound [7], perceptual evaluation [6], sound synthesis [5], synthesized version [5], video game [5], audio engineering society [4], deterministic component [4], discrimination factor [4], enveloped white noise signal [4], filterbased modal synthesis [4], f measure value [4], game engine [4], interactive application [4], modal synthesizer [4], musical instrument [4], real time sound synthesis [4], stochastic component [4], synthesized sound effect [4], synthetic sound [4]
Paper type
Position paper / Poster
DOI: 10.5281/zenodo.3249246
Zenodo URL: https://zenodo.org/record/3249246
Abstract
In this work, we apply recent research results in loopback frequency modulation (FM) to real-time parametric synthesis of percussion sounds. Loopback FM is a variant of FM synthesis whereby the carrier oscillator "loops back" to serve as a modulator of its own frequency. Like FM, more spectral components emerge, but further, when the loopback coefficient is made time varying, frequency trajectories that resemble the nonlinearities heard in acoustic percussion instruments appear. Here, loopback FM is used to parametrically synthesize this effect in struck percussion instruments, known to exhibit frequency sweeps (among other nonlinear characteristics) due to modal coupling. While many percussion synthesis models incorporate such nonlinear effects while aiming for acoustic accuracy, computational efficiency is often sacrificed, prohibiting real-time use. This work seeks to develop a real-time percussion synthesis model that creates a variety of novel sounds and captures the sonic qualities of nonlinear percussion instruments. A linear, modal synthesis percussion model is modified to use loopback FM oscillators, which allows the model to create rich and abstract percussive hits in real-time. Musically intuitive parameters for the percussion model are emphasized resulting in a usable percussion sound synthesizer.
Keywords
feedback systems, frequency and phase modulation synthesis, modal synthesis, pitch glides, sound synthesis, time-varying allpass filters
Paper topics
Digital audio effects, Models for sound analysis and synthesis, Sound/music signal processing algorithms
Easychair keyphrases
pitch glide [27], modal frequency [25], loopback fm oscillator [20], sounding frequency [16], carrier frequency [13], time varying [12], modal synthesis [11], percussion instrument [11], time varying timbre [11], percussion synthesis [10], amplitude envelope [9], filtered noise burst [9], raised cosine envelope [9], tom tom [9], acoustic resonator [8], real time [8], circular plate [7], fm percussion synthesis [7], loopback fm percussion [7], raised cosine [7], raised cosine excitation [7], sound synthesis [7], acoustic resonator impulse response [6], feedback coefficient [6], high carrier frequency [6], impulse response [6], percussion sound [6], percussion synthesis method [6], frequency component [5], percussion model [5]
Paper type
Full paper
DOI: 10.5281/zenodo.3249382
Zenodo URL: https://zenodo.org/record/3249382
Abstract
Interacting with media: The TransTeamProject (T3P) works on developing interactive gloves technics - and other materials, with sound and/or visual samples. Piamenca continues the work developed in Transpiano with a specific emphasis on visual content such as transforming sound into lights, in this case together with a strong vernacular inspiration (Flamenco). T3P creative project is involved with art music together with techno-perspectives. After contextualizing the state of the art in the specific field of “body gesture technology”, the present file will explain how Piamenca relates to computers in a technical sense – methods and processes to produce media transformations (audio and vision) - and will comment their integration in terms of sound, music and audio-visual performance. It will finally demonstrate some ideas such as trans-music orientations with regard to enhancement theories in relation with the transhumanism movement.
Keywords
flamenco, glove-technology, interaction, performance, piano, sampling, trans-music
Paper topics
and software environments for sound and music computing, Humanities in Sound and Music Computing, Interaction in music performance, Interactive music recommendation, Languages, Multimodality in sound and music computing, Music creation and performance, protocols
Easychair keyphrases
musical instrument [4], sound spectrum [4]
Paper type
Position paper / Poster
DOI: 10.5281/zenodo.3249254
Zenodo URL: https://zenodo.org/record/3249254
Abstract
In this demonstration we present novel physical models controlled by the Sensel Morph interface.
Keywords
comtrol, physical models, selsel
Paper topics
not available
Easychair keyphrases
sympathetic string [8], bowed string [6], hammered dulcimer [6], sensel morph [6], hurdy gurdy [4], physical model [4], plucked string [4]
Paper type
Demo
DOI: 10.5281/zenodo.3249276
Zenodo URL: https://zenodo.org/record/3249276
Abstract
Score following matches musical performance audio with its symbolic score in an on-line fashion. Its applications are meaningful in music practice, performance, education, and composition. This paper focuses on following piano music --- one of the most challenging cases. Motivated by the time-changing features of a piano note during its lifetime, we propose a new method that models the evolution of a note in spectral space, aiming to provide an adaptive, hence better, data model. This new method is based on a switching Kalman filter in which a hidden layer of continuous variables tracks the energy of the various note harmonics. The result of this method could potentially benefit applications in de-soloing, sound synthesis and virtual scores. This paper also proposes a straightforward evaluation method. We conducted a preliminary experiment on a small dataset of 13 minutes of music, consisting of 15 excerpts of real piano recordings from eight pieces. The results show the promise of this new method.
Keywords
piano music, score following, switching Kalman filter
Paper topics
Automatic music generation/accompaniment systems, Automatic separation, classification of sound and music, Content processing of music audio signals, Interaction in music performance, Interactive performance systems, Music information retrieval, recognition, Sound/music signal processing algorithms
Easychair keyphrases
kalman filter [17], score following [16], switching kalman filter [9], filtered distribution [7], mvmt1 piano concerto [7], discriminating data model [6], frequency profile [6], independent kalman filter [6], data model [5], observed data [5], piano music [5], real time [5], score alignment [5], state graph [5], art system [4], continuous variable [4], evaluation method [4], frame wise accuracy [4], hidden markov model [4], kalman filter model [4], musical score [4], partial amplitude [4]
Paper type
Full paper
DOI: 10.5281/zenodo.3249398
Zenodo URL: https://zenodo.org/record/3249398
Abstract
Music is usually considered as a sequential process, where sounds, group of sounds and motifs are occurring chronologically, following the natural unfolding of time. At the same time, repetitions and similarities which develop between elements create multi-scale patterns which participate to the perceived structure of the passage and trigger expectation mechanisms and systems [Narmour 2000][Bimbot et al. 2016]. These can be represented as a Polytopic Graph of Latent Relations [Louboutin et al. 2017] where each node of the graph represents a low-scale musical segment and vertices correspond to their mutual relation within the expectation systems. The content of a musical segment can be manipulated by applying various permutations to the nodes of the graph, thus generating a reconfiguration of its musical content, with the same elements in a different order. Specific permutations, called Primer Preserving Permutations (PPP), are of particular interest, as they preserve systems of analogical implications between metrically homologous elements within the segment. In this paper, we describe the implementation of the polytopic reconfiguration process and we elaborate on the organizational properties of Primer Preserving Permutations as well as their potential impact on the inner structure of musical segments. Then, in order to assess the relevance of the reconfiguration scheme (and its underlying hypotheses) we report on a perceptual test where subjects are asked to rate musical properties of MIDI segments : some of them have been reconfigured with PPPs while others have been transformed by Randomly Generated Permutations (RGP). Results shows that PPP-transformed segments score distinctly better than RGP-transformed ones, indicating that the preservation of implication systems plays an important role in the subjective acceptability of the transformation. Additionnaly, we introduce an automatic method for decomposing segments into low-scale musical elements, taking into account possible phase-shifts between the musical surface and the metrical information (for instance, anacruses). We conclude on the potential of the approach for applications in interactive music composition.
Keywords
multiscale representation, music cognition, music structure, music transformation, perceptual tests, polytopic graph
Paper topics
Algorithms and Systems for music composition, Perception and cognition of sound and music
Easychair keyphrases
implication system [9], musical segment [9], time scale [8], elementary object [6], musical surface [6], phase shift [6], polytopic representation [6], degradation score [5], melodic line [5], musical object [5], perceptual test [5], account possible phase shift [4], analogical implication [4], compressibility criterion [4], inner structure [4], low scale [4], low scale musical element [4], musical consistency [4], parallel face [4], polytopic graph [4], randomly generated permutation [4], time shift [4]
Paper type
Position paper / Poster
DOI: 10.5281/zenodo.3249408
Zenodo URL: https://zenodo.org/record/3249408
Abstract
This paper presents a convolutional neural network (CNN) able to predict the perceived dissonance of piano chords. Ratings of dissonance for short audio excerpts were combined from two different datasets and groups of listeners. The CNN uses two branches in a directed acyclic graph (DAG). The first branch receives input from a pitch estimation algorithm, restructured into a pitch chroma. The second branch analyses interactions between close partials, known to affect our perception of dissonance and roughness. The analysis is pitch invariant in both branches, facilitated by convolution across log-frequency and octave-wide max-pooling. Ensemble learning was used to improve the accuracy of the predictions. The coefficient of determination (R2) between rating and predictions are close to 0.7 in a cross-validation test of the combined dataset. The system significantly outperforms recent computational models.
Keywords
CNN, Consonance, DAG network, Deep Layered Learning, Dissonance, Ensemble Learning, Music Information Retrieval, Pitch invariant, Roughness
Paper topics
Automatic separation, classification of sound and music, Content processing of music audio signals, Models for sound analysis and synthesis, Music information retrieval, Perception and cognition of sound and music, recognition, Sound/music signal processing algorithms
Easychair keyphrases
pitch chroma [26], test condition [12], better result [8], computational model [7], cross validation [7], audio file [6], music information retrieval [6], acoustical society [5], dense layer [5], ensemble learning [5], pitch class [5], test run [5], convolutional layer [4], cross fold validation [4], deep layered learning [4], ground truth test [4], just intonation ratio [4], max pooling filter [4], neural network [4], non stationary gabor frame [4], piano chord [4], truth test condition [4]
Paper type
Full paper
DOI: 10.5281/zenodo.3249465
Zenodo URL: https://zenodo.org/record/3249465
Abstract
RaveForce is a programming framework designed for a computational music generation method that involves audio sample level evaluation in symbolic music representation generation. It comprises a Python module and a SuperCollider quark. When connected with deep learning frameworks in Python, RaveForce can send the symbolic music representation generated by the neural network as Open Sound Control messages to the SuperCollider for non-real-time synthesis. SuperCollider can convert the symbolic representation into an audio file which will be sent back to the Python as the input of the neural network. With this iterative training, the neural network can be improved with deep reinforcement learning algorithms, taking the quantitative evaluation of the audio file as the reward. In this paper, we find that the proposed method can be used to search new synthesis parameters for a specific timbre of an electronic music note or loop.
Keywords
Deep Reinforcement Learning, Music Generation, SuperCollider
Paper topics
Automatic music generation/accompaniment systems, Models for sound analysis and synthesis
Easychair keyphrases
neural network [23], music generation [20], deep reinforcement learning [17], reinforcement learning [16], audio file [13], symbolic representation [13], observation space [11], symbolic music representation [11], computational music generation [9], deep learning [8], non real time synthesis [8], non real time [7], preprint arxiv [7], deep learning framework [6], live coding session [6], music generation task [6], non real time audio synthesis [6], open sound control message [6], raw audio generation [6], drum loop [5], raw audio [5], real time [5], action space [4], audio waveform [4], deep learning music generation [4], deep reinforcement learning environment [4], electronic music [4], kick drum [4], musical context [4], running time [4]
Paper type
Position paper / Poster
DOI: 10.5281/zenodo.3249325
Zenodo URL: https://zenodo.org/record/3249325
Abstract
This paper introduces a series of tools to program the Teensy development board series with the Faust programming language. faust2teensy is a command line application that can be used both to generate new objects for the Teensy Audio Library and standalone Teensy programs. We also demonstrate how faust2api can produce Digital Signal Processing engines (with potential polyphony support) for the Teensy. Details about the implementation and optimizations of these systems are provided and the results of various tests (i.e., computational, latency, etc.) are presented. Finally, future directions for this work are discussed through a discussion on bare-metal implementation of real-time audio signal processing applications.
Keywords
DSP, Faust, Micocontroller, Teensy
Paper topics
and software environments for sound and music computing, Hardware systems for sound and music computing, Languages, New interfaces for interactive music creation, protocols
Easychair keyphrases
teensy audio library [22], code listing [14], block size [13], faust program [13], teensy audio [11], teensy program [11], floating point [9], real time audio signal processing [9], teensy audio shield [9], polyphonic dsp engine [6], signal processing [6], audio shield [5], audio signal processing [5], sound synthesis [5], void loop [5], audio shield teensy audio shield [4], audio signal processing application [4], bare metal implementation [4], command line [4], digital signal processing [4], faust2api teensy [4], faust compiler [4], faust program implementing [4], monophonic dsp engine [4], processing power [4], realtime audio signal processing [4], sampling rate [4], sawtooth oscillator [4], standalone faust teensy program [4], void setup [4]
Paper type
Full paper
DOI: 10.5281/zenodo.3249282
Zenodo URL: https://zenodo.org/record/3249282
Abstract
In this paper, implementation, instrument design and control issues surrounding a modular physical modelling synthesis environment are described. The environment is constructed as a network of stiff strings and a resonant plate, accompanied by user-defined connections and excitation models. The bow, in particular, is a novel feature in this setting. The system as a whole is simulated using finite difference (FD) methods. The mathematical formulation of these models is presented, alongside several new instrument designs, together with a real-time implementation in JUCE using FD methods. Control is through the Sensel Morph.
Keywords
high-fidelity control, physical modelling, real-time
Paper topics
Interactive performance systems, Models for sound analysis and synthesis, Sonic interaction design
Easychair keyphrases
grid point [14], bowed string [12], stiff string [11], sympathetic string [11], physical model [10], sensel morph [10], real time [8], grid spacing [7], sound synthesis [7], finite difference [6], hurdy gurdy [6], next time [6], plucked string [6], computer music [5], cpu usage [5], hammered dulcimer [5], non linear [5], connection term [4], discretised distribution function [4], excitation function [4], mass ratio [4], melody string [4], modular physical modelling synthesis environment [4], system architecture [4]
Paper type
Full paper
DOI: 10.5281/zenodo.3249295
Zenodo URL: https://zenodo.org/record/3249295
Abstract
Dancing in beat to the music of one's favorite DJ leads oftentimes to a powerful and euphoric experience. In this study we investigate the effect of putting a dancer in control of music playback tempo based on a real-time estimation of body rhythm and tempo manipulation of audio. A prototype was developed and tested in collaboration with users, followed by a main study where the final prototype was evaluated. A questionnaire was provided to obtain ratings regarding subjective experience, and open-ended questions were posed in order to obtain further insights for future development. Our results imply the potential for enhanced engagement and enjoyment of the music when being able to manipulate the tempo, and document important design aspects for real-time tempo control.
Keywords
beat tracking, electronic dance music, embodiment, real-time interaction, rhythm
Paper topics
Interactive performance systems, Interfaces for sound and music, New interfaces for interactive music creation, Sonic interaction design
Easychair keyphrases
tempo manipulation [17], real time [10], second session [9], body movement [8], first session [6], hand wrist [5], tempo change [5], dance experience [4], data stream [4], electronic dance music [4], mean value rating [4], playback tempo [4], quality factor [4], slide value [4], standard playback [4]
Paper type
Position paper / Poster
DOI: 10.5281/zenodo.3249343
Zenodo URL: https://zenodo.org/record/3249343
Abstract
This paper studies deep neural networks for modeling of audio distortion circuits. The selected approach is black-box modeling, which estimates model parameters based on the measured input and output signals of the device. Three common audio distortion pedals having a different circuit configuration and their own distinctive sonic character have been chosen for this study: the Ibanez Tube Screamer, the Boss DS-1, and the Electro-Harmonix Big Muff Pi. A feedforward deep neural network, which is a variant of the WaveNet architecture, is proposed for modeling these devices. The size of the receptive field of the neural network is selected based on the measured impulse-response length of the circuits. A real-time implementation of the deep neural network is presented, and it is shown that the trained models can be run in real time on a modern desktop computer. Furthermore, it is shown that approximately three minutes of audio is a sufficient amount of data for training the models. The deep neural network studied in this work is useful for real-time virtual analog modeling of nonlinear audio circuits.
Keywords
Audio systems, Feedforward neural networks, Music, Nonlinear systems, Supervised learning
Paper topics
Digital audio effects, Sound/music signal processing algorithms
Easychair keyphrases
neural network [25], real time [25], convolutional layer [20], big muff [16], deep neural network [15], distortion effect [13], processing speed [13], ibanez tube screamer [12], training data [12], convolution channel [11], digital audio effect [11], receptive field [10], activation function [9], clipping amp [9], gated activation [9], harmonix big muff pi [8], layer model [8], non linear bp filter [8], signal ratio [8], tube screamer [8], audio interface [7], big muff pi [7], black box modeling [7], impulse response [7], audio distortion circuit [6], computational load [6], nonlinear activation function [6], selected model [6], tone stage [6], validation loss [6]
Paper type
Full paper
DOI: 10.5281/zenodo.3249374
Zenodo URL: https://zenodo.org/record/3249374
Abstract
In this work we examine a simple mass spring system in which the natural frequency is modulated by its own oscillations, a self-coupling that creates a feedback system in which the output signal ``loops back'' with an applied coefficient to modulate the frequency. This system is first represented as a mass-spring system, then in the context of well-known frequency and phase modulation synthesis, and finally, as a time-varying stretched allpass filter, where both allpass coefficients and filter order are made time varying, the latter to allow for changes to sounding frequency other time (e.g. pitch glides). Expressions are provided that map parameters of one representation to another, allowing for either to be used for real-time synthesis.
Keywords
feedback systems, frequency and phase modulation synthesis, nonlinear modal coupling, pitch glides, time-varying allpass filters
Paper topics
Digital audio effects, Models for sound analysis and synthesis, Sound/music signal processing algorithms
Easychair keyphrases
time varying [16], sounding frequency [13], made time varying [12], self coupled oscillator [11], instantaneous phase [9], instantaneous frequency [8], real part [8], closed form representation [7], loopback fm parameter [7], loopback fm oscillator [6], mass spring system [6], frequency modulation [5], numerical integration [5], discrete time [4], final expression [4], time varying frequency [4], transfer function [4], unit sample delay [4]
Paper type
Position paper / Poster
DOI: 10.5281/zenodo.3249423
Zenodo URL: https://zenodo.org/record/3249423
Abstract
This is a system prototype for joint vocal improvisation between two people that involves sharing embodied sensations of vocal production. This is accomplished by using actuators that excite two participants' rib cages with each other's voices, turning a person's body into a loud speaker. A microphone transmits vocal signals and the players are given a Max Patch to modulate the sound and feel of their voice. The receiver hears the other person's speech and effects through their own body (as if it were their own voice), while also feeling the resonance of the sound signal as it would resonate in the chest cavity of the other. The two players try to re-enact and improvise a script prompt provided to them while not knowing what the other person can hear, of their voice. The game may or may not turn collaborative, adversarial, or artistic depending on the game play.
Keywords
actuator, sound exciter, system prototype, vocal improvisation
Paper topics
not available
Easychair keyphrases
social embodiment [4]
Paper type
Demo
DOI: 10.5281/zenodo.3249443
Zenodo URL: https://zenodo.org/record/3249443
Abstract
This paper presents SonaGraph, a framework and an application for a simplified but efficient harmonic spectrum analyzer suitable for assisted and algorithmic composition. The model is inspired by the analog Sonagraph and relies on a constant-Q bandpass filter bank. First, the historical Sonagraph is introduced, then, starting from it, a simplified (“cartoonified”) model is discussed. An implementation in SuperCollider is presented that includes various utilities (interactive GUIs, music notation generation, graphic export, data communication). A comparison of results in relation to other tools for assisted composition is presented. Finally, some musical examples are discussed, that make use of spectral data from SonaGraph to generate, retrieve and display music information.
Keywords
Assisted composition, Music notation, Spectral information
Paper topics
Algorithms and Systems for music composition, Interfaces for sound and music, Models for sound analysis and synthesis, Music information retrieval
Easychair keyphrases
sound object level [12], spectral data [12], music notation [11], filter bank [10], real time [10], music notation transcription [6], audio level [5], computer music [5], interactive gui [5], sample rate [5], spectral information [5], time resolution [5], amplitude threshold [4], assisted composition [4], constant q bandpass filter [4], gathered data [4], lilypond code [4], music information retrieval [4], spectral analysis [4]
Paper type
Position paper / Poster
DOI: 10.5281/zenodo.3249425
Zenodo URL: https://zenodo.org/record/3249425
Abstract
Today, robots are increasingly becoming an integral part of our everyday life. Expectations humans have about robots are influenced by how they are represented in science fiction films. The process of designing sonic interaction for robots is similar to how a Foley artist designs sound effect of a film. In this paper, we present an exploratory study focusing on sonic characteristics of robot sounds in films. We believe that findings from the current study could be of relevance for future robotic applications involving the communication of internal states through sounds, as well for sonification of expressive robot movements. Excerpts from five films were analyzed using Long Time Average Spectrum (LTAS). As an overall observation, we found that robot sonic presence is highly related to its physical appearance. Preliminary results show that most of the robots analysed in this study have a ``metallic'' quality in their voice, matching the material of their physical form. Characteristics of their voice show significant differences compared to that of human characters; fundamental frequency of robots is either shifted to higher or lower values compared to that of human characters, and their voice spans over a larger frequency band.
Keywords
film sound design, human-robot interaction, LTAS, non-verbal communication, robot sound, sonic interaction design
Paper topics
Multimodality in sound and music computing, Social interaction in sound and music computing, Sonic interaction design
Easychair keyphrases
sound design [11], andrew martin [7], human robot interaction [7], robot sound [7], sonao project [7], robot movement [6], bicentennial man [5], frequency band [5], non verbal [5], physical appearance [5], short circuit [5], bremen emotional sound toolkit [4], emotional expression [4], fictional robot [4], fundamental frequency [4], kth royal institute [4], main human character [4], mechanical sound [4], music computing kth [4], non verbal communication [4], non verbal sound [4], real world robot [4], robot andrew [4], robot sonic presence [4], video excerpt [4]
Paper type
Position paper / Poster
DOI: 10.5281/zenodo.3249337
Zenodo URL: https://zenodo.org/record/3249337
Abstract
Food and music are fundamental elements of most lives. Both eating and listening can modify our emotional and cognitive states, and when paired, can result in surprising perceptual effects. This demo explores the link between the two phenomena of music and food, specifically the way in which what we taste can be influenced by what we listen to. We demonstrate how the same beverage can taste very differently depending on the music that happens to be playing at the same time. To do this, we have created a system that turns the act of drinking into a form of embodied interaction with music. This highlights the multisensory character of flavour perception and underscore the way in which sound can be used to raise people’s awareness of their own eating behaviour.
Keywords
Interactive systems, Multisensory flavour perception, Music, Sonic seasoning
Paper topics
not available
Easychair keyphrases
crossmodal correspondence [6], aarhus university [4]
Paper type
Demo
DOI: 10.5281/zenodo.3249364
Zenodo URL: https://zenodo.org/record/3249364
Abstract
In collaboration with Volvo Cars, we presented a novel design tool to a large public of approximately three million people at the three leading motor shows in 2017 in Geneva, Shanghai and New York. The purpose of the tool was to explore the relevance of interactive audio-visual strategies for supporting the development of sound environments in future silent cars, i.e., a customised sonic identity that would alter the sonic ambience for the driver and by-passers. This new tool should be able to efficiently collect non-experts' sonic preferences for different given contexts. The design process should allow for a high-level control of complex synthesised sounds. The audience interacted individually using a single-touch selection of colour from five palettes and applying it by pointing to areas in a colour-book painting showing a road scene. Each palette corresponded to a sound, and the colour nuance in the palette corresponded to certain tweaking of the sound. In effect, the user selected and altered each sound, added it to the composition, and finally would hear a mix of layered sounds based on the colouring of the scene. The installation involved large touch screens with high quality headphones. In the study presented here, we examine differences in sound preferences between two audiences and a control group, and evaluate the feasibility of the tool based on the sound designs that emerged.
Keywords
Car sounds, Interaction, Novel interfaces, Sound design, Sound installation
Paper topics
Interactive performance systems, Interfaces for sound and music, Multimodality in sound and music computing, New interfaces for interactive music creation, Sonic interaction design
Easychair keyphrases
sound design [15], control group [13], school bell sound [9], motor sound [8], colour nuance [6], rolling sound [6], shanghai audience [6], colour book [5], colour palette [5], data collection [5], school scene [5], sound effect [5], audio effect [4], bell harmonic rolling [4], city centre [4], geneva audience [4], harmonic sound [4], musical expression [4], school area [4], school bell [4], volvo car [4]
Paper type
Full paper
DOI: 10.5281/zenodo.3249284
Zenodo URL: https://zenodo.org/record/3249284
Abstract
Systems of coupled-oscillators can be employed in a variety of algorithmic settings to explore the self-organizing dynamics of synchronization. In the realm of audio-visual generation, coupled oscillator networks can be usefully applied to musical content related to rhythmic perception, sound synthesis, and interaction design. By formulating different models of these generative dynamical systems, I outline different methodologies from which to generate sound from collections of interacting oscillators and discuss how their rich, non-linear dynamics can be exploited in the context of sound-based art. A summary of these mathematical models are discussed and a range of applications (audio synthesis, rhythmic generation, and music perception) are proposed in which they may be useful in producing and analyzing sound. I discuss these models in relationship to two of my own kinetic sound sculptures to analyze to what extent they can be used to characterize synchrony as an analytical tool.
Keywords
generative music, sonification, sound art, sound sculpture, synchrony
Paper topics
Algorithms and Systems for music composition, and software environments for sound and music computing, Auditory display and data sonification, Hardware systems for sound and music computing, Interaction in music performance, Interactive performance systems, Interfaces for sound and music, Languages, Models for sound analysis and synthesis, Music creation and performance, New interfaces for interactive music creation, Perception and cognition of sound and music, protocols, Sonic interaction design, Sound/music signal processing algorithms
Easychair keyphrases
coupled oscillator [33], instantaneous phase [11], coupled oscillator model [9], coupled oscillator network [9], intrinsic frequency [8], center frequency [6], complex order parameter [6], dynamical system [6], external forcing [5], audio visual resonance [4], coupled oscillator dynamic [4], coupled oscillator system [4], oscillator phase [4], phase coherence [4], phase response function [4], phase vocoder model [4], pushing motion [4], rhythmic generation [4], signal processing [4]
Paper type
Position paper / Poster
DOI: 10.5281/zenodo.3249427
Zenodo URL: https://zenodo.org/record/3249427
Abstract
Artistic installations using brain-computer interfaces (BCI) to interact with media in general, and sound in specific, have become increasingly numerous in the last years. Brain or mental states are commonly used to drive musical score or sound generation as well as visuals. Closed loop setups can emerge here which are comparable to the propositions of neurofeedback (NFB). The aim of our audiovisual installation State Dependency, driven by brain states and motor imagery, was to enable the participant to engage in unbound exploration of movement through sound and space unmediated by one's corpo-reality. With the aid of an adaptive feedback loop, perception is taken to the edge. We deployed a BCI to collect motor imagery, visual and cognitive neural activity to calculate approximate entropy (a second order measure of neural signal activity) which was in turn used to interact with the surround Immersive Lab installation. The use of entropy measures on motor imagery and various sensory modalities generates a highly accessible, reactive and immediate experience transcending common limitations of the BCI technology. State dependency goes beyond common practice of abstract routing between mental or brain with external audiovisual states. It provides new territory of unrestrained kinaesthetic and polymodal exploration in an immersive audiovisual environment.
Keywords
audio visual interaction, biofeedback, brain computer interface, motor imagery
Paper topics
Auditory display and data sonification, Multimodality in sound and music computing, Perception and cognition of sound and music, Sound/music and the neurosciences
Easychair keyphrases
motor imagery [16], neural activity [11], approximate entropy [8], entropy measure [8], immersive lab [8], state dependency [8], movement control [7], real time [7], audio visual [6], audio visual medium [6], brain state [5], eeg signal [5], mental state [5], visual cortex [5], adaptive feedback loop [4], bci art [4], closed loop setup [4], computer music [4], feedback loop [4], left primary motor cortex [4], motor cortex [4], motor imagery data [4], movement perception [4], primary visual cortex [4], right primary motor cortex [4], signal quality [4], swiss national science foundation [4], wet electrode [4]
Paper type
Position paper / Poster
DOI: 10.5281/zenodo.3249244
Zenodo URL: https://zenodo.org/record/3249244
Abstract
This paper investigates how to design an embodied learning experience of a drumming teacher playing hand drums, to aid higher rhythm understanding and accuracy. By providing novices the first-person perspective of a drumming teacher while learning to play a West-African djembe drum, participants' learning was measured objectively by their ability to follow the drumming teacher’s rhythms. Participants subjective learning was assessed through a self assessment questionnaire measuring aspects of flow, user-experience, oneness, and presence. Two test iterations were conducted. In both there was found no significance difference in participants ability to follow the drumming teacher' s tempo for the experimental group exposed to the first-person perspective of the teacher in a VR drum lesson, versus the control group exposed to a 2D version of the stereoscopic drum lesson. There was found a significance difference in the experimental group' s presence scores in the first test iteration, and a significant difference in experimental group' s oneness scores in the second test iteration. Participants' subjective feelings indicated enjoyment and motivation to the presented learning technique in both groups.
Keywords
drumming, embodiment, pedagogy, virtual reality
Paper topics
Interaction in music performance, Sonic interaction design
Easychair keyphrases
drum lesson [17], first test iteration [17], control group [15], drumming teacher [15], test stimulus [13], test group [10], test iteration [10], first person perspective [9], hand drum [8], d drum lesson [7], virtual reality [7], vr drum lesson [7], independent t test [6], rhythm pattern [6], second test iteration [6], teaching material [6], trial phase [6], user experience [6], drumming lesson [5], drumming recording [5], significant difference [5], djembe drum [4], embodied first person perspective [4], fast tempo difference score [4], mean value [4], participant rhythm performance [4], playing teacher [4], rhythm accuracy [4], self assessment questionnaire [4], significance difference [4]
Paper type
Position paper / Poster
DOI: 10.5281/zenodo.3249341
Zenodo URL: https://zenodo.org/record/3249341
Abstract
We present a method for tempo estimation from audio recordings based on signal processing and peak tracking, and not depending on training on ground-truth data. First an accentuation curve, emphasising the temporal location and accentuation of notes, is based on a detection of bursts of energy localised in time and frequency. This enables to detect notes in dense polyphonic texture, while ignoring spectral fluctuation produced by vibrato and tremolo. Periodicities in the accentuation curve are detected using an improved version of autocorrelation function. Hierarchical metrical structures, composed of a large set of periodicities in pairwise harmonic relationships, are tracked over time. In this way, the metrical structure can be tracked even if the rhythmical emphasis switches from one metrical level to another. This approach, compared to all the other participants to the MIREX Audio Tempo Extraction from 2006 to 2018, is the third best one among those that can track tempo variations. While the two best methods are based on machine learning, our method suggests a way to track tempo founded on signal processing and heuristics-based peak tracking. Besides, the approach offers for the first time a detailed representation of the dynamic evolution of the metrical structure. The method is integrated into MIRtoolbox, a Matlab toolbox freely available.
Keywords
autocorrelation, metrical analysis, tempo
Paper topics
Computational musicology and ethnomusicology, Content processing of music audio signals, Music information retrieval, Sound/music signal processing algorithms
Easychair keyphrases
metrical level [50], metrical structure [33], metrical layer [28], metrical grid [24], metrical period [16], accentuation curve [10], autocorrelation function [10], music information retrieval [7], periodicity score [7], dvorak new world symphony [6], contextual background [5], global tempo [5], metrical centroid [5], peak lag [5], tempo estimation [5], allegro con fuoco [4], autocorrelation based periodogram [4], core metrical level [4], deep learning [4], dotted quarter note [4], dynamic evolution [4], dynamic metrical centroid [4], dynamic metrical centroid curve [4], large range [4], main metrical level [4], metrical analysis [4], mirex audio tempo extraction [4], strongest periodicity [4], successive frame [4], whole note [4]
Paper type
Full paper
DOI: 10.5281/zenodo.3249305
Zenodo URL: https://zenodo.org/record/3249305
Abstract
The Chordinator is an interactive and educational music device consisting of a physical board housing a “chord stacking” grid. There is an 8x4 grid on the board which steps through each of the eight columns from left to right at a specified tempo, playing the chords you have built in each column. To build a chord, you place blocks on the board which represent major or minor thirds above blocks that designate a root (or bass) note represented as a scale degree. In the bottom row, the user specifies a bass (root) note, and any third blocks placed above it will add that interval above the bass note. Any third blocks placed above other third blocks add an additional interval above the prior one, creating a chord. There are three rows above each root allowing either triads or seventh chords to be built. This interface combined with the board design is intended to create a simple representation of chord structure. Using the blocks, the user can physically “build” a chord using the most fundamental skills, in this case “stacking your thirds.” One also learns which chords work the best in a sequence. It provides quick satisfaction and a fun, interactive way to learn about the structure of chords and can even spark creativity as people build interesting progressions or try to recreate progressions they love from their favorite music.
Keywords
Arduino, Chords, Chord Sequencer, Education, Interactive, Learning, Stacking Thirds
Paper topics
not available
Easychair keyphrases
third block [7], chord progression [4]
Paper type
Demo
DOI: 10.5281/zenodo.3249360
Zenodo URL: https://zenodo.org/record/3249360
Abstract
This paper describes the Viking HRTF dataset, a collection of head-related transfer functions (HRTFs) measured at the University of Iceland. The dataset includes full-sphere HRTFs measured on a dense spatial grid (1513 positions) with a KEMAR mannequin with 20 different artificial left pinnae attached, one at a time. The artificial pinnae were previously obtained through a custom molding procedure from 20 different lifelike human heads. The analyses of results reported here suggest that the collected acoustical measurements are robust, reproducible, and faithful to reference KEMAR HRTFs, and that material hardness has a negligible impact on the measurements compared to pinna shape. The purpose of the present collection, which is available for free download, is to provide accurate input data for future investigations on the relation between HRTFs and anthropometric data through machine learning techniques or other state-of-the-art methodologies.
Keywords
binaural, HRTF, KEMAR, spatial sound
Paper topics
and virtual acoustics, reverberation, Spatial sound
Easychair keyphrases
head related transfer function [14], related transfer function [10], negative mold [7], right channel [7], left channel [6], mean spectral distortion [6], pinna shape [6], standard large anthropometric pinna [6], audio eng [5], kemar mannequin [5], left pinna [5], measurement session [5], custom made pinna [4], dummy head [4], ear canal [4], impulse response [4], jesmonite r ear [4], kemar pinna replica [4], lifelike human head [4], pinna related transfer function [4], related transfer [4], shore oo hardness [4], signal process [4], starting point [4], viking hrtf dataset [4], virtual sound source [4]
Paper type
Position paper / Poster
DOI: 10.5281/zenodo.3249252
Zenodo URL: https://zenodo.org/record/3249252
Abstract
The tuning of a piano is a complicated and time-consuming process, which is usually left for a professional tuner. To make the process faster and non-dependent on the skills of a professional tuner, a semi-automatic piano tuning system is developed. The aim of the system is to tune a grand piano semi-automatically with the help of a non-professional tuner. The system composes of an aluminum frame, a stepper motor, Arduino, microphone, and laptop computer. The stepper motor changes the tuning of the piano strings by turning pins connected to them whereas the aluminum frame holds the motor in place and the Arduino controls the motor. The microphone and the computer are used as a part of a closed loop control system, which is used to tune the strings automatically. The control system tunes the strings by minimising the difference between the current and optimal fundamental frequency. The current fundamental frequency is obtained with an inharmonicity coefficient estimation algorithm and the optimal fundamental frequency is calculated with the Connected Reference Interval (CRI) tuning process. With the CRI tuning process, a tuning close to that of a professional tuner is achieved with a deviation of 2.5 cents (RMS) between the keys A0 and G5 and 8.1 cents (RMS) between G#5 and C8 where the tuners tuning seems to be less consistent.
Keywords
acoustic signal processing, audio systems, automatic control, music, spectral analysis
Paper topics
Hardware systems for sound and music computing, Models for sound analysis and synthesis, Sound/music signal processing algorithms
Easychair keyphrases
fundamental frequency [37], partial frequency [14], professional tuner [14], closed loop control system [12], inharmonicity coefficient [12], stepper motor [12], beating rate [11], cri tuning process [11], inharmonicity coefficient estimation [9], piano string [8], coefficient estimation algorithm [7], target fundamental frequency [7], tuning process [7], aluminum frame [6], control system [6], piano tuner [6], piano tuning [6], piano tuning system [6], reference octave [6], cri process [5], lower tone [5], mat algorithm [5], measured output [5], mode frequency [5], target frequency [5], first matching partial [4], optimal fundamental frequency [4], piano tuning robot [4], tone equal temperament scale [4], yamaha grand piano [4]
Paper type
Full paper
DOI: 10.5281/zenodo.3249293
Zenodo URL: https://zenodo.org/record/3249293
Abstract
In this paper we introduce a hardware platform to pro-totype interfaces of demanding sonic interactive sys-tems. We target applications featuring a large array of analog sensors requiring data acquisition and transmis-sion to computers at fast rates, with low latency, and high bandwidth. This work is part of an ongoing pro-ject which aims to provide designers with a cost effec-tive and accessible platform for fast prototyping of complex interfaces for sonic interactive systems or mu-sical instruments. The high performances are guaran-teed by a SoC FPGA. The functionality of the platform can be customized without requiring significant tech-nical expertise. In this paper, we discuss the principles, the current design, and the preliminary evaluation against common microcontroller-based platforms. The proposed platform can sample up to 96 analog channels at rates up to 24 kHz and stream the data via UDP to computers with a sub millisecond latency.
Keywords
Hardware Platform, Musical Interface, Sonic Interaction
Paper topics
Hardware systems for sound and music computing, Interfaces for sound and music, New interfaces for interactive music creation, Sonic interaction design
Easychair keyphrases
sampling rate [16], sonic interactive system [14], analog signal [13], data acquisition [11], acquisition board [10], microcontroller based platform [9], sound synthesis [9], simultaneous sampling [8], fpga pin [7], maximum rate [7], maximum sampling rate [7], arm cortex [6], board computer [6], data acquisition system [6], pure data [6], udp packet [6], buffer size [5], musical instrument [5], serial interface [5], sonic interactive [5], bit arm cortex [4], filter bank [4], fpga based platform [4], fpga fabric [4], maximum data acquisition rate [4], measured data transmission [4], microcontroller based board [4], pressure sensitive touchpad [4], sensor data [4], sonic interaction [4]
Paper type
Full paper
DOI: 10.5281/zenodo.3249278
Zenodo URL: https://zenodo.org/record/3249278
Abstract
In this paper, we build upon a recently proposed deep convolutional neural network architecture for automatic chord recognition (ACR). We focus on extending the commonly used major/minor vocabulary (24 classes) to an extended chord vocabulary of seven chord types with a total of 84 classes. In our experiments, we compare joint and separate classification of the chord type and chord root pitch class using one or two separate models, respectively. We perform a large-scale evaluation using various combinations of training and test sets of different timbre complexity. Our results show that ACR with an extended chord vocabulary achieves high f-scores of 0.97 for isolated chord recordings and 0.66 for mixed contemporary popular music recordings. While the joint ACR modeling leads to the best results for isolated instrument recordings, the separate modeling strategy performs best for complex music recordings. Alongside with this paper, we publish a novel dataset for extended-vocabulary chord recognition which consists of synthetically generated isolated recordings of various musical instruments.
Keywords
automatic chord recognition, deep convolutional neural network, harmony analysis
Paper topics
Automatic separation, classification of sound and music, Models for sound analysis and synthesis, Music information retrieval, recognition, Sound/music signal processing algorithms
Easychair keyphrases
chord type [28], chord recognition [20], isolated chord recording [15], music information retrieval [14], root pitch class [14], extended vocabulary acr [12], chord root pitch [11], chord voicing [10], seventh chord [10], acr model [8], extended vocabulary [8], music recording [8], neural network [8], automatic chord recognition [7], chord tone [7], th international society [7], acoustic modeling [6], chord label [6], chord vocabulary [6], isolated instrument recording [6], midi file [6], minor chord [6], minor chord vocabulary [6], modeling strategy [6], real life acr application [6], novel dataset [5], training set [5], chord recognition dataset [4], final dense layer [4], high f score [4]
Paper type
Full paper
DOI: 10.5281/zenodo.3249472
Zenodo URL: https://zenodo.org/record/3249472
Abstract
This paper aims to give a basic overview about the URALi (Unity Real-time Audio Library) project, that is currently under development. URALi is a library that aims to provide a collection of software tools to realize real-time sound synthesis in applications and softwares developed with Unity.
Keywords
not available
Paper topics
not available
Easychair keyphrases
not available
Paper type
Demo
DOI: 10.5281/zenodo.3249266
Zenodo URL: https://zenodo.org/record/3249266
Abstract
The paper presents the interactive dance project VIBRA, based on two workshops taking place in 2018. The paper presents the technical solutions applied and discusses artistic and expressive experiences. Central to the discussion is how the technical equipment, implementation and mappings to different media has affected the expressive and experiential reactions of the dancers.
Keywords
computer visuals, Interactive dance, motion sensors, spatial sound
Paper topics
Improvisation in music through interactivity, Interaction in music performance, Interactive performance systems, Interfaces for sound and music, Sonic interaction design
Easychair keyphrases
interactive dance [17], computer visual [11], myo armband [10], sensor data [10], interactive instrument [7], ngimu sensor [7], third author [7], body part [6], causal relationship [6], technical setup [6], dancer movement [5], musical expression [5], myo mapper [5], data communication [4], first author [4], myo sensor [4], project participant [4]
Paper type
Position paper / Poster
DOI: 10.5281/zenodo.3249248
Zenodo URL: https://zenodo.org/record/3249248
Abstract
This project investigates the potentials of Head-Mounted-Display (HMD) based Virtual Reality (VR) that incorporates musical elements as a tool to perform exposure therapy. This is designed to help adolescents diagnosed with Autism Spectrum Disorder (ASD) to deal with their social anxiety. An application was built that combines the possibility of singing in VR while a virtual audience provides feedback. The application was tested with four adolescents diagnosed with ASD from a school for children with special needs in Denmark. The results of the evaluation are presented in this paper.
Keywords
Autism Spectrum Disorder, Music, Performance Anxiety, Performing, Singing, Social Anxiety, Virtual Audience, Virtual Reality
Paper topics
Interaction in music performance, Sound and music for accessibility and special needs, Sound and music for Augmented/Virtual Reality and games
Easychair keyphrases
virtual audience [27], social anxiety [26], simplified version [21], autism spectrum disorder [17], exposure therapy [15], virtual reality [14], liebowitz social anxiety scale [10], none none none [9], virtual environment [9], vr music intervention [9], likert scale [6], smiley face likert [6], smiley likert scale [6], concert hall [5], described situation [5], future iteration [5], voice command [5], developmental disorder [4], face likert scale [4], feared outcome [4], head mounted display [4], immersive tendency questionnaire [4], multisensory experience lab aalborg [4], presence questionnaire [4], scale ranging [4], virtual concert hall [4], vr exposure therapy [4]
Paper type
Position paper / Poster
DOI: 10.5281/zenodo.3249339
Zenodo URL: https://zenodo.org/record/3249339
Abstract
Music Genres serve as an important meta-data in the field of music information retrieval and have been widely used for music classification and analysis tasks. Visualizing these music genres can thus be helpful for music exploration, archival and recommendation. Probabilistic topic models have been very successful in modelling text documents. In this work, we visualize music genres using a probabilistic topic model. Unlike text documents, audio is continuous and needs to be sliced into smaller segments. We use simple MFCC features of these segments as musical words. We apply the topic model on the corpus and subsequently use the genre annotations of the data to interpret and visualize the latent space.
Keywords
Music Genre Visualization, Probabilistic Music Genres, Probabilistic Topic Models
Paper topics
not available
Easychair keyphrases
topic model [19], music genre [13], probabilistic topic model [9], cluster mean [7], document topic proportion [7], text document [6], latent space [4], progressive genre visualization [4], term topic proportion [4], topic proportion [4]
Paper type
Demo
DOI: 10.5281/zenodo.3249352
Zenodo URL: https://zenodo.org/record/3249352
Abstract
In this work, we propose the novel task of automatically estimating pitch (fundamental frequency) from video frames of violin playing using vision alone. In order to investigate this task, we curate a novel dataset of violin playing, which we plan to release publicly to the academic community. To solve this task, we propose a novel Convolutional Neural Network (CNN) architecture that is trained using a student- teacher strategy to transfer discriminative knowledge from the audio domain to the visual domain. At test time, our framework takes video frames as input and directly regresses the pitch. We train and test this architecture on different subsets of our new dataset. Impressively, we show that this task (i.e. pitch prediction from vision) is actually possible. Furthermore, we verify that the network has indeed learnt to focus on salient parts of the image, e.g. the left hand of the violin player is used as a visual cue to estimate pitch.
Keywords
Audio-visual, Multi-modality, Visual pitch estimation
Paper topics
Multimodality in sound and music computing
Easychair keyphrases
video frame [14], visual information [14], pitch network [10], student network [9], convolutional layer [8], pseudo ground truth pitch [8], teacher network [8], violin playing [6], midi number [5], rpa tol [5], silent video [5], test time [5], audio visual [4], ground truth pitch [4], modal audio visual generation [4], multiple input frame [4], pitch frame [4], predict pitch [4], raw pitch accuracy [4], regress pitch [4], test set [4], truth pitch information [4], urmp dataset [4], visual cue [4], visual music transcription [4], visual pitch estimation [4]
Paper type
Position paper / Poster
DOI: 10.5281/zenodo.3249433
Zenodo URL: https://zenodo.org/record/3249433
Abstract
We present VocalistMirror, an interactive user interface that enables a singer to avoid their undesirable facial expressions in singing video recordings. Since singers usually focus on singing expressions and do not care about facial expressions, they sometimes notice that some of their own facial expressions are undesirable when watching recorded singing videos. VocalistMirror allows a singer to first specify their undesirable facial expressions in a recorded video, and then sing again while seeing a real-time warning that is shown when the facial expression of the singer becomes similar to one of the specified undesirable expressions. It also displays Karaoke-style lyrics with piano-roll melody and visualizes acoustic features of singing voices. iOS ARKit framework is used to quantify the facial expression as a 52-dimensional vector, which is then used to compute the distance from undesirable expressions. Our experimental results showed the potential of the proposed interface.
Keywords
facial expression, singer support interface, singing video
Paper topics
Interfaces for sound and music, Multimodality in sound and music computing
Easychair keyphrases
facial expression [68], undesirable facial expression [38], singing voice [18], serious music background [15], real time [14], short singing video clip [10], video clip [10], acoustic feature interface [9], singing video clip [9], acoustic feature [7], facial expression interface [7], fundamental frequency [7], dimensional facial vector [6], karaoke style lyric [6], l1 norm distance [6], selected undesirable facial expression [6], singing pitch [6], interface design [5], music computing [5], singing video [5], truedepth camera [5], video recording [5], expression overall impression [4], exterior design feature [4], piano roll melody [4], real time vocal part arrangement [4], rwc music database [4], similar facial expression [4], singer facial expression [4], singing app [4]
Paper type
Full paper
DOI: 10.5281/zenodo.3249451
Zenodo URL: https://zenodo.org/record/3249451
Abstract
This paper presents VUSAA, an augmented reality sound- walking application for Apple iOS Devices. The application is based on the idea of Urban Sonic Acupuncture, providing site-aware generative audio content aligned with the present sonic environment. The sound-generating algorithm was implemented in Kronos, a declarative programming lan- guage for musical signal processing. We discuss the con- ceptual framework and implementation of the application, along with the practical considerations of deploying it via a commercial platform. We present results from a number of soundwalks so far organized and outline an approach to develop new models for urban dwelling.
Keywords
augmented reality, generative composition, mobile application
Paper topics
Automatic music generation/accompaniment systems, Sonic interaction design, Sound and music for Augmented/Virtual Reality and games
Easychair keyphrases
urban sonic acupuncture [14], aural weather [8], sonic acupuncture [7], app store [6], augmented reality [6], augmented reality soundwalking [6], ios app store [6], sonic content [6], user interface [5], app store review [4], conceptual framework [4], mobile device [4], public space [4], urban acupuncture [4], urban sonic acupuncture strategy [4]
Paper type
Position paper / Poster
DOI: 10.5281/zenodo.3249416
Zenodo URL: https://zenodo.org/record/3249416