Dates: from July 30 to August 01, 2015
Place: Maynooth, Ireland
Proceedings info: Proceedings of the 12th Int. Conference on Sound and Music Computing (SMC-15), Maynooth, Ireland, July 30, 31 & August 1, 2015, ISBN 9-7809-92746629
Abstract
We present a computational model of tonality cognition derived from physical and cognitive principles on the frequency ratios of consonant intervals. The proposed model, which we call the Prime Factor-based Generalized Tonnetz (PFG Tonnetz), is based on the Prime Factor Representation of frequency ratios and can be regarded as a generalization of the Tonnetz. Our assumed application of the PFG Tonnetz is a system for supporting spontaneous and improvisational participation of inexpert citizens in music performance for regional promotion. For this application, the system needs to determine the pitch satisfying constraints on tonality against surrounding polyphonic music because inexpert users frequently lack music skills related to tonality. We also explore a working hypothesis on the robustness of the PFG Tonnetz against recognition errors on harmonic overtones in polyphonic audio signals. On the basis of this hypothesis, the PFG Tonnetz has a good potential as a representation of the tonality constraints of surrounding polyphonic music.
Keywords
ice-breaker activity, PFG Tonnetz, pitch contour, prime factor representation, tonality
Paper topics
Computational musicology and Mathematical Music Theory, Models for sound analysis and synthesis, Perception and cognition of sound and music, Social interaction in sound and music computing
Easychair keyphrases
frequency ratio [21], integer grid point [15], consonant interval [13], prime factor representation [12], pitch contour [11], harmonic overtone [9], limit pfg tonnetz [9], polyphonic audio signal [9], body motion [8], factor based generalized tonnetz [8], prime number [8], tonality constraint [8], cognitive principle [7], computational model [7], polyphonic music [7], recognition error [7], regional promotion [7], tonality cognition [7], grid point chord [6], limit just intonation [6], music performance [6], pitch frequency [6], tonality model [6], grid point [5], improvisational participation [5], inexpert user [5], minor chord [5], integer frequency ratio [4], pfg tonnetz space [4], pitch satisfying constraint [4]
Paper type
Position paper / Poster
DOI: 10.5281/zenodo.851165
Zenodo URL: https://zenodo.org/record/851165
Abstract
RedirectedWalking (RDW) received increasing attention during the last decade. While exploring large-scale virtual environments (VEs) by means of real walking, RDWtechniques allow to explore VEs, that are significantly larger than the required physical space. This is accomplished by applying discrepancies between the real and the physical movements. This paper focuses on the development of an experiment to identify detection thresholds for an acoustic RDW system by means of a wave field synthesis (WFS) system. The implementation of an automated test procedure is described.
Keywords
immersive virtual environments, real time tracking, redirected walking, virtual reality, wave field synthesis
Paper topics
Perception and cognition of sound and music, Sonic interaction design, Spatial audio
Easychair keyphrases
test subject [41], virtual sound source [30], rotation gain [20], curvature gain [19], virtual environment [18], sound source [15], redirected walking [13], tracking area [12], starting position [11], curvature gain test [9], detection threshold [9], wave field synthesis [9], rotation gain test [7], tracking system [7], virtual world [7], mowec source [6], optional component [6], physical wf area [6], real world [6], rotational distortion [6], time dependent gain [6], translation gain [6], virtual rotation [6], alarm clock [5], auditory cue [5], gain test [5], self motion [5], time dependent [5], tracking data [5], immersive virtual environment [4]
Paper type
Full paper
DOI: 10.5281/zenodo.851053
Zenodo URL: https://zenodo.org/record/851053
Abstract
A frequently occurring problem of state-of-the-art tempo estimation algorithms is that the predicted tempo for a piece of music is a wholef-number multiple or fraction of the tempo as perceived by humans (tempo octave errors). While often this is simply caused by shortcomings of the used algorithms, in certain cases, this problem can be attributed to the fact that the actual number of beats per minute (BPM) within a piece is not a listener’s only criterion to consider it being “fast” or “slow”. Indeed, it can be argued that the perceived style of music sets an expectation of tempo and therefore influences its perception. In this paper, we address the issue of tempo octave errors in the context of electronic music styles. We propose to incorporate stylistic information by means of probability density functions that represent tempo expectations for the individual music styles. In combination with a style classifier those probability density functions are used to choose the most probable BPM estimate for a sample. Our evaluation shows a considerable improvement of tempo estimation accuracy on the test dataset.
Keywords
information extraction, music information retrieval, octave errors, tempo estimation, wikipedia extraction
Paper topics
Multimodality in sound and music computing, Music information retrieval
Easychair keyphrases
tempo estimation [35], tempo octave error [14], probability density function [12], giantstep tempo dataset [11], music information retrieval [11], tempo estimation accuracy [11], tempo estimation algorithm [11], tempo range [11], block level feature [9], tempo annotation [9], tempo estimate [9], th international society [9], tempo relationship [8], music style [7], octave error [7], tempo information [7], wikipedia article [7], art tempo estimation [6], dance nu disco [6], electronic music [6], electronic music style [6], indie dance nu [6], tempo estimator [6], tempo induction algorithm [6], tempo ranker [6], feature vector [5], probability density [5], style estimation [5], house glitch hop [4], infobox music genre [4]
Paper type
Full paper
DOI: 10.5281/zenodo.851139
Zenodo URL: https://zenodo.org/record/851139
Abstract
Along this work, a new online DTW-based score alignment method is used over an online score-informed source sep- aration system. The proposed alignment stage deals with the input signal and the score. It estimates the score posi- tion of each new audio frame in an online fashion by using only information from the beginning of the signal up to the present audio frame. Then, under the Non-negative Matrix Factorization (NMF) framework and previously learned in- strument models the different instrument sources are sep- arated. The instrument models are learned on training ex- cerpts of the same kinds of instruments. Experiments are performed to evaluate the proposed system and its individ- ual components. Results show that it outperforms a state- of-the-art comparison method.
Keywords
alignment, audio, DTW, music, score, source-separation
Paper topics
Models for sound analysis and synthesis, Sound/music signal processing algorithms
Easychair keyphrases
source separation [16], alignment method [13], instrument model [13], spectral pattern [13], spectral basis function [9], score alignment [8], dynamic time warping [6], excitation basis vector [6], non negative matrix factorization [6], signal processing [6], alignment stage [5], carabias orti [5], cost function [5], cost matrix [5], midi time [5], musical instrument [5], time series [5], latent variable analysis [4], low complexity signal decomposition [4], multi excitation model [4], multiplicative update rule [4], neural information processing system [4], nonnegative matrix factorization [4], offline version [4], online scoreinformed source separation [4], polyphonic audio [4], real time [4], signal model [4], sound source separation [4], trained instrument model [4]
Paper type
Position paper / Poster
DOI: 10.5281/zenodo.851127
Zenodo URL: https://zenodo.org/record/851127
Abstract
In this paper, we propose a new loop sequencer that automatically selects music loops according to the degree of excitement by the user. A loop sequencer is expected to be a good tool for non-musicians to compose music because it does not require musical expert knowledge. However, it is not easy to appropriately select music loops because a loop sequencer usually has a huge-scale loop collection (e.g., more than 3000 loops). It is therefore required to automatically select music loops based on the user's simple and easy input. In this paper, we focus on the degree of excitement. In typical techno music, the temporal evolution of excitement is an important feature. Our system allows the user to input the temporal evolution of excitement by drawing a curve, then selects music loops automatically according to the input excitement. Experimental results show that our system is easy to understand and generates satisfied musical pieces for non-experts of music.
Keywords
Automatic music composition, Computer-aided music composition, Degree of excitement, Hidden Markov model, Loop sequencer
Paper topics
Interfaces for sound and music
Easychair keyphrases
music loop [37], loop sequencer [10], baseline system [8], musical piece [7], music composition [5], techno music [5], computer aided music composition [4], music loop according [4], temporal evolution [4]
Paper type
Position paper / Poster
DOI: 10.5281/zenodo.851065
Zenodo URL: https://zenodo.org/record/851065
Abstract
This paper presents a music performance assistance system that enables a user to sing, play a musical instrument producing harmonic sounds (e.g., guitar), or play drums while playing back a karaoke or minus-one version of an existing music audio signal from which the sounds of the user part (singing voices, harmonic instrument sounds, or drum sounds) have been removed. The beat times, chords, and vocal F0 contour of the original music signal are visualized and are automatically scrolled from right to left in synchronization with the music play-back. To help a user practice singing effectively, the F0 contour of the user’s singing voice is estimated and visualized in real time. The core functions of the proposed system are vocal, harmonic, and percussive source separation and content visualization for music audio signals. To provide the first function, vocal-and-accompaniment source separation based on RPCA and harmonic-and-percussive source separation based on median filtering are performed in a cascading manner. To provide the second function, content annotations (estimated automatically and partially corrected by users) are collected from aWeb service called Songle. Subjective experimental results showed the effectiveness of the proposed system.
Keywords
Harmonic and percussive source separation, Music content visualization, Music performance assistance, Singing voice separation
Paper topics
Interfaces for sound and music, Sound/music signal processing algorithms
Easychair keyphrases
music audio signal [28], singing voice [19], source separation [19], percussive source separation [17], vocal f0 contour [14], active music listening [12], singing voice separation [12], beat time [11], real time [10], accompaniment sound [8], music content [8], robust principal component analysis [8], musical instrument [7], music performance assistance [7], service called songle [7], user singing voice [7], web service [7], instrument part [6], performance assistance system [6], chord progression [5], median filtering [5], percussive sound [5], accompaniment source separation [4], audio signal [4], automatic accompaniment [4], median filter [4], playback position [4], polyphonic music [4], vocal andaccompaniment source separation [4], vocal spectrogram [4]
Paper type
Full paper
DOI: 10.5281/zenodo.851133
Zenodo URL: https://zenodo.org/record/851133
Abstract
Handpan is a term used to describe a group of struck metallic musical instruments, which are similar in shape and sound to the Hang (developed by PANArt in January 2000). The handpan is a hand played instrument, which consists of two hemispherical steel shells that are fastened together along the circumference. The instrument usually contains a minimum of eight eliptical notes and is played by delivering rapid and gentle strikes to the note areas. This report details the design and implementation of an experimental procedure to record, analyse, and resynthesise the handpan sound. Four instruments from three different makers were used for the analysis, giving insight into common handpan sound features, and the origin of signature amplitude modulation characteristics of the handpan. Subjective listening tests were conducted aiming to estimate the minimum number of signature partials required to sufficiently resynthesise the handpan sound.
Keywords
amplitude modulation, analysis, decay rates, handpan, hang, listening test, partials, resynthesis, signature, T60
Paper topics
Models for sound analysis and synthesis
Easychair keyphrases
note field [27], handpan sound [18], amplitude modulation [13], signature partial [12], resynthesised signal [7], signature amplitude modulation [7], amplitude modulation characteristic [6], decay rate [6], highest magnitude partial [6], listening test [6], magnetic absorbing pad [6], musical instrument [6], surrounding note field [6], audio signal [5], frequency value [5], note group [5], steel pan [5], amplitude modulated partial frequency [4], decay time [4], energy decay relief [4], estimated amplitude modulation rate [4], highest magnitude [4], mean pd60 decay time [4], median similarity rating [4], modulated partial frequency value [4], signature handpan sound [4], steady state [4], subjective listening test [4], undamped and damped configuration [4], undamped and damped measurement [4]
Paper type
Position paper / Poster
DOI: 10.5281/zenodo.851153
Zenodo URL: https://zenodo.org/record/851153
Abstract
The paper presents a set of mid-level descriptors for the analysis of musical textures played on the guitar, divided in six categories: global, guitar-specific, rhythm, pitch, amplitude and spectrum descriptors. The employed system is based on an acoustic nylon guitar with hexaphonic pick-ups, and was programmed in Max. An overview of the explored low-level audio descriptors is given in the first section. Mid-level descriptors, many of them based on a general affordance of the guitar, are the subject of the central section. Finally, some distinctive characteristics of six different textures -- two-voice writing, block chords, arpeggios, fast gestures with legato, slow melody with accompaniment, strummed chords -- are highlighted with the help of the implemented tools.
Keywords
descriptors of guitar performance, hexaphonic nylon guitar, interactive musical systems, mid-level descriptors
Paper topics
Content processing of music audio signals, Interactive performance systems, Music performance analysis and rendering
Easychair keyphrases
mid level descriptor [28], level descriptor [14], mid level [10], mid level descriptor value [8], string jump [8], block chord [7], fundamental frequency [7], superimposition index [7], mean value [6], real time [6], standard deviation [6], string index [6], left hand [5], open string [5], pitch class [5], spectrum descriptor [5], string centroid [5], acoustic guitar [4], implemented mid level descriptor [4], low level descriptor [4], non pitched event [4], prime form [4], prominent ioi [4]
Paper type
Full paper
DOI: 10.5281/zenodo.851077
Zenodo URL: https://zenodo.org/record/851077
Abstract
Query-by-Humming (QBH) systems base their operation on aligning the melody sung/hummed by a user with a set of candidate melodies retrieved from music tunes. While MIDI-based QBH builds on the premise of existing annotated transcriptions for any candidate song, audio-based research makes use of melody extraction algorithms for the music tunes. In both cases, a melody abstraction process is required for solving issues commonly found in queries such as key transpositions or tempo deviations. Automatic music transcription is commonly used for this, but due to the reported limitations in state-of-the-art methods for real-world queries, other possibilities should be considered. In this work we explore three different melody representations, ranging from a general time-series one to more musical abstractions, which avoid the automatic transcription step, in the context of an audio-based QBH system. Results show that this abstraction process plays a key role in the overall accuracy of the system, obtaining the best scores when temporal segmentation is dynamically performed in terms of pitch change events in the melodic contour.
Keywords
Audio-based Query-by-Humming, Melody encoding, Singing voice alignment
Paper topics
Music information retrieval
Easychair keyphrases
time series [18], music information retrieval [12], temporal segmentation [10], alignment algorithm [8], full automatic music transcription [8], melodic contour [8], subsequence dynamic time warping [8], abstraction process [6], hit rate [6], main f0 contour [6], melody estimation algorithm [6], music collection [6], pitch change event [6], candidate song [5], edit distance [5], fundamental frequency [5], semitone quantization [5], smith waterman [5], candidate melody [4], estimation algorithm melodia [4], frequency value [4], general time series [4], mean reciprocal rank [4], melody abstraction [4], melody abstraction process [4], melody extraction [4], pitch contour [4], polyphonic music signal [4], symbolic aggregate approximation [4], symbolic representation [4]
Paper type
Full paper
DOI: 10.5281/zenodo.851123
Zenodo URL: https://zenodo.org/record/851123
Abstract
The present article describes and discusses an acoustic guitar augmented with structure-borne sound drivers at-tached on its soundboard. The sound drivers enable to drive electronic sounds into the guitar, transforming the soundboard into a loudspeaker and building a second layer of sonic activity on the instrument. The article pre-sents the system implementation and its associated design process, as well as a set of sonic augmentations. The sound esthetics of augmented acoustic instruments are discussed and compared to instruments comprising separate loudspeakers.
Keywords
Active acoustics, Augmented Instrument, Guitar, Live electronics, Sound processing, Structure-borne sound
Paper topics
Interfaces for sound and music, Multimodality in sound and music computing
Easychair keyphrases
sound driver [11], acoustic guitar [9], augmented instrument [9], structure borne sound driver [8], acoustic instrument [6], active control [5], electronic sound [5], frequency response [5], signal processing [5], acoustic sound [4], active acoustic [4], active acoustic guitar [4], attack timbre modification [4], computer music [4], design process [4], electric guitar [4], hexaphonic pickup [4], international computer [4], playing technique [4]
Paper type
Full paper
DOI: 10.5281/zenodo.851049
Zenodo URL: https://zenodo.org/record/851049
Abstract
As the music consumption paradigm moves towards streaming services, users have access to increasingly large catalogs of music. In this scenario, music classification plays an important role in music discovery. It enables, for example, search by genres or automatic playlist creation based on mood. In this work we study the classification of song mood, using features extracted from lyrics alone, based on a vector space model representation. Previous work in this area reached contradictory conclusions based on experiments carried out using different datasets and evaluation methodologies. In contrast, we use a large freely-available dataset to compare the performance of different term-weighting approaches from a classification perspective. The experiments we present show that lyrics can successfully be used to classify music mood, achieving accuracies of up to 70% in some cases. Moreover, contrary to other work, we show that the performance of the different term weighting approaches evaluated is not statistically different using the dataset considered. Finally, we discuss the limitations of the dataset used in this work, and the need for a new benchmark dataset to progress work in this area.
Keywords
Million Songs Dataset, Mood classification, Music classification, Music information retrieval, Sentiment classification, text mining
Paper topics
Music information retrieval, Perception and cognition of sound and music
Easychair keyphrases
term weighting [17], mood classification [15], mood quadrant [15], term weighting scheme [15], music information retrieval [12], music classification [11], music mood classification [11], document frequency [10], song dataset [9], term frequency [9], th international society [9], vector space model [9], classification performance [8], delta tf idf [7], distinct term [7], mood group [7], classification accuracy [6], mood tag [6], social tag [6], classification result [5], feature analysis [5], lyrical feature [5], mood category [5], musixmatch dataset [5], term distribution [5], accuracy tf idf [4], idf term weighting [4], lyric based classification [4], mood granularity [4], statistically significant difference [4]
Paper type
Full paper
DOI: 10.5281/zenodo.851021
Zenodo URL: https://zenodo.org/record/851021
Abstract
This paper presents the early developments of a recently started research project, aimed at studying from a multidisciplinary perspective an exceptionally well preserved ancient pan flute. A brief discussion of the history and iconography of pan flutes is provided, with a focus on Classical Greece. Then a set of non-invasive analyses are presented, which are based on 3D scanning and materials chemistry, and are the starting point to inspect the geometry, construction, age and geographical origin of the instrument. Based on the available measurements, a preliminary analysis of the instrument tuning is provided, which is also informed with elements of theory of ancient Greek music. Finally, the paper presents current work aimed at realizing an interactive museum installation that recreates a virtual flute and allows intuitive access to all these research facets.
Keywords
3D scanning, Archaeoacoustics, Interactive multimedia installations, Virtual instruments
Paper topics
Interfaces for sound and music, Multimodality in sound and music computing
Easychair keyphrases
pan flute [13], ancient greek music [11], musical instrument [10], active preservation [5], metric measurement [5], franc ois vase [4], internal pipe diameter dint [4], preserved ancient pan flute [4], sound synthesis [4], stopped pipe wind instrument [4], very high resolution [4]
Paper type
Position paper / Poster
DOI: 10.5281/zenodo.851067
Zenodo URL: https://zenodo.org/record/851067
Abstract
This paper presents a novel piano tutoring system that encourages a user to practice playing a piano by simplifying difficult parts of a musical score according to the playing skill of the user. To identify the difficult parts to be simplified, the system is capable of accurately detecting mistakes of a user's performance by referring to the musical score. More specifically, the audio recording of the user's performance is transcribed by using supervised non-negative matrix factorization (NMF) whose basis spectra are trained from isolated sounds of the same piano in advance. Then the audio recording is synchronized with the musical score using dynamic time warping (DTW). The user's mistakes are then detected by comparing those two kinds of data. Finally, the detected parts are simplified according to three kinds of rules: removing some musical notes from a complicated chord, thinning out some musical notes from a fast passage, and removing octave jumps. The experimental results showed that the first rule can simplify musical scores naturally. The second rule, however, simplified the scores awkwardly, especially when the passage constituted a melody line.
Keywords
NMF, Piano performance support, Score simplification
Paper topics
Interactive performance systems, Music performance analysis and rendering
Easychair keyphrases
multipitch estimation [18], musical score [15], piano roll [15], score simplification [11], octave error [10], activation matrix [8], actual performance [8], audio signal [8], mistake detection [8], dynamic time warping [7], audio recording [6], non negative matrix factorization [6], synchronized piano roll [6], musical note [5], simplified score [5], user performance [5], base spectrum matrix [4], difficult part [4], fast passage [4], harmonic structure [4], informed piano tutoring system [4], musical score according [4], novel piano tutoring system [4], player skill [4], practice playing [4], rwc music database [4], score informed piano tutoring [4]
Paper type
Full paper
DOI: 10.5281/zenodo.851129
Zenodo URL: https://zenodo.org/record/851129
Abstract
Karaoke is a popular amusement, but people do not necessarily enjoy karaoke when they are not singing. It is better that non-singing people engage in karaoke to enliven it, but this is not always easy, especially if they do not know the song. Here, we focus on the tambourine, which is provided in most karaoke spaces in Japan but are rarely used. We propose a system that instructs how a non-singing person plays the tambourine. Once the singer choose a song, the tambourine part for this song is automatically generated based on the standard MIDI file. During the playback, the tambourine part is displayed in a common music game style with the usual karaoke-style lyrics. The correctness tambourine beat is fed to the display. The results showed that our system motivated non-singing people to play the tambourine with a game-like instruction even for songs that they did not know.
Keywords
Karaoke, Tambourine part generation, Tambourine Support
Paper topics
Interactive performance systems
Easychair keyphrases
tambourine part [19], baseline system [16], tambourine player [14], easy song hard song [12], mean value [10], body motion [9], play karaoke [9], practice mode [9], tambourine part generation [9], unknown song [9], tambourine performance [8], usual karaoke style lyric [8], easy song [7], instrumental solo section [7], tambourine support system [7], temporal differential [7], common music game style [6], hard song [6], hard song easy song [6], non singing person [6], real time tambourine performance [6], rwc music database [6], wii tambourine [6], singing voice [5], singer favorite song [4], snare drum [4], strong note [4], system easy song [4], tambourine performance feedback [4], unknown known unknown song song [4]
Paper type
Position paper / Poster
DOI: 10.5281/zenodo.851059
Zenodo URL: https://zenodo.org/record/851059
Abstract
This paper presents a system that takes audio signals of any song sung by a singer as the input and automatically generates a music video clip in which the singer appears to be actually singing the song. Although music video clips have gained the popularity in video streaming services, not all existing songs have corresponding video clips. Given a song sung by a singer, our system generates a singing video clip by reusing existing singing video clips featuring the singer. More specifically, the system retrieves short fragments of singing video clips that include singing voices similar to that in target song, and then concatenates these fragments using a technique of dynamic programming (DP). To achieve this, we propose a method to extract singing scenes from music video clips by combining vocal activity detection (VAD) with mouth aperture detection (MAD). The subjective experimental results demonstrate the effectiveness of our system.
Keywords
audio-visual processing, Music video generation, singing scene detection
Paper topics
Multimodality in sound and music computing
Easychair keyphrases
video clip [82], music video clip [68], singing scene [61], singing voice [61], music video [47], non singing scene [28], singing voice feature [22], singing scene detection [17], singing voice separation [15], video generation [13], mouth aperture degree [12], audio visual [11], audio visual synchronization [11], singing video [9], automatic music video generation [8], database clip [8], real music video clip [8], audio signal [7], edge free dp [7], existing music video [7], real video clip [7], similar singing voice [7], singing video clip [7], instrumental section [6], mouth aperture [6], music video generation [6], scene detection method [6], talking head [6], video fragment [6], arbitrary song [5]
Paper type
Full paper
DOI: 10.5281/zenodo.851033
Zenodo URL: https://zenodo.org/record/851033
Abstract
The use of interactive technology in music therapy is rapidly growing. The flexibility afforded by the use of these technologies in music therapy is substantial. We present steps in development of Bean, a Digital Musical Instrument wrapped around a commercial game console controller and designed for use in a music therapy setting. Bean is controlled by gestures, and has both physical and virtual segments. The physical user interaction is minimalistic, consisting of the spatial movement of the instrument, along with two push buttons. Also, some visual aspects have been integrated in Bean. Direct visual feedback from the instrument itself is mirrored in accompanying software, where a 3D virtual representation of the instrument can be seen. Sound synthesis currently consists of amplitude and frequency modulation and effects, with a clear separation of melody and harmony. These aspects were developed with an aim to encourage an immediate sense of agency. Bean is being co-developed with clients and therapists, in order to assess the current state of development, and provide clues for optimal improvement going forward. Both the strengths, and the weaknesses of the design at the time of the evaluation, were assessed. Using this information, the current design has been updated, and is now closer to a formal evaluation.
Keywords
DMI, Music Therapy, Participatory Design, Tangible Interface for Musical Expression
Paper topics
Interfaces for sound and music, Multimodality in sound and music computing, Social interaction in sound and music computing, Sound/music and the neurosciences
Easychair keyphrases
music therapy [24], music therapist [8], digital musical instrument [7], sensor data [7], aural feedback [6], solo voice [6], therapeutic setting [6], visual feedback [6], free play [5], aalborg university copenhagen [4], art therapist [4], mapping strategy [4]
Paper type
Full paper
DOI: 10.5281/zenodo.851069
Zenodo URL: https://zenodo.org/record/851069
Abstract
In many domains we wish to gain further insight into the subjective preferences of an individual. The problem with subjective preferences is that individuals are not necessarily coherent in their responses. Often, a simple linear ranking is either not possible, or may not accurately reflect the true preferences or behaviour of the individual. The phenomenon of consonance is heavily subjective and individuals often report to perceive different levels on consonance, or indeed dissonance. In this paper we present a thorough analysis of previous studies on the perception of consonance and dissonance of dyads. We outline a system which ranks musical intervals in terms of consonance based on pairwise comparison and we compare results obtained using the proposed system with the results of previous studies. Finally we propose future work to improve the implementation and design of the system. Our proposed approach is robust enough to handle incoherences in subjects' responses; preventing the formation of circular rankings while maintaining the ability to express these rankings --- an important factor for future work. We achieve this by representing the data gathered on a directed graph. Abstract objects are represented as nodes, and a subject's preference across any two objects is represented as a directed edge between the two corresponding nodes. We can then make use of the transitive nature of human preferences to build a ranking --- or partial ranking --- of objects with a minimum of pairwise comparisons.
Keywords
Consonance, Digraph, Dissonance, Partial Ranking, Ranking, Subjective Preferences
Paper topics
Computational musicology and Mathematical Music Theory, Models for sound analysis and synthesis, Music information retrieval, Sound/music and the neurosciences
Easychair keyphrases
weighted graph [8], directed graph [6], sample group [6], pairwise comparison [5], piano note [5], ranking method [5], subjective preference [5], ranking algorithm [4], subject response [4], test bed [4]
Paper type
Full paper
DOI: 10.5281/zenodo.851045
Zenodo URL: https://zenodo.org/record/851045
Abstract
We describe the composition and performance process of the multimodal piece MinDSounDS, highlighting the design decisions regarding the application of diverse sensors, namely the Kinect (motion sensor), real-time audio analysis with Music Information Retrieval (MIR) techniques, WiiMote (accelerometer) and Epoc (Brain-Computer Interface, BCI). These decisions were taken as part of an collaborative creative process, in which the technical restrictions imposed by each sensor were combined with the artistic intentions of the group members. Our mapping schema takes in account the technical limitations of the sensors and, at the same time, respects the performers’ previous repertoire. A deep analysis of the composition process, particularly due to the collaborative aspect, highlights advantages and issues, which can be used as guidelines for future work in a similar condition.
Keywords
BCI, Kinect, Multimedia, Musical Creation, Music Information Retrieval, Synthesis, WiiMote
Paper topics
Interactive performance systems, Interfaces for sound and music, Sonic interaction design
Easychair keyphrases
composition process [16], virtual environment [7], brain computer interface [6], musical expression [4], slap gesture [4]
Paper type
Full paper
DOI: 10.5281/zenodo.851117
Zenodo URL: https://zenodo.org/record/851117
Abstract
There is considerable interest in music-based games, as the popularity of Rock Band and others can attest, as well as puzzle games. However, these have rarely been combined. Most music-based games fall into the category of rhythm games, and in those games where music is incorporated into a puzzle-like challenge, music usually serves as either an accompaniment or reward. We set out to design a puzzle game where musical knowledge and analysis would be essential to making deductions and solving the puzzle. The result is the CrossSong Puzzle, a novel type of music-based logic puzzle that truly integrates musical and logical reasoning. The game presents a player with a grid of tiles, each representing a mashup of measures from two different songs. The goal is to rearrange the tiles so that each row and column plays a continuous musical excerpt. Automatically identifying a set of song fragments to fill a grid such that each tile contains an acceptable mash-up is our primary technical hurdle. We propose an algorithm that analyses a corpus of music, searches the space of possible fragments, and selects an arrangement that maximizes the “mashability” of the resulting grid. This algorithm and the interaction design of the system are the main contributions.
Keywords
games, interfaces, mashups, puzzles
Paper topics
Content processing of music audio signals, Interactive performance systems, Interfaces for sound and music, Social interaction in sound and music computing, Sonic interaction design, Sound and music for VR and games
Easychair keyphrases
crosssong puzzle [12], visual hint [7], music based game [4], puzzle game [4], real time [4]
Paper type
Full paper
DOI: 10.5281/zenodo.851097
Zenodo URL: https://zenodo.org/record/851097
Abstract
Ilinx is a multidisciplinary art/science research project focusing on the development of a multisensory art installation involving sound, visuals and haptics. In this paper we describe design choices and technical challenges behind the development of six tactile augmented garments, each one embedded with thirty vibrating actuators. Starting from perceptual experiments, conducted to characterize the actuators used in the garments, we describe hardware and software design, and the development of several haptic effects. The garments have successfully been used by over 300 people during the premiere of the installation in the TodaysArt 2014 festival in The Hague.
Keywords
haptics, multisensory, whole-body suit
Paper topics
Multimodality in sound and music computing, Social interaction in sound and music computing, Sonic interaction design
Easychair keyphrases
duty cycle [18], haptic effect [12], driver board [8], duty cycle difference [6], pwm duty cycle [6], average peak amplitude [4], body segment [4], central processing unit [4], dual lock velcro strip [4], duty cycle value [4], multi sensory art installation [4]
Paper type
Full paper
DOI: 10.5281/zenodo.851025
Zenodo URL: https://zenodo.org/record/851025
Abstract
Visual programming languages are commonly used in the domain of sound and music creation. Specific properties and paradigms of those visual languages make them convenient and appealing to artists in various applications such as computer composition, sound synthesis, multimedia artworks, and development of interactive system. This paper presents a systematic research of several well-known languages for sound and music creation. The research was based on the analysis of cognitive dimensions such as abstraction gradient, consistency, closeness of mapping, and error-proneness. We have also considered the context of each analyzed language including its availability, community, and learning materials. Data for the research were collected from a survey conducted among users of the most notable and widespread visual programming languages. The data is presented both in raw, textual format and in a summarized table view. The results indicate desirable aspects along with possible improvements of visual programming approaches for different use cases. Finally, future research directions and goals are suggested in the field of visual programming for applications in music.
Keywords
cognitive dimensions, music creation, visual programming
Paper topics
Computer environments for sound/music processing, Interfaces for sound and music
Easychair keyphrases
pure data [36], few time [16], visual programming language [14], visual programming [11], few week [10], native instrument reaktor [9], algorithmic composition [8], programming language [8], formal music education [7], interactive system [7], music creation [7], audio effect [6], computer music [6], debugging tool limitation [6], few day [6], few month [6], music composition [6], symbolic sound kyma [6], answered question [5], inspiring aspect [5], musical composition [5], online tutorial [5], temporal dimension [5], user base [5], visual representation [5], automating composition technique [4], cognitive dimension framework [4], existing sound [4], program flow control [4], sound synthesis [4]
Paper type
Full paper
DOI: 10.5281/zenodo.851083
Zenodo URL: https://zenodo.org/record/851083
Abstract
Low-cost MIDI mixer-style controllers may not lend themselves to the performance practise of electroacoustic music. This is due to the limited bit depth in which values of controls are transmitted and potentially the size and layouts of control elements, providing only coarse control of sound processes running on a computer. As professional controllers with higher resolution and higher quality controls are more costly and possibly rely on proprietary protocols, the paper investigates the development process of custom DIY controllers based on the Arduino and Teensy 3.1 micro controllers, and Open Source software. In particular, the paper discusses the challenges of building higher resolution controllers on a restricted budget with regard to component selection, printed circuit board and enclosure design. The solutions, compromises and outcomes are presented and analysed in fader-based and knob-based prototypes.
Keywords
electroacoustic performance practise, high-resolution, human computer interaction, midi-controllers, open source
Paper topics
Interfaces for sound and music
Easychair keyphrases
mixer style controller [6], fader box [5], open source [5], electroacoustic music [4], size comparison [4]
Paper type
Full paper
DOI: 10.5281/zenodo.851087
Zenodo URL: https://zenodo.org/record/851087
Abstract
Music notation is facing new musical forms such as electronic and/or interactive music, live coding, hybridizations with dance, design, multimedia. It is also facing the migration of musical instruments to gestural and mobile platforms, which poses the question of new scores usages on devices that mostly lack the necessary graphic space to display the music in a traditional setting and approach. Music scores distributed and shared on the Internet start also to be the support of innovative musical practices, which raises other issues, notably regarding dynamic and collaborative music scores. This paper introduces some of the perspectives opened by the migration of music scores to mobile platforms and to the Internet. It presents also the approach adopted with INScore, an environment for the design of augmented, interactive music scores.
Keywords
collaboration, internet, music score
Paper topics
Interactive performance systems
Easychair keyphrases
music score [15], music notation [12], mobile platform [8], collaborative score design [6], interactive music score [6], use case [6], websocket server [6], forwarding mechanism [5], computer music [4], event based interaction mechanism [4], international computer [4], score set gmnf [4], web audio api [4]
Paper type
Full paper
DOI: 10.5281/zenodo.851061
Zenodo URL: https://zenodo.org/record/851061
Abstract
The current paper takes a critical look at the current state of Auditory Display. It isolates nave realism and cogni-tivist thinking as limiting factors to the development of the field. An extension of Gibson’s theory of affordances into the territory of Embodied Cognition is suggested. The proposed extension relies heavily on Conceptual Metaphor Theory and Embodied Schemata. This is hoped to provide a framework in which to address the problematic areas of theory, meaning and lack of cognitive research in Auditory Display. Finally the current research’s development of a set of embodied auditory models intended to offer greater lucidity and reasonability in Auditory Display systems through the exploitation of embodied affordances, is discussed.
Keywords
Affordances, Auditory, Cognition, Data-driven, Display, Embodied, Furlong, Music, Roddy, Sonification, Stephen
Paper topics
Auditory displays and data sonification, Perception and cognition of sound and music, Sonic interaction design
Easychair keyphrases
auditory display [27], embodied schema [21], embodied affordance [15], meaning making [12], symbol grounding problem [12], embodied cognition [11], auditory domain [10], cognitive capacity [10], cognitive science [10], embodied interaction [10], naive realism [10], big small schema [9], second generation cognitive science [8], cognitively based research [6], conceptual metaphor theory [6], problem area [6], design framework [5], embodied experience [5], human experience [5], auditory perception [4], bass line [4], design pattern [4], ecological interface design [4], embodied auditory model [4], embodied mind [4], embodied music cognition [4], embodied schema theory [4], envelope attack speed [4], pitch level [4], stereo image [4]
Paper type
Full paper
DOI: 10.5281/zenodo.851019
Zenodo URL: https://zenodo.org/record/851019
Abstract
This paper aims to analyze one style of Chinese traditional folk songs named Shaanxi XinTianYou. Research on XinTianYou is beneficial to cultural exploration and music information retrieval. We build a MIDI database to explore the general characteristics of melody. Our insight is that, the combination of intervals reflects the characteristics of the music style. To find the most representative combination of intervals, we propose to use N-Apriori algorithm which counts the frequent patterns of melody. Considering both the significance and similarity between music pieces, we also provide a multi-layer melody perception clustering algorithm which uses both the melodic direction and the melodic value. The significant patterns are selected as the general characteristics of XinTianYou. The musical structure of XinTianYou is analyzed based on both the experiment results and the music theory. We also ask experts to evaluate our experiment results, and prove that our results are consistent with the expert's intuition.
Keywords
Clustering, Folk songs, General characteristics, Pattern mining
Paper topics
Music information retrieval
Easychair keyphrases
general characteristic [11], folk song [10], melody segment [8], frequent pattern [7], average sc result [6], chinese folk song [6], edit distance [6], multi layer melody [6], clustering result [5], significant pattern [5], aware top k pattern [4], candidate k item [4], chinese folk music [4], chinese music [4], frequent item [4], music piece [4], redundancy aware [4], top k cosine similarity [4], wide interval [4]
Paper type
Full paper
DOI: 10.5281/zenodo.851035
Zenodo URL: https://zenodo.org/record/851035
Abstract
In this paper we present a study evaluating the effectiveness of a tactile metronome for music performance and training. Four guitar players were asked to synchronize to a metronome click-track delivered either aurally or via a vibrotactile stimulus. We recorded their performance at different tempi (60 and 120BPM) and compared the results across modalities. Our results indicate that a tactile metronome can reliably cue participants to follow the target tempo. Such a device could hence be used in musical practice and performances as a reliable alternative to traditional auditory click-tracks, generally considered annoying and distracting by performers.
Keywords
haptics, metronome, music performance, notification, tactile, vibrotactile
Paper topics
Interactive performance systems, Interfaces for sound and music, Multimodality in sound and music computing
Easychair keyphrases
tactile metronome [17], auditory metronome [10], music performance [8], auditory click track [7], metronome signal [7], tactile stimulus [7], target tempo [7], click track [6], computer music [5], target ioi [5], audio modality tactile modality figure [4], guitar player [4], raw data point [4], reaction time [4], tactile click track [4], tactile display [4]
Paper type
Full paper
DOI: 10.5281/zenodo.851023
Zenodo URL: https://zenodo.org/record/851023
Abstract
Modes of limited transposition are musical modes originally conceived by the French composer Olivier Messiaen for a tempered system of 12 pitches per octave. They are defined on the base of symmetry-related criteria used to split an octave into a number of recurrent interval groups. This paper describes an algorithm to automatically compute the modes of limited transposition in a generic n-tone equal temperament. After providing a pseudo-code description of the process, a Web implementation will be proposed.
Keywords
generalization, modes of limited transposition, Olivier Messiaen
Paper topics
Computational musicology and Mathematical Music Theory
Easychair keyphrases
limited transposition [11], ring diagram [10], pitch class [9], equal temperament [8], global interval [7], messiaen mode [7], olivier messiaen [6], tone equal temperament [6], frequency ratio [5], theoretical work [5], data structure [4], generalized mode [4], western music [4]
Paper type
Full paper
Abstract
We present a novel method of composing piano pieces with Grammatical Evolution. A grammar is designed to define a search space for melodies consisting of notes, chords, turns and arpeggios. This space is searched using a fitness function based on the calculation of Zipf's distribution of a number of pitch and duration attributes of the given melodies. In this way, we can create melodies without setting a given key or time signature. We can then create simple accompanying bass parts to repeat under the melody. This bass part is evolved using a grammar created from the evolved treble line with a fitness based on Zipf's distribution of the harmonic relationship between the treble and bass parts. From an analysis of the system we conclude that the designed grammar and the construction of the compositions from the final population of melodies is more influential on the musicality of the resultant compositions than the use of the Zipf's metrics.
Keywords
Algorithmic Composition, Evolutionary Computation, Grammatical Evolution, Melodic Composition
Paper topics
Computational musicology and Mathematical Music Theory, Sound/music signal processing algorithms
Easychair keyphrases
fitness function [15], zipf distribution [15], grammatical evolution [12], final generation [10], bass part [7], zipf law [7], best fitness [6], final population [6], fit individual [6], pitch duration [6], short melody [6], duration attribute [5], musical composition [5], top individual [5], accompanying bass part [4], bass accompaniment [4], best individual [4], best median ideal [4], computer music [4], event piece event [4], evolutionary run [4], fitness measure [4], genetic algorithm [4], piano piece [4], treble melody [4]
Paper type
Full paper
Abstract
Sonification is the use of sonic materials to represent information. The use of spatial sonification to represent spatial data, i.e., that which contains positional information, is inherent due to the nature of sound. However, perceptual issues such as the Precedence Effect and Minimum Audible Angle attenuate our ability to perceive directional stimuli. Furthermore, the mapping of multivariate datasets to synthesis engine parameters is non-trivial as a result of the vast information space. This paper presents a model for representing spatial datasets via spatial sonification through the use of granular synthesis.
Keywords
Auditory Displays, Data Sonification, Granular Synthesis, Multimodal Data Representation, Psychoacoustics, Spatial Audio
Paper topics
Auditory displays and data sonification, Models for sound analysis and synthesis, Multimodality in sound and music computing, Perception and cognition of sound and music, Spatial audio
Easychair keyphrases
data point [29], flash rate [11], spatial data [10], granular synthesis [9], granular stream [8], lightning occurrence [7], auditory display [6], spatial sonification [6], synthesis engine [6], data slice [5], grain density [5], temporal transformation [5], complex data space [4], flash rate value [4], minimum audible angle [4], multimodal data representation [4], perceptual issue [4], point cloud [4], sound particle [4], sound spatialization [4], spatial dataset [4], spatial sound [4]
Paper type
Full paper
DOI: 10.5281/zenodo.851089
Zenodo URL: https://zenodo.org/record/851089
Abstract
This paper describes a reactive architecture handling the hybrid temporality of guided human-computer music improvisation. It aims at combining reactivity and anticipation in the music generation processes steered by a scenario. The machine improvisation takes advantage of the temporal structure of this scenario to generate short-term anticipations ahead of the performance time, and reacts to external controls by refining or rewriting these anticipations over time. To achieve this in the framework of an interactive software, guided improvisation is modeled as embedding a compositional process into a reactive architecture. This architecture is instantiated in the improvisation system ImproteK and implemented in OpenMusic.
Keywords
Guided improvisation, Music generation, Planning, Reactive architecture, Scenario, Scheduling
Paper topics
Interactive performance systems, Music performance analysis and rendering
Easychair keyphrases
improvisation handler [26], generation model [20], improvisation renderer [13], generation parameter [11], dynamic control [9], guided improvisation [9], performance time [9], improvisation fragment [8], music generation [7], time window [7], execution trace [6], generation process [6], improvisation handler agent [6], improvisation system [6], reactive architecture [6], real time [6], short term [6], sub sequence [6], computer music [5], generation phase [5], temporal structure [5], action container [4], handler action container [4], human computer improvisation [4], long term structure [4], memory generation model [4], performance time tp [4], short term plan extraction [4], th international computer [4], user control [4]
Paper type
Full paper
DOI: 10.5281/zenodo.851143
Zenodo URL: https://zenodo.org/record/851143
Abstract
This paper introduces the concepts and principles behind Harmony of the Spheres, an Android app based on physi- cal spaces and transformations. The app investigates how gestural multitouch and accelerometer control can be used to create and interact with objects in these physical spaces. The properties of these objects can be arbitrarily mapped to sound parameters, either of an internal synthesizer or ex- ternal systems, and they can be visualized in flexible ways. On a larger scale, users can make soundscapes by defin- ing sequences of physical space conditions, each of which has an effect on the positions and properties of the physical objects.
Keywords
audiovisual mapping, gestural interaction, mobile apps, musical spaces, physical models
Paper topics
Computational musicology and Mathematical Music Theory, Interactive performance systems, Interfaces for sound and music, Sonic interaction design
Easychair keyphrases
physical condition [14], inherent motion [9], musical object [7], real time [7], directional gravity [5], physical model [5], audio parameter [4], central gravity [4], gravitational center [4], internal synthesizer [4], mathematical music theory [4], n dimensional space [4], physical space [4], physical transformation [4], transformational theory [4], visual dimension [4]
Paper type
Full paper
DOI: 10.5281/zenodo.851163
Zenodo URL: https://zenodo.org/record/851163
Abstract
Music emotion recognition (MER) systems have been shown to perform well for musical genres such as film soundtracks and classical music. It seems difficult, however, to reach a satisfactory level of classification accuracy for popular music. Unlike genre, music emotion involves complex interactions between the listener, the music and the situation. Research on MER systems is handicapped due to the lack of empirical studies on emotional responses. In this paper, we present a study of music and emotion using two models of emotion. Participants' responses on 80 music stimuli for the categorical and dimensional model, are compared. In addition, we collect 207 musical excerpts provided by participants for four basic emotion categories (happy, sad, relaxed, and angry). Given that these examples represent intense emotions, we use them to train musical features using support vector machines with different kernels and with random forests. The most accurate classifier, using random forests, is then applied to the 80 stimuli, and the results are compared with participants' responses. The analysis shows similar emotional responses for both models of emotion. Moreover, if the majority of participants agree on the same emotion category, the emotion of the song is also likely to be recognised by our MER system. This indicates that subjectivity in music experience limits the performance of MER systems, and only strongly consistent emotional responses can be predicted.
Keywords
emotional responses, music emotion, music emotion recognition, music perception
Paper topics
Music information retrieval, Perception and cognition of sound and music
Easychair keyphrases
dimensional model [31], music information retrieval [19], participant response [18], music emotion recognition [17], recognition system [15], th international society [15], categorical model [13], emotional response [13], emotion recognition system [12], musical excerpt [12], random forest [12], second clip [11], support vector machine [9], emotion category [8], induced emotion [7], musical feature [7], basic emotion [6], emotion model [6], emotion recognition [6], music emotion [6], happy sad [5], machine learning [5], popular music [5], recognition accuracy [5], artist name [4], greatest number [4], music research [4], popular musical excerpt [4], recognition result [4], subjective music recommendation system [4]
Paper type
Full paper
DOI: 10.5281/zenodo.851055
Zenodo URL: https://zenodo.org/record/851055
Abstract
There are two key challenges to the use of digital, wireless communication links for the short-range transmission of multiple, live music streams from independent sources: delay and synchronisation. Delay is a result of the necessary buffering in digital music streams, and digital signal processing. Lack of synchronisation between time-stamped streams is a result of independent analogue-to-digital conversion clocks. Both of these effects are barriers to the wireless, digital recording studio. In this paper we explore the issue of synchronization, presenting a model, some network performance figures, and the results of experiments to explore the perceived effects of losing synchronization between channels. We also explore how this can be resolved in software when the data is streamed over a Wi-Fi link for real-time audio monitoring using consumer-grade equipment. We show how both fixed and varying offsets between channels can be resolved in software, to below the level of perception, using an offset-merge algorithm. As future work, we identify some of the key solutions for automated calibration. The contribution of this paper is the presentation of perception experiments for mixing unsynchronized music channels, the development of a model representing how these streams can be synchronized after-the-fact, and the presentation of current work in progress in terms of realizing the model.
Keywords
Digital Audio, Interaural Time Difference, Latency, Synchronisation
Paper topics
Computer environments for sound/music processing, Content processing of music audio signals, Interactive performance systems, Sound/music and the neurosciences, Sound/music signal processing algorithms
Easychair keyphrases
real time [13], interaural time difference [11], offset merge [7], buffer size [6], inter channel [6], front end software [4], mixing desk [4], music performance [4], real time monitoring function [4], real time operating system [4], sound localization [4]
Paper type
Full paper
DOI: 10.5281/zenodo.851121
Zenodo URL: https://zenodo.org/record/851121
Abstract
This paper presents the results of a study that explores the effects of including nonlinear dynamical processes in the design of digital musical interfaces. Participants of varying musical backgrounds engaged with a range of systems, and their behaviours, responses and attitudes were recorded and analysed. The study suggests links between the inclusion of such processes and scope for exploration and serendipitous discovery. Relationships between musical instruments and nonlinear dynamics are discussed more broadly, in the context of both acoustic and electronic musical tools. Links between the properties of nonlinear dynamical systems and the priorities of experimental musicians are highlighted and related to the findings of the study.
Keywords
digital musical instruments, mapping, nonlinear dynamical systems
Paper topics
Interactive performance systems, Interfaces for sound and music, Sonic interaction design
Easychair keyphrases
nonlinear dynamical [22], nonlinear dynamical system [20], nonlinear dynamical process [15], discontinuous mapping [13], continuous mapping [11], experimental music [11], musical practice [10], experimental music group [9], musical tool [8], static system [8], nonlinear dynamic [7], nonlinear dynamical element [6], computer music [5], free improvisation [5], continuum international publishing group [4], damped forced duffing oscillator [4], digital musical interface [4], experimental music group mapping [4], material oriented [4], midi control [4], musical background [4], non experimental group [4], non experimental music group [4], open university [4], overall score [4], sonic event [4]
Paper type
Full paper
DOI: 10.5281/zenodo.851041
Zenodo URL: https://zenodo.org/record/851041
Abstract
In this paper we investigate the suitability of decision tree classifiers to assist the task of massive computational ethnomusicology analysis. In our experiments we have employed a dataset of 10,200 traditional Irish tunes. In order to extract features from the Irish tunes, we have converted them into MIDI files and then extracted high level features from them. In our experiments with the traditional Irish tunes, we have verified that decision tree classifiers might be used for this task.
Keywords
computational ethnomusicology, decision trees, irish music
Paper topics
Computational musicology and Mathematical Music Theory
Easychair keyphrases
high level symbolic feature [24], decision tree classifier [19], decision tree [15], music information retrieval [14], short excerpt [12], abc notation [11], computational ethnomusicology [11], midi file [9], tune according [9], irish traditional [8], irish traditional music [7], irish music genre [6], machine learning [5], slip jig [5], traditional irish [5], abc format [4], association rule mining [4], data mining [4], folk music [4], irish traditional tune [4], midi format [4], naive listener [4], rule mining algorithm [4], time signature [4]
Paper type
Full paper
DOI: 10.5281/zenodo.851095
Zenodo URL: https://zenodo.org/record/851095
Abstract
We present a new interactive sound installation to be explored by movement. Specifically, movement qualities extracted from the motion tracking data excite a dynamical system (a synthetic flock of agents), which responds to the movement qualities and indirectly controls the visual and sonic feedback of the interface. In other words, the relationship between gesture and sound are mediated by synthetic swarms of light rays. Sonic interaction design of the system uses density as a design dimension, and maps the swarm parameters to sound synthesis parameters. Three swarm behaviors and three sound models are implemented, and evaluation suggests that the general approach is promising and the system has potential to engage the user.
Keywords
gesture sound mapping, Interactive sound installation, sonic interaction design
Paper topics
Interactive performance systems, Multimodality in sound and music computing, Sonic interaction design
Easychair keyphrases
light ray [11], sound synthesis [10], interactive sound installation [6], movement quality [6], sonic interaction design [6], computing system [4], high pitch sound texture [4], human factor [4], physical model [4], visual appearance [4]
Paper type
Full paper
DOI: 10.5281/zenodo.851151
Zenodo URL: https://zenodo.org/record/851151
Abstract
A method for generating music in response to brain sig- nals is proposed. The brain signals are recorded using consumer-level brain-computer interface equipment. Each time-step in the signal is passed through a directed acyclic graph whose nodes execute simple numerical manipula- tions. Certain nodes also output MIDI commands, leading to patterned MIDI output. Some interesting music is ob- tained, and desirable system properties are demonstrated: the music is responsive to changes in input, and a sin- gle input signal passed through different graphs leads to similarly-structured outputs.
Keywords
adaptive composition, BCI, EEG, generative music, music
Paper topics
Auditory displays and data sonification, Multimodality in sound and music computing, Sound/music and the neurosciences, Sound/music signal processing algorithms
Easychair keyphrases
brain computer interface [9], output node [9], temporal structure [7], brain signal [6], bci data [5], brain computer [5], eeg signal [5], esense meter [5], midi note [5], time series [5], bci signal [4], executable graph [4], human computer interaction [4], inbound edge [4], multiple output [4], neural network [4], non static input signal [4], pmod unary pdiv sin [4]
Paper type
Full paper
DOI: 10.5281/zenodo.851155
Zenodo URL: https://zenodo.org/record/851155
Abstract
This paper introduces Mephisto, a transpiler for converting sound patches designed using the graphical computer music environment Pure Data to the functional DSP programming language Faust. Faust itself compiles into highly-optimized C++ code. The aim of the proposed transpiler is to enable creating highly optimized C++ code embeddable in games or other interactive media for sound designers, musicians and sound engineers using PureData in their workflows and to reduce the prototype-to-product delay. Mephisto's internal structure, its conventions and limitations and its performance are going to be presented and discussed.
Keywords
audio in games, faust, high performance sound processing, procedural sound design, pure data, transpiler
Paper topics
Computer environments for sound/music processing, High Performance Computing for Audio, Sound and music for VR and games
Easychair keyphrases
pure data [27], faust code [12], parse tree [9], dac object [8], highly optimized c code [8], object figure [8], programming language [7], optimized c code [6], average cpu utilization [4], block diagram [4], control mechanism [4], data structure [4], mephisto generated faust code [4], pd object tree [4], pure data patch [4], sound synthesis [4], standard ccitt dialing tone [4], transpiler generated faust code [4], tree traversal [4], tree walker [4]
Paper type
Full paper
DOI: 10.5281/zenodo.851147
Zenodo URL: https://zenodo.org/record/851147
Abstract
In this work we propose how to modify a standard scheme for text-to-speech alignment for the alignment of lyrics and singing voice. To this end we model the duration of phonemes specific for the case of singing. We rely on a duration-explicit hidden Markov model (DHMM) phonetic recognizer based on mel frequency cepstral coefficients (MFCCs), which are extracted in a way robust to background instrumental sounds. The proposed approach is tested on polyphonic audio from the classical Turkish music tradition in two settings: with and without modeling phoneme durations. Phoneme durations are inferred from sheet music. In order to assess the impact of the polyphonic setting, alignment is evaluated as well on an acapella dataset, compiled especially for this study. We show that the explicit modeling of phoneme durations improves alignment accuracy by absolute 10 percent on the level of lyrics lines (phrases) and performs on par with state-of-the-art aligners for other languages.
Keywords
lyrics-to-audio alignment, phoneme durations, polyphonic audio, score-following, score-informed alignment, singing voice tracking, speech-to-text alignment, Turkish classical music
Paper topics
Computational musicology and Mathematical Music Theory, Content processing of music audio signals, Models for sound analysis and synthesis, Music information retrieval, Sound/music signal processing algorithms
Easychair keyphrases
musical score [9], alignment accuracy [7], singing voice [7], explicit hidden markov model [6], phoneme duration [6], audio alignment [5], duration explicit [5], automatic lyric [4], background instrument [4], classical turkish music [4], hidden markov model [4], hidden semi markov model [4], hmm singer adaptation [4], markov model [4], polyphonic audio [4], vocal activity detection [4]
Paper type
Full paper
Abstract
What are the effects of a musician's movement on the affective impact of experiencing a music performance? How can perceptual, sub-personal and cognitive aspects of music be investigated through experimental processes? This article describes the development of a mixed methods approach that tries to tackle such questions by blending quantitative and qualitative methods with observations and interpretations. Basing the core questions on terms and concepts obtained through a wide survey of literature on musical gesture and movement analysis, the iterative, cyclical advance and extension of a series of experiments is shown, and preliminary conclusions drawn from data and information collected in a pilot study. With the choice of particular canonical pieces from contemporary music, a multi-perspective field of questioning is opened up that provides ample materials and challenges for a process of converging, intertwining and cross-discipline methods development. The resulting interpretation points to significant affective impact of movement in music, yet these insights remain subjective and demand that further and deeper investigations are carried out.
Keywords
affective impact, blended interpretation, mixed methods, movement perception, music performance
Paper topics
Multimodality in sound and music computing, Music performance analysis and rendering, Perception and cognition of sound and music
Easychair keyphrases
mixed method research [15], musical gesture [15], music performance [12], audio rating [10], mixed method [10], perceived effort [6], video rating [6], affective impact [5], video condition [5], continuous self report method [4], median time series [4], movement analysis [4], musical performance [4], music perception [4], quantitative track [4], research project [4]
Paper type
Full paper
DOI: 10.5281/zenodo.851107
Zenodo URL: https://zenodo.org/record/851107
Abstract
An experiment was carried out in order to assess the use of non-verbal sensory scales for evaluating perceived music qualities, by comparing them with the analogous verbal scales.Participants were divided into two groups; one group (SV) completed a set of non-verbal scales responses and then a set of verbal scales responses to short musical extracts. A second group (VS) completed the experiment in the reverse order. Our hypothesis was that the ratings of the SV group can provide information unmediated (or less mediated) by verbal association in a much stronger way than the VS group. Factor analysis performed separately on the SV group, the VS group and for all participants shows a recurring patterning of the majority of sensory scales versus the verbal scales into different fac- tors. Such results suggest that the sensory scale items are indicative of a different semantic structure than the verbal scales in describing music, and so they are indexing different qualities (perhaps ineffable), making them potentially special contributors to understanding musical experience.
Keywords
music expressiveness, music qualities, non verbal sensory scales, semantic differential
Paper topics
Multimodality in sound and music computing, Perception and cognition of sound and music
Easychair keyphrases
sensory scale [41], verbal scale [38], non verbal sensory scale [28], musical excerpt [15], bizet mozart chopin [12], brahm vivaldi bizet [12], har sof smo rou [12], mal tak blu ora [12], mozart chopin bach [12], sof smo rou bit [12], vivaldi bizet mozart [12], blu ora har [9], hea lig col [9], lig col war [9], non verbal [9], cold warm [8], heavy light [8], bitter sweet [7], factor analysis [7], scale blue orange [7], soft smooth [7], age range [6], blue orange [6], chopin bach factor [6], equivalent verbal scale [6], factor score [6], mean age [6], rel mal tak blu [6], soft smooth sweet light warm [6], very familiar very unfamiliar [6]
Paper type
Full paper
DOI: 10.5281/zenodo.851099
Zenodo URL: https://zenodo.org/record/851099
Abstract
This paper investigates the use state space models and real time sonification as a tool for electroacoustic composition. State Space models provide mathematical representations of physical systems, making possible to virtually capture a real life system behavior in a matrix-vector equation. This representation provides a vector containing the so called states of the system describing how a system evolves over time. This paper shows different sonifications for state space models and ways of using them in multichannel electroacoustic composition. Even though conventional sound synthesis techniques are used for sonification, very peculiar timbres and effects can be generated when sonifiying state space models. The paper presents an inverted pendulum, a mass-spring-damper system, and a harmonic oscillator, implemented in Supercollider and different real time multichannel sonification approaches, as well as ways of using them in electroacoustic composition.
Keywords
Interactive System, Inverted Pendulum, Multichannel, Sonification, Spring Mass Damper, State Space Models
Paper topics
Auditory displays and data sonification, Interactive performance systems, Interfaces for sound and music, Models for sound analysis and synthesis, Spatial audio
Easychair keyphrases
real time [21], state space [21], mass spring damper system [18], state space model [17], inverted pendulum [10], sound synthesis [9], harmonic oscillator [7], system behaviour [6], time paradox [6], electroacoustic composition [5], sound transformation [5], mathematical model [4], model input value [4], multichannel sonification [4], sampling period ts [4], sonified state space model [4], state space form [4], state vector [4], stereo sonification [4]
Paper type
Full paper
DOI: 10.5281/zenodo.851135
Zenodo URL: https://zenodo.org/record/851135
Abstract
Spatialization, pitch assignment, and timbral variation are three methods that can improve the perception of complex data in both an artistic and analytical context. This multi-modal approach to sonification has been applied to fish movement data with the dual goals of providing an aural representation for an artistic sound installation as well as qualitative data analysis tool useful to scientists studying fish movement. Using field data collected from three wild Chinook Salmon (Oncorhynchus tshawytscha) living in the Snake River Watershed, this paper will demonstrate how sonification offers new perspectives for interpreting migration pat-terns and the impact of environmental factors on the life-cycle associated with this species. Within this model, audio synthesis parameters guiding spatialization, microtonal pitch organization, and temporal structure are assigned to streams of data through software developed by Ben Luca Robertson. Guidance for the project has been provided by Dr. Jonathan Middleton of Eastern Washington University, while collection and interpretation of field data was performed by University of Idaho – Water Resources Program Ph.D. candidate, Jens Hegg.
Keywords
auditory display, microtones, salmon, sonification, spatialization
Paper topics
Auditory displays and data sonification
Easychair keyphrases
auditory display [12], strontium isotope [11], chemical signature [10], pacific ocean [10], water resource program [9], strontium isotope signature [7], marine environment [6], migration pattern [6], pitch assignment [6], strontium isotopic ratio [6], maturation period [5], idaho water resource [4], maternal signature [4], mean value [4], otolith sample [4], snake river [4], timbral variation [4], water system [4]
Paper type
Full paper
DOI: 10.5281/zenodo.851027
Zenodo URL: https://zenodo.org/record/851027
Abstract
This paper reports the concept, design, and prototyping of MUSE, a real-time, turn based, collaborative music making game for users with little to no formal music education background. MUSE is a proof-of-concept, web application running exclusively in the Chrome web browser for four players using game pad controllers. First, we outline the proposed methodology with respect to related research and discuss our approach to designing MUSE through a partial gamification of music using a player-centric design framework. Second, we explain the implementation and prototyping of MUSE. Third, we highlight recent observations of participants using our proof-of-concept application during a short art/installation gallery exhibition. In conclusion, we reflect on our design methodology based on the informal user feedback we received and look at several approaches into improving MUSE.
Keywords
collaborative music, interactive music, music gamification, music sandbox
Paper topics
Interactive performance systems, Interfaces for sound and music, Perception and cognition of sound and music, Social interaction in sound and music computing, Sonic interaction design
Easychair keyphrases
real time [23], long term engagement [14], musical toy [11], serious musical instrument [11], musical output [9], game system [8], musical result [8], emotional response [7], end turn [6], game mechanic [6], game rule [6], motivational affordance [6], overall pleasant musical output [6], real time collaborative music [6], collaborative music [5], creative freedom [5], designing muse [5], music making [5], music output [5], passive player [5], provide user [5], chrome web browser [4], game component [4], game design element [4], low level musical control [4], musical concept [4], musical instrument [4], player block [4], real time pleasant musical output [4], web audio api [4]
Paper type
Full paper
DOI: 10.5281/zenodo.851159
Zenodo URL: https://zenodo.org/record/851159
Abstract
Chronic pain is pain that persists past the expected time of healing. Unlike acute pain, chronic pain is often no longer a sign of damage and may never disappear. Remaining physically active is very important for people with chronic pain, but in the presence of such persistent pain it can be hard to maintain a good level of physical activity due to factors such as fear of pain or re-injury. This paper introduces a sonification methodology which makes use of characteristics and structural elements of Western tonal music to highlight and mark aspects of movement and breathing that are important to build confidence in peoples’ body capability in a way that is easy to attend to and devoid of pain. The design framework and initial conceptual design that uses musical elements such as melody, harmony, texture and rhythm for improving the efficiency of the sonification used to support physical activity for people with chronic pain is here presented and discussed. In particular, we discuss how such structured sonification can be used to facilitate movement and breathing during physical rehabilitation exercises that tend to cause anxiety in people with chronic pain. Experiments are currently being undertaken to investigate the use of these musical elements in sonification for chronic pain.
Keywords
Chronic pain, Implicit music understanding, Musically-informed, Physical rehabilitation, Sonification
Paper topics
Auditory displays and data sonification
Easychair keyphrases
chronic pain [15], physical activity [14], musically informed sonification [12], musical element [7], physical rehabilitation [7], exercise space [6], maximum target point [6], self efficacy [6], western tonal music [6], minimum amount [5], musical stability [4], musical training [4], provide information [4]
Paper type
Position paper / Poster
DOI: 10.5281/zenodo.851111
Zenodo URL: https://zenodo.org/record/851111
Abstract
We propose a novel method for generating choreographies driven by music content analysis. Although a considerable amount of research has been conducted in this field, a way to leverage various music features or music content in automated choreography has not been proposed. Previous methods suffer from a limitation in which they often generate motions giving the impression of randomness and lacking context. In this research, we first discuss what types of music content information can be used in automated choreography and then argue that creating choreography that reflects this music content requires novel beat-wise motion connectivity constraints. Finally, we propose a probabilistic framework for generating choreography that satisfies both music content and motion connectivity constraints. The evaluation indicates that the choreographies generated by our proposed method were chosen as having more realistic dance motion than those generated without the constraints.
Keywords
automated choreography, computer graphics, data driven, music analysis, probabilistic modeling
Paper topics
Multimodality in sound and music computing, Music and robotics
Easychair keyphrases
motion connectivity constraint [28], dance motion [22], motion fragment [22], musical constraint [21], music content [13], automated choreography [12], motion connectivity [10], cross entropy [8], chord label [7], generate choreography [7], music constraint [7], generating choreography [6], music content analysis [6], probabilistic model [6], various music feature [6], acoustic feature [5], beat location [4], hierarchical structure [4], kernel function [4], measure boundary [4], motion connectivity constrained choreography [4], motion database [4], musical feature [4], probabilistic framework [4], structural segmentation [4], structure label [4], subjective evaluation [4]
Paper type
Full paper
DOI: 10.5281/zenodo.851119
Zenodo URL: https://zenodo.org/record/851119
Abstract
In this paper, we propose MusicMean, a system that fuses existing songs to create an in-between song such as an average song, by calculating the average acoustic frequency of musical notes and the occurrence frequency of drum elements from multiple MIDI songs. We generate an in-between song for generative music by defining rules based on simple music theory. The system realizes the interactive generation of in-between songs. This represents new interaction between human and digital content. Using MusicMean, users can create personalized songs by fusing their favorite songs.
Keywords
Average song, Interactive music generation, Song morphing
Paper topics
Computer environments for sound/music processing, Interfaces for sound and music
Easychair keyphrases
blend rate [15], averaging operation [13], musical note [11], average song [10], drum pattern [10], existing song [7], drum pattern histogram [6], musical bar [6], music generation [6], music theory [6], midi file [5], musical key [5], average note [4], mashup music video [4], musical note averaging operation [4], music video generation [4], statistical model [4], user specified blend rate [4], video generation system [4]
Paper type
Position paper / Poster
DOI: 10.5281/zenodo.851071
Zenodo URL: https://zenodo.org/record/851071
Abstract
This paper presents a technique to synthesize the music based on the impression and emotion of the input narratives. The technique prepares a dictionary which record the sensibility polarity values of arbitrary words. The technique also supposes that users listen to the sample chords and rhythms, and input the fitness values to the pre-defined impression word pairs, so that the technique can learn the relations between chords/rhythms and these impression. After these processes, the technique interactively synthesize the music for input narratives.It estimates the fitness values of the narrative to the impression word pairs y applying the dictionary, and then estimates the chord and rhythm progressions those impressions and emotions are the closest to the input narrative. Finally, the technique synthesizes the output tune by combining the chord and rhythm. We suppose this technique encourages to express impression and emotion of the input narratives by generating music.
Keywords
Document analysis, Learning of user's sensibility, Music synthesis
Paper topics
Multimodality in sound and music computing, Music performance analysis and rendering, Perception and cognition of sound and music
Easychair keyphrases
impression word [33], musical feature [25], fitness value [24], rhythm progression [10], sample chord [8], brassy simple [7], light heavy [7], preliminary data construction [7], th impression word [7], bright dark [6], energetic calm musical feature [6], musical feature value [6], document analysis [5], enjoyable wistful [5], music synthesis [5], chord progression [4], dark enjoyable wistful tripping [4], energetic calm [4], fitness value vector [4], minor seventh chord impression [4], point likert scale [4], semantic orientation calculation technique [4], seventh chord impression word [4], user prepared musical pattern [4], user sensibility [4], wistful tripping quiet energetic [4]
Paper type
Full paper
DOI: 10.5281/zenodo.851101
Zenodo URL: https://zenodo.org/record/851101
Abstract
The mixing of audio signals has been at the foundation of audio production since the advent of electrical recording in the 1920’s, yet the mathematical and psychological bases for this activity are relatively under-studied. This paper investigates how the process of mixing music is conducted. We introduce a method of transformation from a “gain-space” to a “mix-space”, using a novel representation of the individual track gains. An experiment is conducted in order to obtain time-series data of mix-engineers' exploration of this space as they balance levels within a multi-track session to create their desired mixture. It is observed that, while the exploration of the space is influenced by the initial configuration of track gains, there is agreement between individuals on the appropriate gain settings required to create a balanced mixture. Implications for the design of intelligent music production systems are discussed.
Keywords
Intelligent mixing systems, Mix-engineering, Music production, Subjective audio evaluation
Paper topics
Computational musicology and Mathematical Music Theory, Interfaces for sound and music, Perception and cognition of sound and music
Easychair keyphrases
mix space [28], mix engineer [9], audio engineering society convention [8], final mix [8], backing track [6], fader control [6], source position [6], relative loudness [5], track gain [5], audio engineering society [4], audio signal [4], dynamic range compression [4], final mix position [4], gain space [4], intelligent mixing system [4], intelligent music production system [4], level balancing [4], mix velocity [4], multitrack session [4], probability density function [4], rhythm section [4], source directivity [4]
Paper type
Full paper
DOI: 10.5281/zenodo.851115
Zenodo URL: https://zenodo.org/record/851115
Abstract
Decomposition of the music signal into the signals of the individual instruments is a fundamental task for music signal processing. This paper proposes a decomposition algorithm of the music signal based on non-negative sparse estimation. we estimate the coefficients of the linear combination by assuming the feature vector of the given music signal can be approximated as the linear combination of the elements in the pre-trained dictionary. Since the music sound is considered as a mixture of tones from several instruments and only a few tones are appeared at the same time, the coefficients must be non-negative and sparse if the music signals are represented by non-negative vectors. In this paper we used the feature vector based on the auto correlation functions. The experimental results show that the proposed decomposition method can accurately estimate the tone sequence from the music sound played using two instruments.
Keywords
Auto-correlation functions, Decomposition of music signal, Dictionary learning, Non-negative sparse coding
Paper topics
Music information retrieval, Sound/music signal processing algorithms
Easychair keyphrases
auto correlation function [30], auto correlation [23], musical signal [23], non negative sparse coding [18], alto saxophone [16], music sound [14], sampling rate [14], individual instrument [12], non negative normalized auto [12], linear combination [11], negative sparse [11], cross correlation [10], dictionary matrix [10], non negative matrix factorization [8], normalized auto correlation [8], normalized auto correlation function [8], feature vector [7], midi sound source [7], negative normalized auto correlation [7], pre trained dictionary [7], alto saxophone part [6], decomposition algorithm [6], negative normalized auto correlation vector [6], non negative matrix [6], nonnegative matrix factorization [6], non negative sparse coefficient [6], sound signal [6], contrabass part [5], estimated coefficient [5], non negative coefficient [4]
Paper type
Full paper
DOI: 10.5281/zenodo.851039
Zenodo URL: https://zenodo.org/record/851039
Abstract
The separation of percussive sounds fromharmonic sounds in audio recordings remains a challenging task since it has received much attention over the last decade. In a previous work, we described amethod to separate harmonic and percussive sounds based on a constrained Non-negative Matrix Factorization (NMF) approach. The approach distinguishes between percussive and harmonic bases integrating percussive and harmonic sound features, such as smoothness and sparseness, into the decomposition process. In this paper, we propose an online version of our previous work. Instead of decomposing the whole mixture, the online proposal decomposes a set of segments of the mixture selected by a sliding temporal window. Both percussive and harmonic bases of the next segment are initialized using the bases obtained in the decomposition of the previous segment. Results show that an online proposal can provide satisfactory separation performance but the sound quality of the separated signals depends inversely on the latency of the system.
Keywords
Constraints, Harmonic/Percussive separation, Latency, Non-negative Matrix Factorization, Online, Signal to Distortion Ratio (SDR), Signal to Interference Ratio (SIR), Smoothness, Sound source separation, Sparseness
Paper topics
Content processing of music audio signals, Music information retrieval, Sound/music signal processing algorithms
Easychair keyphrases
harmonic sound [26], online proposal [25], percussive sound [13], non negative matrix factorization [12], harmonic base [11], computation time [10], offline method [9], method online [8], percussive separation [7], separation performance [7], whole mixture [7], harmonic sound separation method [6], separated percussive signal [6], cost function [5], minimum local [5], next segment [5], proposal online [5], sir result [5], harmonic signal [4], language processing [4], magnitude spectrogram [4], matrix factorization [4], offline harmonic [4], percussive separation work [4], sliding window [4], source separation [4], whole mixture signal [4]
Paper type
Position paper / Poster
DOI: 10.5281/zenodo.851017
Zenodo URL: https://zenodo.org/record/851017
Abstract
It possible to position equal-tempered discrete notes on a flat hexagonal grid in such a way as to allow musical constructs (chords, intervals, melodies, etc.) to take on the same shape regardless of the tonic. This is known as a musical isomorphism, and it has been shown to have advantages in composition, performance, and learning. Considering the utility and interest of such layouts, an extension into 3D interactions was sought, focussing on cylindrical hexagonal lattices which have been extensively studied in the context of carbon nanotubes. In this paper, we explore the notation of this class of cylindrical hexagonal lattices and develop a process for mapping a flat hexagonal isomorphism onto such a lattice. This mapping references and draws upon previous explorations of the helical and cyclical nature of western musical harmony.
Keywords
harmonic theory, hexagonal lattice, isomorphic layout, musical controller design, tonnetz, wicki-hayden
Paper topics
Computational musicology and Mathematical Music Theory, Interactive performance systems, Interfaces for sound and music
Easychair keyphrases
isomorphic layout [21], chiral vector [13], cylindrical hexagonal lattice [12], isotone axis [10], chiral angle [9], hexagonal lattice [9], carbon nanotube [7], cylindrical hexagonal [7], cylindrical hexagonal tube [7], hexagonal grid [6], pitch axis [5], boundary shape [4], chiral vector direction [4], dashed green line [4], harmonic table [4], musical isomorphism [4], tone equal [4], typical isomorphic layout [4], whole number [4], wicki hayden layout [4]
Paper type
Full paper
DOI: 10.5281/zenodo.851085
Zenodo URL: https://zenodo.org/record/851085
Abstract
Automatically following rhythms by beat tracking is by no means a solved problem, especially when dealing with varying tempo and expressive timing. This paper presents a connectionist machine learning approach to expressive rhythm prediction, based on cognitive and neurological models. We detail a multi-layered recurrent neural network combining two complementary network models as hidden layers within one system. The first layer is a Gradient Frequency Neural Network (GFNN), a network of nonlinear oscillators which acts as an entraining and learning resonant filter to an audio signal. The GFNN resonances are used as inputs to a second layer, a Long Short-term Memory Recurrent Neural Network (LSTM). The LSTM learns the long-term temporal structures present in the GFNN's output, the metrical structure implicit within it. From these inferences, the LSTM predicts when the next rhythmic event is likely to occur. We train the system on a dataset selected for its expressive timing qualities and evaluate the system on its ability to predict rhythmic events. We show that our GFNN-LSTM model performs as well as state-of-the art beat trackers and has the potential to be used in real-time interactive systems, following and generating expressive rhythmic structures.
Keywords
Audio Signal Processing, Expressive Timing, Gradient Frequency Neural Networks, Machine Learning, Metre Perception, Music Information Retreival, Recurrent Neural Networks, Rhythm Prediction
Paper topics
Computer environments for sound/music processing, Interactive performance systems, Music information retrieval, Music performance analysis and rendering, Perception and cognition of sound and music, Sound/music signal processing algorithms
Easychair keyphrases
neural network [19], beat tracking [15], gradient frequency neural network [12], rhythmic event [11], recurrent neural network [9], expressive timing [8], hebbian learning [8], audio data [7], city university london [6], connectivity matrix [6], mean field [6], music information retrieval [6], rhythm prediction [6], short term memory [6], audio signal [5], metrical structure [5], rhythmic structure [5], beat induction [4], hierarchical metrical structure [4], integer ratio [4], long term structure [4], mean field network [4], mid level representation [4], neural network model [4], online initonline initonline lstm [4], online online initonline initonline [4], onset detection function [4], real time [4], standard deviation [4], th international society [4]
Paper type
Full paper
DOI: 10.5281/zenodo.851063
Zenodo URL: https://zenodo.org/record/851063
Abstract
Visualization is an extremely useful tool to understand similarity of impressions among large number of tunes, or relationships of individual characteristics among artists, effectively in a short time. We expect chord progressions are beneficial in addition to acoustic features to understand the relationships among tunes; however, there have been few studies on visualization of music collections with the chord progression data. In this paper, we present a technique for integrated visualization of chord progression, meta information and acoustic features in collections of large number of tunes. This technique firstly calculates the acoustic feature values of the given set of tunes. At the same time, the technique collates typical chord progression patterns from the chord progressions of the tunes given as sequences of characters, and records which patterns are used in the tunes. Our implementation visualizes the above information applying the dual scatterplots, where one of the scatterplots arranges tunes based on their acoustic features, and the other figures co-occurrences among chord progression and meta information. In this paper, we introduce the experiment with tunes of 20 Japanese pop musicians using our visualization technique.
Keywords
acoustic feature, chord progression, information visualization, music recommendation
Paper topics
Interfaces for sound and music, Music information retrieval
Easychair keyphrases
chord progression pattern [44], meta information [39], chord progression [35], acoustic feature [32], progression pattern [17], meta information value [11], presented visualization technique [9], visualization technique [9], typical chord progression pattern [8], acoustic feature value [7], drag operation [7], music information retrieval [7], selected meta information [7], artist name [6], pop music [6], visualization result [6], selected dot [5], correlated meta information [4], japanese pop [4], music visualization [4], progression pattern matching [4], similar chord progression pattern [4], th meta information [4]
Paper type
Full paper
DOI: 10.5281/zenodo.851029
Zenodo URL: https://zenodo.org/record/851029
Abstract
In this work we decompose analog musical resonant waveforms into their instantaneous frequency and amplitude envelope, and then smooth these information before resynthesis. The psychoacoustic impacts are evaluated from the point of view of dynamic brightness, tristimulus and spectrum irregularity. Signals with different amounts of resonance were analysed, and different types and lengths were tested for the smoothers. Experiments were carried out with amplitude smoothing only, frequency smoothing only, and simultaneous smoothing of amplitude and frequency signals. We draw conclusions relating the parameters explored and the results, which match with the sounds produced with the technique.
Keywords
AM-FM analysis resynthesis, analysis smoothing resynthesis, psychoacoustic impact of dafx
Paper topics
Models for sound analysis and synthesis, Perception and cognition of sound and music
Easychair keyphrases
instantaneous frequency [8], brightness value [6], musical instrument [6], env mod [5], order smoother [5], psychoacoustic metric [5], signal processing [5], tristimulus triangle [5], audio engineering society convention [4], frequency modulation [4], harmonics finding process [4], higher order [4], irregularity value [4], resonant waveform [4], waveform c figure [4]
Paper type
Full paper
DOI: 10.5281/zenodo.851145
Zenodo URL: https://zenodo.org/record/851145
Abstract
The perceived properties of a digital piano keyboard were studied in two experiments involving different types of vibrotactile cues in connection with sonic feedback. The first experiment implemented a free playing task in which subjects had to rate the perceived quality of the instrument according to five attributes: Dynamic control, Richness, Engagement, Naturalness, and General preference. The second experiment measured performance in timing and dynamic control in a scale playing task. While the vibrating condition was preferred over the standard non-vibrating setup in terms of perceived quality, no significant differences were observed in timing and dynamics accuracy. Overall, these results must be considered preliminary to an extension of the experiment involving repeated measurements with more subjects.
Keywords
digital piano, synthetic vibrations, tactile perception
Paper topics
Multimodality in sound and music computing, Perception and cognition of sound and music
Easychair keyphrases
vibrotactile feedback [10], dynamic control [9], general preference [9], vibration condition [9], digital keyboard [8], key vibration [8], digital piano [7], attribute scale [6], j arvel ainen [6], non vibrating standard [6], vibration sample [6], key velocity [5], negative group [5], perceived quality [5], piano synthesizer [5], control engagement richness [4], digital piano keyboard [4], individual consistency [4], positive group [4], significant difference [4], vibrotactile cue [4]
Paper type
Full paper
DOI: 10.5281/zenodo.851015
Zenodo URL: https://zenodo.org/record/851015
Abstract
Several technologies to measure lip pressure during brass instrument playing have already been developed as prototypes. This paper presents many technological improvements of previous methods and its optimization to use this technique as “easy to handle” tool in the classroom. It also offers new options for performance science studies gathering many intra- and inter-individual variabilities of playing parameters. Improvements include a wireless sensor setup to measure lip pressure in trumpet and cornet playing and to capture the orientation and motion of the instrument. Lightweight design and simple fixation allow to perform with a minimum of alteration of the playing conditions. Wireless connectivity to mobile devices is introduced for specific data logging. The app includes features like data recording, visualization, real-time feedback and server connectivity or other data sharing possibilities. Furthermore, a calibration method for the sensor setup is developed and the results showed measurement accuracy of less than 5% deviation and measurement range from 0.6N up to a peak load to 70N. A pilot study with 9 participants (beginners, advanced students and a professional player) confirmed practical usage. The integration of these real- time data visualizations into daily teaching and practicing could be just the next small step. Lip pressure forces are not only extremely critical for the upper register of the brass instruments, they are in general crucial for all brass instruments, especially playing in upper registers. Small changes of the fitting permit the use of the sensor for all brass instruments.
Keywords
app, biofeedback, brass, cornet, lip pressure, real-time, trumpet
Paper topics
Music performance analysis and rendering
Easychair keyphrases
lip pressure [17], sensor module [11], real time feedback [9], real time [6], professional player [5], sensor setup [5], brass instrument playing [4], data logging [4], dof imu [4], lip pressure measurement [4], miniature load cell [4], piston valve trumpet [4], real time data visualization [4]
Paper type
Position paper / Poster
DOI: 10.5281/zenodo.851105
Zenodo URL: https://zenodo.org/record/851105
Abstract
In this paper we present an application that can send all events from any sensor available at an Android device using OSC and through Unicast or Multicast network communication. Sensors2OSC permits the user to activate and deactivate any sensor at runtime has forward compatibility with any new sensor that can become available without the need to upgrade the application for that. The sensors rate can be changed from the slowest to the fastest, and the user can configure any IP and Port to redirect the OSC messages. The application is described in detail with some discussion about Android devices limitations and the advantages of this application in contrast with so many others that we have on the market.
Keywords
android, interaction, ipv6, mobile, multicast, osc, unicast
Paper topics
Interactive performance systems, Sonic interaction design
Easychair keyphrases
mobile device [12], android device [11], sensor rate [10], android api [9], sensor available [9], osc message [8], multicast address [6], sensor value [6], available sensor [5], main screen [5], port number [5], forward compatibility [4], mobile application development [4]
Paper type
Full paper
DOI: 10.5281/zenodo.851161
Zenodo URL: https://zenodo.org/record/851161
Abstract
Granular methods to synthesise environmental sound textures (e.g. rain, wind, fire, traffic, crowds) preserve the richness and nuances of actual recordings, but need a preselection of timbrally stable source excerpts to avoid unnaturally-sounding jumps in sound character. To overcome this limitation, we add a description of the timbral content of each sound grain to choose successive grains from similar regions of the timbre space. We define two different timbre similarity measures, one based on perceptual sound descriptors, and one based on MFCCs. A listening test compared these two distances to an unconstrained random grain choice as baseline and showed that the descriptor-based distance was rated as most natural, the MFCC based distance generally as less natural, and the random selection always worst.
Keywords
concatenative synthesis, corpus-based synthesis, granular synthesis, sound descriptors, sound texture synthesis
Paper topics
Sound and music for VR and games, Sound/music signal processing algorithms
Easychair keyphrases
sound texture [21], sound texture synthesis [15], crowd water faucet [7], desert wind stadium [7], lapping wave desert [7], stadium crowd water [7], traffic jam baby [7], water faucet formula [7], wave desert wind [7], wind stadium crowd [7], diemo schwarz [6], environmental sound texture [6], naturalness rating [6], texture synthesis [6], sound designer [5], baby total orig descr [4], corpus based concatenative synthesis [4], descriptor based similarity measure [4], mfcc based distance [4], scaled naturalness rating [4], signal processing [4], timbral similarity [4]
Paper type
Full paper
DOI: 10.5281/zenodo.851125
Zenodo URL: https://zenodo.org/record/851125
Abstract
This paper describes a music browsing assistance service, Songrium (http://songrium.jp), that enables visualization and exploration of massive user-generated music content with the aim of enhancing user experiences in enjoying music. Such massive user-generated content has yielded ``web-native music'', which we defined as musical pieces that are published, shared, and remixed (have derivative works created) entirely on the web. Songrium has two interfaces for browsing and listening to web-native music from the viewpoints of scale and time: Songrium3D for gaining community-scale awareness and Interactive History Player for gaining community-history awareness. Both of them were developed to stimulate community activities for web-native music by visualizing massive music content spatially or chronologically and by providing interactive enriched experiences. Songrium has analyzed over 680,000 music video clips on the most popular Japanese video-sharing service, Niconico, which includes original songs of web-native music and their derivative works such as covers and dance arrangements. Analyses of more than 120,000 original songs reveal that over 560,000 derivative works have been generated and contributed to enriching massive user-generated music content.
Keywords
interactive system, music visualization, user-generated content, web application, web-native music
Paper topics
Interfaces for sound and music, Music information retrieval
Easychair keyphrases
derivative work [54], music content [29], interactive history player [28], music star map [15], visual effect [15], content creation community [12], music content creation [12], user generated music content [12], web native music [12], video clip [11], hatsune miku [10], browsing assistance service [9], music browsing assistance [9], music structure [7], music video clip [7], native music [7], dimensional space [6], embedded video player [6], video sharing service [6], vocaloid character [6], web native music content [6], music recommendation [5], public event [5], repeated section [5], vocaloid song [5], webnative music [5], crypton future medium [4], music video [4], popular japanese video sharing [4], web service [4]
Paper type
Full paper
DOI: 10.5281/zenodo.851091
Zenodo URL: https://zenodo.org/record/851091
Abstract
This paper presents Sound My Vision, an Android application for controlling music expression and multimedia projects. Unlike other similar applications which collect data only from sensors and input devices, Sound My Vision also analyses input video in real time and extracts low-level video features. Such a versatile controller can be used in various scenarios from entertainment and experimentation to live music performances, installations and multimedia projects. The application can replace complex setups that are usually required for capturing and analyzing a video signal in live performances. Additionally, mobility of smart-phones allows perspective changes in sense that the performer can become either an object or a subject involved in controlling the expression. The most important contributions of this paper are selection of general and low-level video feature and the technical solution for seamless real-time video extraction on the Android platform.
Keywords
mobile application, OSC, sensors, video analysis, video features
Paper topics
Interactive performance systems, Interfaces for sound and music
Easychair keyphrases
moving object [15], video feature [15], mobile device [13], mobile application [10], musical expression [9], real time [9], real time video analysis [8], computer music [7], touch screen [6], international computer [5], multimedia project [5], use case [5], android application [4], android operating system [4], computer vision [4], low level video feature [4], multimedia system [4], real time video [4], seamless real time video [4]
Paper type
Position paper / Poster
DOI: 10.5281/zenodo.851081
Zenodo URL: https://zenodo.org/record/851081
Abstract
Spatially distributed musical ensembles play together while being distributed in space, e.g., in a park or in a historic building. Despite the distance between the musicians they should be able to play together with high synchronicity and realize complex rhythms (as far as the speed of sound permits). In this paper we propose systematic support of such ensembles based on electronic music stands that are synchronized to each other without using a permanent computer network or any network at all. This makes it possible to perform music for spatially distributed musical ensembles in places where it is difficult to get access to a computer network, e.g., in parks, historic buildings or big concert venues.
Keywords
Click Track, Electronic Music Stand, Synchronization
Paper topics
Interactive performance systems
Easychair keyphrases
m sync player [28], music stand [21], digital music stand [20], click track [13], distributed button [9], distributed musical ensemble [9], electronic music stand [9], page turning [9], computer music [6], continuous synchronization [6], playback model [6], radio time signal [6], shot synchronization [6], delta time [5], visual cue [5], auditory cue [4], ensemble member [4], low latency audio transmission [4], msync player [4], musical expression [4], music performance [4], sheet music [4], synchronized m sync player [4], tempo change [4], web based click track editor [4]
Paper type
Full paper
DOI: 10.5281/zenodo.851131
Zenodo URL: https://zenodo.org/record/851131
Abstract
In this paper we present SynPy, an open-source software toolkit for quantifying syncopation. It is flexible yet easy to use, providing the first comprehensive set of implementations for seven widely known syncopation models using a simple plugin architecture for extensibility. SynPy is able to process multiple bars of music containing arbitrary rhythm patterns and can accept time-signature and tempo changes within a piece. The toolkit can take input from various sources including text annotations and standard MIDI files. Results can also be output to XML and JSON file formats. This toolkit will be valuable to the computational music analysis community, meeting the needs of a broad range of studies where a quantitative measure of syncopation is required. It facilitates a new degree of comparison for existing syncopation models and also provides a convenient platform for the development and testing of new models.
Keywords
python, syncopation modelling, toolkit
Paper topics
Computational musicology and Mathematical Music Theory, Models for sound analysis and synthesis
Easychair keyphrases
rhythm pattern [14], syncopation model [13], note sequence [12], velocity sequence [11], standard midi file [9], time signature [7], time span [7], metrical weight [6], music perception [6], syncopation prediction [6], synpy toolkit [6], metrical hierarchy [5], metrical level [5], note duration [5], arbitrary rhythm pattern [4], computational music analysis community [4], json file [4], longuet higgin [4], metrical position [4], open source [4], plugin architecture [4], prediction value [4], quarter note [4], queen mary [4], son clave rhythm [4], syncopation value [4]
Paper type
Position paper / Poster
DOI: 10.5281/zenodo.851079
Zenodo URL: https://zenodo.org/record/851079
Abstract
Composing drum patterns and musically developing them through repetition and variation is a typical task in electronic music production. We propose a system that, given an input pattern, automatically creates related patterns using a genetic algorithm. Two distance measures (the Hamming distance and directed-swap distance) that relate to rhythmic similarity are shown to derive usable fitness functions for the algorithm. A software instrument in the Max for Live environment presents how this can be used in real musical applications. Finally, a user survey was carried out to examine and compare the effectiveness of the fitness metrics in determining rhythmic similarity as well as the usefulness of the instrument for musical creation.
Keywords
Algorithmic Composition, Drums, Electronic Music, Genetic Algorithms, Rhythm Similarity
Paper topics
Computational musicology and Mathematical Music Theory, Computer environments for sound/music processing, Interfaces for sound and music, Models for sound analysis and synthesis, Perception and cognition of sound and music
Easychair keyphrases
genetic algorithm [32], target pattern [15], fitness function [12], hamming distance [11], rhythmic similarity [11], directed swap distance [9], drum pattern [9], distance measure [8], edit distance [7], swap distance [7], computer music [5], bit string [4], correlation matrix [4], next section [4], perceived similarity [4], rhythmic pattern [4]
Paper type
Full paper
DOI: 10.5281/zenodo.851031
Zenodo URL: https://zenodo.org/record/851031
Abstract
We present computer-aided composition experiments related to the notions of polyrhythmic structures and variable tempo curves. We propose a formal context and some tools that allow to generate complex multi-varying-tempo polyrhythms integrated in compositional processes and performance, implemented as algorithms and prototype user interfaces.
Keywords
Computer-aided composition, Polytemporal music, Rhythm, Tempo
Paper topics
Computer environments for sound/music processing
Easychair keyphrases
tempo curve [16], computer aided composition [7], computer music [7], temporal pattern [7], simulated annealing algorithm [6], compositional process [4], musical material [4], recent work [4], target rhythm [4], varying tempo [4]
Paper type
Full paper
DOI: 10.5281/zenodo.851109
Zenodo URL: https://zenodo.org/record/851109
Abstract
The Harmonic Walk is an interactive, physical environment based on user’s motion detection and devoted to the study and practice of tonal harmony. When entering the rectangular floor surface within the application’s camera view, a user can actually walk inside the musical structure, causing a sound feedback depending on the occupied zone. We arranged a two masks projection set up to allow users to experience melodic segmentation and tonality harmonic space, and we planned two phase assessment sessions, submitting a 22 high school student group to various test conditions. Our findings demonstrate the high learning effectiveness of the Harmonic Walk application. Its ability to transfer abstract concepts in an enactive way, produces important improvement rates both for subjects who received explicit information and for subjects who didn’t.
Keywords
Interactive physical environments, Music cognition, Music learning applications
Paper topics
Interactive performance systems, Interfaces for sound and music
Easychair keyphrases
harmonic change [20], harmonization task [16], harmonic walk [15], effect size [12], non musician [10], high school student [9], circular ring [8], subject category [8], melody harmonization [7], second test [7], employed chord [6], enactive experience [6], instructed subject [6], melodic segmentation [6], standard deviation [6], test conductor [6], tonality harmonic space [6], zone tracker application [6], assessment session [5], assessment test [5], audio file [5], explicit information [5], high school [5], instructed musician [5], tonal function [5], tonal melody [5], uninstructed musician [5], catholic institute barbarigo [4], harmonic walk application [4], music high school [4]
Paper type
Full paper
DOI: 10.5281/zenodo.851043
Zenodo URL: https://zenodo.org/record/851043
Abstract
Many contemporary computer music systems can emulate as- pects of composers’ behaviour, creating and arranging struc- tural elements traditionally manipulated by composers. This raises the question as to how new computer music systems can act as effective tools that enable composers to express their personal musical vision–if a computer is acting as a composer’s tool, but is working directly with score structure, how can it preserve the composer’s artistic voice? David Wessel and Matthew Wright have argued that, in the case of musical instrument interfaces, a balance should be struck between ease of use and the potential for developing expres- sivity through virtuosity. In this paper, we adapt these views to the design of compositional interfaces. We introduce the idea of the virtuoso composer, and propose an understanding of computer music systems that may enhance the relation- ship between composers and their computer software tools. We conclude by arguing for a conceptualization of the com- poser/computer relationship that promotes the continued evo- lution of human musical expression.
Keywords
Critical Studies, Electronic Composition, Generative Music, Human-Computer Collaboration
Paper topics
Music and robotics, Music performance analysis and rendering, Social interaction in sound and music computing, Sound and music for VR and games
Easychair keyphrases
computer music system [19], musical work [10], computer music [7], musical instrument [7], computer based composition environment [6], musical idea [6], virtuoso composer [6], computer system [5], score structure [5], david cope experiment [4], musical intelligence [4], musical structure [4], real time [4]
Paper type
Full paper
DOI: 10.5281/zenodo.851137
Zenodo URL: https://zenodo.org/record/851137
Abstract
A surface can be harsh and raspy, or smooth and silky, and everything in between. We are used to sense these features with our fingertips as well as with our eyes and ears: the exploration of a surface is a multisensory experience. Tools, too, are often employed in the interaction with surfaces, since they augment our manipulation capabilities. “Sketch a Scratch” is a tool for the multisensory exploration and sketching of surface textures. The user’s actions drive a physical sound model of real materials’ response to interactions such as scraping, rubbing or rolling. Moreover, different input signals can be converted into 2D visual surface profiles, thus enabling to experience them visually, aurally and haptically.
Keywords
Exploration, Interaction, Texture sketching
Paper topics
Computer environments for sound/music processing, Interactive performance systems, Interfaces for sound and music, Multimodality in sound and music computing, Sonic interaction design
Easychair keyphrases
haptic feedback [13], surface texture [10], surface profile [7], tool mediated exploration [6], virtual surface [6], impact model [5], audio signal [4], friction model [4], interactive surface [4], lateral force [4], multisensory exploration [4], real surface [4], rubbed object [4], self contained interactive installation [4], sound design toolkit [4], texture exploration [4], world voice day [4]
Paper type
Full paper
DOI: 10.5281/zenodo.851051
Zenodo URL: https://zenodo.org/record/851051
Abstract
A descriptor of features’ modulation, useful in classification tasks and real time analysis, is proposed. This descriptor is computed in the time domain, ensuring fast computation speed and optimal temporal resolution. In this work we take into account amplitude envelope as inspected feature, so the outcome of this process can be useful to gain information about the input’ energy modulation and can be exploited to detect transients presence in audio segments. The proposed algorithm relays on an adaptation of Continuous Brightness Estimation (CoBE).
Keywords
Brightness, CoBE, Events detection, Feature extraction, MIR
Paper topics
Digital audio effects, Music information retrieval, Sound/music signal processing algorithms
Easychair keyphrases
envelope follower [7], event density [7], amplitude envelope [6], energy modulation [6], energy modulation amount [6], cobe value [5], energy envelope [5], low energy [5], attack leap [4], cobe ebf [4], continuous brightness estimation [4], crest factor [4], envelope brightness [4], feature extraction [4], modulation amount [4], onset detection [4], spectral flux [4], standard deviation [4], trap median [4]
Paper type
Full paper
DOI: 10.5281/zenodo.851103
Zenodo URL: https://zenodo.org/record/851103
Abstract
Here we present experimental results that investigate the application of vibrotactile stimulus of pure and complex waveforms. Our experiment measured a subject’s ability to discriminate between pure and complex waveforms based upon vibrotactile stimulus alone. Subjective same/different awareness was captured for paired combinations of sine, saw, and square waveforms at a fixed fundamental frequency of 160 Hz (f0). Each arrangement was presented non-sequentially via a gloved vibrotactile device. Audio and bone conduction stimulus were removed via headphone and tactile noise masking respectively. The results from our experiments indicate that humans possess the ability to distinguish between different waveforms via vibrotactile stimulation when presented asynchronously at f0 and that this form of interaction may be developed further to advance digital musical instrument (DMI) extra-auditory interactions in computer music.
Keywords
Interfaces for sound and music, Multimodality in sound and music computing, Perception and cognition of sound and music, Sound/music and the neurosciences
Paper topics
Interfaces for sound and music, Multimodality in sound and music computing, Perception and cognition of sound and music, Sound/music and the neurosciences
Easychair keyphrases
vibrotactile stimulus [10], complex waveform [7], vibrotactile feedback [7], audio tactile glove [6], multisensory integration [5], musical instrument [5], college cork cork [4], non musician [4], sub threshold [4]
Paper type
Full paper
DOI: 10.5281/zenodo.851057
Zenodo URL: https://zenodo.org/record/851057
Abstract
This paper deals with the acoustic analysis of timbral and rhythmic patterns of the Cicada Orni sound activity, collected at the Plato Academy archaeological site during the summer period of 2014, comprising the Tettix soundscape database. The main purpose here is to use sound analysis for understanding the basic patterns of cicada calls and shrilling sounds, and subsequently use the raw material provided by the Tettix database in a statistical modeling framework for creating virtual sounds of cicadas, allowing the control of synthesis parameters spanning micro, meso and macro temporal levels.
Keywords
Cicada sound, Soundscape, Statistical models, Synthesis model, Timbre and rhythm analysis
Paper topics
Content processing of music audio signals, Models for sound analysis and synthesis, Sonic interaction design
Easychair keyphrases
virtual cicada [14], second order markov model [8], cicada orni [7], plato academy [7], macro temporal level [6], micro temporal [6], cicada call [5], cicada singing [5], low pass [5], macro temporal [5], synthesis model [5], temporal scale [5], tettix project [5], transition matrix [5], cicada chorus [4], high pass filter [4], lower right [4], low frequency [4], low pass filtered version [4], meso temporal [4], meso temporal scale [4], micro temporal scale [4], micro temporal synthesis engine [4], multi ethnic heterotopical soundscape [4], plato academy soundscape [4], precedence effect [4], statistical modeling [4], tettix database [4], upper left [4], upper right [4]
Paper type
Full paper
DOI: 10.5281/zenodo.851141
Zenodo URL: https://zenodo.org/record/851141
Abstract
In this paper we present a flexible framework for parametric speech analysis and synthesis with high quality. It constitutes an extended source-filter model. The novelty of the proposed speech processing system lies in its extended means to use a Deterministic plus Stochastic Model (DSM) for the estimation of the unvoiced stochastic component from a speech recording. Further contributions are the efficient and robust means to extract the Vocal Tract Filter (VTF) and the modelling of energy variations. The system is evaluated in the context of two voice quality transformations on natural human speech. The voice quality of a speech phrase is altered by means of re-synthesizing the deterministic component with different pulse shapes of the glottal excitation source. A Gaussian Mixture Model (GMM) is used in one test to predict energies for the re-synthesis of the deterministic and the stochastic component. The subjective listening tests suggests that the speech processing system is able to successfully synthesize and arise to a listener the perceptual sensation of different voice quality characteristics. Additionally, improvements of the speech synthesis quality compared to a baseline method are demonstrated.
Keywords
Glottal Source, LF model, Source-Filter, Speech Analysis Transformation and Synthesis, Voice Quality
Paper topics
Content processing of music audio signals, Models for sound analysis and synthesis, Sound/music signal processing algorithms
Easychair keyphrases
voice quality [57], synthesis quality [17], mo synthesis quality [14], voice quality transformation [14], tense voice quality [12], signal processing [9], voice descriptor [9], voice quality rating [9], glottal pulse [8], quality rating [8], baseline method svln [7], rdgci contour [7], relaxed voice quality [7], sinusoidal content [7], time domain mixing [7], voice quality change [7], glottal excitation source [6], glottal flow [6], glottal source [6], spectral fading [6], spectral fading synthesis [6], spectral slope [6], speech communication association [6], synthesis quality rating [6], unvoiced signal [6], very tense voice [6], voice quality characteristic [6], energy measure [5], source filter [5], unvoiced component [5]
Paper type
Full paper
DOI: 10.5281/zenodo.851075
Zenodo URL: https://zenodo.org/record/851075
Abstract
We present research in sound synthesis techniques employing lookup tables higher than two dimensions. Higher dimensional wavetables have not yet been explored to their fullest potential due to historical resource restrictions, particularly memory. This paper presents a technique for sound synthesis by means of three-variable functions as an extension to existing multidimensional table lookup synthesis techniques.
Keywords
Multidimensional wavetable, Sound Synthesis, Three variable functions
Paper topics
Digital audio effects, Multimodality in sound and music computing, Sound/music signal processing algorithms
Easychair keyphrases
voxel stack [17], wave terrain [16], wave terrain synthesis [14], harmonic content [10], sound synthesis [10], computer music [9], linear frequency [9], linear phase [9], frequency sweep [8], orbit trajectory [8], amplitude value [7], wave voxel [7], international computer [6], dimensional space [5], orbit length [5], sine wave [5], table size [5], dimensional lookup table [4], dimensional wavetable [4], dynamic voxel stack content [4], indexing operation [4], real time video image [4], sub wavetable [4], term wave voxel [4], variable function [4], wave voxel synthesis [4]
Paper type
Full paper
DOI: 10.5281/zenodo.851037
Zenodo URL: https://zenodo.org/record/851037
Abstract
Perceptual evaluation tests where subjects assess certain qualities of different audio fragments are an integral part of audio and music research. These require specialised software, usually custom-made, to collect large amounts of data using meticulously designed interfaces with carefully formulated questions, and play back audio with rapid switching between different samples. New functionality in HTML5 included in the Web Audio API allows for increasingly powerful media applications in a platform independent environment. The advantage of a web application is easy deployment on any platform, without requiring any other application, enabling multiple tests to be easily conducted across locations. In this paper we propose a tool supporting a wide variety of easily configurable, multi-stimulus perceptual audio evaluation tests over the web with multiple test interfaces, pre- and post-test surveys, custom configuration, collection of test metrics and other features. Test design and set up doesn't require programming background, and results are gathered automatically using web friendly formats for easy storing of results on a server.
Keywords
Audio Evaluation, HTML5, Listening Tests, Web Audio
Paper topics
Interfaces for sound and music, Music information retrieval, Perception and cognition of sound and music
Easychair keyphrases
web audio api [17], audio engineering society [15], listening test [10], comment box [8], perceptual evaluation [7], sample rate [7], audio sample [6], audio file [5], metricenable metricenable [5], audio fragment [4], audio perceptual evaluation [4], audio quality [4], browser based listening test environment [4], browser based perceptual evaluation [4], metricresult metricresult id [4], multiple stimulus [4], perceptual evaluation tool [4], setup file [4]
Paper type
Position paper / Poster
DOI: 10.5281/zenodo.851157
Zenodo URL: https://zenodo.org/record/851157
Abstract
This paper introduces Web Audio Modules (WAMs), which are high-level audio processing/synthesis units that represent the equivalent of Digital Audio Workstation (DAW) plug-ins in the browser. Unlike traditional browser plugins WAMs load from the open web with the rest of the page content without manual installation. We propose the WAM API – which integrates into the existing Web Audio API – and provide its implementation for JavaScript and C++ bindings. Two proof-of-concept WAM virtual instruments were implemented in Emscripten, and evaluated in terms of latency and performance. We found that the performance is sufficient for reasonable polyphony, depending on the complexity of the processing algorithms. Latency is higher than in native DAW environments, but we expect that the forthcoming W3C standard AudioWorkerNode as well as browser developments will reduce it.
Keywords
daw plugin, emscripten, sound synthesis, virtual instrument, web audio
Paper topics
Computer environments for sound/music processing, Digital audio effects, Interfaces for sound and music, Sound/music signal processing algorithms
Easychair keyphrases
web audio api [26], audio api [15], web browser [10], web audio api node [8], buffer size [7], parameter space [7], style virtual instrument [6], virtual instrument [6], virtual void [6], wam implementation [6], web page [6], web service [6], audio plug [5], daw style [5], manual installation [5], native plug [5], use case [5], wam sinsynth [5], web application [5], web audio [5], audio api node graph [4], audio module [4], open source [4], streamlined api [4], user interface [4], void data [4], wam api [4], web api [4], web component [4], web midi api [4]
Paper type
Full paper
DOI: 10.5281/zenodo.851149
Zenodo URL: https://zenodo.org/record/851149