Dates: from July 18 to July 23, 2010
Place: Barcelona, Spain
Proceedings info: not available
Abstract
In this study, we propose and compare two probabilistic models for online pitch tracking: Hidden Markov Model and Change Point Model. In our models each note has a certain characteristic spectral shape which we call spectral templates. Hence the system's goal is to find the note whose template is active given the audio data. The main focus on this work is the trade off between latency and accuracy of the pitch tracking system. We present the probabilistic models and the inference schemes in detail. Encouraging results are obtained from the experiments that are done on low-pitched monophonic audio.
Keywords
Change Point Model, Hidden Markov Model, Pitch tracking, Real-time audio processing
Paper topics
Automatic music transcription, Musical pattern recognition/modeling, Sound/music signal processing algorithms
Easychair keyphrases
pitch tracking [17], gamma potential [16], probabilistic model [16], change point [10], hidden markov model [7], indicator variable [7], change point model [6], online pitch tracking [6], polyphonic pitch tracking [6], time frame [6], spectral template [5], certain characteristic spectral shape [4], compound poisson observation model [4], computational complexity [4], exact forward backward algorithm [4], inference scheme [4], low pitched instrument [4], monophonic pitch tracking [4], non negative matrix factorization [4], pitch tracking method [4], pitch tracking system [4], standard hidden markov model [4]
Paper type
unknown
DOI: 10.5281/zenodo.849699
Zenodo URL: https://zenodo.org/record/849699
Abstract
Nowadays some churches are not used for worship, but they are used for cultural or leisure performances. The acoustic conditions of the original buildings are not the optimum for these new uses. For this reason it is necessary an acoustical rehabilitation. This paper describes the work done in order to improve the acoustic of a church in Vinaròs and it is presented the refurbishment of this room as a multiple-use room. To make this improvement a ray tracing tool has been used. The improvement has been evaluated with virtual acoustics and according to Beranek’s parameters. The main aims of this study were to evaluate the actual acoustic conditions and to present a proposal for later acoustic refurbishment.
Keywords
Acoustical rehabilitation, Simultation, Virtual acoustics
Paper topics
Computer environments for sound/music processing
Easychair keyphrases
acoustic fitting [16], reverberation time [16], saint agustine church [11], absorbent material [7], stained glass sheet [7], octave band [5], pressure level [5], stall seat [5], absorption coefficient [4], catt acoustic [4], clarity index c80 [4], definition index d50 [4], frequency band [4], index d50 distribution [4], meter wide [4], rapid speech transmission index [4], saint augustine church [4], sti index [4]
Paper type
unknown
DOI: 10.5281/zenodo.849707
Zenodo URL: https://zenodo.org/record/849707
Abstract
This paper presents recent works on controlling and editing sound spatialization on multiple speakers based on sound descriptors. It has been implemented as an extension of Holo-Edit, an OpenSoundControl compliant multitrack spatial trajectory editor developed at GMEM. An SDIF interface has been implemented allowing importing and visualizing sound descriptors generated by third party softwares. A set of scripting tools is proposed to process and map these time-tagged data to sound trajectory generation.
Keywords
adaptive spatialization, OSC, sdif visualization, SpatDIF, trajectory scripting
Paper topics
3D sound/music, Computer environments for sound/music processing, Visualization of sound/music data
Easychair keyphrases
holo edit [7], sound spatialization [7], fundamental frequency [6], international computer music [6], adaptive spatialization [4], begin time [4], data object [4], global time selection [4], instrumental sound [4], sound description [4], sound event [4]
Paper type
unknown
DOI: 10.5281/zenodo.849709
Zenodo URL: https://zenodo.org/record/849709
Abstract
The Bag-of-Frames (BoF) approach has been widely used in music genre classification. In this approach, music genres are represented by statistical models of low-level features computed on short frames (e.g. in the tenth of ms) of audio signal. In the design of such models, a common procedure in BoF approaches is to represent each music genre by sets of instances (i.e. frame-based feature vectors) inferred from training data. The common underlying assumption is that the majority of such instances do capture somehow the (musical) specificities of each genre, and that obtaining good classification performance is a matter of size of the training dataset, and fine-tuning feature extraction and learning algorithm parameters. We report on extensive tests on two music databases that contradict this assumption. We show that there is little or no benefit in seeking a thorough representation of the feature vectors for each class. In particular, we show that genre classification performances are similar when representing music pieces from a number of different genres with the same set of symbols derived from a single genre or from all the genres. We conclude that our experiments provide additional evidence to the hypothesis that common low-level features of isolated audio frames are not representative of music genres.
Keywords
audio music genre classification, Bag of Frames, short time descriptors, vector quantization
Paper topics
Musical pattern recognition/modeling, Music information retrieval, Sound/music signal processing algorithms
Easychair keyphrases
music piece [24], feature vector [18], common low level feature [12], music genre [12], music genre classification [11], genre classification [9], second best accuracy [7], genre markov svm [6], markov model [6], music information retrieval [6], audio feature [5], classification accuracy [5], last column [5], percentage point [5], average accuracy [4], genre classification contest [4], latin music database [4], low level feature [4], markov model classifier [4], training data [4]
Paper type
unknown
DOI: 10.5281/zenodo.849711
Zenodo URL: https://zenodo.org/record/849711
Abstract
In this paper we analyze the proceedings of all the past six editions of the Sound & Music Computing Confer- ence. The proceedings are analyzed using knowledge based “keywords to text” mapping to discover the overall conference evolution trends. The analysis is done on a basis of number of papers and distinct authors, participa- tion ratio for each relevant topic, the interdependence of topics in terms of shared keywords and the overall popu- larity of keywords. The analysis was done for each con- ference year as well as for the overall collection of pro- ceedings till date. The objective of the discussed work is to provide an insight of the progress made over the past six years in the SMC community that was envisioned in the roadmap.
Keywords
Knowledge mining, SMC Conference, trends
Paper topics
Multimodality in sound and music computing, Web 2.0 and music
Easychair keyphrases
middle level topic [20], high level topic [19], level topic [16], musical sound source separation [6], music computing [6], preliminary analysis [5], distinct author [4], gesture controlled audio system [4], middle level [4], musical performance modeling visualization [4], music information retrieval [4], relevant topic [4]
Paper type
unknown
DOI: 10.5281/zenodo.849701
Zenodo URL: https://zenodo.org/record/849701
Abstract
Query-by-Humming (QBH) is an increasingly popular technology that allows users to browse through a song database by singing/humming a part of the song they wish to retrieve. Besides these cases, QBH can also be used in applications such as Score Alignment and Real-Time Accompaniment. In this paper we present an online QBH algorithm for audio recordings of singing voice, which uses a Multi-Similarity measurement approach to pinpoint the location of a query within a musical piece taking into account the pitch trajectory, phonetic content and RMS energy envelope. Experiments show that our approach can achieve 75% Top-1 accuracy in locating an exact melody from the whole song, and 58% Top-1 accuracy in locating the phrase which contains the exact lyrics – an improvement of 170% over the basic pitch trajectory method. Average query duration is 6 seconds while average runtime is 1.1 times the duration of the query.
Keywords
Dynamic Time Warping, Interactive Music Systems, Lyrics-based Similarity, Query by Humming, Score Alignment
Paper topics
Automatic music generation/accompaniment systems, Automatic music transcription, Interactive performance systems, Musical pattern recognition/modeling, Music information retrieval, Sound/music signal processing algorithms
Easychair keyphrases
pitch contour [14], random guess accuracy [12], dynamic time warping [6], melodic variation [6], lyric matching [5], musical piece [5], reference vocal [5], retrieved phrase [5], average accuracy [4], best candidate [4], contour based dtw [4], post processing [4], real time accompaniment system [4], reference recording [4]
Paper type
unknown
DOI: 10.5281/zenodo.849703
Zenodo URL: https://zenodo.org/record/849703
Abstract
Bodily movement of music performers is widely acknowledged to be a means of communication with the audience. For singers, where the necesity of movement for sound production is limited, postures, i.e. static positions of the body, may be relevant in addition to actual movements. In this study, we present the results of an analysis of a singer's postures, focusing on differences in postures between a dress rehearsal without audience and a concert with audience. We provide an analysis based on manual annotation of postures and propose and evaluate methods for automatic annotation of postures based on motion capture data, showing that automatic annotation is a viable alternative to manual annotation. Results furthermore suggest that the presence of an audience leads the singer to use more `open' postures, and differentiate more between different postures. Also, speed differences of transitions from one posture to another are more pronounced in concert than during rehearsal.
Keywords
Annotation, Body movement, Music Performance
Paper topics
Computational musicology, Multimodality in sound and music computing, Musical pattern recognition/modeling
Easychair keyphrases
data point [12], manual annotation [12], angular velocity [11], automatic annotation [10], left arm [10], arm next [8], concert performance [8], forearm angle [8], right arm [8], mean angular velocity [7], clustering algorithm [6], dress rehearsal [6], right forearm [6], right forearm angle [6], singer posture [6], video recording [6], rehearsal condition [5], angle left arm [4], average angular velocity [4], motion sensing data [4], musical gesture [4], sensor data [4], singing performance [4], systematic difference [4], viola da gamba [4], wilcoxon signed rank test [4]
Paper type
unknown
DOI: 10.5281/zenodo.849595
Zenodo URL: https://zenodo.org/record/849595
Abstract
In this paper we propose a divergence measure which is applied to the analysis of the relationships between gesture and sound. Technically, the divergence measure is defined based on a Hidden Markov Model (HMM) that is used to model the time profile of sound descriptors. We show that the divergence has the following properties: non- negativity, global minimum and non-symmetry. Particularly, we used this divergence to analyze the results of experiments where participants were asked to perform physical gestures while listening to specific sounds. We found that the proposed divergence is able to measure global and local differences in either time alignment or amplitude between gesture and sound descriptors.
Keywords
Divergence Measure, Gesture-Sound Similarity, Hidden Markov Modeling (HMM)
Paper topics
Gesture controlled audio systems, Sonic interaction design
Easychair keyphrases
divergence measure [41], hidden markov model [9], global minimum [7], model state [7], gesture parameter [6], gesture velocity [6], hidden markov modeling [6], hmm based divergence measure [6], temporal evolution [6], hidden markov [5], non negativity [5], sound descriptor [5], temporal alignment [5], audio descriptor sample [4], audio visual speech recognition [4], markov chain [4], model signal sample [4], physical gesture [4], real time [4], sample unit [4], standard deviation [4], time invariant observation distribution [4], time profile [4]
Paper type
unknown
DOI: 10.5281/zenodo.849715
Zenodo URL: https://zenodo.org/record/849715
Abstract
In this paper, we present our research on left hand gesture acquisition and analysis in guitar performances. The main goal of our research is the study of expressiveness. Here, we focus on a detection model for the left hand fingering based on gesture information. We use capacitive sensors to capture fingering positions and we look for a prototyp- ical description of the most common fingering positions in guitar playing. We report the performed experiments and study the obtained results proposing the use of classification techniques to automatically determine the finger positions.
Keywords
capacitive sensor, gesture acquisition, guitar, machine learning
Paper topics
Gesture controlled audio systems
Easychair keyphrases
finger position [13], capacitive sensor [11], hand position [9], left hand [8], finger combination [7], fingering position [6], index finger [6], measured relative capacitance [6], acquisition system [5], automatic classification [5], classical guitar [5], classification technique [5], default fingering [5], gesture information [5], guitar performance [5], played string [5], automatically classified category [4], automatic classifier [4], capacitance measured relative capacitance [4], collected data [4], finger activation [4], guitar playing [4], recorded position [4], weighted averaged f measure [4]
Paper type
unknown
DOI: 10.5281/zenodo.849719
Zenodo URL: https://zenodo.org/record/849719
Abstract
We present an experimental environment for working with physically based sound models. We situate physical models in an interactive multi-modal space. Users may interact with the models through touch, using tangible controllers, or by setting up procedurally animated physical machines. The system responds with both real-time sound and graphics. A built-in strongly-timed scripting language allows for a different kind of exploration. The scripting language may be used to play the models with precise timing, to change their relation, and to create new behaviors. This environment gives direct, concrete ways for users to learn about how physical models work and begin to explore new musical ideas.
Keywords
graphics, interactive, language, multi-modal, physical models
Paper topics
Computer environments for sound/music processing, Interactive performance systems, Interfaces for music creation and fruition, Multimodality in sound and music computing, Physical modeling for sound generation, Visualization of sound/music data
Easychair keyphrases
physical model [16], diffuse illumination multitouch table [8], procedural animation [7], simtime advance [7], direct manipulation [6], direct manipulation environment [6], tangible controller [6], real time [5], stringd pluckatfraction [5], finite difference [4], real world [4], textual language [4]
Paper type
unknown
DOI: 10.5281/zenodo.849713
Zenodo URL: https://zenodo.org/record/849713
Abstract
The concept of dynamical form is presented as a dimension of music perception. Dynamical form refers to the subjective perception of temporal events in music (explosive, fading out, rising etc.). In a behavioral experiment listeners were asked to categorize musical excerpts varying in musical period, tonality, instrumentation, and acoustic features while attending to their dynamical form. Data indicates that subjects are sensitive to dynamical forms, but were particularly sensitive to a specific one (suspense). We also discuss a method of categorizing dynamical forms in terms of force dynamics.
Keywords
Dynamical form, Musical semantics, Music perception
Paper topics
Musical pattern recognition/modeling, Music and emotions, Sound/music perception and cognition
Easychair keyphrases
dynamical form [37], force tendency [10], music perception [9], stable state [7], changing state pattern [6], force dynamical pattern [6], force dynamical structure [6], musical excerpt [6], man leaning [5], atonal orch [4], behavioral data [4], force dynamical [4], free clustering task [4], verbal label [4]
Paper type
unknown
DOI: 10.5281/zenodo.849705
Zenodo URL: https://zenodo.org/record/849705
Abstract
We propose an approach to audio-based data-driven music visualization and an experimental design to study if the music visualization can aid listeners in identifying the structure of music. A three stage system is presented including feature extraction, the generation of a recurrence plot and the creation of an arc diagram to visualize the repetitions within a piece. Then subjects are asked to categorize simple forms of classical music with and without audio and visual cues provided. The accuracy and speed are measured. The results show that the visualization can reinforce the identification of musical forms.
Keywords
arc diagram, music structure analysis, music visualization, recurrence plot
Paper topics
Music information retrieval, Sound/music perception and cognition, Visualization of sound/music data
Easychair keyphrases
arc diagram [17], recurrence plot [11], phase space [9], musical form [7], music structure analysis [6], time series [6], visual cue [5], chroma feature [4], classical music [4], main theme [4], music information retrieval [4], music visualization [4], segmentation boundary [4], strophic form [4]
Paper type
unknown
DOI: 10.5281/zenodo.849721
Zenodo URL: https://zenodo.org/record/849721
Abstract
In this paper, we propose a computational method of automatic music composition which generates pieces based on counterpoint and imitation. Counterpoint is a compositional technique to make several independent melodies which sound harmonious when they are played simultaneously. Imitation is another compositional technique which repeats a theme in each voice and associate the voices. Our computational method consists of the stochastic model of counterpoint and that of imitation. Both stochastic models are simple Markov models whose unit of state is a beat. We formulate the problem as the problem to find the piece which maximize the product of probabilities that correspond to both stochastic models. Dynamic programming can be used to find the solution because the models are simple Markov models. Experimental results show that our method can generate pieces which satisfy the requirements of counterpoint within two successive beats, and can realize imitations of the theme with flexible transformations.
Keywords
automatic compotion, counterpoint, dynamic programming, imitation, stochastic model
Paper topics
Automatic music generation/accompaniment systems
Easychair keyphrases
stochastic model [21], successive beat [15], dynamic programming [13], markov model [13], transition probability [12], pitch transition [7], unigram probability [7], absolute value [6], melodic interval [5], probability distribution [5], rhythm pattern [5], automatic composition [4], automatic counterpoint [4], data sparseness problem [4], free counterpoint [4], generate piece [4], last slice [4], lower voice [4], musical knowledge [4], musical piece [4], similar way [4], successive accented perfect fifth [4]
Paper type
unknown
DOI: 10.5281/zenodo.849723
Zenodo URL: https://zenodo.org/record/849723
Abstract
In this paper we propose to use a set of block-level audio features for automatic tag prediction. As the proposed feature set is extremely high-dimensional we will investigate the Principal Component Analysis (PCA) as compression method to make the tag classification computationally tractable. We will then compare this block-level feature set to a standard feature set that is used in a state-of-the-art tag prediction approach. To compare the two feature sets we report on the tag classification results obtained for two publicly available tag classification datasets using the same classification approach for both feature sets. We will show that the proposed features set outperform the standard feature set, thus contributing to the state-of-the-art in automatic tag prediction.
Keywords
block-level features, tag classification, tag prediction
Paper topics
Music information retrieval, Sound/music signal processing algorithms
Easychair keyphrases
block level feature [31], block level [27], tag classification [27], feature set [22], automatic tag prediction [14], level feature [13], summarization function [13], automatic tag classification [11], hop size [11], music information retrieval [11], standard feature set [11], tag prediction [11], performance measure [10], principal component [10], audio feature [9], block size [9], delta spectral pattern [9], automatic tag [8], frequency band [8], feature vector [7], music similarity estimation [7], second classification stage [7], stacked generalization [7], total variance [7], audio signal [6], classification stage [6], level feature set [6], probabilistic output [6], tag affinity vector [6], tag classification task [6]
Paper type
unknown
DOI: 10.5281/zenodo.849725
Zenodo URL: https://zenodo.org/record/849725
Abstract
Automatic composition techniques are important in sense of upgrading musical applications for amatuer musicians such as composition support systems. In this paper, we present an algorithm that can automatically generate songs from Japanese lyrics. The algorithm is designed by considering composition as an optimal solution search problem under constraints given by the prosody of the lyrics. To verify the algorithm, we launched "Orpheus" which composes with the visitor's lyrics on the web-site, and 56,000 songs were produced within a year. Evaluation results on generated songs are also reported, indicating that "Orpheus" can help users to compose their original Japanese songs.
Keywords
Automatic Composition, Lyrics, Probabilistic Models, Prosody, Song
Paper topics
Automatic music generation/accompaniment systems, Computational musicology, Musical pattern recognition/modeling
Easychair keyphrases
japanese lyric [6], evaluation result [5], pitch accent [5], amateur musician [4], automatic composition [4], composition algorithm [4], composition system [4], japanese song [4], pitch sequence [4]
Paper type
unknown
DOI: 10.5281/zenodo.849727
Zenodo URL: https://zenodo.org/record/849727
Abstract
In this paper, the project AV Clash will be presented. AV Clash is a Web-based tool for integrated audiovisual expression, created by Video Jack (Nuno N. Correia and André Carrilho, with the assistance of Gokce Taskan). In AV Clash, users can manipulate seven “objects” that represent sounds, incorporating audioreactive animations and graphical user interface elements to control animation and sound. The sounds are retrieved from online sound database Freesound.org, while the animations are internal to the project. AV Clash addresses the following research question: how to create a tool for integrated audiovisual expression, with customizable content, which is flexible, playful to use and engaging to observe? After an introduction to the project, a contextualization with similar works is presented, followed by a presentation of the motivations behind the project, and past work by Video Jack. Then the project and its functionalities are described. Finally, conclusions are presented, assessing the achievement of the initial aims, and addressing the limitations of the project, while outlining paths for future developments.
Keywords
Audiovisual tool, Freesound.org, Net art, Performance, Sound visualization
Paper topics
Interactive performance systems, Interfaces for music creation and fruition, Visualization of sound/music data, Web 2.0 and music
Easychair keyphrases
selection button [12], video jack [11], visual effect [8], integrated audiovisual expression [7], loop selection button [7], back stage button [6], playing iavo [6], audio reactive [5], freesound radio [5], audio effect selection button [4], effect button [4], graphical user interface element [4], iavo user interface [4], online sound database freesound [4], pre loader [4]
Paper type
unknown
DOI: 10.5281/zenodo.849729
Zenodo URL: https://zenodo.org/record/849729
Abstract
Chord symbols and progressions are a common way to describe musical harmony. In this paper we present SEQ, a pattern representation using the Web Ontology Language OWL DL and its application to modelling chord sequences. SEQ provides a logical representation of order information, which is not available directly in OWL DL, together with an intuitive notation. It therefore allows the use of OWLreasoners for tasks such as classification of sequences by patterns and determining subsumption relationships between the patterns. We present and discuss application examples using patterns obtained from data mining.
Keywords
chord sequences, discovery, OWL, pattern, subsumption
Paper topics
Computational musicology, Musical pattern recognition/modeling, Music information retrieval, Web 2.0 and music
Easychair keyphrases
meeus kp degree [17], basedegree triad rootmvt [12], chord sequence [10], distinctive pattern [8], subsumption relationship [8], feature set pattern [7], kp degree basedegree [6], seq pattern [6], owl reasoner [5], degree basedegree triad [4], description logic [4], distinctive chord pattern [4], first order logic [4], pattern discovery [4], pattern subsumption [4], semantic web [4]
Paper type
unknown
DOI: 10.5281/zenodo.849733
Zenodo URL: https://zenodo.org/record/849733
Abstract
In this paper, three state of the art non-stationary sinusoidal analysis methods based on Fourier transform (FT) are compared - the derivative method, reassignment and generalized reassignment. The derivative method and reassignment were designed to analyze linear log-AM/linear FM sinusoids. Generalized reassignment can analyze sinusoids containing arbitrary order modulations, however the discussion will be limited to linear log-AM/linear FM in order to compare it objectively to reassignment and the derivative method. In this paper, the equivalence of reassignment and the derivative method is shown to hold for arbitray order modulation estimation and theoretical comparison with generalized reassignment is presented. The results of tests conducted on two different frequency ranges, full range (frequencies up to Nyquist) and reduced range (frequencies up to 3/4 Nyquist) frequency range, are compared to the Cramer-Rao bounds (CRBs).
Keywords
derivative method, non-stationary analysis, reassignment method, signal processing, sinusoidal modeling
Paper topics
Sound/music signal processing algorithms
Easychair keyphrases
derivative method [35], generalized reassignment [18], linear log [14], time derivative [14], window function [12], frequency range [9], frequency estimate [8], estimation error [7], linear fm sinusoid [7], parameter estimation [7], signal processing [7], digital audio effect [6], reassignment method [6], bin frequency [5], higher order [5], noise ratio [5], analysis method [4], arbitrary order [4], derivative method stft expression [4], frequency derivative [4], general equation [4], non stationary [4], order modulation [4], reduced frequency range [4], see equation [4], signal derivative [4], spectrum peak [4]
Paper type
unknown
DOI: 10.5281/zenodo.849735
Zenodo URL: https://zenodo.org/record/849735
Abstract
Multimedia scenarios have multimedia content and inter- active events associated with computer programs. Inter- active Scores (IS) is a formalism to represent such sce- narios by temporal objects, temporal relations (TRs) and interactive events. IS describe TRs, but IS cannot repre- sent TRs together with conditional branching. We propose a model for conditional branching timed IS in the Non- deterministic Timed Concurrent Constraint (ntcc) calculus. We ran a prototype of our model in Ntccrt (a real-time ca- pable interpreter for ntcc) and the response time was ac- ceptable for real-time interaction. An advantage of ntcc over Max/MSP or Petri Nets is that conditions and global constraints are represented declaratively.
Keywords
concurrent constraint programming, interactive multimedia, interactive scores, ntccrt, real time
Paper topics
Computer environments for sound/music processing, Interactive performance systems
Easychair keyphrases
conditional branching [14], interactive score [14], time unit [14], real time interaction [9], real time [8], real time capable interpreter [8], concurrent constraint [7], desainte catherine [7], interactive event [7], rigid duration [7], temporal object [7], timed conditional relation [6], local variable [5], start point [5], controlt ransf [4], interactive multimedia [4], local constraint [4], nondeterministic timed concurrent constraint [4], petri net [4], pure data [4], temporal concurrent constraint programming [4], w aitf orallp [4]
Paper type
unknown
DOI: 10.5281/zenodo.849737
Zenodo URL: https://zenodo.org/record/849737
Abstract
In this paper we describe how graphical scores can be coupled with synthesis algorithms in the visual programming language PWGL. The present approach is based on an extensible music notation and a direct connection to a flexible sound synthesis engine. We implement, as an exercise, a simple working model that makes it possible create graphical scores out of user defined graphical objects and connect the graphical objects to specific synthesis methods.
Keywords
Graphical notation, musical performance, sound synthesis
Paper topics
Computer environments for sound/music processing, Interactive performance systems, Interfaces for music creation and fruition, Visualization of sound/music data
Easychair keyphrases
graphical object [16], graphical score [12], visual instrument definition [11], synthesis algorithm [10], pwgl playback device [9], computer music [8], playback device [7], sample player [7], playback event [5], sound sample [5], enp expression [4], enp expression designer [4], extensible music notation system [4], filename property [4], flexible sound synthesis engine [4], international computer music [4], prepare playback [4], sound synthesis [4], synthesis instrument [4], synth plug box [4], visual programming language pwgl [4]
Paper type
unknown
DOI: 10.5281/zenodo.849739
Zenodo URL: https://zenodo.org/record/849739
Abstract
This paper proposes a computationally efficient method for computing the constant-Q transform (CQT) of a time-domain signal. CQT refers to a time-frequency representation where the frequency bins are geometrically spaced and the Q-factors (ratios of the center frequencies to bandwidths) of all bins are equal. An inverse transform is proposed which enables a reasonable-quality (around 55dB signal-to-noise ratio) reconstruction of the original signal from its CQT coefficients. Here CQTs with high Q-factors, equivalent to 12–96 bins per octave, are of particular interest. The proposed method is flexible with regard to the number of bins per octave, the applied window function, and the Q-factor, and is particularly suitable for the analysis of music signals. A reference implementation of the proposed methods is published as a Matlab toolbox. The toolbox includes user-interface tools that facilitate spectral data visualization and the indexing and working with the data structure produced by the CQT.
Keywords
Acoustic signal analysis, Constant-Q transform, Music, Wavelet transform
Paper topics
Automatic music transcription, Computer environments for sound/music processing, Musical pattern recognition/modeling, Musical sound source separation and recognition
Easychair keyphrases
spectral kernel [19], frequency bin [17], time domain signal [14], window function [13], constant q transform [9], inverse transform [9], blackman harris window [7], center frequency [7], frequency resolution [7], high q factor [7], discrete fourier transform [6], time frequency [6], time shifted atom [6], conjugate transpose [5], interface tool [5], lowpass filter [5], signal processing [5], time domain [5], transform domain [5], atom hop size [4], complex valued dft [4], computational efficiency [4], highest snr value [4], hop size [4], inverse cqt transform [4], lowest frequency bin [4], reference implementation [4], spectral kernel ak [4], temporal kernel [4], time resolution [4]
Paper type
unknown
DOI: 10.5281/zenodo.849741
Zenodo URL: https://zenodo.org/record/849741
Abstract
This study takes place in the framework of an ongoing research dealing with the analysis, synthesis and gestural control of sonic textures. In this paper, we describe two recent contributions related to this field: the first one aimed at providing a sonic textures space based on human perception. For that purpose, we conducted a psychoacoustic experiment, relying on the tangible interface, where subjects were asked to evaluate similarity between sonic textures by gathering them in several groups. The second part of this study aimed at experimenting the control of sonic textures synthesis using a tangible interactive table. We also designed a musical tabletop application inspired by the metaphor of a sonic space exploration. This gave very promising insights on the possibilities offered by such interfaces for the real-time processing of sonic textures.
Keywords
perceptual space, sonic interaction design, sonic textures, tangible interface
Paper topics
Computer environments for sound/music processing, Gesture controlled audio systems, Interactive performance systems, Interfaces for music creation and fruition, Sonic interaction design
Easychair keyphrases
sonic texture [61], sonic texture space [16], sonic space [15], distance matrix [13], psychoacoustic experiment [13], tangible interface [13], perceptual sonic texture space [10], tangible object [10], grouping experiment [7], musical tabletop application [7], tangible sound actuator [7], multidimensional scaling analysis [6], music information retrieval [6], sonic texture grouping experiment [6], tangible sound [6], texture space [6], group according [5], listening area [5], sound attractor [5], sound classification [5], actual tangible sound object [4], group classification [4], musical sonic texture [4], music computing [4], scalar product matrix [4], sonic texture distance matrix [4], sonic texture space resulting [4], sound actuator [4], sound grouping experiment [4], tangible interactive table [4]
Paper type
unknown
DOI: 10.5281/zenodo.849743
Zenodo URL: https://zenodo.org/record/849743
Abstract
Systems able to find a song based on a sung, hummed, or whistled melody are called Query-By-Humming (QBH) systems. Tunebot is an online QBH web service and iPhone app that connects users to the desired re-cording on Amazon.com or iTunes. Tunebot’s searchable database is composed of thousands of user-contributed melodies. Melodies are collected from user queries, sung contributions and through contributions from on-line play of an associated iPhone Karaoke game: Karaoke Callout. In this paper we describe the architecture and workings of the paired systems, as well as issues involved in building a real-world, working music search engine from user-contributed data.
Keywords
crowdsourcing, gwap, mobile, music, query-by-humming
Paper topics
Mobile music, Music information retrieval, Web 2.0 and music
Easychair keyphrases
karaoke callout [14], search key [13], music information retrieval [11], search engine [11], note interval [9], pitch weight [9], rhythm weight [8], music search engine [6], real world [6], user query [6], vantage point tree [6], correct target [5], back end [4], correct song [4], edit distance [4], flash based web interface [4], front end [4], humming system [4], matching algorithm [4], melodic similarity [4], northwestern university eec [4], online qbh web service [4], que sera sera [4], right rank [4], sheridan road evanston [4]
Paper type
unknown
DOI: 10.5281/zenodo.849745
Zenodo URL: https://zenodo.org/record/849745
Abstract
In this paper we describe a method to detect patterns in dance movements. Such patterns can be used in the context of interactive dance systems to allow dancers to influence computational systems with their body movements. For the detection of motion patterns, dynamic time warping is used to compute the distance between two given movements. A custom threshold clustering algorithm is used for subsequent unsupervised classification of movements. For the evaluation of the presented method, a wearable sensor system was built. To quantify the accuracy of the classification, a custom label space mapping was designed to allow comparison of sequences with disparate label sets.
Keywords
Dynamic Time Warping, Interactive Dance, Pattern Recognition, Wearable Sensor System
Paper topics
Interactive performance systems
Easychair keyphrases
dance movement [11], error rate [11], interactive dance [10], arm movement [9], interface board [8], dynamic time warping [7], sensor node [7], inertial sensor node [6], motion data [5], sensor data [5], axis accelerometer [4], class label [4], cost matrix [4], dance club [4], gumstix embedded linux system [4], international computer music [4], non temporal feature classification [4], threshold value [4], wearable sensor system [4]
Paper type
unknown
DOI: 10.5281/zenodo.849749
Zenodo URL: https://zenodo.org/record/849749
Abstract
Faust is a functional programming language dedicated to the specification of executable monorate synchronous musical applications. To extend Faust capabilities to domains such as spectral processing, we introduce here a multirate extension of the core Faust language. The key idea is to link rate changes to data structure manipulation operations: creating a vector-valued output signal divides the rate of input signals by the vector size, while serializing vectors multiplies rates accordingly. This interplay between vectors and rates is made possible in the language static semantics by the introduction of dependent types. We present a typing semantics, a denotational semantics and a correctness theorem that show that this extension preserves the language synchonous characteristics. This new design is under current implementation in the Faust compiler.
Keywords
dependent types, functional programming language, multirate, vector signal processing
Paper topics
Computer environments for sound/music processing, Digital Audio Effects, Sound/music signal processing algorithms
Easychair keyphrases
signal processor [20], static semantic [10], dependent type [9], multirate extension [8], output signal [8], type correct [8], frequency correctness theorem [7], denotational semantic [6], faust expression [6], output impedance [6], rated type [6], signal processing [6], type correctness [6], vector size [6], core faust [5], faust primitive [5], type environment [5], type scheme [5], composition operator [4], dynamic semantic [4], faust program [4], faust static semantic [4], international computer music [4], predefined identifier [4], programming language [4], static domain [4], static typing semantic [4], type type [4]
Paper type
unknown
DOI: 10.5281/zenodo.849751
Zenodo URL: https://zenodo.org/record/849751
Abstract
Existing methods for sound texture synthesis are often concerned with the extension of a given recording, while keeping its overall properties and avoiding artefacts. However, they generally lack controllability of the resulting sound texture. After a review and classification of existing approaches, we propose two methods of statistical modeling of the audio descriptors of texture recordings using histograms and Gaussian mixture models. The models can be interpolated to steer the evolution of the sound texture between different target recordings (e.g. from light to heavy rain). Target descriptor values are stochastically drawn from the statistic models by inverse transform sampling to control corpus-based concatenative synthesis for the final sound generation, that can also be controlled interactively by navigation through the descriptor space. To better cover the target descriptor space, we expand the corpus by automatically generating variants of the source sounds with transformations applied, and storing only the resulting descriptors and the transformation parameters in the corpus.
Keywords
audio descriptors, corpus-based concatenative synthesis, Gaussian mixture models, sound textures, statistic modeling
Paper topics
Digital Audio Effects, Musical pattern recognition/modeling, Musical performance modeling, Music information retrieval
Easychair keyphrases
sound texture [31], corpus based concatenative synthesis [18], digital audio effect [9], sound texture synthesis [9], granular synthesis [8], audio descriptor [7], gaussian mixture model [7], synthesis method [7], texture synthesis [7], descriptor space [6], real time [5], statistical modeling [5], descriptor based sound texture [4], descriptor value [4], heavy rain [4], signal processing [4], spectral centroid [4], target descriptor [4], target descriptor value [4]
Paper type
unknown
DOI: 10.5281/zenodo.849753
Zenodo URL: https://zenodo.org/record/849753
Abstract
We present D-Jogger, a music interface that makes use of body movement to dynamically select music and adapt its tempo to the user’s pace. D-Jogger consists of several independent modules, such as a step detection algorithm and tempo-aware playlists, to achieve this goal. The research done with D-Jogger has focused on entrainment: the synchronization of two rhythmical processes, in this case music and walking. We present several ways of visualizing entrainment data, including synchronization plots and phase histograms. A pilot experiment was performed using D-Jogger with 33 participants. Preliminary data suggest that, when the music’s tempo and the user’s pace are close enough to each other, most users synchronize their walking to the music - taking a step with each beat. A user survey indi-cated that participants experience this effect as stimulat-ing and motivating. Several other application domains for D-Jogger are possible: personal training devices for joggers, rehabilita-tion therapy for Parkinson patients or simply as a nice-to-have application for your mobile phone.
Keywords
Body movement, Entrainment, Step detection, Synchronization
Paper topics
Gesture controlled audio systems
Easychair keyphrases
synchronization plot [8], anti phase [7], phase plot [7], pilot experiment [7], step detection algorithm [6], phase histogram [5], alignment algorithm [4], mobile version [4], musicology ghent university [4]
Paper type
unknown
DOI: 10.5281/zenodo.849747
Zenodo URL: https://zenodo.org/record/849747
Abstract
This paper provides an overview of a cueing system, the Master Cue Generator (MCG) used to trigger performers (humans or computers) over a network. The performers are scattered in several locations and receive cues to help them interact musically over the network. The paper proposes a classification of cues that dynamically evolve and reshape as the performance takes place. This begets the exploration of vari-ous issues such as how to represent and port a hierarchy of control over a networked music performance and also takes into account pa-rameters inherent to a network such as latency and distance. This approach is based on several years of practice led research in the field of network music performance (NMP), a discipline that is gaining grounds within the music technology community both as a practice and through the development of tools and strategies for interacting over disparate locations.
Keywords
Cueing, Improvisation, Music Networks, Network Performance
Paper topics
Interactive performance systems, Web 2.0 and music
Easychair keyphrases
network topology [7], structured improvisation [5], behavioural cue [4], computer music [4], process centered musical network [4]
Paper type
unknown
DOI: 10.5281/zenodo.849755
Zenodo URL: https://zenodo.org/record/849755
Abstract
Finite Difference (FD) methods can be the basis for physics-based music instrument models that generate realistic audio output. However, such methods are compute-intensive; large simulations cannot run in real time on current CPUs. Many current systems now include powerful Graphics Processing Units (GPUs), which are a good fit for FD methods. We describe an implementation of an FD-based simulation of a two-dimensional membrane that runs efficiently on mid-range GPUs; this will form a framework for constructing a variety of realistic software percussion instruments. For selected problem sizes, realtime sound generation was demonstrated on a mid-range test system, with speedups of up to 2.9 over pure CPU execution.
Keywords
GPU, Physical Modeling, Realtime, Sound, Synthesis
Paper topics
Physical modeling for sound generation
Easychair keyphrases
grid size [23], buffer size [22], finite difference [19], output buffer [18], real time [14], output buffer size [12], audio output buffer [11], pinned memory [11], parallel implementation [8], memory transfer [7], sound synthesis [7], thread block [7], boundary gain [6], col u row [6], finite difference membrane simulation [6], finite difference method [6], serial implementation [6], synthesis algorithm [6], audio output [4], cuda core running [4], difference based sound synthesis [4], general purpose computing [4], powerful graphic processing unit [4], real time sound synthesis [4], simulation grid size [4], test system [4], total execution time [4], varying grid size [4]
Paper type
unknown
DOI: 10.5281/zenodo.849757
Zenodo URL: https://zenodo.org/record/849757
Abstract
The focus of this project is the manipulation of a robotic voice signal for the purpose of adding emotional expression. In particular, the main aim was to design the emotion expressed by a robotic voice by manipulating specific acoustic parameters such as pitch, amplitude and tempo of the speech. The three basic emotions considered were: anger, happiness and sadness. Knowledge based on the analysis of emotional sentences recorded by actors was used to develop a program in Max/MSP to ‘emotionally’ manipulate neutral sentences produced by a Text-To-Speech (TTS) synthesizer. A listening test was created to verify the program success in simulating different emotions. We found that test subjects could separate the sad sentences from the others, while the discrimination between angry and happy sentences was not as clear.
Keywords
emotion, voice
Paper topics
Digital Audio Effects, Sound/music perception and cognition
Easychair keyphrases
speech rate [17], happy phrase [12], angry phrase [10], sad phrase [10], emotional state [9], basic emotion [8], pitch contour [8], robotic voice [8], fundamental frequency [7], human voice [7], neutral phrase [7], actor performance [6], downward directed pitch contour [6], happiness neutral sadness [6], human speech [5], listening test [5], actor voice [4], chi square [4], emotional expression [4], emotion anger happiness [4], function graph [4], highest intensity score [4], intended emotion [4], maximum amplitude peak [4], speech sound [4], vocal fold [4], voice signal [4]
Paper type
unknown
DOI: 10.5281/zenodo.849759
Zenodo URL: https://zenodo.org/record/849759
Abstract
In this paper, we present an experiment whose goal was to recognize the role of contextual information in the recognition of environmental sounds. 43 subjects participated to a between-subjects experiment where they were asked to walk on a limited area in a laboratory, while the illusion of walking on different surfaces was simulated, with and without an accompaining soundscape. Results show that, in some conditions, adding a soundscape significantly improves surfaces' recognition.
Keywords
soundscapes, walking sounds
Paper topics
Interactive performance systems, Sonic interaction design
Easychair keyphrases
beach sand [18], footstep sound [13], forest underbrush [13], correct answer [9], incoherent soundscape [9], coherent soundscape [7], frozen snow high grass [6], ski slope [6], confusion matrix [5], aalborg university copenhagen [4], beach sand dry [4], creaking wood [4], dry leave [4], gravel concrete carpet ub [4], interactive system [4], know metal dirt [4], medialogy department aalborg [4], snow beach sand [4], snow forest underbrush [4], subject experiment [4], virtual environment [4], virtual reality [4], wood frozen snow [4]
Paper type
unknown
DOI: 10.5281/zenodo.849761
Zenodo URL: https://zenodo.org/record/849761
Abstract
Most automatic chord recognition systems follow a standard approach combining chroma feature extraction, filtering and pattern matching. However, despite much research, there is little understanding about the interaction between these different components, and the optimal parameterization of their variables. In this paper we perform a systematic evaluation including the most common variations in the literature. The goal is to gain insight into the potential and limitations of the standard approach, thus contributing to the identification of areas for future development in automatic chord recognition. In our study we find that filtering has a significant impact on performance, with self-transition penalties being the most important parameter; and that the benefits of using complex models are mostly, but not entirely, offset by an appropriate choice of filtering strategies.
Keywords
Chord Estimation, Chord Recognition, Hidden Markov Model, HMM, MIR
Paper topics
Automatic music transcription, Musical pattern recognition/modeling, Music information retrieval
Easychair keyphrases
chord model [41], chord recognition [20], post filtering [17], pattern matching [15], automatic chord recognition [14], pre filtering [12], transition matrix [12], transition probability [10], full covariance matrix [9], self transition probability [9], binary chord template [7], chord recognition system [7], covariance matrix [7], gmm chord model [7], pitch class [7], transition probability matrix [7], chroma feature [6], chroma feature extraction [6], hidden markov model [6], mixture component [6], probabilistic chord model [6], training data [6], binary template [5], chord template [5], feature extraction [5], frame rate [5], gaussian model [5], major triad [5], minor triad [5], transition penalty [5]
Paper type
unknown
DOI: 10.5281/zenodo.849763
Zenodo URL: https://zenodo.org/record/849763
Abstract
This paper describes the development so far of a system that uses multiparametric controllers along with an interactive high-level search process to navigate timbre spaces. Either of two previously developed interfaces are used as input devices; a hand tracking system and a malleable foam controller. Both interfaces share the property of streaming continuous multiparametric codependent data. When these data streams are mapped to synthesis parameters, the controllers can be used to explore the parameter space in an embodied manner; with the hand tracker, moving or changing the shape of the hand changes the sound, and with the foam, deforming its shape changes the sound. The controllers become too sensitive with larger parameter spaces, so a navigation system was developed to enable high level control over the subset of the parameter space in which the controllers are working. By moving and refining the working range, a timbre space can be progressively explored to find a desired sound. The search process was developed by focusing on three scenarios, the control of four, ten and forty dimensional timbre spaces. Using the system is an interactive process, while one hand is used for detailed search with one of the input devices, the other hand controls high level search parameters with MIDI and the computer keyboard. Initial reactions from two musicians indicate the development so far to be successful, the next stage in this project is to carry out formal user studies.
Keywords
control, mapping, synthesis
Paper topics
Gesture controlled audio systems, Interfaces for music creation and fruition, Sonic interaction design
Easychair keyphrases
timbre space [24], timbre space navigation [12], parameter space [8], echo state network [7], computer vision based hand [6], multiparametric controller [6], vision based hand tracking [6], control stream [5], hand tracker [5], search process [5], synthesis parameter [5], dimensional timbre space [4], forty dimensional timbre space [4], hand geometry data [4], hand tracking system [4], high level search [4], malleable foam controller [4], musical control [4], working range [4]
Paper type
unknown
DOI: 10.5281/zenodo.849765
Zenodo URL: https://zenodo.org/record/849765
Abstract
This paper describes a real-time audio analysis/resynthesis system that we developed for a music piece for ensemble and electronics. The system combines real-time audio analysis and concatenative synthesis based on the segmentation of sound streams into constituting segments and the description of segments by an efficient set of descriptors adapted to the given musical context. The system has been implemented in Max/MSP using the FTM & Co and MuBu libraries and successfully employed in the production and performance of the piece. As more and more research in the domain of music information retrieval, we use the term of typo-morhpology to designate the description of sounds by morphologic criteria including the temporal evolution of sound features that also can provide pertinent means for the classification of sounds. Although, the article mainly insists on the technical aspects of the work, it occasionally contextualizes the different technical choices regarding particular musical aspects.
Keywords
audio descriptors, audio musaicing, music information retrieval, real-time analysis/resynthesis, typo-morhpology
Paper topics
Automatic music generation/accompaniment systems, Interactive performance systems, Music information retrieval, Sound/music signal processing algorithms
Easychair keyphrases
real time [20], sound material [10], pitch content [9], pitch distribution table [9], standard deviation [9], data base [8], sound segment [8], analysis sub system [7], pitch distribution [7], loudness envelope [6], music information retrieval [6], real time analysis [6], real time audio analysis [6], description data [5], descriptor value [5], musical writing [5], audio processing [4], audio stream [4], effective duration [4], envelope skewness [4], harmonicity coefficient [4], onset detection function [4], onset time [4], real time audio processing [4], relaxed real time [4], resynthesis system [4], segment description [4], sound description [4], sound feature [4], sub system [4]
Paper type
unknown
DOI: 10.5281/zenodo.849767
Zenodo URL: https://zenodo.org/record/849767
Abstract
This paper discusses a system capable of detecting the position of the listener through a head-tracking system and rendering a 3D audio environment by binaural spatialization. Head tracking is performed through face recognition algorithms which use a standard webcam, and the result is presented over headphones, like in other typical binaural applications. With this system users can choose an audio file to play, provide a virtual position for the source in an euclidean space, and then listen to the sound as if it is coming from that position. If they move their head, the signal provided by the system changes accordingly in real-time, thus providing a realistic effect.
Keywords
3d audio, binaural spatialization, headtracking, real-time
Paper topics
3D sound/music, Sonic interaction design, Sound/music signal processing algorithms
Easychair keyphrases
binaural spatialization [10], head tracking system [9], real time [9], impulse response [7], head tracker [6], sound source [6], virtual reality [6], cipic database [4], head tracking [4], immersive experience [4]
Paper type
unknown
DOI: 10.5281/zenodo.849769
Zenodo URL: https://zenodo.org/record/849769
Abstract
In audio based music similarity, a well known effect is the existence of hubs, i.e. songs which appear similar to many other songs without showing any meaningful per- ceptual similarity. We show that this effect depends on the homogeneity of the samples under consideration. We compare three small sound collections (consisting of poly- phonic music, environmental sounds, and samples of indi- vidual musical instruments) with regard to their hubness. We find that the collection consisting of cleanly recorded musical instruments produces the smallest hubs, wheres hubness increases with inhomogeneity of the audio sig- nals. We also conjecture that hubness may have an impact on the performance of dimensionality reduction algorithms like Multidimensional Scaling.
Keywords
Content-Based, Hubs, Interaction, Music Similarity, Visualization
Paper topics
Music information retrieval, Visualization of sound/music data
Easychair keyphrases
data base [9], kullback leibler divergence [9], music information retrieval [9], musical instrument [8], frequency cepstral coefficient [7], multidimensional scaling [7], music collection [7], sound sample [7], audio similarity [6], environmental sound [5], hub problem [5], mel frequency [5], austrian research institute [4], data set [4], sound texture [4]
Paper type
unknown
DOI: 10.5281/zenodo.849771
Zenodo URL: https://zenodo.org/record/849771
Abstract
Even if lasting less than three minutes, Iannis Xenakis' Concret PH is one of the most influential works in the electroacoustic domain. It was originally created to be diffused in the Philips Pavilion, designed by the same Xenakis for the 1958 World Fair in Brussels. As the Pavilion was dismantled in 1959, the original spatialization design devised from the Pavilion has been lost. The paper presents new findings about the spatialization of Concret PH. It discusses them in the light of Xenakis' aesthetics, and consequently proposes a plausible reconstruction of the spatialization design. Finally, it proposes a real-time, interactive implementation of the reconstructed spatialization, rendered on a 8-channel setup using a VBAP technique.
Keywords
Concret PH, Electroacoustic music, Poème électronique, Spatialization, Xenakis
Paper topics
3D sound/music, access and modelling of musical heritage, Interactive performance systems, Technologies for the preservation
Easychair keyphrases
philips pavilion [14], computer music [9], control engine [8], audio engine [6], iannis xenaki [6], interlude sonore [6], sound route [6], sound source [6], virtual space [6], electronic music [5], real time [5], vep project [5], internal buss [4], philips technical review [4], sound material [4], third dimension [4], vbap algorithm [4], virtual reality [4], world fair [4]
Paper type
unknown
DOI: 10.5281/zenodo.849773
Zenodo URL: https://zenodo.org/record/849773
Abstract
An Augmented Music Score is a graphic space providing the representation, composition and manipulation of heterogeneous music objects (music scores but also images, text, signals...), both in the graphic and time domains. In addition, it supports the representation of the music performance, considered as a specific sound or gestural instance of the score. This paper presents the theoretical foundation of the augmented music score as well as an application - an augmented score viewer - that implements the proposed solutions.
Keywords
graphic signal, music score, segmentation, synchronization
Paper topics
Visualization of sound/music data
Easychair keyphrases
graphic signal [22], music score [20], graphic segment [12], augmented music score [11], constant color signal [9], fundamental frequency [9], performance representation [9], graphic space [7], time space [7], music notation [6], augmented score [5], color signal [5], computer music [5], graphic representation [5], message string [5], music representation [5], osc address [5], score component [5], time position [5], time segment [5], augmented music score viewer [4], augmented score viewer [4], constant color signal figure [4], constant thickness signal kc [4], first order music score [4], international computer music [4], music time [4], osc message [4], time graphic [4], time mapping [4]
Paper type
unknown
DOI: 10.5281/zenodo.849775
Zenodo URL: https://zenodo.org/record/849775
Abstract
Conceptual musical works that lead to a multitude of realizations are of special interest. One can’t talk about a performance without taking into account the rules that lead to the existence of that particular presentation. After dealing with similar works of open form by Iannis Xenakis and Karlheinz Stockhausen, the interest in John Cage’s music is evident. His works are “so free” that one can play any part of the material; even a void set is welcomed. The freedom is maximal and still there are decisions to consider in order to perform the piece. The present article focus on the Concert for Piano and Orchestra of 1957–58, and it is part of the Cagener project, intended to develop a set of conceptual and software tools, which generates a representation of the pieces, intended to assist the performers in their task. The computer serves as a partner in making choices of multiple possibilities, mix together sounds of different sources and of various kinds and following compositional ideas clearly stated.
Keywords
Computer Aided Performance, Musical Analysis, Musical Modelling
Paper topics
Automatic music generation/accompaniment systems, Computational musicology, Interfaces for music creation and fruition
Easychair keyphrases
not available
Paper type
unknown
DOI: 10.5281/zenodo.849777
Zenodo URL: https://zenodo.org/record/849777
Abstract
A range of systems exist for collaborative music making on multi-touch surfaces. Some of them have been highly successful, but currently there is no systematic way of designing them, to maximize collaboration for a particular user group. We are particularly interested in systems that will engage novices and experts. We designed a simple application in an initial attempt to clearly analyze some of the issues. Our application allows groups of users to express themselves in collaborative music making using pre-composed materials. User studies were video recorded and analyzed using two techniques derived from Grounded Theory and Content Analysis. A questionnaire was also conducted and evaluated. Findings suggest that the application affords engaging interaction. Enhancements for collaborative music making on multi-touch surfaces are discussed. Finally, future work on the prototype is proposed to maximize engagement.
Keywords
collaborative music, engagement, evaluation, multi-touch surfaces, musical interface
Paper topics
Interactive performance systems, Interfaces for music creation and fruition
Easychair keyphrases
multi touch [20], multi touch system [15], collaborative music making [14], multi user instrument [14], see quote [11], open coding [8], tangible and embedded interaction [8], content analysis [7], exploratory multi case study [6], multi touch surface [6], music making [6], social interaction [6], collaborative interaction [5], creative engagement [5], structured task [5], visual feedback [5], continuous action [4], grounded theory [4], local control [4], musical expression [4], musical task [4], music collaboration [4], music making task [4], public space [4], structured coding [4], user study [4]
Paper type
unknown
DOI: 10.5281/zenodo.849779
Zenodo URL: https://zenodo.org/record/849779
Abstract
The orchestral timpani are a key component in western classical music, although their weight, size, and fragility make their transportation very difficult. Current commercial software synthesizers for the Orchestral Timpani are primarily sample-based and work with a MIDI keyboard, giving the user control over the note amplitude and pitch. This approach implements a virtual five-piece set of orchestral timpani, which is controlled using a pressure-sensitive graphics tablet. A brief analysis of the mechanics and playing techniques of the Timpani is presented, followed by their approximation by this model’s control scheme and sound engine. Thereon, the details of the model’s implementation are explained, and finally the results of the model are presented along with conclusions on the subject.
Keywords
Graphics tablet, Real-time Interaction, Sample-based synthesis, Timpani
Paper topics
Gesture controlled audio systems, Interactive performance systems, Interfaces for music creation and fruition, Physical modeling for sound generation, Sonic interaction design, Sound/music signal processing algorithms
Easychair keyphrases
orchestral timpani [8], percussion instrument [8], physical modeling [7], real time [7], control scheme [6], dynamic value [5], sound synthesis [5], tonal range [5], crossfading algorithm [4], digital waveguide mesh [4], digitizer tablet [4], high note [4], hit coordinate [4], hit position [4], low note [4], mallet type [4], physical model [4], sample based synthesis [4]
Paper type
unknown
DOI: 10.5281/zenodo.849781
Zenodo URL: https://zenodo.org/record/849781
Abstract
Understanding the gap between a musical score and a real performance of that score is still a challenging problem. To tackle this broad problem, researchers focus on specific instruments and/or musical styles. Hence, our research is focused on the study of classical guitar and aims at de- signing a system able to model the use of the expressive resources of that instrument. Thus, one of the first goals of our research is to provide a tool able to automatically identify expressive resources in the context of real record- ings. In this paper we present some preliminary results on the identification of two classical guitar articulations from a collection of chromatic exercises recorded by a profes- sional guitarist. Specifically, our system combines several state of the art analysis algorithms to distinguish among two similar guitarists’ left hand articulations such as legato and glissando. We report some experiments and analyze the results achieved with our approach.
Keywords
expressivity, feature extraction, guitar
Paper topics
Musical performance modeling
Easychair keyphrases
expressive articulation [18], sax representation [13], envelope approximation [12], classical guitar [7], left hand [7], high frequency content [6], step size [6], expressive resource [5], fundamental frequency [5], note onset [5], peak location [5], peak position [4], plucking onset [4], region extraction module [4], symbolic aggregate approximation [4], system able [4]
Paper type
unknown
DOI: 10.5281/zenodo.849783
Zenodo URL: https://zenodo.org/record/849783
Abstract
Spatial movement has been used by composers as a mu-sical parameter (intention), and this paper focus on the reception by the audience of spatial patterns. We present the results of a series of perception experiments where a total of N=118 listeners had to recognize simple rhythm patterns based on the left-right movements of 7 different sound types. The stimuli varied in harmonicity (HNR), temporal intensity variation, spectral distribution, move-ment continuity and tempo. Listening conditions included stereo loudspeaker open field listening and headphone listening. Results show that globally the recognition is low, considering the simplicity of the pattern recognition task. The factor that most perturbed recognition is the intensity variation, with completely unvarying sounds yielding better results, and this was more important than the listening condition. We conclude that spatial sound movement is not suitable as a composition element for normally complex music, but it can be recognized by untrained listeners using stable sounds and simple pat-terns.
Keywords
electroacoustic music, sound movement, sound synthesis, spatial perception
Paper topics
3D sound/music, Data sonification, Sound/music perception and cognition
Easychair keyphrases
listening condition [21], rhythm pattern [15], intensity variation [11], rhythmic pattern [9], recognition score [7], sound stimulus [7], spatial movement [7], discontinuous evolution [6], basic sound [5], loudspeaker listening [5], movement continuity [5], recognition rate [5], regression analysis [5], left right movement [4], mean recognition score [4], most experiment [4], noise ratio [4], pattern complexity [4], pattern recognition score [4], right channel [4], sound spatial perception [4], spatial movement pattern [4], spatial sound [4], spatial trajectory [4], square signal sound [4], synthetic sound [4], temporal intensity variation [4]
Paper type
unknown
DOI: 10.5281/zenodo.849785
Zenodo URL: https://zenodo.org/record/849785
Abstract
We propose two novel lyrics-to-audio alignment methods which make use of additional chord information. In the first method we extend an existing hidden Markov model (HMM) for lyrics alignment by adding a chord model based on the chroma features often used in automatic audio chord detection. However, the textual transcriptions found on the Internet usually provide chords only for the first among all verses (or choruses, etc.). The second method we propose is therefore designed to work on these incomplete transcriptions by finding a phrase-level segmentation of the song using the partial chord information available. This segmentation is then used to constrain the lyrics alignment. Both methods are tested against hand-labelled ground truth annotations of word beginnings. We use our first method to show that chords and lyrics complement each other, boosting accuracy from 59.1% (only chroma feature) and 46.0% (only phoneme feature) to 88.0% (0.51 seconds mean absolute displacement). Alignment performance decreases with incomplete chord annotations, but we show that our second method compensates for this information loss and achieves an accuracy of 72.7%.
Keywords
alignment, chords, hidden Markov models, lyrics, structural segmentation
Paper topics
Automatic music transcription, Multimodality in sound and music computing, Musical pattern recognition/modeling, Sound/music signal processing algorithms
Easychair keyphrases
chord information [36], song segment [17], complete chord information [11], phoneme feature [11], phrase level segmentation [11], vocal activity detection [11], audio alignment [10], baseline method [10], lyric alignment [10], lyric line [8], absolute displacement [7], chroma feature [7], complete chord [7], music information retrieval [7], phrase level [7], chord information recovery [6], chord model [6], hidden markov model [6], large scale alignment [6], missing chord [5], music computing [5], chord progression model [4], existing hidden markov model [4], ground truth [4], information recovery method [4], mean absolute displacement [4], partial chord information [4], real world [4], short pause state [4], song segment type [4]
Paper type
unknown
DOI: 10.5281/zenodo.849787
Zenodo URL: https://zenodo.org/record/849787
Abstract
This paper addresses one aspect of human music cognition, which is the recollection of melodic sequences stored in short-term memory, and the manipulation of such items in working memory, by measuring spans of successfully recalled melodic sequences. In order to avoid long-term memory collaboration in this task, short-term memory measurements are made using randomly-generated melodic sequences, which in turn may sound difficult and unfamiliar to many experimental subjects. We investigate the dependence of melodic span measures on such aspects as familiarity and difficulty, both in direct-order recalling (as it relates to short-term memory capacity) and in inverse-order recalling (as it relates to working memory capacity). We also discuss the relation of these measurements to cognitive models of short-term and working memory for verbal and melodic material.
Keywords
Automatic voice transcription, Cognitive memory models, Musical memory
Paper topics
Automatic music transcription, Computational musicology, Sound/music and Neuroscience, Sound/music perception and cognition
Easychair keyphrases
span measure [29], working memory [26], melodic span [20], melodic sequence [16], melodic span measure [14], short term memory [12], numerical sequence [10], term memory [9], working memory model [9], digit span [8], numerical span [8], direct order [7], forward melodic span [7], long term memory [7], numerical span measure [7], phonological loop [7], span test [7], backward melodic span [6], backward span measure [6], distinct tone [6], inverse order [6], short term [6], span indices [6], statistical analysis [6], chromatic scale [5], forward span [5], significance level [5], underlying mechanism [5], human music cognition [4], quarter tone scale [4]
Paper type
unknown
DOI: 10.5281/zenodo.849789
Zenodo URL: https://zenodo.org/record/849789
Abstract
In this paper we present the description and the first results of a pilot experiment in which participants were requested to mimic the production of sonic elements trough different control modalities. Results show different degrees of dependence of the control temporal profiles with the dynamic level and temporal ordering of the stimuli. The protocol and methodology here advanced may turn useful for ameliorating existing mapping strategies for gesture based interactive media, with particular emphasis to adaptive control of physics-based models for sound synthesis.
Keywords
control, gesture, movement
Paper topics
Interfaces for music creation and fruition, Music and robotics, Sonic interaction design, Sound/music perception and cognition
Easychair keyphrases
control modality [13], arm movement [6], control profile [6], velocity profile [6], loudness profile [5], standard deviation [5], isolated note [4], motor control [4], musical domain [4], peak velocity [4], pilot experiment [4], position control [4], temporal profile [4]
Paper type
unknown
DOI: 10.5281/zenodo.849791
Zenodo URL: https://zenodo.org/record/849791
Abstract
Music performance is processing to embody musical ideas in concrete sound, giving expression to tempo and dynamics and articulation to each note. Human compe-tence in music performance rendering is enhanced and fostered by supplementing a lack of performance skill and musical knowledge using computers. This paper in-troduces a performance design environment called Mix-tract, which assists users in designing “phrasing,” and a performance design guideline called the Hoshina-Mixtract method executable on Mixtract. Mixtract provides its users with a function for assisting in the ana-lysis of phrase structure and a function to show the de-gree of importance of each note in a phrase group. We verified that the proposed system and method help seven children to externalize their musical thought and help them transform their subjective musical thoughts into objective ones.
Keywords
application to music education, design supporting system, musical phrase expression, performance rendering
Paper topics
Interfaces for music creation and fruition, Musical performance modeling, Sonic interaction design, Visualization of sound/music data
Easychair keyphrases
expression curve [19], phrase structure [17], performance design [12], phrase expression [12], primary phrase line [11], hierarchical phrase structure analysis [8], phrase boundary [8], apex note [7], junior high school student [6], expression design [5], musical performance [5], musical thought [5], performance rendering [5], real time [5], apex probability viewer [4], beat tapping or conducting [4], expression mark [4], hoshina theory [4], musical competence [4], musical expression [4], music performance [4], onset time [4], phrase structure analysis [4], piano roll [4], score time [4], tacit knowledge [4], tempo curve [4]
Paper type
unknown
DOI: 10.5281/zenodo.849793
Zenodo URL: https://zenodo.org/record/849793
Abstract
Sometimes users of a music retrieval system are not able to explicitly state what they are looking for. They rather want to browse a collection in order to get an overview and to discover interesting content. A common approach for browsing a collection relies on a similarity-preserving projection of objects (tracks, albums or artists) onto the (typically two-dimensional) display space. Inevitably, this implicates the use of dimension reduction techniques that cannot always preserve neighborhood and thus introduce distortions of the similarity space. This paper describes ongoing work on MusicGalaxy -- an interactive user-interface based on an adaptive non-linear multi-focus zoom lens that alleviates the impact of projection distortions. Furthermore, the interface allows manipulation of the neighborhoods as well as the projection by weighting different facets of music similarity. This way the visualization can be adapted to the user's way of exploring the collection. Apart from the current interface prototype, findings from early evaluations are presented.
Keywords
adaptivity, exploration, neighborhood-preserving projection, user-interface
Paper topics
Music information retrieval, Visualization of sound/music data
Easychair keyphrases
primary focus [11], secondary focus [11], fish eye lens [9], facet distance [8], user interface [7], aggregated distance metric [6], music collection [6], springlen mesh overlay [6], distance metric [5], feature space [5], multidimensional scaling [5], nearest neighbor [5], album cover [4], facet distance cuboid [4], high dimensional feature space [4], music information retrieval [4], neighborhood preserving projection [4], overview window [4], right mouse button [4]
Paper type
unknown
DOI: 10.5281/zenodo.849795
Zenodo URL: https://zenodo.org/record/849795
Abstract
New cloud computing ways open a new paradigm for music composition. Our music composing system is now distributed on the Web shaping what we call as Computer Music Cloud (CMC). This approach benefits from the technological advantages involved in distributed computing and the possibility of implementing specialized and independent music services which may in turn be part of multiple CMCs. The music representation used in a CMC plays a key role in successful integration. This paper analyses the requirements for efficient music representation for CMC composition: high music representativity, database storage, and textual form. Finally, it focuses on its textual shape, presenting MusicJSON, a format for music information interchange among the different services composing a CMC. MusicJSON and database-shaped representation, both based on an experienced sound and complete music representation, offer an innovative proposal for music cloud representation.
Keywords
cloud computing, music composition, music interchage format, music representation, music web services
Paper topics
Computational musicology, Computer environments for sound/music processing, Web 2.0 and music
Easychair keyphrases
music object [15], music representation [12], music composition [9], extended reference [8], storage service [8], web application [8], computer music cloud [7], music service [7], cloud computing [6], evmusic representation [6], tree structure [6], music element [5], textual form [5], composition environment [4], database content [4], external object [4], musicjson format [4], shared music [4], textual representation [4], traditional notation [4], user interface [4], user library [4], valuable feature [4]
Paper type
unknown
DOI: 10.5281/zenodo.849801
Zenodo URL: https://zenodo.org/record/849801
Abstract
The interaction between composers and performers has recently acquired new challenges with the advent of scores in real time. Such systems potentiate new approaches to composition and performance by imposing new possibilities and constraints. Õdaiko is a real-time graphical score generator and features a composer playing live electronic music, an assistant to the composer generating the scores and finally the performer(s). In this paper, I present Õdaiko, focusing on its implementations and the related composer-assistant-performer interactions as a basis for development.
Keywords
Interaction, Real-Time Score Generation, Rhythm
Paper topics
Interactive performance systems, Musical performance modeling
Easychair keyphrases
real time [25], electronic music [12], live electronic music [6], real time score [6], musical event [5], solo panel [5], pre composed guidance score [4], real time graphical score [4], real time score generation [4], real time score generator [4], sectional area [4], sieve panel [4], vertical barline [4]
Paper type
unknown
DOI: 10.5281/zenodo.849807
Zenodo URL: https://zenodo.org/record/849807
Abstract
Modeling of musical style and stylistic re-injection strategies based on the recombination of learned material have already been elaborated in music machine improvisation systems. Case studies have shown that content-dependant regeneration strategies have great potential for a broad and innovative sound rendering. We are interested in the study of the principles under which stylistic reinjection could be sufficiently controlled, in other words, a framework that would permit the person behind the computer to guide the machine improvisation process under a certain logic. In this paper we analyze this three party interaction scheme among the isntrument player, the computer and the computer user. We propose a modular architecture for Computer Assisted Improvisation (CAO). We express stylistic reinjection and music sequence scheduling concepts under a formalism based on graph theory. With the help of these formalisms we then study a number problems concerning temporal and qualitative control of pattern generation by stylistic re-injection. Finally we discuss the integration of these concepts into a real-time environment for computer improvisation, under the name GrAIPE.
Keywords
graph theory, improvisation, interaction, sequence scheduling, stylistic modeling
Paper topics
Automatic music generation/accompaniment systems, Interactive performance systems, Musical performance modeling
Easychair keyphrases
computer assisted improvisation [17], computer user [17], real time [17], music sequence [12], music sequence scheduling [12], interaction scheme [11], machine improvisation [10], stylistic reinjection [10], improvisation system [9], party interaction [9], computer music [8], graph theory [8], real time system [7], sequence scheduling [7], instrument player [6], musical event [6], party interaction scheme [6], shortest path problem [6], computer improvisation [5], high level [5], shortest path [5], stylistic learning [5], countably infinite set [4], define music sequence [4], instrument player computer [4], international computer music [4], music sequence matching [4], sequence scheduling problem [4], short term memory [4], short time memory [4]
Paper type
unknown
DOI: 10.5281/zenodo.849803
Zenodo URL: https://zenodo.org/record/849803
Abstract
Composition is viewed as a process that has its own temporal dimension. This process can sometimes be highly non-linear, sometimes is carried out in real- time during a performance. A model is proposed that unifies the creational and the performance time and that traces the history of the creation of a piece. This model is based on a transformation that enhances data structures to become persistent. Confluent persistence allows navigation to any previous version of a piece, to create version branches at any point, and to com- bine different versions with each other. This concept is tuned to integrate two important aspects, retroactiv- ity and multiplicities. Three representative problems are posed: How to define dependancies on entities that change over time, how to introduce changes ex-post that affect future versions, and how to continue work- ing on parallel versions of a piece. Solutions based on our test implementation in the Scala language are presented. Our approach opens new possibilities in the area of music analysis and can conflate disparate notions of composition such as tape composition, in- teractive sound installation, and live improvisation. They can be represented by the same data structure and both offline and realtime manipulations happen within the same transactional model.
Keywords
Data Structures, Musical Composition, Temporality, Versioning
Paper topics
Computer environments for sound/music processing
Easychair keyphrases
data structure [17], compressed path [5], confluent persistence [5], pre order [5], access path [4], compositional process [4], ephemeral data structure [4], fat node [4], neutral vertex v2 [4], pointer field [4], post order [4], retroactive data structure [4], version graph [4]
Paper type
unknown
DOI: 10.5281/zenodo.849805
Zenodo URL: https://zenodo.org/record/849805
Abstract
We present a method to generate human-like performance expression for polyphonic piano music. Probabilistic models and machine learning techniques have been successfully applied to solve the problem of generating human-like expressive performance, given a music score. In case of polyphonic music, however, it was difficult to make models tractable and a huge amount of training data was necessary, because performance contexts and relationships of performance expressions are very complex. To overcome these problems, we propose amethod with a combination of probabilistic models for melody and harmony. The experimental results show that the proposed method was able to generate fluctuations of performance expression parameters for polyphonic piano music such like human performers do. The results of the subjective evaluations are also reportedwhich indicate that their sounds were human-like and had certain degree of musicality.
Keywords
Machine learning, Music performance, Polyphonic music
Paper topics
Automatic music generation/accompaniment systems, Musical performance modeling
Easychair keyphrases
performance expression [93], polyphonic piano music [31], performance expression parameter [28], human performance expression [25], performed note duration [23], upper and lower outer voice [16], onset time [12], outer voice [12], harmony model [10], piano sonata [10], polyphonic piano [10], probabilistic model [10], score feature [10], music performance [9], training data [8], generate performance [7], human likeness [7], performance context [7], global expression [6], human music performance [6], instantaneous tempo [6], local expression [6], machine learning [6], melody model [6], mozarts piano sonata [6], performed duration [6], piano sonata kv331 [6], subjective evaluation [6], expression mark [5], music score [5]
Paper type
unknown
DOI: 10.5281/zenodo.849809
Zenodo URL: https://zenodo.org/record/849809
Abstract
This paper focuses on phaseshaping techniques and their relation to classical abstract synthesis methods. Elementary polynomial and geometric phaseshapers, such as those based on the modulo operation and linear transformations, are investigated. They are then applied to the generation of classic and novel oscillator effects by using nested phaseshaping compositions. New oscillator algorithms introduced in this paper include single-oscillator hard sync, triangle modulation, efficient supersaw simulation, and sinusoidal waveshape modulation effects. The digital waveforms produced with phaseshaping techniques are generally discontinuous, which leads to aliasing artifacts. Aliasing can be effectively reduced by modifying samples around each discontinuity using the previously proposed polynomial bandlimited step function (polyBLEP) method.
Keywords
acoustic signal processing, audio oscillators, music, phaseshaping, signal synthesis
Paper topics
Digital Audio Effects, Sound/music signal processing algorithms
Easychair keyphrases
phase signal [28], audio engineering society [11], phaseshaper entity [8], triangle modulation [8], variable slope phase signal [8], modulo operation [7], variable slope [7], duty width [6], sinusoidal waveshaper [6], unipolar modulo counter signal [6], elementary phaseshaper [5], nested phaseshaping [5], phase cycle [5], computer music [4], fractional period phase [4], fractional period phase signal [4], fractional phase period [4], linear transformation [4], oscillator algorithm [4], rectangular signal [4], sinusoidal waveshape modulation effect [4], slave oscillator [4], supersaw simulation [4], thick line [4], thin line [4], triangular fractional period phase [4], trivial single oscillator softsync [4], variable slope triangular phase signal [4], variable width pulse signal [4], virtual analog [4]
Paper type
unknown
DOI: 10.5281/zenodo.849811
Zenodo URL: https://zenodo.org/record/849811
Abstract
This paper describes the development of a set of electronic music instruments (PHOXES), which are based on physical modeling sound synthesis. The instruments are modular, meaning that they can be combined with each other in various ways in order to create richer systems, challenging both the control and perception, and thereby also the sonic potential of the models. A method for evaluating the PHOXES has been explored in the form of a pre-test where a test subject borrowed the instrument for a period of 10 days. The longer test period makes way for a more nuanced qualitative evaluation of how such instruments might be integrated into workflows of real world users.
Keywords
Control Structures, Electronic Music Instruments, Exploration, Mapping, Physical Modeling
Paper topics
Computer environments for sound/music processing, Physical modeling for sound generation, Sonic interaction design
Easychair keyphrases
physical model [26], excitation controller [12], test subject [12], physical modeling [9], turbulence model [9], particle phox [8], physical modeling sound synthesis [8], friction phox [7], tube phox [7], computer music [6], drum phox [6], phox excitation control [6], test period [6], flute controller [5], phoxe system [5], pre test [5], amplified low pressure sensor [4], commercial electronic music instrument [4], electronic musician [4], excitation gesture [4], input device [4], international computer music association [4], musical expression [4], pragmatic quality [4], sonic potential [4]
Paper type
unknown
DOI: 10.5281/zenodo.849813
Zenodo URL: https://zenodo.org/record/849813
Abstract
It is known that one of the most important tasks in music post-production is equalization. Equalization can be applied in several ways, but one of the main purposes it serves is masking minimization. This is done so that the listener can appreciate the timbral qualities of all instruments within a musical mix. However, the study of masking between the different instruments of a multi-track mix has not received a lot of attention, and a quantitative measure based on perceptual studies has not yet been proposed. This paper presents such a measure, along with a study of masking between several common instruments. The measure proposed (cross-adaptive signal-to-masker ratio) is intended to serve as an analysis tool to be used by audio engineers when trying to combat masking using their preferred equalization techniques.
Keywords
Auditory Masking, Content-based Processing, Music post-production
Paper topics
Digital Audio Effects, Sound/music perception and cognition, Sound/music signal processing algorithms
Easychair keyphrases
auditory filter [19], electric guitar [19], audio engineer [11], multi track recording [11], audio engineering society [9], excitation pattern [9], masker ratio [9], lower band [7], masking coefficient [7], sustained trumpet note [7], th convention paper [6], guitar strum [5], masking minimization [5], adaptive digital audio effect [4], analysis stage [4], analysis tool [4], auditory filter bandwidth [4], cross adaptive [4], cross adaptive smr [4], higher band [4], multi track [4], relative level [4], rhythm electric guitar [4], several common instrument [4]
Paper type
unknown
DOI: 10.5281/zenodo.849815
Zenodo URL: https://zenodo.org/record/849815
Abstract
Voiced vowel production in human speech depends both on oscillation of the vocal folds and on the vocal tract shape, the latter contributing to the appearance of formants in the spectrum of the speech signal. Many speech synthesis models use a feed-forward source-filter model, where the magnitude frequency response of the vocal tract is approximated with sufficient accuracy by the spectral envelope of the speech signal. In this research, a method is presented for real-time estimation of the vocal tract area function from the recorded voice by matching spectral formants to those in the output spectra of a piecewise cylindrical waveguide model having various configurations of cross-sectional area. When a match is found, the formants are placed into streams so their movement may be tracked over time and unintended action such as dropped formants or the wavering of an untrained voice may be accounted for. A parameter is made available to adjust the algorithm’s sensitivity to change in the produced sound: sensitivity can be reduced for novice users and later increased for estimation of more subtle nuances.
Keywords
Musical Controller, Pure Data Object, Vocal Tract Extraction
Paper topics
Gesture controlled audio systems, Interactive performance systems, Sound/music signal processing algorithms
Easychair keyphrases
vocal tract [39], vocal tract shape [33], vocal tract area function [16], cross sectional area [15], formant peak [13], formant stream [10], tract shape [10], spectral envelope [8], real time [7], vocal tract model [7], vowel sound [7], vocal tract shape estimation [6], dropped formant [5], reflection coefficient [5], speech signal [5], amplitude envelope [4], area function [4], detected formant [4], feed forward source filter [4], formant frequency [4], forward source filter model [4], minimum action [4], musical control [4], piecewise cylindrical waveguide model [4], real time musical control [4], sample frame [4], second derivative [4], untrained voice [4], vocal tract transfer function [4]
Paper type
unknown
DOI: 10.5281/zenodo.849817
Zenodo URL: https://zenodo.org/record/849817
Abstract
In the field of audio restoration, the most popular method is the Short Time Spectral Attenuation (STSA). Although this method reduces the noise and improves the SNR, it mostly tends to introduce signal distortion and a residual noise called musical noise (a tonal, random, isolated, time-varying noise). This work presents a new audio restoration algorithm based on Non-negative Matrix Factorization (NMF) with a noise suppression rule that introduce the masking phenomenon of the human hearing to calculate a noise masking threshold from the estimated target source. Extensive test with PESQ measure at low SNR (i.e. < 10dB) shows that the method dont introduce musical noise and permits to control the trade-off between undesired component suppression and source attenuation. In particular, we show that NMF is a suitable technique to extract the clean audio signal from undesired non stationary noise in a monaural recording of ethnic music. Moreover, we carry out a listening test in order to compare NMF with the state of the art audio restoration framework using the EBU MUSHRA test method. The encouraging results obtained with this methodology in the presented case study support their applicability in several fields of audio restoration.
Keywords
Audio restoration, Cultural Heritage, Music signal processing algorithms, Non-negative Matrix Factorization
Paper topics
Digital Audio Effects, Sound/music signal processing algorithms
Easychair keyphrases
masking threshold [14], target source [13], audio restoration [10], non negative matrix factorization [10], speech enhancement [10], undesired component [10], minimum mean square error [8], musical noise [8], sparse code [8], suppression rule [8], cost function [7], art audio restoration framework [6], audio restoration algorithm [6], audio signal [6], ebu mushra test [6], estimated target source [6], ethnic music [6], motivated bayesian suppression rule [6], mushra test method [6], noise masking threshold [6], noise suppression rule [6], undesired non stationary noise [6], hidden reference [5], listening test [5], monaural recording [5], spectral valley [5], clean audio signal [4], mmse stsa estimator [4], spectral amplitude estimator [4], target present frame [4]
Paper type
unknown
DOI: 10.5281/zenodo.849819
Zenodo URL: https://zenodo.org/record/849819
Abstract
This study investigates the use of short-term memory for pitch recognition in a Western (12-tone) vs. a 10-tone equal temperament context. 10 subjects with at least one year of formal music and theory training participated in an experiment that consisted of two identical music listening tests (one per tuning system) in which they were trained to recall a reference tone and count the number of times it recurred in various short monophonic melodies. In the parts of the experiment where subjects used their short-term memory to execute one-to-one comparisons between the given reference tone and the melody tones, the results were equivalent for both tuning modes. On the other hand, when subjects tried to recall the reference tone directly from long-term memory, the results were noticeably better for the Western tuning context.
Keywords
equal temperament tuning systems, pitch memory, short term memory
Paper topics
Sound/music perception and cognition
Easychair keyphrases
reference tone [18], tone equal temperament test [14], pitch memory [13], tone equal temperament [12], absolute pitch [11], cross session performance [9], equal temperament [9], testing phase [9], tone session [7], correct answer [6], long term memory [6], short term memory [6], target tone [6], perfect score [5], training part [5], tuning system [5], western tuning [5], correct response [4], equal temperament tuning system [4], short term pitch memory [4], term memory [4]
Paper type
unknown
DOI: 10.5281/zenodo.849821
Zenodo URL: https://zenodo.org/record/849821
Abstract
The paper describes a simple but effective method for incorporating automatically learned tempo models into real-time music tracking systems. In particular, instead of training our system with `rehearsal data' by a particular performer, we provide it with many different interpretations of a given piece, possibly by many different performers. During the tracking process the system continuously recombines this information to come up with an accurate tempo hypothesis. We present this approach in the context of a real-time tracking system that is robust to almost arbitrary deviations from the score (e.g. omissions, forward and backward jumps, unexpected repetitions or re-starts) by the live performer.
Keywords
audio alignment, score following, tempo model
Paper topics
Automatic music generation/accompaniment systems, Musical performance modeling
Easychair keyphrases
real time [23], real time tracking system [16], tempo model [14], live performance [12], score representation [12], note onset [11], tracking system [9], forward path [8], tempo change [8], tempo curve [8], backward path [6], off line alignment [6], real time music tracking [6], tempo information [6], relative tempo [5], tracking process [5], alignment path [4], artificial intelligence [4], computational perception johanne kepler [4], correct position [4], dynamic time warping [4], learned tempo model [4], many different [4], mozart mozart mozart [4], music tracking system [4], piano piano piano [4], real time audio tracking system [4], sonata kv279 mov [4], time real time [4]
Paper type
unknown
DOI: 10.5281/zenodo.849823
Zenodo URL: https://zenodo.org/record/849823
Abstract
In this paper we describe a new parametric model for synthesis of environmental sound textures, like running water, rain and fire. Sound texture analysis is cast in the framework of wavelet decomposition and hierarchical statistical, generative models, that have previously found application in image texture analysis and synthesis. By stochastic sampling from the model and reconstructing the sampled wavelet coefficients to a time-domain signal, we can synthesize distinct versions of a sound, that bear perceptually convincing similarity to the source sound. The resulting model is shown to perform favorably in comparison to previous approaches to sound texture synthesis while the resulting models provide a parametric description of sound textures.
Keywords
hidden markov models, hierarchical generative models, sound texture synthesis, wavelet analysis
Paper topics
Sound/music perception and cognition, Sound/music signal processing algorithms
Easychair keyphrases
wavelet coefficient [24], sound texture [23], hidden markov tree model [12], hidden markov model [11], sound texture synthesis [11], wavelet tree [10], hidden markov [9], hidden markov tree [9], signal processing [8], wavelet decomposition [8], wavelet transform [7], conditional state probability [6], discrete wavelet transform [6], hidden state [6], running water [6], wavelet decomposition tree [6], detail coefficient [5], model parameter [5], see section [5], state variable [5], texture modeling [5], texture synthesis [5], binary tree [4], hidden state variable [4], log likelihood [4], non gaussian [4], scale coefficient dependency [4], state gaussian mixture model [4], temporal fine structure [4], textural sound [4]
Paper type
unknown
DOI: 10.5281/zenodo.849825
Zenodo URL: https://zenodo.org/record/849825
Abstract
Time-frequency representations are commonly used tools for the representation of audio and in particular music signals. From a theoretical point of view, these representations are linked to Gabor frames. Frame theory yields a convenient reconstruction method making post-processing unnecessary. Furthermore, using dual or tight frames in the reconstruction, we may resynthesize localized components from so-called sparse representation coefficients. Sparsity of coefficients is directly reinforced by the application of a $\ell^1$-penalization term on the coefficients. We introduce an iterative algorithm leading to sparse coefficients and demonstrate the effect of using these coefficients in several examples. In particular, we are interested in the ability of a sparsity promoting approach to the task of separating components with overlapping analysis coefficients in the time-frequency domain. We also apply our approach to the problem of auditory scene description, i.e. source identification in a complex audio mixture.
Keywords
auditory scene description, Frames, source identification, Sparsity, Time-frequency representation
Paper topics
Automatic music transcription, Musical pattern recognition/modeling, Musical sound source separation and recognition, Music information retrieval, Sound/music signal processing algorithms
Easychair keyphrases
sparse coefficient [17], time frequency [16], sparse representation [13], tight frame [12], frame operator [11], gabor coefficient [10], gabor frame [7], time frequency coefficient [7], time frequency representation [7], auditory scene description [6], canonical gabor coefficient [6], signal processing [6], tight window [6], time frequency domain [6], signal component [5], audio signal processing [4], dual frame [4], music signal [4], music transcription [4], number source number [4], second source [4], short time fourier transform [4], source number source [4], time frequency shift [4], time slice [4]
Paper type
unknown
DOI: 10.5281/zenodo.849827
Zenodo URL: https://zenodo.org/record/849827
Abstract
Analysis and description of musical expression is a large field within musicology. However, the manual annotation of large corpora of music, which is a prerequisite in order to describe and compare different artists' styles, is very labor intensive. Therefore, computer systems are needed, which can annotate recordings of different performances automatically, requiring only minimal corrections by the user. In this paper, we apply Dynamic Time Warping for audio-to-score alignment in order to extract the onset times of all individual notes within an audio recording and compare two strategies for improving the accuracy. The first strategy is based on increasing the temporal resolution of the features used. To cope with arising constraints in terms of computational costs, we apply a divide and conquer pattern. The second strategy is the introduction of a post-processing step, in which the onset time of each individual note is revised. The advantage of this method is, that in contrast to default algorithms, arpeggios and asynchronies can be resolved as well.
Keywords
audio alignment, dynamic time warping, nmf, tone model
Paper topics
Computational musicology, Music information retrieval, Sound/music signal processing algorithms
Easychair keyphrases
anchor note [21], dynamic time warping [15], onset time [13], post processing [13], audio signal [11], temporal resolution [10], anchor selection [9], time frame [8], chroma vector [7], high resolution dtw [7], music information retrieval [7], score alignment [7], search window [7], alignment path [6], audio recording [6], computational cost [6], individual note [6], non anchor note [6], pitch activation feature [6], onset detection [5], onset estimate [5], optimal alignment [5], overall accuracy [5], pitch class [5], gaussian window [4], onset candidate [4], pitch activation [4], post processing method [4], tone model [4], whole system [4]
Paper type
unknown
DOI: 10.5281/zenodo.849831
Zenodo URL: https://zenodo.org/record/849831
Abstract
This paper faces the general problem of modeling pinna-related transfer functions (PRTFs) for 3-D sound rendering. Following a structural modus operandi, we exploit an algorithm for the decomposition of PRTFs into ear resonances and frequency notches due to reflections over pinna cavities in order to deliver a method to extract the frequencies of the most important spectral notches. Ray-tracing analysis reveals a convincing correspondence between extracted frequencies and pinna cavities of a bunch of subjects. We then propose a model for PRTF synthesis which allows to control separately the evolution of resonances and spectral notches through the design of two distinct filter blocks. The resulting model is suitable for future integration into a structural head-related transfer function model, and for parametrization over anthropometrical measurements of a wide range of subjects.
Keywords
3d Audio, Pinna, PRTF, Spatial Sound
Paper topics
3D sound/music
Easychair keyphrases
spectral notch [10], head related impulse response [8], frequency range [6], head related transfer function [6], notch tracking algorithm [6], pinna related transfer function [6], reflection coefficient [6], signal processing [6], frequency notch [5], notch frequency [5], reflection point [5], second order [5], measured head related impulse [4], multi notch filter [4], pinna cavity [4], related transfer function [4], right prtf elevation [4], time delay [4]
Paper type
unknown
DOI: 10.5281/zenodo.849833
Zenodo URL: https://zenodo.org/record/849833
Abstract
In this paper we present an evolutionary algorithm for real–time generation of polyphonic rhythmic patterns in a certain style implemented as a Pure Data patch. Population of rhythms is derived from analysis of MIDI loops, which profile each style for subsequent automatic generation of rhythmic patterns that evolve over time through genetic algorithm operators and user input data.
Keywords
genetic algorithm, metric indispensability, rhythm generation
Paper topics
Automatic music generation/accompaniment systems, Interactive performance systems
Easychair keyphrases
th note level [9], fitness function [6], metrical level [6], midi drum loop [6], non real time [6], real time [6], genetic algorithm [5], mutation operator [5], offspring population [5], pulse comprising [5], metrical coherence [4], rhythmic cell [4]
Paper type
unknown
DOI: 10.5281/zenodo.849835
Zenodo URL: https://zenodo.org/record/849835
Abstract
An essential part of the oboe technique is the reed-making process, where the raw material is carved and shaped. Different oboe schools define different types of shapes, and argue about their adequacy for a better sound and performance. This paper focuses on the perceptual influence of 3 reed-making types. We chose 6 reeds representing 3 pairs of each style (French, German, American) and recorded 116 sound samples with two oboists in controlled conditions. N=63 sound stimuli were selected: 9 diminuendo long tones, 18 eight-note phrases from which 18 low-pitched and 18 high-pitched tones were extracted. Tones were normal-ized in pitch and intensity to help listeners to focus on timbre. 40 participants (20 non-oboist musicians and 20 professional oboists) completed a free-categorization task on each of the 4 stimuli sets, grouping sounds by global similarity. Results show that the most salient production parame-ters are the attack type and the oboist-oboe. The reed-making shows no significant influence on isolated tones and a marginal influence on complex phrases, and inter-reed differences are more important than inter-reed-making differences. Reed-making is important in per-formance technique but has no influence on the perceived timbre. Future research will deal with performer proprio-ception of the reed making
Keywords
music performance, oboe, reed-making, sound perception, timbre
Paper topics
Sound/music perception and cognition
Easychair keyphrases
reed making [38], reed making style [19], oboist non oboist oboist [18], non oboist oboist [9], non oboist [8], isolated tone [7], attack type [6], highpitched long phrase non oboist [6], high pitched note [6], lowpitched highpitched long phrase [6], low pitched note [6], phrase non oboist oboist [6], sound perception [6], tongue attack [6], multidimensional scaling [5], note phrase [5], pitched note [5], significant difference [5], breath attack [4], diminuendo long tone [4], free grouping task [4], high pitched [4], low pitched [4], participant df diminuendi [4], pearson test [4], pearson test comparing [4], professional oboist [4], short tone high pitched note [4], sound production [4], sound sample [4]
Paper type
unknown
DOI: 10.5281/zenodo.849837
Zenodo URL: https://zenodo.org/record/849837
Abstract
The Stanza Logo-Motoria is a multimodal interactive system for learning and communication developed by means of the EyesWeb XMI platform. It is permanently installed in a Primary School where it is used as an alternative or/and additional tool to traditional ways of teaching. The Stanza Logo-Motoria is used by all school children, from first to fifth class, including children with disabilities. This paper describes the system and a first assessment of the teaching activities carried out with it.
Keywords
Expressive gestures, Interactive system for learning, Multimodal interactive system
Paper topics
Gesture controlled audio systems, Interactive performance systems, Multimodality in sound and music computing, Music and emotions
Easychair keyphrases
stanza logo motoria [50], low level feature [9], primary school [6], grade class [5], system architecture [5], audiovisual content [4], body movement [4], contraction index [4], feature extraction component [4], level feature [4], multimodal interactive system [4], peripheral zone [4], real time [4], resonant memory [4], resonant memory application [4], sound space [4], special need education [4], video camera [4]
Paper type
unknown
DOI: 10.5281/zenodo.849839
Zenodo URL: https://zenodo.org/record/849839
Abstract
Computer language for the description of pattern has been employed for both analysis and composition of music. In this paper we investigate the latter, with particular interest in pattern language for use in live coding performance. Towards this end we introduce Tidal, a pattern language designed for music improvisation, and embedded in the Haskell programming language. Tidal represents polyphonic patterns as a time varying function, providing an extensible range of pattern generators and combinators for composing patterns out of hierarchies of sub-patterns. Open Sound Control (OSC) messages are used to trigger sound events, where each OSC parameter may be expressed as a pattern. Tidal is designed to allow patterns to be created and modified during a live coded performance, aided by terse, expressive syntax and integration with an emerging time synchronisation standard.
Keywords
computer language, haskell, live coding, pattern
Paper topics
Interactive performance systems, Interfaces for music creation and fruition, Musical pattern recognition/modeling
Easychair keyphrases
pattern language [17], live coding [10], musical pattern [8], computer music [6], live coding performance [6], open sound control [6], pattern transformation [6], computer language [4], haskell programming language [4], music improvisation [4], pattern generator [4], rotr black grey red [4], sub pattern [4]
Paper type
unknown
DOI: 10.5281/zenodo.849841
Zenodo URL: https://zenodo.org/record/849841
Abstract
We consider the task of inferring associations between two differently-distributed and unlabelled sets of timbre data. This arises in applications such as concatenative synthesis/ audio mosaicing in which one audio recording is used to control sound synthesis through concatenating fragments of an unrelated source recording. Timbre is a multidimensional attribute with interactions between dimensions, so it is non-trivial to design a search process which makes best use of the timbral variety available in the source recording. We must be able to map from control signals whose timbre features have different distributions from the source material, yet labelling large collections of timbral sounds is often impractical, so we seek an unsupervised technique which can infer relationships between distributions. We present a regression tree technique which learns associations between two unlabelled multidimensional distributions, and apply the technique to a simple timbral concatenative synthesis system. We demonstrate numerically that the mapping makes better use of the source material than a nearest-neighbour search.
Keywords
concatenative synthesis, regression trees, timbre, unsupervised learning
Paper topics
Musical pattern recognition/modeling, Music information retrieval, Sound/music signal processing algorithms
Easychair keyphrases
concatenative synthesis [13], regression tree [12], concatenative synthesiser [8], sound source [8], audio excerpt [7], timbre feature [7], principal component [6], response variable [6], computer music [5], data distribution [5], independent variable [5], associative multivariate regression tree [4], concatenative sound synthesis [4], data point [4], full concatenative synthesis framework [4], full concatenative synthesis system [4], multivariate split [4], nearest neighbour search [4], source material [4], splitting plane [4], timbral concatenative synthesis system [4], timbre remapping [4], timbre trajectory [4]
Paper type
unknown
DOI: 10.5281/zenodo.849843
Zenodo URL: https://zenodo.org/record/849843
Abstract
We present in this paper an original approach of the use of tonality for composition and improvisation, developed by the composer, improviser and musician [hidden for blind reviewing]. The main concept is to consider a minimal group of notes which acts as a signature of a given scale in the major-minor tonal system. We define first within the tonal system context the notion of tonal signature and expose its principle. Among the possible way to solve this problem and find all the tonal signatures, we define some constraints and we use a constraint solver implemented in the composition aided computer music environment Open Music. We provide some examples of compositions written by the composer with improvisation playing based on the tonal signature concept. [hidden for blind reviewing]'s music counts already a rich discography with players from the international jazz scene. We will provide excerpts of the recorded and published music.
Keywords
composition, constraints, improvisation, tonal signature
Paper topics
Computational musicology, Computer environments for sound/music processing
Easychair keyphrases
tonal signature [69], tonal system [20], major minor tonal system [12], tonal signature concept [7], boe oe boe [6], c major mode [6], musical material [5], common note [4], constraint solver [4], limited transposition mode [4], open music [4], reference scale [4]
Paper type
unknown
DOI: 10.5281/zenodo.849845
Zenodo URL: https://zenodo.org/record/849845
Abstract
Although music is often defined as the “language of emo- tion”, the exact nature of the relationship between musical parameters and the emotional response of the listener re- mains an open question. Whereas traditional psycholog- ical research usually focuses on an analytical approach, involving the rating of static sounds or preexisting musi- cal pieces, we propose a synthetic approach based on a novel adaptive interactive music system controlled by an autonomous reinforcement learning agent. Preliminary re- sults suggest an autonomous mapping from musical pa- rameters (such as rhythmic density, articulation and sound level) to the perception of tension is possible. This paves the way for interesting applications in music therapy, inter- active gaming, and physiologically-based musical instru- ments.
Keywords
adaptive music, interactive music system, reinforcement learning
Paper topics
Automatic music generation/accompaniment systems, Interactive performance systems, Sonic interaction design
Easychair keyphrases
musical parameter [15], musical tension [15], reinforcement learning [12], rhythmic density [8], interactive music system [7], sound level [7], musical performance [6], music generation [6], real time [6], reinforcement learning agent [6], musical agent [5], first monophonic voice [4], international computer music [4], low tension [4], polyphonic voice [4], reward function [4]
Paper type
unknown
DOI: 10.5281/zenodo.849849
Zenodo URL: https://zenodo.org/record/849849
Abstract
This paper builds upon the existing Reactable musical platform and aims at extending and improving its approach to music theory. Sections 1 and 2.2 explain the motivations that led to the development of this proposal from a musical point of view while also giving a music education perspective. In section 2 we'll see a brief survey on tabletop and tangible multi-user systems for audiovisual performance and we'll also briefly introduce the process of implicit learning, we'll formulate a hypothesis about music as a natural language, and describe how the work hereafter presented can help music education. In section 3 we'll describe the current state of the art about music theory on the Reactable, followed by an original proposal about a way to extend and improve it. Finally we'll see how people who had a chance to test the system found it interesting and playful, while also giving important feedback that can be used to improve many practical aspects of the implementation.
Keywords
harmony, music, reactable, theory, tonal
Paper topics
Automatic music generation/accompaniment systems, Gesture controlled audio systems, Interactive performance systems, Interfaces for music creation and fruition, Visualization of sound/music data
Easychair keyphrases
music theory [11], music education [8], natural language [7], musical instrument [6], implicit learning [5], tangible interface [5], chord preset [4], future development [4], harmony system [4], piano roll [4], western tonal music [4]
Paper type
unknown
DOI: 10.5281/zenodo.849847
Zenodo URL: https://zenodo.org/record/849847
Abstract
In this paper we present a system that learns rhythmic patterns from drum audio recording and synthesizes music variations from the learnt sequence. The procedure described is completely unsupervised and embodies the transcription of a percussion sequence into a fuzzy multilevel representation. Moreover, a tempo estimation procedure identifying the most regular subsequence is used to guarantee that the metrical structure is preserved in the generated sequence. The final synthesis is performed, recombining the audio material derived from the sample itself. Some examples of generations along with a descriptive evaluation are provided.
Keywords
beat boxing, machine listening, music analysis, music generation
Paper topics
Automatic music generation/accompaniment systems, Automatic music transcription, Musical pattern recognition/modeling
Easychair keyphrases
metrical structure [8], audio signal [6], colored triangle [6], skeleton grid [6], skeleton subsequence [6], cluster configuration [5], musical sequence [5], appropriate symbol [4], cluster distance [4], cluster distance threshold [4], continuation indices [4], descriptive evaluation [4], grid position [4], tempo detection [4], variable length markov chain [4]
Paper type
unknown
DOI: 10.5281/zenodo.849851
Zenodo URL: https://zenodo.org/record/849851
Abstract
Voice conversion is an emergent problem in voice and speech processing with increasing commercial interest due to applications such as Speech-to-Speech Translation (SST) and personalized Text-To-Speech (TTS) systems. A Voice Conversion system should allow the mapping of acoustical features of sentences pronounced by a source speaker to values corresponding to the voice of a target speaker, in such a way that the processed output is perceived as a sentence uttered by the target. In the last two decades the number of scientific contributions to the voice conversion problem has grown considerably, and a solid overview of the historical process as well as of the proposed techniques is indispensable for those willing to contribute to the field. The goal of this text is to provide a critical survey that combines historical presentation to technical discussion while pointing out advantages and drawbacks of each technique, and to bring a discussion of future directions, specially referring to the development of a perceptual benchmarking process in voice conversion systems.
Keywords
acoustical features, speech-to-speech translation, voice conversion
Paper topics
Digital Audio Effects, Musical pattern recognition/modeling, Physical modeling for sound generation, Sound/music perception and cognition, Sound/music signal processing algorithms
Easychair keyphrases
voice conversion [80], voice conversion system [37], target speaker [14], source speaker [9], vocal tract [8], voice signal [8], acoustic feature [6], artificial neural network [6], speech recognition [6], transformation technique [6], typical voice conversion system [6], voice conversion technique [6], acoustical feature [5], representation model [5], training phase [5], vector quantization [5], bilingual subject [4], crosslingual voice conversion [4], crosslingual voice conversion system [4], gaussian mixture model [4], glottal pulse [4], line spectral frequency [4], several voice conversion system [4], speaker transformation algorithm [4], text independent [4], text independent training [4], text independent voice conversion [4], vocal tract length normalization [4], voice conversion problem [4], voice transformation [4]
Paper type
unknown
DOI: 10.5281/zenodo.849853
Zenodo URL: https://zenodo.org/record/849853