Dates: from July 04 to July 07, 2018
Place: Limassol, Cyprus
Proceedings info: Proceedings of the 15th Sound and Music Computing Conference (SMC 2018), Limassol Cyprus, 2018, ISBN 978-9963-697-30-4
Abstract
Predicting the acoustics of objects from computational models is of interest to instrument designers who increasingly use Computer Assisted Design. We examine techniques to carry out these estimates using a database of impulse responses from 3D printed models and a custom algorithm for mode interpolation within a geometrical matrix. Test geometries are organized as a function of their physical characteristics and placed into a multidimensional space/matrix whose boundaries are defined by the objects at each corner. Finite Element Analyses is integrated into the CAD environment to provide estimates of material vibrations also compared to measurements on the fabricated counterparts. Finally, predicted parameters inform physical models for aural comparisons between fabricated targets and computational estimates. These hybrid methods are reliable for predicting early modes as they covary with changes in scale and shape in our test matrix.
Keywords
not available
Paper topics
not available
Easychair keyphrases
not available
Paper type
unknown
DOI: 10.5281/zenodo.1422679
Zenodo URL: https://zenodo.org/record/1422679
Abstract
This paper describes MoveSynth, a performance system for two players, who interact with it and collaborate with each other in various ways, including full-body movements, arm postures and continuous gestures, to compose music in real time. The system uses a Kinect sensor, in order to track the performers’ positions, as well as their arm and hand movements. In the system’s current state, the musical parameters that the performers can influence include the pitch and the volume of the music, the timbre of the sound, as well as the time interval between successive notes. We extensively experimented using various classifiers in order to detect the one that gives the optimal results regarding the task of continuous gesture and arm posture recognition, accomplishing 92.11\% for continuous gestures and 99.33\% for arm postures, using an 1-NN classifier with a
condensed search space in both cases. Additionally, the qualitative results of the usability testing of the final system, which was performed by 9 users, are encouraging and identify possible avenues for further exploration and improvement.
Keywords
not available
Paper topics
not available
Easychair keyphrases
not available
Paper type
unknown
DOI: 10.5281/zenodo.1422627
Zenodo URL: https://zenodo.org/record/1422627
Abstract
This paper presents a combination of signal processing and machine learning techniques for classification of bird song recordings. Our pipeline consists of filters to enhance the bird song signal with respect to environmental noise, followed by machine learning algorithms that exploits various acoustic features. The filtering stage is based on the assumptions that bird songs are tonal and sporadic, and that noise, present along the entire recording, has large bandwidth. We present and discuss the results of an experiment on a dataset containing recordings of bird songs from species in the Southern Atlantic Coast of South America. This experiment compares the use of several acoustic features (RMD, ZCR, MFCC, spectral centroid/bandwidth/rolloff and syllable duration), extracted from pre-filtered recordings using three proposed filters, combined with traditional classification strategies (KNN, NB and SVM), in order to identify useful filter/feature/classifier combinations for this bird song classification task. This strategy produces improved classification results with respect to those reported in a previous study using the same dataset.
Keywords
not available
Paper topics
not available
Easychair keyphrases
not available
Paper type
unknown
DOI: 10.5281/zenodo.1422609
Zenodo URL: https://zenodo.org/record/1422609
Abstract
We present a low-cost six-degrees of freedom violin and bow pose-tracking system. The infrared and depth streams of an infrared-based Microsoft Kinect Mbox One depth camera are utilized to track the 3-dimensional position of 6 infrared markers attached on the violin and the bow. After computing the bow pose on the violin coordinate system, a number of bowing features are extracted. In order to evaluate the system’s performance, we recorded 4 bowing exercises using simultaneously our system along with a commercial two-sensor 3D tracking system based on electromagnetic field (EMF) sensing. The mean pearson coefficient values were 0.996 for the bow position, 0.889 for the bow velocity, 0.966 for the bow tilt, 0.692 for the bowbridge distance and 0.998 for the bow inclination. Compared to existing bowing-tracking solutions, the proposed solution might be appealing because of its low cost, easy setup and high performance.
Keywords
not available
Paper topics
not available
Easychair keyphrases
not available
Paper type
unknown
DOI: 10.5281/zenodo.1422548
Zenodo URL: https://zenodo.org/record/1422548
Abstract
Mechanical vibrations have typically been used in the performance domain within feedback systems to inform musicians of system states or as communication channels between performers. In this paper, we propose the additional taxonomic category of vibrational excitation of musical instruments for sound generation. To explore the variety of possibilities associated with this extended taxonomy, we present the Oktopus, a multi-purpose wireless system capable of motorised vibrational excitation. The system can receive up to eight inputs and generates vibrations as outputs through eight motors that can be positioned accordingly to produce a wide range of sounds from an excited instrument. We demonstrate the usefulness of the proposed system and extended taxonomy through the development and performance of Live Mechanics, a composition for piano and interactive electronics.
Keywords
not available
Paper topics
not available
Easychair keyphrases
not available
Paper type
unknown
DOI: 10.5281/zenodo.1422619
Zenodo URL: https://zenodo.org/record/1422619
Abstract
This article treats an aspect in a larger research agenda to understand DJ practices, which are an important part of popular music culture: We present a heuristic algorithm that estimates cue points where tracks should cross-fade in a DJ mix. We deduced statistics and heuristics from a list of rules provided by human experts, and from a database of example tracks with given cue regions. We then created an algorithm for cue-point estimation based on rich automatic annotations by state of the art MIR methods, such as music structure segmentation and beat tracking. The results were evaluated quantitatively on the example database and qualitatively by experts.
Keywords
not available
Paper topics
not available
Easychair keyphrases
not available
Paper type
unknown
DOI: 10.5281/zenodo.1422581
Zenodo URL: https://zenodo.org/record/1422581
Abstract
In this paper, we present an artificial intelligence-based musical composition algorithm for generating harmonic and various arpeggios based on given chords in real-time, which incorporates a recurrent neural network (RNN) with the gated recurrent unit (GRU) and a tetrahedral context-sensitive L-system (TCSL). RNN can play a significant role of learning inherent harmony from arpeggio datasets and providing probabilistic prediction which suggests the selection of the next note and the lengths of the selected notes. We also establish TCSL model based on a tetrahedron model utilizing seven interval operators and production rules to increase the variety of generation. TCSL is responsible for generating one note at each iteration by respectively requesting probabilistic predication from RNN, calculating optional notes and determining target note. Our experiments where we trained two RNNs for TCSL generation model indicated that the proposed algorithm has advantages in overcoming the obstacles of achieving inherent global harmony as well as the variety of generation in current computer-aided arpeggio composition algorithms. Our research attempts to extend deep learning model (DLM) towards the design space for interactive composition systems.
Keywords
not available
Paper topics
not available
Easychair keyphrases
not available
Paper type
unknown
DOI: 10.5281/zenodo.1422613
Zenodo URL: https://zenodo.org/record/1422613
Abstract
This work studies the use of signal-driven synthesis algorithms applied to an augmented guitar. A robust sub-octave generator, partially modeled after a classic audio-driven monophonic guitar synthesizer design of the 1970s is presented. The performance of the proposed system is evaluated within the context of an augmented active guitar with an actuated sound box. Results of the evaluation show that the design represents an exciting augmentation for the instrument, as it radically transforms the sound of the electric guitar while remaining responsive to the full range of the guitar playing gestures.
Keywords
not available
Paper topics
not available
Easychair keyphrases
not available
Paper type
unknown
DOI: 10.5281/zenodo.1422641
Zenodo URL: https://zenodo.org/record/1422641
Abstract
Phonation modes are considered as expressive resources in singing voice, and have been defined in four categories (breathy, pressed, neutral and flow) that correspond to different ratios of subglottal pressure and glottal airflow. This work focuses on the automatic classification of phonation modes by analyzing a number of audio descriptors and applying machine learning techniques. The proposed method extends the feature set used in previous works, and uses a correlation-based feature selection algorithm to reduce the feature set dimension. For 10 iterations, cross validation is applied to tune the hyper-parameters of a Multi-Layer Perceptron (MLP) model, and automatic classification is performed on test sets to evaluate the performance. The analysis of the features we propose justifies the decision of extending the feature set. The experiments performed on two reference datasets, separately and combined, result in a mean F-measure of 0.89 for the soprano, 0.97 for the baritone, and 0.93 for the combined datasets. The achieved results outperform the results of previous works.
Keywords
not available
Paper topics
not available
Easychair keyphrases
not available
Paper type
unknown
DOI: 10.5281/zenodo.1422611
Zenodo URL: https://zenodo.org/record/1422611
Abstract
We present a machine learning approach to model vibrato in classical music violin audio performances. A set of descriptors have been extracted from the music scores of the performed pieces and used to train a model for classifying notes into vibrato or non-vibrato, as well as for predicting the performed vibrato amplitude and frequency. In addition to score features we have included a feature regarding the fingering used in the performance. The results show that the fingering feature affects consistently the prediction of the vibrato amplitude. Finally, an implementation of the resulting models is proposed as a didactic real-time feedback system to assist violin students in performing pieces using vibrato as an expressive resource.
Keywords
not available
Paper topics
not available
Easychair keyphrases
not available
Paper type
unknown
DOI: 10.5281/zenodo.1422607
Zenodo URL: https://zenodo.org/record/1422607
Abstract
Guitar solos provide a way for guitarists to distinguish themselves. Many rock music enthusiasts would claim to be able to identify performers on the basis of guitar solos, but in the absence of veridical knowledge and/or acoustical (e.g., timbral) cues, the task of identifying transcribed solos is much harder. In this paper we develop methods for automatically classifying guitarists using (1) beat and MIDI note representations, and (2) beat, string, and fret information, enabling us to investigate whether there exist “fretboard choreographies” that are specific to certain artists. We analyze a curated collection of 80 transcribed guitar solos from Eric Clapton, David Gilmour, Jimi Hendrix, and Mark Knopfler. We model the solos as zero and first-order Markov chains, and do performer prediction based on the two representations mentioned above, for a total of four classification models. Our systems produce above-chance classification accuracies, with the firstorder fretboard model giving best results. Misclassifications vary according to model but may implicate stylistic differences among the artists. The current results confirm that performers can be labeled to some extent from symbolic representations. Moreover, performance is improved by a model that takes into account fretboard choreographies.
Keywords
not available
Paper topics
not available
Easychair keyphrases
not available
Paper type
unknown
DOI: 10.5281/zenodo.1422569
Zenodo URL: https://zenodo.org/record/1422569
Abstract
Ircam spat⇠ is a real-time audio engine dedicated to sound spatialization, artificial reverberation, and sound diffusion. This paper introduces a new major revision of the software package (spat⇠ 5), and its integration in the Max environment. First, we present the newly adopted OSC interface that is used throughout the library for controlling the processors; we discuss the motivations for this choice, the syntax in use, and the potential benefits in terms of usability, performances, customization, etc. Then we give an overview of new features introduced in this release, covering Higher Order Ambisonics processing, object-based audio production, enhanced inter-operability with VR or graphics frameworks, etc.
Keywords
not available
Paper topics
not available
Easychair keyphrases
not available
Paper type
unknown
DOI: 10.5281/zenodo.1422552
Zenodo URL: https://zenodo.org/record/1422552
Abstract
This paper describes the design, implementation and evaluation of a digital percussion instrument with multidimensional polyphonic control of a real-time physical modelling system. The system utilises modular parametric control of different physical models, excitations and couplings alongside continuous morphing and unique interaction capabilities to explore and enhance expressivity and gestural interaction for a percussion instrument. Details of the instrument and audio engine are provided together with an experiment that tested real-time capabilities of the system, and expressive qualities of the instrument. Testing showed that advances in sensor technology have the potential to enhance creativity in percussive instruments and extend gestural manipulation, but will require well designed and inherently complex mapping schemes.
Keywords
not available
Paper topics
not available
Easychair keyphrases
not available
Paper type
unknown
DOI: 10.5281/zenodo.1422605
Zenodo URL: https://zenodo.org/record/1422605
Abstract
Audio tagging (AT) refers to identifying whether a particular sound event is contained in a given audio segment. Sound event detection (SED) requires a system to further determine the time, when exactly an audio event occurs within the audio segment. Task 4 in the DCASE 2017 competition required to solve both tasks automatically based on a set of 17 sounds (horn, siren, car, bicycle, etc.) relevant for smart cars, a subset of the weakly-labeled dataset called the AudioSet. We propose the Xception - Stacked Residual Recurrent Neural Network (XRRNN), based on modifications of the system CVSSP by Xu et al. (2017), that won the challenge for the AT task. The processing stages of the XRRNN consists of 1) an Xception module as front-end, 2) a 1 ⇥ 1 convolution, 3) a set of stacked residual recurrent neural networks, and 4) a feed-forward layer with attention. Using log-Mel spectra and MFCCs as input features and a fusion of the posteriors of trained networks with those input features, we yield the following results through a set of Bonferroni-corrected t-tests using 30 models for each configuration: For AT, XRRNN significantly outperforms the CVSSP system with a 1.3\% improvement (p = 0.0323) in F-score (XRNN-logMel vs CVSSP-fusion). For SED, for all three input feature combinations, XRRNN significantly reduces the error rate by 4.5\% on average (average p = 1.06 · 10 10 ).
Keywords
not available
Paper topics
not available
Easychair keyphrases
not available
Paper type
unknown
DOI: 10.5281/zenodo.1422563
Zenodo URL: https://zenodo.org/record/1422563
Abstract
This paper describes the implementation of a portable impulse response measurement system (PIRMS). As an extension to a typical field recording scenario, the design of a PIRMS enables artists and researchers to capture high quality impulse response measurements in remote locations and under physically restrictive conditions. We describe the design requirements for such a multipurpose system. The recording of environmental sound and impulse responses is considered from both a philosophical and technical standpoint in order to address aesthetic and practical concerns.
Keywords
not available
Paper topics
not available
Easychair keyphrases
not available
Paper type
unknown
DOI: 10.5281/zenodo.1422544
Zenodo URL: https://zenodo.org/record/1422544
Abstract
In this article we present an interactive toolkit, which we will refer to as ―Drama prosodic tools‖ from now on, for the extended vocal and gestural performance of Attic tragic poetry in modern drama related to its prosodic aspects. ―Drama prosodic tools‖ are based on prosodic elements (melodic, rhythmic) of the ancient text and are used: a) to detect various parameters of the actor’s voice, b) to track the movements and gestures of the performer, c) to combine the collected data of the above mentioned processes and d) to trigger interactive sound and speech processing during the performance. In the first part, we focus on the development of modules for the phonological articulation of the ancient text based on archeomusicological readings (related to music and language) in order to add aesthetic values to the modern performance of ancient Greek drama. In the second part of this paper we present an evaluation of "Drama prosodic tools" in two different experimental performances. In the first case, prosodic tools are used by an experienced actor in ancient drama who interprets the ancient text in a conventional way; in the second one, the tools are used by an expert musician in the interpretation of ancient Greek prosody. In this way we manage to test these tools in two different situations and control programming parameters and algorithms.
Keywords
not available
Paper topics
not available
Easychair keyphrases
not available
Paper type
unknown
DOI: 10.5281/zenodo.1422621
Zenodo URL: https://zenodo.org/record/1422621
Abstract
In the past, Acoustic Scene Classification systems have been based on hand crafting audio features that are input to a classifier. Nowadays, the common trend is to adopt data driven techniques, e.g., deep learning, where audio representations are learned from data. In this paper, we propose a system that consists of a simple fusion of two methods of the aforementioned types: a deep learning approach where log-scaled mel-spectrograms are input to a convolutional neural network, and a feature engineering approach, where a collection of hand-crafted features is input to a gradient boosting machine. We first show that both methods provide complementary information to some extent. Then, we use a simple late fusion strategy to combine both methods. We report classification accuracy of each method individually and the combined system on the TUT Acoustic Scenes 2017 dataset. The proposed fused system outperforms each of the individual methods and attains a classification accuracy of 72.8\% on the evaluation set, improving the baseline system by 11.8\%.
Keywords
not available
Paper topics
not available
Easychair keyphrases
not available
Paper type
unknown
DOI: 10.5281/zenodo.1422583
Zenodo URL: https://zenodo.org/record/1422583
Abstract
This paper describes the ongoing development of a system for the creation of a distributed musical space: the MusicBox. The MusicBox has been realized as an open access point for mobile devices. It provides a musical web application enabling the musician to distribute audio events onto the connected mobile devices and control synchronous playback of these events. In order to locate the mobile devices, a microphone array has been developed, allowing to automatically identify sound direction of the connected mobile devices. This makes it possible to control the position of audio events in the musical space. The system has been implemented on a Raspberry Pi, making it very cheap and robust. No network access is needed to run the MusicBox, turning it into a versatile tool to setup interactive distributed music installations.
Keywords
not available
Paper topics
not available
Easychair keyphrases
not available
Paper type
unknown
DOI: 10.5281/zenodo.1422514
Zenodo URL: https://zenodo.org/record/1422514
Abstract
Faust is a functional programming language for audio applications, designed for real-time signal processing and synthesis. A part of the Faust source code distribution is the command line tool mesh2faust [1]. mesh2faust can process a 3D modelled mesh, and generate the corresponding audio physical model, as well as code to play its sound. Here we describe an interface for controlling mesh2faust which is implemented as a plugin for the free and opensource 3D modelling software, Blender.
Keywords
not available
Paper topics
not available
Easychair keyphrases
not available
Paper type
unknown
DOI: 10.5281/zenodo.1422625
Zenodo URL: https://zenodo.org/record/1422625
Abstract
Interaction with generative processes often concerns manipulation of their input and output or a variation of predefined parameters that are part of a given process. One can think of algorithmic procedures as black boxes, it does not matter how they work if they serve in a useful way. Based on a black box model, generative processes can be instantiated, followed by a reflection of whether one accepts their results or not. This often involves an idea of completion. That an algorithm produces a result that has to be evaluated and treated accordingly. Creative activity, (such as musical composition) is arguably not such a clearly-defined process. Instead of progressing towards known goals, a compositional process might constantly develop and change shape. In such situations, generative algorithms are needed that interact with the ongoing creative activity. Algorithms that match (and take place within) the context of evolving and dynamic compositional processes. This paper presents a software framework that addresses the relationship between interaction and generative algorithms based on scheduling and computer process management. Algorithms that are partial and scheduled based on adaptive heuristics. Interrupt-based process management and context switching as a creative force.
Keywords
not available
Paper topics
not available
Easychair keyphrases
not available
Paper type
unknown
DOI: 10.5281/zenodo.1422629
Zenodo URL: https://zenodo.org/record/1422629
Abstract
This presentation, in a first part, summarizes the genesis and the concepts that underlie the paradigm of "multisensory and interactive simulation of physical objects" introduced and developed by ACROE, as well as their implementation in a technology which is fully mature today, especially the Hélicanthe platform. In a second part, explanations are given on an artwork of the author, Helios, entirely realized with these technologies.
Keywords
not available
Paper topics
not available
Easychair keyphrases
not available
Paper type
unknown
DOI: 10.5281/zenodo.1422493
Zenodo URL: https://zenodo.org/record/1422493
Abstract
Music performance databases that can be referred to as numerical values play important roles in the research of music interpretation, analysis of expressive performances, automatic transcription, and performance rendering technology. We are creating and will publicly release a new version of the CrestMuse Performance Expression Database (PEDB), which is a performance expression database of more than two hundred virtuoso classical piano performances of scores from the Baroque period through the early 20th century, including music by Bach, Mozart, Beethoven, and Chopin. The CrestMusePEDB has been used by more than 50 research institutions around the world. It has especially contributed to research on performance rendering systems as training data. Responding to the demand to increase the database, we started a threeyear project in 2016 to develop a second edition of the CrestMusePEDB. In this second edition, 443 performances that contain quantitative data and phrase information about what the pianists had in mind while playing the performance are also included. We further report on the final stage of the project, which will end next winter.
Keywords
not available
Paper topics
not available
Easychair keyphrases
not available
Paper type
unknown
DOI: 10.5281/zenodo.1422503
Zenodo URL: https://zenodo.org/record/1422503
Abstract
This paper presents a new model for segmenting symbolic music data into phrases. It is based on the idea that melodic phrases tend to consist of notes, which increase rather than decrease in length towards the phrase end. Previous research implies that the timing of note events might be a stronger predictor of both theoretical and perceived segmentation than pitch information. Our approach therefore relies only on temporal information about note onsets. Phrase boundaries are predicted at those points in a melody where the difference between subsequent note-to-note intervals reaches minimal values. On its own, the proposed model is parameter-free, does not require adjustments to fit a particular dataset, and is not biased towards metrical music. We have tested the model on a set of 6226 songs and compared it with existing rule-based segmentation algorithms that had been previously identified as good performers: LBDM and Grouper. Next, we investigated two additional predictors: meter and the presence of pauses. Finally, we integrated all approaches into a meta-classifier, which yielded a significantly better performance than each of the individual models.
Keywords
not available
Paper topics
not available
Easychair keyphrases
not available
Paper type
unknown
DOI: 10.5281/zenodo.1422505
Zenodo URL: https://zenodo.org/record/1422505
Abstract
This paper proposes a music composition interface that visualizes intra-composer consistency and musical typicality in order to support composers' decisions about their new pieces' style while keeping them aware of the self-expression balance between keeping their own composition style and breaking out of it. While there are many composition support systems that focus on generating or suggesting specific musical elements (e.g., melody or chord), our proposed interface feeds back the user’s composition consistency and typicality to help users understand their own balance and how other composers tended to maintain some self-expression balances. To estimate the consistency and the typicality, we focused on monophonic melody (i.e., note sequence) as a musical element and modeled it by using a Bayesian topic n-gram model called the hierarchical Pitman-Yor topic model (HPYTM).By using proposed interface, named CTcomposer, the user can get comprehensive views of previous pieces by checking scatter plots of their consistency and typicality values. This interface also continuously updates and visualizes the consistency and typicality along the user input musical notes so that a piece the user is composing can be compared with previous pieces. The user can also raise or lower the consistency and typicality values as desired.
Keywords
not available
Paper topics
not available
Easychair keyphrases
not available
Paper type
unknown
DOI: 10.5281/zenodo.1422655
Zenodo URL: https://zenodo.org/record/1422655
Abstract
This paper explores how patterns found within the field of data transmission could be conceived of as musical phrases that can be performed by voice and translated by computers back into information. It is based on practicebased research, and the presentation of hardware and software to audiences within a performance. I provide an overview of the relevance of this work within the fields of sonification and musical human-computer-interaction, and continue to describe the software and hardware I have developed to convert sound to data. I describe how learning traditional musical systems that employ rhythmic improvisation such as flamenco, Cuban rumba, bata and Indian Konnakol could be useful in thinking about how computer signals can be explored musically. I conclude with reflections on performing with this system and thoughts for extending these ideas further towards multiple performers.
Keywords
not available
Paper topics
not available
Easychair keyphrases
not available
Paper type
unknown
DOI: 10.5281/zenodo.1422647
Zenodo URL: https://zenodo.org/record/1422647
Abstract
The aim of this paper is to explore deep learning architectures for the development of a real-time gesture recognizer for the Leap Motion Sensor that will be able to continuously classify sliding windows into targeted gesture classes related to responsive interactions that are used in controlling the performance with a virtual 3D musical instrument. In terms of responsiveness it is assumed that the gestures can be recognized within a small time interval, while the employed gestures are inspired by interaction with realworld percussive and string instruments. The proposed method uses a Long Short-Term Memory (LSTM) network on top of a feature embedding layer of the raw data sequences as input, to map the input sequence to a vector of fixed dimensionality which is subsequently passed to a dense layer for classification among the targeted gesture classes. The performance evaluation of the proposed system has been carried out on a dataset of hand gestures of 8 classes with 11 participants. We report a recognition rate of 92.62\% for a 10-fold cross-validation setup and 85.50\% for a cross-participant setup. We also demonstrate that the later recognition rate can be further improved by adapting the trained model with the addition of few user gesture samples in the training set.
Keywords
not available
Paper topics
not available
Easychair keyphrases
not available
Paper type
unknown
DOI: 10.5281/zenodo.1422601
Zenodo URL: https://zenodo.org/record/1422601
Abstract
Dervish Sound Dress is a wearable piece of technology; a garment that is inspired by the sacred experience of the Whirling Dervishes or the Mevlevi Sufi order in Turkey. The garment functions as an instrument when it is worn and changes depending on how the wearer moves. The cultural traditions of the Mevlevi Sufis and their metaphysical experience during the turning ritual of the ‘sema’ performance is the inspiration behind the creation of a garment that emulates sounds by using body movement. Dervish Sound Dress is outfitted with sensors that emit musical sounds with every movement that the wearer makes. The movement triggers sensations through haptic feedback like when a musician plays an instrument. The project seeks to explore how technology can be integrated into a garment as an expressive body instrument to amplify contemporary sonic performance. Dervish Sound Dress explores how through performance, sound and sound vibrations that are used in a garment can generate an emotive response in the wearer by creating sonic expression. This dress is accessible to anyone wishing to embark on a unique musical journey. Keywords: Dervish, sound design, haptics, wearable technology
Keywords
not available
Paper topics
not available
Easychair keyphrases
not available
Paper type
unknown
DOI: 10.5281/zenodo.1422599
Zenodo URL: https://zenodo.org/record/1422599
Abstract
In this paper we present our current work on the development of a web-based system that allows users to design and interact with virtual music instruments in a virtual 3D environment, providing three natural and intuitive means of interaction (physical, gestural and mixed). By employing the Leap Motion sensor, we benefit from its high performance on providing accurate finger tracking data. The proposed system is integrated as a creative tool of a novel STEAM education platform that promotes science learning through musical activities. Our approach models two families of music instruments (stringed and percussion), with realistic sonic feedback by utilizing a physical model-based sound synthesis engine. Consequently, the proposed interface meets the performance requirements of real-time interaction systems and is implemented strictly with web technologies that allow platform independent functionality.
Keywords
not available
Paper topics
not available
Easychair keyphrases
not available
Paper type
unknown
DOI: 10.5281/zenodo.1422593
Zenodo URL: https://zenodo.org/record/1422593
Abstract
This paper presents the introduction of electroacoustic (e/a) music composition to the secondary education through a case study at the Music School of Nicosia, Cyprus. The teaching method focused on e/a music composition, through the processes of guided listening, analysis of compositions, recording, manipulation and editing of sounds, and culminated in a concert with compositions and live sound-diffusion by the students. The objective was to guide students with no previous knowledge of e/a music to compose original works with sounds. It is shown that it is possible, for students of young age, to produce high quality compositions after only 4 months of instruction and tutoring. We find that is important to maintain a high tutor-to-student ratio and that students with longer teaching periods (2 versus 1 weekly in our case study) produced higher quality compositions. At the end of the project 90\% of the students commented that they really enjoyed working on this project and were very satisfied with their results. Especially the students that reached a high level of quality of sound material and manipulation, expressed the desire to continue to listen to, and compose e/a music in the future.
Keywords
not available
Paper topics
not available
Easychair keyphrases
not available
Paper type
unknown
DOI: 10.5281/zenodo.1422532
Zenodo URL: https://zenodo.org/record/1422532
Abstract
This paper discusses a new method for encoding Byzantine Music Neumatic Notation (especially the one developed during the ‘transitory’ period 1670-1814). The Notation of this period is characterized by difficulties and peculiarities. The difficult access to Byzantine manuscripts and their deteriorated condition, complicate reading. In addition, our incomplete knowledge of the interpretation of signs impedes the comprehension of the musical text leading in results that are often in dispute. The fact that sign unions are complex enough together with their presence in various places in a composition make electronic transcriptions the ultimate challenge. Moreover, there does not exist a framework for data encoding and analysis. This work presents a proposal for the development of such a model for the old Byzantine Neumatic Notation in Python. The implementation of this project is still at an initial stage, and focuses, mostly, on the efficient digitization of old manuscripts. The system, even though fully functional, has certain limitations. Some signs are missing, and the musical text is created using microphotographies. Future developments of the program will focus on resolving these deficiencies and adding more features to the system.
Keywords
not available
Paper topics
not available
Easychair keyphrases
not available
Paper type
unknown
DOI: 10.5281/zenodo.1422575
Zenodo URL: https://zenodo.org/record/1422575
Abstract
In a previous experiment we have measured the subjective perception of auditory lateralization in listeners who were exposed to binaural piano tone reproductions, under different conditions (normal and reversed-channel listening, manual or automatic tone production by a Disklavier, and disclosure or hiding of the same keys when they were autonomously moving during the automatic production of a tone.) This way, participants were engaged in a localization task under conditions also involving visual as well as proprioceptive (that is, relative to the position and muscular effort of their body parts) identification of the audio source with the moving key, even when the binaural feedback was reversed. Their answers, however, were clustered on a limited region of the keyboard when the channels were not reversed. The same region became especially narrow if the channels were reversed. In this paper we report about an acoustic analysis of the localization cues conducted on the stimuli that have been used in the aforementioned experiment. This new analysis employs a computational auditory model of sound localization cues in the horizontal plane. Results suggest that listeners used interaural level difference cues to localize the sound source, and that the contribution of visual and proprioceptive cues in the localization task was limited especially when the channels were reversed.
Keywords
not available
Paper topics
not available
Easychair keyphrases
not available
Paper type
unknown
DOI: 10.5281/zenodo.1422510
Zenodo URL: https://zenodo.org/record/1422510
Abstract
Multimodal diegetic narrative tools, as applied in multimedia arts practices, possess the ability to cross the spaces that exist between the physical world and the imaginary. Within this paper we present the findings of a multidiscipline practice-based research project that explored the potential of an audio-visual art performance to purposefully interact with an audience’s perception of narrative place. To achieve this goal, research was undertaken to investigate the function of multimodal diegetic practices as applied in the context of a sonic-art narrative. This project direction was undertaken to facilitate the transformation of previous experiences of place through the creative amalgamation and presentation of collected audio and visual footage from real-world spaces. Through the presentation of multimedia relating to familiar geographical spatial features, the audience were affected to evoke memories of place and to construct and manipulate their own narrative.
Keywords
not available
Paper topics
not available
Easychair keyphrases
not available
Paper type
unknown
DOI: 10.5281/zenodo.1422518
Zenodo URL: https://zenodo.org/record/1422518
Abstract
Introducing music training to children can often present challenges, so alternative methods and tools different from traditional training are desirable. In this paper we describe the design and evaluation of an application for musical performance and playful expression targeted to children age 7-11. The application is tailored to fit into the context of a musical workshop named Small Composers, which is run by the FIGURA Ensemble. The application was evaluated in an actual workshop setting to assess how well it can it be integrated into the conventional session, and it was found to have potential for being part of future workshops.
Keywords
not available
Paper topics
not available
Easychair keyphrases
not available
Paper type
unknown
DOI: 10.5281/zenodo.1422540
Zenodo URL: https://zenodo.org/record/1422540
Abstract
The formalization of musical composition rules is a topic that has been studied for a long time. It can lead to a better understanding of the underlying processes, and provide a useful tool for musicologist to aid and speed up the analysis process. In our attempt we introduce Schoenberg’s rules from Fundamentals of Musical Composition using a specialized version of Petri nets, called Music Petri nets. Petri nets are a formal tool for studying systems that are concurrent, asynchronous, distributed, parallel, nondeterministic, and/or stochastic. We present some examples highlighting how multiple approaches to the analysis task can find counterparts in specific instances of PNs.
Keywords
not available
Paper topics
not available
Easychair keyphrases
not available
Paper type
unknown
DOI: 10.5281/zenodo.1422579
Zenodo URL: https://zenodo.org/record/1422579
Abstract
In sound synthesis nonlinear oscillators are used to produce sounds with rich spectra or as LFOs to produce envelopes and trajectories. The frequencies of nonlinear oscillators depend on various parameters and, in most cases, cannot be calculated. This paper presents some well-known nonlinear oscillators which have been implemented in several sound synthesis languages. It is shown how to measure the frequencies of these oscillators precisely in a fast and straightforward way. The fundamentals of feedback control are introduced and applied to the control of the frequencies of these oscillators by adapting their parameters or the time step of their simulation. For demonstration purposes the oscillators as well as the measurement and control systems have been implemented in mxj-externals for Max and provided for download from http://www.icst.net/downloads.
Keywords
not available
Paper topics
not available
Easychair keyphrases
not available
Paper type
unknown
DOI: 10.5281/zenodo.1422639
Zenodo URL: https://zenodo.org/record/1422639
Abstract
A radio play is a form of drama which exists in the acoustic domain and is usually consumed over broadcast radio. In this paper a method is proposed that, given a story in the form of unstructured text, produces a radio play that tells this story. First, information about characters, acting lines, and environments is retrieved from the text. The information extracted serves to generate a production script which can be used either by producers of radio-drama, or subsequently used to automatically generate the radio play as an audio file. The system is evaluated in two parts: precision, recall, and f1 scores are computed for the information retrieval part while multistimulus listening tests are used for subjective evaluation of the generated audio.
Keywords
not available
Paper topics
not available
Easychair keyphrases
not available
Paper type
unknown
DOI: 10.5281/zenodo.1422536
Zenodo URL: https://zenodo.org/record/1422536
Abstract
This paper describes the publication of a musical structure analysis database and a tool for manually generating time-span trees on the basis of the generative theory of tonal music (GTTM). We previously analyzed 300 pieces of music with the analysis database and the analysis editor on the basis of the GTTM. However, the length of each piece was about eight bars (which is short), and the conventional analysis editor did not work for pieces of music with a large number of bars and notes, which thus took a lot of time to manually edit. Therefore, we developed a tool for manually generating time-span tree analyses for the GTTM that can manipulate a piece of music with a large number of bars and notes. Four musicologists developed an analysis database of 50 musical pieces of 32 bars in length by using the manual generation tool. The experimental results show that the average editing time with the time-span tree generation tool is shorter than that with the previous tool.
Keywords
not available
Paper topics
not available
Easychair keyphrases
not available
Paper type
unknown
DOI: 10.5281/zenodo.1422651
Zenodo URL: https://zenodo.org/record/1422651
Abstract
Creating harmony in karaoke by a lead vocalist and a backing vocalist is enjoyable, but backing vocals are not easy for non-advanced karaoke users. First, it is difficult to find musically appropriate submelodies (melodies for backing vocals). Second, the backing vocalist has to practice backing vocals in advance in order to play backing vocals accurately, because singing submelodies is often influenced by the singing of the main melody. In this paper, we propose a backing vocals practice system called HamoKara. This system automatically generates a submelody with a rulebased or HMM-based method, and provides users with an environment for practicing backing vocals. Users can check whether their pitch is correct through audio and visual feedback. Experimental results show that the generated submelodies are musically appropriate to some degree, and the system helped users to learn to sing submelodies to some extent.
Keywords
not available
Paper topics
not available
Easychair keyphrases
not available
Paper type
unknown
DOI: 10.5281/zenodo.1422675
Zenodo URL: https://zenodo.org/record/1422675
Abstract
A multimodal simulation of instrumental virtual strings is proposed. The system presents two different scenes under the Unity3D software, respectively representing guitar and bass strings. Physical interaction is enabled by a Sensable Technologies PhantomTM Omni, a portable haptic device with six degrees of freedom. Thanks to this device, credible physically-modeled haptic cues are returned by the virtual strings. Audio and visual feedback are dealt with by the system, too. Participants in a pilot user test appreciated the simulation especially concerning the haptic component.
Keywords
not available
Paper topics
not available
Easychair keyphrases
not available
Paper type
unknown
DOI: 10.5281/zenodo.1422635
Zenodo URL: https://zenodo.org/record/1422635
Abstract
This position paper introduces the concept of complexity management in instrument design as a means to optimize the learning rewards cycle in an effort to maintain player motivation. Successful fluency and expertise on an instrument requires sustained practice. In the quest to enable exceptional levels of expression, instruments designed for virtuosic performance often have a high level of complexity, which can be overwhelming for a beginner, decreasing practice motivation. Here we explain complexity management, the idea of intentionally limiting instrument complexity on a temporary basis so that instrument difficulty is optimally matched to user skill and users always remain capable of focused learning and enjoy sufficient musical success to motivate continued practice. We discuss the relevance of Csikszentmihalyi’s ideas about flow, along with concepts from traditional music learning, such as chunking and internalization, along with the importance of practice and enjoyment. We then propose our own concept of learning efficiency and the importance of controlling challenge. Finally, we introduce our own experiments into complexity management using the violin, an existing example of an instrument with high input complexity. We discuss the effects of simplifying intonation in order to make early musical success easier along with plans for further investigations.
Keywords
not available
Paper topics
not available
Easychair keyphrases
not available
Paper type
unknown
DOI: 10.5281/zenodo.1422538
Zenodo URL: https://zenodo.org/record/1422538
Abstract
This paper aims to present the notion of aural microspace, an area whose aural architecture is not accessible unless it is mediated by recording technology and discuss the exploration of this concept in compositional practice. The author analyses the characteristics of acoustic space from a spectromorphological, cultural and technical perspective, with a focus on auditory intimacy and is proposing novel ways for working in this domain with references to two multichannel acousmatic works, Topophilia and Karst Grotto.
Keywords
not available
Paper topics
not available
Easychair keyphrases
not available
Paper type
unknown
DOI: 10.5281/zenodo.1422524
Zenodo URL: https://zenodo.org/record/1422524
Abstract
This paper presents a user interface for the exploration of music libraries based on parametric t-SNE. Each song in the music library is represented as a stack of 34-dimensional vectors containing features related to genres, emotions and other musical characteristics. Parametric t-SNE is used to construct a model that extracts a pair of coordinates from these features for each song, preserving similarity relations between songs in the high dimensional-feature space and their projection in a two-dimensional space. The two-dimensional output of the model will be used for projecting and rendering a song catalogue onto a plane. We have investigated different models, which have been obtained by using different structures of hidden layers, pre-training technique, features selection, and data pre-processing. These results are an extension of a previous project published by Moodagent Company, which show that the clustering adaptation of genres and emotions, that is obtained by using parametric t-SNE, is by far more accurate than the previous methods based on PCA. Finally, our study proposes a visual representation of the resulting model. The model has been used to build a music-space of 20000 songs, visually rendered for browser interaction. This provides the user with a certain degree of freedom to explore the music-space by changing the highlighted features and it offers an immersive experience for music exploration and playlist generation.
Keywords
not available
Paper topics
not available
Easychair keyphrases
not available
Paper type
unknown
DOI: 10.5281/zenodo.1422557
Zenodo URL: https://zenodo.org/record/1422557
Abstract
KO2 is a platform for distributed musical applications, consisting of the messaging protocol O2 and the signal processing language Kronos. This study is an effort to use O2 as a comprehensive communications framework for inter-process signal routing, including clock synchronization and audio. The Kronos compiler is exposed as an O2 service, allowing remotely specified programs to be compiled and run in near real-time on various devices in the network.
Keywords
not available
Paper topics
not available
Easychair keyphrases
not available
Paper type
unknown
DOI: 10.5281/zenodo.1422645
Zenodo URL: https://zenodo.org/record/1422645
Abstract
This work focuses on automatic makam recognition task for Turkish makam music (TMM) using pitch distributions that are widely used in mode recognition tasks for various music traditions. Here, we aim to improve the performance of previous works by extending distribution features and performing parameter optimization for the classifier. Most music theory resources specifically highlight two aspects of the TMM makam concept: use of microtonal intervals and an overall melodic direction that refers to the design of melodic contour on the song/musical piece level. Previous studies for makam recognition task already utilize the microtonal aspect via making use of high resolution histograms (using much finer bin width than one 12th of an octave). This work considers extending the distribution feature by including distributions of different portions of a performance to reflect the long-term characteristics referred in theory for melodic contour, more specifically for introduction and finalis. Our design involves a Multi-Layer Perceptron classifier using an input feature vector composed of pitch distributions of the first and the last sections together with the overall distribution, and the mean accuracy of 10 iterations is 0.756. The resources used in this work are shared for facilitating further research in this direction.
Keywords
not available
Paper topics
not available
Easychair keyphrases
not available
Paper type
unknown
DOI: 10.5281/zenodo.1422577
Zenodo URL: https://zenodo.org/record/1422577
Abstract
We present an interactive generative method for bridgingbetween sound-object composition rooted in Pierre Schaeffer’s TARTYP taxonomy and transformational pitch-class composition ingrainedin Klumpenhouwer Networks. We create a quantitative representation of soundobjectswithin an ordered sound space. We use this representation to define a probability-based mapping of pitch classes to sound objects. We demonstrate the im-plementation of the method in a real-time compositional process that also utilizes our previous work on aTARTYP generative grammar tool and an interactive K-Network tool.
Keywords
not available
Paper topics
not available
Easychair keyphrases
not available
Paper type
unknown
DOI: 10.5281/zenodo.1422659
Zenodo URL: https://zenodo.org/record/1422659
Abstract
The present work explores the potential of networked interaction through dance. The paper reports the conception and initial implementation steps of an ongoing project, with public presentations planned for June and October 2018. The project's main objective is to extend the interaction paradigm of live coding through intimate coupling to human body movement using wearable devices. Our target is a performance involving dancers in separate locations, whose movements are tracked with magnetic and acceleration-based sensors on wireless wearable devices. In this way, two or more dancers performing concurrently in distant locations can jointly create a performance by sharing data measured by the sensors. Inspired by traditional african music practices, where several musicians play on one instrument creating interlocking rhythmic patterns, and by research on rhythmical interweaving (epiplokē) of ancient greek metrical theory, we use the data to modulate the metric patterns in the performance in order to weave rhythmic patterns. We discuss the design choices and implementation challenges for a performance. The second main objective is to develop a prototype that demonstrates the use of literate programming and reproducible research practices with open source tools and evaluates the advantages of such techniques for development as well as for dissemination and ultimately educational purposes. We develop new tools and workflows using EMACS and org-mode as a platform for both documentation and development on an embedded wearable device made with CHIP-PRO. We show the benefits of using this environment both for documentation and for streamlining and speeding up the development process.
Keywords
not available
Paper topics
not available
Easychair keyphrases
not available
Paper type
unknown
DOI: 10.5281/zenodo.1422665
Zenodo URL: https://zenodo.org/record/1422665
Abstract
An electronic wind instrument is an analog or digital electronic instrument actuated by blowing onto an electronic sensor. Through the history of electronic wind instruments, the refinement of the physical interfaces has not been followed by major innovations regarding breath and embouchure detection: the industry is still largely relying on pressure sensors for measuring air flow. We argue that many sound production techniques for acoustic wind instruments depend on breath quality in addition to breath quantity, and that most of the commercially available electronic options do not account for this. A series of breath signal measurements indicated that an electret microphone flush-mounted in a plastic tube is a suitable sensor for feature extraction of the player’s breath. Therefore, we propose the implementation of an electronic wind instrument, which captures the envelope and frequency content of the breath and detects whether the signal is voiced or unvoiced. These features are mapped to the parameters of an FM synthesizer, with an external MIDI keyboard providing pitch control. A simple evaluation shows that the proposed implementation is able to capture the intended signal features, and that these translate well into the character of the output signal. A short performance was recorded to demonstrate that our instrument is potentially suitable for live applications.
Keywords
not available
Paper topics
not available
Easychair keyphrases
not available
Paper type
unknown
DOI: 10.5281/zenodo.1422534
Zenodo URL: https://zenodo.org/record/1422534
Abstract
In this paper, we present a data-driven approach for automatically generating South Indian rhythmic patterns. The method uses a corpus of Carnatic percussion compositions and grooves performed in adi tala – a uniform eight-beat cycle. To model the rhythmic structure and the generation process of phrasings that fit within the tala cycles, we use a set of arithmetic partitions that model the strategies used by professional Carnatic percussionists in their performance. Each partition consists of combinations of stroke sequences. This modeling approach has been validated in terms of the groupings used in this music idiom by direct discussions with Carnatic music experts. Two approaches were used for grouping the sequences of strokes into meaningful rhythmic patterns. The first is based on a well-formed dictionary of prerecorded phrase variations of stroke groupings and the second one on a segmentation algorithm that works by comparing the distance of adjacent strokes. The sequences of strokes from both approaches were later analyzed and clustered by similarity. The results from these analyses are discussed and used to improve existing generative approaches for modelling this particular genre by emulating Carnatic-style percussive sequences and creating rhythmic grooves. The creation of these tools can be used by musicians and artists for creative purposes in their performance and also in music education as a means of actively enculturing lay people into this musical style.
Keywords
not available
Paper topics
not available
Easychair keyphrases
not available
Paper type
unknown
DOI: 10.5281/zenodo.1422615
Zenodo URL: https://zenodo.org/record/1422615
Abstract
Several studies provide evidence that blind people orient themselves using echolocation, transmitting signals with mouth clicks. Our previous study within embodiment in Virtual Reality (VR) showed the possibility to enhance a Virtual Body Ownership (VBO) illusion over a body morphologically different from human in the presence of agency. In this paper, we explore real-time audio navigation with echolocation in Virtual Environment (VE) in order to create a feeling of being a virtual bat. This includes imitation of the sonar system, which might help to achieve a stronger VBO illusion in the future, as well as build an echolocation training simulator. Two pilot tests were conducted using a within-subject study design, exploring time and traveled distance during spatial orientation in VE. Both studies, involved four conditions – early reflections, reverb, early reflections-reverb (with deprived visual cues) and finally vision. This resulted in preferred reflection pulses for the test subjects with musical background, while only reverberation features were favored by non-musicians, when being exposed to VE walking-based task.
Keywords
not available
Paper topics
not available
Easychair keyphrases
not available
Paper type
unknown
DOI: 10.5281/zenodo.1422555
Zenodo URL: https://zenodo.org/record/1422555
Abstract
The Arab-Andalusian music is performed through nawabāt (plural of nawba), suites of instrumental and vocal pieces ordered according to their metrical pattern in a sequence of increasing tempo. This study presents for the first time in literature a system for automatic recognition of nawba for audio recordings of the Moroccan tradition of ArabAndalusian music. The proposed approach relies on template matching applied to pitch distributions computed from audio recordings. The templates have been created using a data-driven approach, utilizing a score collection categorized into nawabāt. This methodology has been tested on a dataset of 58 hours of music: a set of 77 recordings in eleven nawabāt from the Arab-Andalusian corpus collected within the CompMusic project and stored in Dunya platform. An accuracy of 75\% on the nawba recognition task is reported using Euclidean distance (L2) as distance metric in the template matching.
Keywords
not available
Paper topics
not available
Easychair keyphrases
not available
Paper type
unknown
DOI: 10.5281/zenodo.1422623
Zenodo URL: https://zenodo.org/record/1422623
Abstract
Sound and music computing (SMC) is still an emerging field in many institutions, and the challenge is often to gain critical mass for developing study programs and undertake more ambitious research projects. We report on how a long-term collaboration between small and medium-sized SMC groups have led to an ambitious undertaking in the form of the Nordic Sound and Music Computing Network (NordicSMC), funded by the Nordic Research Council and institutions from all of the five Nordic countries (Denmark, Finland, Iceland, Norway, and Sweden). The constellation is unique in that it covers the field of sound and music from the “soft” to the “hard,” including the arts and humanities, the social and natural sciences, and engineering. This paper describes the goals, activities, and expected results of the network, with the aim of inspiring the creation of other joint efforts within the SMC community.
Keywords
not available
Paper topics
not available
Easychair keyphrases
not available
Paper type
unknown
DOI: 10.5281/zenodo.1422528
Zenodo URL: https://zenodo.org/record/1422528
Abstract
Music recommender systems can offer users personalized and contextualized recommendation and are therefore important for music information retrieval. An increasing number of datasets have been compiled to facilitate research on different topics, such as content-based, context-based or next-song recommendation. However, these topics are usually addressed separately using different datasets, due to the lack of a unified dataset that contains a large variety of feature types such as item features, user contexts, and timestamps. To address this issue, we propose a large-scale benchmark dataset called #nowplaying-RS, which contains 11.6 million music listening events (LEs) of 139K users and 346K tracks collected from Twitter. The dataset comes with a rich set of item content features and user context features, and the timestamps of the LEs. Moreover, some of the user context features imply the cultural origin of the users, and some others—like hashtags—give clues to the emotional state of a user underlying an LE. In this paper, we provide some statistics to give insight into the dataset, and some directions in which the dataset can be used for making music recommendation. We also provide standardized training and test sets for experimentation, and some baseline results obtained by using factorization machines.
Keywords
not available
Paper topics
not available
Easychair keyphrases
not available
Paper type
unknown
DOI: 10.5281/zenodo.1422565
Zenodo URL: https://zenodo.org/record/1422565
Abstract
We present the first implementation of a new framework for sound and music computing, which allows humans to explore musical environments by communicating feedback to an artificial agent. It is based on an interactive reinforcement learning workflow, which enables agents to incrementally learn how to act on an environment by balancing exploitation of human feedback knowledge and exploration of new musical content. In a controlled experiment, participants successfully interacted with these agents to reach a sonic goal in two cases of different complexities. Subjective evaluations suggest that the exploration path taken by agents, rather than the fact of reaching a goal, may be critical to how agents are perceived as collaborative. We discuss such quantitative and qualitative results and identify future research directions toward deploying our “co-exploration” approach in real-world contexts.
Keywords
not available
Paper topics
not available
Easychair keyphrases
not available
Paper type
unknown
DOI: 10.5281/zenodo.1422507
Zenodo URL: https://zenodo.org/record/1422507
Abstract
In this paper we present a pilot study carried out within the project SONAO. The SONAO project aims to compensate for limitations in robot communicative channels with an increased clarity of Non-Verbal Communication (NVC) through expressive gestures and non-verbal sounds. More specifically, the purpose of the project is to use movement sonification of expressive robot gestures to improve Human-Robot Interaction (HRI). The pilot study described in this paper focuses on mechanical robot sounds, i.e. sounds that have not been specifically designed for HRI but are inherent to robot movement. Results indicated a low correspondence between perceptual ratings of mechanical robot sounds and emotions communicated through gestures. In general, the mechanical sounds themselves appeared not to carry much emotional information compared to video stimuli of expressive gestures. However, some mechanical sounds did communicate certain emotions, e.g. frustration. In general, the sounds appeared to communicate arousal more effectively than valence. We discuss potential issues and possibilities for the sonification of expressive robot gestures and the role of mechanical sounds in such a context. Emphasis is put on the need to mask or alter sounds inherent to robot movement, using for example blended sonification.
Keywords
not available
Paper topics
not available
Easychair keyphrases
not available
Paper type
unknown
DOI: 10.5281/zenodo.1422499
Zenodo URL: https://zenodo.org/record/1422499
Abstract
Previous studies have shown that people agree with each other on the perceived valence and arousal for soundscape recordings. This study investigates whether we can compute the perceived emotion of the mixedsoundscape recordings based on the perceived emotion of source soundscape recordings. We discovered quantifiable trends in the effect of mixing on the perceived emotion of soundscape recordings. Regression analysis based on the trajectory observation resulted in coefficients with high R2 values. We found that the change of loudness of a source soundscape recording had an influence on its weight on the perceived emotion of mixed-soundscape recordings. Our visual analysis of the center of mass data plots found the specific patterns of the perceived emotion of the source soundscape recordings that belong to different soundscape categories and the perceived emotion of the mix. We also found that when the difference in valence/arousal between two source soundscape recordings is larger than a given threshold, it is highly likely that the valence/arousal of the mix is in between the valence/arousal of two source soundscape recordings.
Keywords
not available
Paper topics
not available
Easychair keyphrases
not available
Paper type
unknown
DOI: 10.5281/zenodo.1422591
Zenodo URL: https://zenodo.org/record/1422591
Abstract
Although the physics of the bowed violin string are well understood, most audio feature extraction algorithms for violin still rely on general-purpose signal processing methods with latencies and accuracy rates that are unsuitable for real-time professional-calibre performance. Starting from a pickup which cleanly captures the motion of the bowed string with minimal colouration from the bridge and body, we present a lightweight time-domain method for modelling string motion using segmented linear regression. The algorithm leverages knowledge of the patterns of Helmholtz motion to produce a set of features which can be used for control of real-time synthesis processes. The goal of the paper is not a back-extraction of physical ground truth, but a responsive, low-latency feature space suitable for performance applications.
Keywords
not available
Paper topics
not available
Easychair keyphrases
not available
Paper type
unknown
DOI: 10.5281/zenodo.1422597
Zenodo URL: https://zenodo.org/record/1422597
Abstract
This paper describes the process of recreating a glass harmonica inspired new interface for musical expression. In order to design and implement this interface, a number of technologies have been incorporated in the development process. By combining fabrication techniques like laser cutting and 3D printing, a one octave chromatic scale replica interface has been built. The interaction was facilitated by using a MPR121 capacitive sensing chip paired with an Arduino Uno. The interface controls a physics based model of a glass harmonica. We present a preliminary evaluation of the interface as a solo instrument. This project is the continuation of a series of experiments whose goal is to recreate some musical instruments from the Danish Musical Instruments’ museum using fabrication techniques and sound synthesis. The ultimate goal is to encourage visitors of the museum to play with a replica of the instruments, in order to experience their gestural interaction, playability and sound quality, without touching the precious original counterpart.
Keywords
not available
Paper topics
not available
Easychair keyphrases
not available
Paper type
unknown
DOI: 10.5281/zenodo.1422571
Zenodo URL: https://zenodo.org/record/1422571
Abstract
This paper presents a novel classification technique for datasets of similar audio fragments with different durations, that allows testing pertinence of fragments without the need of embedding data in a common representation space. This geometrically-motivated technique considers direct DTW measurements between alternative different-sized representations, such as MFCCgrams or chromagrams, and defines the classification problem over relative embeddings based on point-to-set distances. The proposed technique is applied to the classification of voice recordings containing both normal and disturbed speech utterances, showing that it significantly improves performance metrics with respect to usual alternatives for this type of classification, such as bag-of-words histograms and Hidden-Markov Models. An experiment was conducted using the Universal Access Speech database (UA-Speech) from the University of Illinois, which contains over 700 different words recorded by 19 dysarthric speakers and 13 speakers without any speech disorder. The method proposed here achieved a global Fmeasure (with 10-fold cross-validation) above 95\%, against 81\% for a bag-of-words classification and 83\% for Hidden Markov Models.
Keywords
not available
Paper topics
not available
Easychair keyphrases
not available
Paper type
unknown
DOI: 10.5281/zenodo.1422585
Zenodo URL: https://zenodo.org/record/1422585
Abstract
Historic performance spaces throughout the world have stood the test of time, in many cases reportedly due to their superior acoustics. Numerous studies focus on measurements made within these spaces to verify or disprove these claims. Regardless of the reason for their demise, the reputations of a number of historic theaters have been maintained due to the significance of their performances and premieres. Not all these spaces remain in the present day for modern study and analysis; however, current computational technologies allow extinct theaters to be digitally reconstructed and simulated. This work focuses on the construction of models and acoustic simulations for two notable Viennese theaters from the 18th century. Analysis elucidates how room acoustics may have been perceived within the spaces during a period when opera and theater were thriving in the region. Moreover, the acoustic characteristics of these theaters are compared to the standards of modern metrics. In doing so, a determination of the quality of each venue is made.
Keywords
not available
Paper topics
not available
Easychair keyphrases
not available
Paper type
unknown
DOI: 10.5281/zenodo.1422550
Zenodo URL: https://zenodo.org/record/1422550
Abstract
In this paper we explore the idea of transforming a portable speaker into an interactive music sketch pad using low and high fidelity prototyping. We present research into the current state of the art of musical sketch pads and specify the requirements for a new concept based on pressure-sensitive conductive fabric. We developed a virtual prototype and subjected it to user testing. Results from the user test led to the design and implementation of a high fidelity prototype based on a single-board computer with an additional audio interface communicating with a custom embedded MIDI controller. A real-time, loop-based musical software platform was developed as part of the high fidelity prototype. Finally, user test results are presented and discussed.
Keywords
not available
Paper topics
not available
Easychair keyphrases
not available
Paper type
unknown
DOI: 10.5281/zenodo.1422633
Zenodo URL: https://zenodo.org/record/1422633
Abstract
This paper introduces how a participatory performance named Embodied iSound gathered a number of technologies, sound and music in order to explore the experience of sonic crossings. The main components of the experience are introduced, from the inception of design, inspired by contemporary social and political events, to the implementation of technical solutions. Among these, participants’ gestures and movements are tracked in order to control sound generation and spatialization. The physicality of a performance space is used to enable an allinclusive listening experience that activates all senses as well as personal and collective memories. The performance was evaluated by the audience who provided encouraging feedback, in particular regarding embodiment, interaction, and immersiveness.
Keywords
not available
Paper topics
not available
Easychair keyphrases
not available
Paper type
unknown
DOI: 10.5281/zenodo.1422673
Zenodo URL: https://zenodo.org/record/1422673
Abstract
Artificial Intelligence (AI) algorithms have been extensively employed for generating art in many forms. From image generation to sound design, the self-emergence of structures in such algorithms make them suitable for exploring forms of computational creativity in automated generation of art. This work explores sound synthesis by combining swarm intelligence, user interaction and a novel sonic communication protocol between socially-capable artificial agents. These are termed “Sonoids” and socially behave following the well-known boids algorithm but perceive their environment (positions, velocities and identities of other agents) through sound. For the purposes of the work, the overall sound–synthesis environment is demonstrated as an iOS application that handles the Sonoids movement and sonifies their social interaction. User interaction is additionally allowed, which can modify specific parameters of the sonic communication protocol, leading to rich sonifications that present self-emergent structure.
Keywords
not available
Paper topics
not available
Easychair keyphrases
not available
Paper type
unknown
DOI: 10.5281/zenodo.1422681
Zenodo URL: https://zenodo.org/record/1422681
Abstract
Deep learning has proven very effective in image and audio classification tasks. Is it possible to improve the performance of emotion recognition tasks based on deep learning approaches? We introduce the strength of deep learning in the context of soundscape emotion recognition (SER). To the best of our knowledge, this is the first study to use deep learning for SER. The main aims are to evaluate the performance of the Convolutional Neural Network (CNN) trained from scratch, the Long ShortTerm Memory Recurrent Neural Networks (LSTM-RNN) trained from scratch, the CNN trained through supervised fine-tuning, the Support Vector Machines for Regression (SVR), and the combination of CNN and SVR (Transfer Learning) for predicting the perceived emotion of soundscape recordings. The results show that deep learning is a promising approach for improving the performance for SER. Moreover, the fine-tuned VGG-like audio classification model outperforms the other deep-learning frameworks regarding predicting valence. The best performance of predicting arousal is obtained by the CNN trained from scratch. Finally, we analyze the performance of predicting perceived emotion for soundscape recordings in each of Schafer's soundscape categories.
Keywords
not available
Paper topics
not available
Easychair keyphrases
not available
Paper type
unknown
DOI: 10.5281/zenodo.1422589
Zenodo URL: https://zenodo.org/record/1422589
Abstract
Despite the proliferation of new digital musical instruments (DMIs) coming from a diverse community of designers, researchers and creative practitioners, many of these instruments experience short life cycles and see little actual use in performance. There are a variety of reasons for this, including a lack of established technique and repertoire for new instruments, and the prospect that some designs may be intended for other purposes besides performance. In addition, we propose that many designs may not meet basic functional standards necessary for an instrument to withstand the rigors of real-world performance situations. For active and professional musicians, a DMI might not be viable unless these issues have been specifically addressed in the design process, as much as possible, to ensure troublefree use during performance. Here we discuss findings from user surveys around the design and use of DMIs in performance, from which we identify primary factors relating to stability, reliability and compatibility that are necessary for their dependable use. We then review the state of the art in new instrument design through 40 years of proceedings from three conferences - ICMC, NIME, and SMC - to see where and how these have been discussed previously. Our review highlights key factors for the design of new instruments to meet the practical demands of real-world use by active musicians.
Keywords
not available
Paper topics
not available
Easychair keyphrases
not available
Paper type
unknown
DOI: 10.5281/zenodo.1422595
Zenodo URL: https://zenodo.org/record/1422595
Abstract
The sonification of line charts, from which auditory line charts are produced is a common sonification strategy used today. This paper examines timbre as a potentially useful sonic dimension for relaying information in sonified line charts. A user-study is presented in which 43 participants were tasked with identifying particular trends among multiple distractor trends using sonified data. These sonified data comprised frequency-mapped trends isolated with the gradual enrichment of harmonic content, using a sawtooth wave as a guideline for the overall harmonic structure. Correlations between harmonic content and identification success rates were examined. Results from the study indicate that the majority of participants consistently chose the sample with the most harmonics when deciding which sonified trend best represented the visual equivalent. However, this confidence decreased with each harmonic addition to the point of complete uncertainty when choosing between a sample with 3 harmonics and a sample with 4 harmonics.
Keywords
not available
Paper topics
not available
Easychair keyphrases
not available
Paper type
unknown
DOI: 10.5281/zenodo.1422501
Zenodo URL: https://zenodo.org/record/1422501
Abstract
The paper presents an exploratory study about the guitar strumming technique, based on three different accompaniment patterns. The experiments were held with an acoustic nylon guitar equipped with hexaphonic pickups, and focused on different features of this technique. It was possible to note a diversity of strategies for beat control and rhythmic expression employed by different musicians, which may be described by technical, perceptive and expressive factors. The results also display preliminary quantitative boundaries for the distinction between block, strummed and arpeggio chords.
Keywords
not available
Paper topics
not available
Easychair keyphrases
not available
Paper type
unknown
DOI: 10.5281/zenodo.1422685
Zenodo URL: https://zenodo.org/record/1422685
Abstract
It is considered hard to teach programming in secondary education, while following the steps of the provided curriculum. However, when teaching is supported by suitable methodologies, learning can be ameliorated. Under this premise, this paper discusses a different teaching approach to programming in secondary education and examines the potential benefit of sound-alerts as a complementary teaching tool. Such alerts were created by pairing sound stimuli to specific programming actions and operations. Both the selection of sound stimuli as well as the potential impact of the use of sound alerts on programming were evaluated through perceptual studies. Results showed that participants preferred synthesized to natural (pre-recorded) stimuli for all types of alerts. It was also revealed that users prefer sound-alerts associated to pending actions, errors, and successful code execution, over alerts highlighting a step-by-step execution of the code.
Keywords
not available
Paper topics
not available
Easychair keyphrases
not available
Paper type
unknown
DOI: 10.5281/zenodo.1422526
Zenodo URL: https://zenodo.org/record/1422526
Abstract
Mutualizing the body and the instrument offers a different way of thinking about designing embodied musical interfaces. This research presents the design of the BodyHarp, a wearable instrument that combines large bodybased gestures with the fine control of hand-based instruments. This reflects a desire to create a single interface that both captures expressive, dance-like, body movement as well as nuanced gestural interactions. The BodyHarp was not designed as a separate artifact, but rather it was crafted as an augmentation to the human body. This fusion seeks to enhance the sense of intimacy between the player and the instrument and carries a different type of aesthetic — like playing a traditional instrument (the harp) but as part of the body. In other words, the BodyHarp aims to capture creative body movement and placing it in an instrumental context. In doing so, we aim to blur the transition between two gestural mediums (dance and playing an instrument) by mutualizing them — or, in a sense, by designing the interface as a part of the body.
Keywords
not available
Paper topics
not available
Easychair keyphrases
not available
Paper type
unknown
DOI: 10.5281/zenodo.1422667
Zenodo URL: https://zenodo.org/record/1422667
Abstract
Sonificition is a constantly evolving field, with many implementations. There is a scientific need to adopt alternative methods of analysis, especially nowadays when the amount of data and their complexity is growing. Moreover, contemporary music relies often on algorithms, whereas there is an open discussion about the nature of algorithmic music. After Xenakis’ works, algorithmic music has gained great reputation. In the contemporary Avant Garde scene more and more composers use algorithmic structures, getting advantage of the modern powerful computers. In that project we aim to create music that accompanies time-lapse videos. Our purpose is to transform the visual informational content into music structures that enhance the experience and create a more complete audio-visual experience. For our application we use digital video processing techniques. Our concern is to capture the motion in the video and we focus on the arrangement of the dominant colours. We relate the background of the video with a background harmony and the moving items that stand out against the background with a melody. The parameters of the music rhythm and video pace are taken into consideration as well. Finally, we demonstrate a representative implementation, as a case study.
Keywords
not available
Paper topics
not available
Easychair keyphrases
not available
Paper type
unknown
DOI: 10.5281/zenodo.1422516
Zenodo URL: https://zenodo.org/record/1422516
Abstract
Recent developments of web standards, such as WebAudio, WebSockets or WebGL, has permitted new potentialities and developments in the field of interactive music systems. Until now, research and development efforts have principally focused on the exploration and validation of the concepts and on building prototypes. Nevertheless, it remains important to provide stable and powerful development environments for artists or researchers. The present paper aims at proposing foundations to the development of an experimental system, by analysing salient properties of existing computer music systems, and showing how these properties could be transposed to web-based distributed systems. Particularly, we argue that changing our perspective from a Mobile Web to a Web of Thing approach could allow us to tackle recurrent problems of web-based setups. We finally describe a first implementation of the proposed platform and two prototype applications.
Keywords
not available
Paper topics
not available
Easychair keyphrases
not available
Paper type
unknown
DOI: 10.5281/zenodo.1422567
Zenodo URL: https://zenodo.org/record/1422567
Abstract
n the last years handpans have gained more and more popularity and evolved to an idol of the public. They provide the possibility to produce tones and beautiful melodies rather then only transient sounds, like most other easy-touse percussive instruments. Considering the assumption that instruments with very simple interfaces offer a particularly suitable introduction to making music, it follows that handpans could also be used in early musical education. However, their interface itself is still abstract and not very informative about the kind of sounds it produces. For this reason, in this paper we present the concept and first prototype for an augmented digital handpan. In our concept we use Leap Motion to capture strokes on a plexiglass dome and give additional visual information for advices and learning instructions by augmentation with the help of projections on the surface.
Keywords
not available
Paper topics
not available
Easychair keyphrases
not available
Paper type
unknown
DOI: 10.5281/zenodo.1422520
Zenodo URL: https://zenodo.org/record/1422520
Abstract
In this paper we introduce a flexibilization mechanism for audio processing that allows the dynamic control of a tradeoff between computational cost and quality of the output. This mechanism can be used to adapt signal processing algorithms to varying CPU load conditions in critical realtime applications that require uninterrupted delivery of audio blocks at a fixed rate in the presence of processor overload. Flexibilization takes place through the control of parameters in elementary signal processing modules that are combined to form more complex processing chains. We discuss several instances of audio processing tasks that can be flexibilized along with a discussion of their flexibilization parameters and corresponding impact in costs and quality, and propose an implementation framework for plugin development that provides the necessary mechanisms for control of the cost-quality trade-off.
Keywords
not available
Paper topics
not available
Easychair keyphrases
not available
Paper type
unknown
DOI: 10.5281/zenodo.1422683
Zenodo URL: https://zenodo.org/record/1422683
Abstract
Deep learning has yielded promising results in music information retrieval and other domains compared to machine learning algorithms trained on hand-crafted feature representations, but is often limited by the availability of data and vast hyper-parameter space. It is difficult to obtain large amounts of annotated recordings due to prohibitive labelling costs and copyright restrictions. This is especially true when the MIR task is low-level in nature such as instrument recognition and applied to wide ranges of world instruments, causing most MIR techniques to focus on recovering easily verifiable metadata such as genre. We tackle this data availability problem using two techniques: generation of synthetic recordings using MIDI files and synthesizers, and by adding noise and filters to the generated samples for data augmentation purposes. We investigate the application of deep synthetically trained models to two related low-level MIR tasks of frame-level polyphony detection and instrument classification in polyphonic recordings, and empirically show that deep models trained on synthetic recordings augmented with noise can outperform a majority class baseline on a dataset of polyphonic recordings labeled with predominant instruments.
Keywords
not available
Paper topics
not available
Easychair keyphrases
not available
Paper type
unknown
DOI: 10.5281/zenodo.1422587
Zenodo URL: https://zenodo.org/record/1422587
Abstract
Computer music systems normally implement a unit-generator graph as a network of unit generators, through which audio vectors are streamed and processed. This paper proposes an alternative implementation technique for unitgenerator-based sound synthesis, which views a unit-generator graph as a generator of audio-vector trees to be lazily evaluated. The simplest implementation of this technique allows to process sound synthesis and sound-control tasks in different threads even in a synchronous computer music system, making real-time sound synthesis more stable by amortizing the time costs for sound-control tasks over DSP cycles, while maintaining the low roundtrip latency between the audio input and processed output at the same time. We also extend the discussion to the possible extensions of our technique for parallelization, distribution, and speculation in real-time sound synthesis. The investigation into such a novel implementation technique would benefit further research on high-performance real-time sound synthesis.
Keywords
not available
Paper topics
not available
Easychair keyphrases
not available
Paper type
unknown
DOI: 10.5281/zenodo.1422661
Zenodo URL: https://zenodo.org/record/1422661
Abstract
UPISketch is a program for composing sounds with the help of drawing. Its use is simple: we draw sound gestures by defining their melodic contour, like in a classical score. The difference however resides in the fact that pitches are directly drawn, without the need for knowledge of traditional music notation. We use monophonic sounds as sound sources.
Keywords
not available
Paper topics
not available
Easychair keyphrases
not available
Paper type
unknown
DOI: 10.5281/zenodo.1422637
Zenodo URL: https://zenodo.org/record/1422637
Abstract
One of the main challenges of spatial audio rendering in headphones is the crucial work behind the personalization of the so-called head-related transfer functions (HRTFs). HRTFs capture the listener’s acoustic effects allowing a personal perception of immersion in virtual reality context. This paper aims to investigate the possible benefits of personalized HRTFs that were individually selected based on anthropometric data (pinnae shapes). Personalized audio rendering was compared to a generic HRTF and a stereo sound condition. Two studies were performed; the first study consisted of a screening test aiming to evaluate the participants’ localization performance with HRTFs for a non-visible spatialized audio source. The second experiment allowed the participants to freely explore a VR scene with five audiovisual sources for two minutes each, with both HRTF and stereo conditions. A questionnaire with items for spatial audio quality, presence and attention was used for the evaluation. Results indicate that audio rendering methods made no difference on responses to the questionnaire in the two minutes of a free exploration.
Keywords
not available
Paper topics
not available
Easychair keyphrases
not available
Paper type
unknown
DOI: 10.5281/zenodo.1422512
Zenodo URL: https://zenodo.org/record/1422512
Abstract
'You Are Here' is a sound installation that took place on the 24th of March 2017 inside the buffer zone as an attempt to address the temporal indeterminacy that describes the buffer zone during daylight saving time in Cyprus. After a political decision to join different time zones, the island underwent a second division, this time in a temporal sense, which in return raised a question: in what time does the buffer zone itself exist and how does time change while crossing the buffer zone? This article briefly outlines some of the technical, aesthetic and theoretical considerations that the artist underwent while preparing the site-specific and site-responsive sound installation.
Keywords
not available
Paper topics
not available
Easychair keyphrases
not available
Paper type
unknown
DOI: 10.5281/zenodo.1422559
Zenodo URL: https://zenodo.org/record/1422559