Invited Experts and Speakers

Roland Bammer
Stanford University
Giovanna Varni
University of Genoa
Antonio Camurri
University of Genoa
Current research on emotions and on multimodal interfaces at Casa Paganini - InfoMus
as part of EU projects HUMAINE, ENACTIVE, and TAI-CHI

8 August 2007, Wednesday, 14:00
Metin Tevfik Sezgin
University of Cambridge
Dilek H. Tür
International Computer Science Institute
UC Berkeley, USA
Spoken Language Understanding for Conversational Systems

8 August 2007, Wednesday, 16:00
Leonello Tarabella
Research Area of the
National Council of Researches
Gesture touchless live computer music
computerART project of ISTI-C.N.R. music and visual art by computer

1 August 2007, Wednesday, 14:00
Gaël Richard
ENST (Télécom Paris)

1 - An overview of audio indexing

19 July Thursday, 14:00

2 - Transcription and separation of drum signals from polyphonic music

20 July Friday, 14:00


Roland Bammer, PhD
Stanford University, LUCAS MRS/I Center, Dept. of Radiology, School of Medicine, Stanford, CA, USA

Clinical Applications and User Interfaces for DT-MRI Data: Tensor Field Visualization and Interaction

Diffusion tensor imaging (DTI) and its variants provide important diagnostic information about tissue microstructure occult to conventional imaging. DTI leverages on the highly anisotropic proton self-diffusion in white matter fibers and thus the eigenvector orientation of the 2nd order tensor provides an excellent, non-invasive, surrogate for the orientation of these fibers.

Clinically, DTI promises to be of great utility for better understanding the pathophysiology of diffuse white matter abnormalities in a great variety of diseases, such as multiple sclerosis, schizophrenia, dyslexia, autism, traumatic brain injury, etc., but also for focal abnormalities, such as tumors. For the latter, the directional information obtained with DTI can be used to study the involvement of important fiber tracts and therefore can facilitate surgical planning. Visualization of fiber tracts or tractography has also gained attention by the neuroscience community, where it is utilized to associate functional connectivity (via functional fMRI) with anatomical connectivity (via DTI tracking).

However, the multidimensionality of the diffusion tensor itself and the tracking results pose challenges when it comes to presenting these data to the clinician or neuroscientists for interpretation or quantification. Graphical user interfaces and software tools for medical image analysis have so far been focused on 2D or 3D data sets and perception of and interaction with these multi-dimensional data is therefore difficult and still in its infancy.

The objective of this presentation is to provide a general overview of user interfaces for the acquisition and presentation of MRI data, tools for visualizing scalar metrics of the diffusion tensor, and state-of-the-art methods for DTI tractography. This will be followed by a discussion of current concepts to present and interact with tracking data, their respective strengths and weaknesses, and a discussion on how these tools could be tailored to be more efficient for a clinical setting.

Giovanna Varni
InfoMus Lab. Casa Paganini Intl Centre of Excellence, DIST, University of Genoa, Italy

The EyesWeb XMI open platform for multimodal interaction

This seminar presents an overview of the architecture and main technical features of EyesWeb XMI (for eXtended Multimodal Interaction), a hardware and software platform for real-time multimodal processing of multiple data streams. This platform originates from the previous EyesWeb platform and it is the result of a three year work concerning a new conceptual model, design and implementation. Main focus of EyesWeb XMI is on multimodality and cross-modality in order to enable a deeper, natural, and experience-centric approach in human-computer interaction. In this framework, a very crucial target was to improve synchronization and processing of several different data streams. Concrete scenarios and interactive systems based on EyesWeb XMI applications will be shown during the seminar.

Antonio Camurri, PhD
InfoMus Lab. Casa Paganini Intl Centre of Excellence, DIST, University of Genoa, Italy

Research projects on multimodal interfaces and emotion at Casa Paganini – InfoMus

The seminar introduces research at InfoMus Lab in multimodal interfaces for non-verbal expressive communication, for experience-centric, multimedia systems able to interpret the high-level information conveyed by users through their non-verbal expressive gesture, and to establish an effective dialog with users taking into account emotional, affective content. The seminar addresses research issues in the design of multimodal interactive systems such as the following: multimodal analysis, i.e., approaches and techniques for extracting high-level non-verbal information from expressive gesture performed by users, and the interaction strategies that such systems should apply in the dialog process with users; the emergence of novel interface paradigms, e.g. tangible acoustic interfaces; research on emotional interfaces and measurements of emotion in subjects exposed to music stimuli. The seminar will refer to research projects at the InfoMus Lab (www.infomus.org , www.casapaganini.org) based on the EyesWeb XMI open software platform ( www.eyesweb.org).


Metin Tevfik Sezgin, PhD
University of Cambridge, Computer Laboratory, William Gates Building, 15 JJ Thomson Avenue, Cambridge CB3 0FD, UK

Temporal Sketch Recognition and Sketch Based Interfaces

Sketching is a natural mode of interaction used in a variety of settings. For example, people sketch during early design and brainstorming sessions to guide the thought process; when we communicate certain ideas, we use sketching as an additional modality to convey ideas that can not be put in words. The emergence of hardware such as PDAs and Tablet PCs has enabled capturing freehand sketches, enabling the routine use of sketching as an additional human-computer interaction modality.

But despite the availability of pen based information capture hardware, relatively little effort has been put into developing software capable of understanding and reasoning about sketches. To date, most approaches to sketch recognition have treated sketches as images (i.e., static finished products) and have applied vision algorithms for recognition. However, unlike images, sketches are produced incrementally and interactively; one stroke at a time and their processing should take advantage of this.

In this talk, I will describe ways of doing sketch recognition by extracting as much information as possible from temporal patterns that appear during sketching. I will present a sketch recognition framework based on hierarchical statistical models of temporal patterns. I will show that in certain domains, stroke orderings used in the course of drawing individual objects contain temporal patterns that can aid recognition. I will also briefly summarize some of the current work on sketch-based interfaces at the University of Cambridge Computer Laboratory.


Dilek Hakkani-Tür, PhD
International Computer Science Institute, UC Berkeley, Berkeley, CA, USA

Spoken Language Understanding in Conversational Systems

Understanding language is about extracting the "meaning" from natural language input. One of the biggest challenges of spoken language understanding is the characteristics of naturally spoken language, which varies greatly orthographically and incorporates prosody and syntax. The same meaning can be expressed in many different surface forms and also the same surface form can express many different meanings. Another challenge for spoken language understanding is robustness to noise in the input resulting from the errors in the speech recognizer output and the disfluencies in spontaneously spoken language. Furthermore, one has to deal with the lack of typographic cues such as paragraphs and punctuation in the speech recognizer output.

In this talk, I will mainly summarize the previous work attacking these challenges using data driven approaches. I will briefly present related work on domain-dependent and independent meaning representations and then describe the state-of-the-art for some of the popular language understanding tasks.

Leonello Tarabella, PhD
Research Area of the National Council of Researches via Moruzzi, 1 - 56124 Pisa, Italy

Gesture touchless live computer music
computerART project of ISTI-C.N.R. music and visual art by computer

I here propose the practical results of my research in interactive/improvised electro-acoustic music after having developed both hardware and software tools. The research in the whole finds roots in my active experience in jazz music. My proposal emphasizes the importance of expressiveness and feeling in live computer music performance. Two different original gesture recognition devices and systems, or hyper-instruments, are described (PalmDriver and Handel) together with the "pCM" real-time music language based on C-language, for sound synthesis and events management.

1) The PalmDriver hyper-instrument is an electronic device based on IR technology which consists of 2 sets of 8 that measure the distance of the different zones of the hands’ palm. The PalmDriver is stable and responsive. As a consequence, sounds generated by the computer evoke on the performer the sensation of “touching and modelling the sound”.

2) Image processing technology has been used for realizing Handel System hyper-instrument: a CCD camera is connected to a video grabber card. The digital image is then analyzed as the reconstructed image consisting of those pixels whose luminance is greater than a predefined threshold and color. On the basis of Handel, the Imaginary Piano has been realized. Information is used for controlling an algorithmic compositions rather than for playing scored music.
3) For composing and for performing interactive computer music using the hyper-instruments, I realized a framework based on pure C programming, that is pure-C-Music or pCM. This programming framework gives the possibility to write a piece of music in terms of synthesis algorithms, score and management of data streaming from external interfaces. The composition itself is a C program, which mainly consists of the Score and Orchestra parts. The Object Oriented paradigm is mainly used for defining instruments in terms of class declaration then instanced as many times as wanted. Everything is compiled into machine code that runs at CPU speed.

- I propose the presentation of the above mentioned hyper-instruments and the pCM language.
- I also propose a live performance using my tools and systems.


Gaël Richard, PhD
ENST (Télécom Paris), 75014 Paris, FRANCE

Talk 1 Title: An overview of audio indexing

Summary: The enormous amount of unstructured digital audio (and more generally multimedia) data available nowadays and the spread of its use as a data source in many applications are introducing new challenges to researchers in information and signal processing. The need for content-based audio data indexing and retrieval techniques to make the audio information more readily available to the user is becoming ever more critical. The purpose of this talk is to provide an overview of some approaches of audio indexing with a focus on music signal processing

Talk 2 Title: Transcription and separation of drum signals from polyphonic music.

Summary: The purpose of this talk is to present current research directions in audio indexing conducted at GET-ENST. After a brief introduction of our subspace based signal analysis framework, several aspects of audio indexing such as feature selection or Harmonic/noise decomposition will be illustrated in the context of drum signal processing (drum signal separation and transcription). The talk will be concluded with a demonstration of audio post-remixing (with enhanced or reduced drum track) and if time remains with a demonstration of a drum loop retrieval system from vocal queries. The content of this talk will be largely based on a paper recently accepted for publication [1].

[1] O. Gillet and G. Richard. Transcription and separation of drum signals from polyphonic music. Accepted for publication In IEEE Transactions on Audio, Speech, and Language Processing, Special Issue on Music Information Retrieval, 2007.


eNTERFACE 2007 Web Site Similar | Cordis | Eurasip | Isca | Bogazi├ži University | CmpE | EE