HOME
CALL FOR PARTICIPATION
GRANT OPPORTUNITIES
PROJECTS & TEAMS
WORKSHOP
OUTPUTS
TRAVEL & LIVING
ORGANIZATION
GENERAL INFO
FP6 PROJECTS
Projects & Teams

eNTERFACE'07 Projects
1
Multi-Approach DT-MRI Data Analysis & Visualization Platform
Description   Team   E-mail   Schedule   Expand   Collapse  

Project Description :

Objective

This project aims at developing a freeware (not open-source) DT-MRI Analysis and Visualization platform that would serve as a common platform for future collaboration of the participating research groups and dissemination of their research outputs to a wider medical audience. The development environment will be VAVframe, a software framework under development Bogazici University, Turkey. VAVframe is a ITK/VTK based C++ framework for Linux. It will support distributed computing as well. The framework allows the users to implement their research algorithms (analysis/visualization/interaction) as well as use ITK/VTK functions.

Background

The human brain mapping refers to understanding the functional and the physiological structure of human brain. This project is concentrated on the physiological structure as revealed by DT-MRI imaging technique. DT-MRI is a relatively new technique with increasing importance, esp in understanding neurodegenerative diseases. The engineering challenge is to reconstruct the connection network.

There are two basic approaches in utilizing the information DT-MRI provides: Fiber Tractography and Connectivity Mapping. The former approach relies on the principal diffusion direction and attempts to reconstruct the fiber that passes through a given point. The basic tool used for tractography is numerical integration of the principal diffusion direction (the major eigenvector of the diffusion tensor) among which the most popular method is the 4th order Runge-Kutta [1]. Fiber tractography is prone to cumulative errors, can not overcome the partial volume effect and disregards part of the information embedded in the diffusion tensor (which itself is an approximation based on gaussianity assumption). The latter approach attempts to utilize the true nature of the DT-MRI data, i.e. the gaussian diffusion process, by estimating a connectivity map. They consider each and every possible connection with weights set by the dataset. Several approaches in this group are based on some sort of Monte-Carlo simulations of the random walk model [2,3,4]. Lenglet et al., on the other hand, recasted the connectivity problem to Riemannian differential geometry framework where they defined their local metric tensor using the DTI data and solved for geodesics [5]. Probably, the most important point that differentiates these two approaches is their behaviour at problematic regions such as crossing and kissing fibers. The tractography methods either pretends to follow a single fiber by choosing a direction to proceed or stops tracking, whereas the connectivity mapping based methods allow for branching. Although branching is not correct anatomically, presenting the DT-MRI data in this way is more loyal to the nature of the acquired data (localized gaussian maps of diffusing particles) and allows the users to interpret is. Thus, we can say that connectivity mapping is a more direct way of communicating the information embedded in DT-MRI data. However, it is not trivial to interpret connectivity maps.

Technical Description

A basic DTI analysis and visualization application developed under VAVframe will be provided by VAVlab (www.vavlab.ee.boun.edu.tr), Bogazici University. Consequently, the principal C++ classes will have been implemented. The participants will initially be provided with a tutorial/introduction on this package, the coding conventions and the documentation procedures that must be followed. The infrastructure is based on C++ classes implemented under Linux (Fedore Core 6) with ITK and VTK libraries. The participants will be either asked to study a certain algorithm assigned to them and implement it, or they may prefer to implement an algorithm of their choice (such as their own research results). The details will be set once the participating labs are known. The deliverable will be a DTI application usable in a clinical environment.

Workpackages

WP1: (now – June 2007) Implementation of the basic DTI application at VAVlab.
WP2: (Weeks 1-2) Implementation of SIMILAR Tensor Standard based I/O functions
WP3: (Weeks 1-2) Implementation of tensor visualization routines
WP4: (Weeks 1-2) Implementation of analysis routines (Tractography / Connectivity / Tensor Registration). Details will be set after the participating labs are known.
WP5: (Week 3) Integration
WP6: (Week 4) Documentation and Demo
Preferred skills: C++ programming under Linux, familiarity with VTK and ITK, Signal Processing / Computer Graphics background

References

[1]C.R. Tench, P.S. Morgan, M. Wilson, and L.D. Blumhardt, “White matter mapping using diffusion tensor mri,” Magnetic Resonance in Medicine, vol. 47, pp. 967–972, 2002.
[2] M.A. Koch, D.G. Norris, and M. Hund-Georgiadis, “An investigation of functional and anatomical connectivity using magnetic resonance imaging,” Neuroimage, vol. 16, pp. 241–250, 2002.
[3] P. Hagmann, J.P. Thiran, P. Vandergheynst, S. Clarke, and R. Meuli, “Statistical fiber tracking on dt-mri data as a potential tool for morphological brain studies,” ISMRM Workshop on Diffusion MRI : Biophysical Issues, 2000.
[4] M.K. Chung, M. Lazar, A.L. Alexander, Y. Lu, and R. Davidson, “Probabilistic connectivity measure in diffusion tensor imaging via anisotropic kernel smoothing,” Tech. Rep. 1081, University of Wisconsin, 2003.
[5] C. Lenglet, R. Deriche, and O. Faugeras, “Diffusion tensor magnetic resonance imaging : Brain connectivity mapping,” Tech. Rep. 4983, INRIA, France, 2003.

Burak Acar
acarbuboun.edu.tr
LEADER/SENIOR
Professor
BUMM / VAVlab, Bogaziçi University
Roland Bammer
rbammerstanford.edu
LEADER/SENIOR
Professor
Stanford University
Marcos Martin Fernandez
marcmatel.uva.es
LEADER/SENIOR
Professor
University of Valladolid, Spain
Suzan Uskudarli
suzan.uskudarliboun.edu.tr
LEADER/SENIOR
Professor
BUMM / VAVlab, Bogaziçi University
Ali Vahit Sahiner
alivahit.sahinerboun.edu.tr
LEADER/SENIOR
Professor
BUMM / VAVlab, Bogaziçi University
Deniz Diktas BUMM, Bogaziçi University
Sila Girgin
silagirgingmail.com
MS Student BUMM / VAVlab, Bogaziçi University
Murat Aksoy
maksoystanford.edu
PhD Student Stanford University
DIA Ousmane Amadou
ousamdiagmail.com
MS Student Ecole Superieure Polytechnique de Dakar
Ioannis Marras
imarrasaiia.csd.auth.gr
PhD Student Artificial Intelligence & Information Analysis lab, Department of Informatics, Aristotle University
Luis Miguel San Jose
lsanjosetel.uva.es
Professor ETSI Telecomunicación, Valladolid, SPAIN
Emma Munoz-Moreno
emunmorlpi.tel.uva.es
PhD Student ETSI Telecomunicación, Valladolid, SPAIN
Susana Merino Caviedes
smercavlpi.tel.uva.es
PhD Student ETSI Telecomunicación, Valladolid, SPAIN
Miguel Angel Martin Fernandez
migmartel.uva.es
PhD Student ETSI Telecomunicación, Valladolid, SPAIN
Guldem Kucuk
guldemkistanbul.edu.tr
MS Student Istanbul University
Neslihan Avcu
avcuneslehanyahoo.com
MS Student Dokuz Eylül University
Erkin Tekeli
erkin.tekeliboun.edu.tr
PhD Student BUMM, Vavlab, Bogaziçi University

E-mail list address: enterfacedti@googlegroups.com

(You may use enterface07alllisteci.cmpe.boun.edu.tr to send e-mail to all participants. If you have any problems, please contact arman.savranboun.edu.tr)

2
Advanced Multimodal Interfaces for Flexible Communications
Description   Team   E-mail   Schedule   Expand   Collapse  

Project Description :

Utilizing any of the most common IP multi-communication clients available, we would like to implement the interface of an asymmetric communication channel. The students will design and integrate the blocks that will permit automatic media translation. This media translation will allow users to communicate using the best media option for them even if it is not the best option for their interlocutor (the translator will adapt the content accordingly).

Goal

This project aims at developing a prototype that will exemplify how multimodal services will enhance the way we will communicate in the future. Utilizing one of the most common communication IP clients (Skype) as a platform, students will develop an application whose interface will be plastic in as many modalities as possible. The designed plasticity will not only enable adapting the application to a PC and a PDA at the same time, but also to the user current environment. Students will develop and/or integrate media translators or adapters available during communication. These media adapters will allow communications to be asymmetric. Asymmetric communication will allow any user to choose how to communicate depending on his own status (e.g. can/can’t speak – video available/unavailable) regardless of the other interlocutor’s choice. Following the Next Generation Network standards and architecture, students will implement the media adapters as services. They will develop a client application that will combine those services and permit offering the best communication option to all users.

Workpackages
  1. Study and design of the interface: choice of graphical adaptation, modality translations, etc.
  2. Development and integration of the text-to-speech adaptation and the speech-to-text adaptation.
  3. Integration of a language translator to be introduced in the text and speech adaptation service
  4. Development and integration of a video-speech adaptation and speech-video adaptation (avatar).
  5. Programming of the prototype for PC and PDA to simulate how communication would work.
  6. User tests
  7. Report
Deliverables
  1. Prototype
  2. Report
  3. Document with the design of the multimodal adaptation and how the services should be integrated in a communications network.
Background
  • Skype developer zone web site. | www
  • TISPAN (ETSI) in charge of developing the standards for NGN | www
    NGN RELEASE 1. ETSI TR 180 001
    NGN generic capabilities and their use to develop services. ETSI TR 181 004
  • 3GPP, in charge of developing of the IMS architecture standardization | www
    IMS release 6
  • IETF, in order to define most of the actual used protocols for communications over IP. | www
Ana C. Andrés
ana.c.andresdelvalleaccenture.com
LEADER/SENIOR
Researcher
Accenture Technology Labs
Allasia Jérôme
jerome.allasiairisa.fr
Researcher IRISA
Ionut Petre
ipetreici.ro
Researcher Research Institute for Informatics ICI Bucharest
Saeed Usman
saeedeurecom.fr
PhD Student Institute Eurecom
Nicolau Dragos
dragosici.ro
Researcher National Research and Development Institute for Informatics-Bucharest, Romania
Dragos Catalin Barbu
dbarbuici.ro
MS Student Research Institute for Informatics ICI Bucharest
Radut Valentin
vradutici.ro
Researcher Research Institute for Informatics, Bucharest, Romania
Jerome Urbain
jerome.urbainfpms.ac.be
PhD Student Belgium

E-mail list address: enterface07p2listeci.cmpe.boun.edu.tr

(You may use enterface07alllisteci.cmpe.boun.edu.tr to send e-mail to all participants. If you have any problems, please contact arman.savranboun.edu.tr)

3
A Multimodal Framework for the Communication of Disabled
Description   Team   E-mail   Schedule   Expand   Collapse  

Project Description :

Objective

This project aims to build a multimodal framework that combines visual, aural and haptic interaction with gesture-speech-text recognition, speech synthesis and sign language recognition and synthesis, in order to enable the communication of people exhibiting different kinds of disabilities.
This project will use tools from the Project 2 and Project 3 of the eNTERFACE 2006 workshop. Additionally, the project aims at constructing a cross-modal transformation framework, which will be able to combine all the modalities from an individual, perform recognition of the transmitted message, and translate it into another form that is perceivable by the receiver. The project will focus on exploiting the correlation between modalities in order to enhance the perceivable information by an impaired individual who cannot perceive all incoming modalities. A collaborative VR game and a multimodal video of news for the hearing impaired will be used as two application environments.

Background

This project is based on Project 2 and Project 3 of eNTERFACE’06. In project 2: “Multimodal tools and interfaces for the intercommunication between visually impaired and “deaf and mute” people”, the system provided alternative tools and interfaces to blind and deaf-and-mute persons so as to enable their intercommunication as well as their interaction with the computer. The proposed application integrates haptics, audio, visual output as well as computer vision, sign language analysis and synthesis, speech recognition and synthesis, in order to provide an interactive environment where the blind and deaf and mute users can collaborate. In Project 3: Sign Language Tutoring Tool”, a tutoring tool is developed for interactive sign language education. The users can watch pre-recorded sign videos and practice the signs and receive automatic feedback about the quality of their performance. The application integrates manual and non-manual sign language recognition, sign synthesis in an interactive and educative environment for deaf and mute.

The proposed project will develop/improve tools for multimodal communication: continuous sign/speech segmentation and recognition, sliding text recognition, cued speech recognition, and speech and sign synthesis. These tools will then be used to develop a modality replacement framework. The basic idea is that a modality, which would not be perceived due to a specific disability, can be employed to improve the information that is conveyed in the perceivable modalities and increase the accuracy rates of recognition. The correlations between modalities will be explored and the framework will be integrated in two environments:

  1. A treasure hunting game application that is jointly played by the blind and deaf-and-mute user by developing a modality replacement framework for the unconstrained communication between blind and deaf-mute people and by modeling the virtual environment using “smart” objects that will also include information about their possible interaction mechanisms with the users.
  2. Speech and text aided sign segmentation on Broadcast News videos for the hearing impaired. In news for the hearing impaired, the speaker also signs with the hands as she talks. On top of this, there is also corresponding text superimposed on the video. The aim is to use modalities with less noise (speech and/or text) to segment/detect the modalities with noisy signals (sign). The aim is to segment and annotate the signs in the videos via the help of either the speech or both the speech and the text and to generate segmented, and annotated sign videos to be used in the Sign Language Tutor application. The annotated sign data that will be collected in this project will be integrated to Sign Language Tutoring tool and will provide a huge amount of training signs for the users.

Technical Description

The goals of the project are the following:

  1. To study the modalities and their characteristics better perceived by the disabled users.
  2. To build and tune an information-theoretic framework on cross modal transformations especially for the intercommunication between blind and “deaf and mute” people.
  3. To develop efficient mechanism for multimodal replacement through the communication channels of the terminals and the collaborative virtual environment.

Speech: For processing speech modality, automatic speech recognition techniques will be used for speech to text conversion. For the news videos, since the noise is high and the vocabulary is large, techniques that increase the utterance retrieval rate [1] must be used. This step will also provide the start and end frames of each spoken word.

Sliding text: In addition to speech modality, the sliding text will be processed with OCR techniques. The information extracted from this modality is expected to be the same with the speech modality. Thus, the results can be corrected by using both modalities to provide accurate information for segmentation and annotation.

Sign: By using the information extracted from the speech and sliding text modalities, the signs will be segmented and annotated. For this purpose, the segmentation of spoken word can aid sign segmentation and that sign can be annotated by the spoken word. The annotated signs will form a new sign database after consistency and clustering analysis.

Cued Speech: Cued speech [2] is a specific gestural language (different from the sign language) used for communication between hearing impaired people and other people and consists of a combination of lip shapes and gestures. Thus, the transmitted message is contained into three modalities: audio, lip shapes, and hand shapes. The fact that hand shapes are made near the face and also that the exact number and orientation of fingers has to be determined in order to deduce the correct gesture differentiate Cued Speech from sign language.

Coupled Hidden Markov Models (CHMM) [3] will be employed to model the inter-dependencies and the asynchronous nature of different modalities.

Given the strict demands for real-time processing, the project will be totally developed in C++. However, for the feature extraction process other programming environments (e.g. Matlab) may be also considered as a reference point.

The target deliverables of the project are

  • A modality replacement framework that will be integrated in the treasure hunting game and hearing impaired news videos so as to allow the unconstrained (up to a certain degree) communication of blind and deaf-mute people.
  • A new sign database formed with the segmented and annotated signs from the news recordings.

Workpackages

WP1: Pre-workshop preparations: Collection of news videos, preliminary discussion on the architecture, software tools, etc.
WP2: Design of the architecture of the collaborative environment, the terminals and the modalities used in each terminal: speech, sign, text, cued speech
WP3: Definition and tuning of the information-theoretic framework on cross modal transformations
WP4: Integration and synchronization of the interfaces into the collaborative virtual environment.
WP5: Integration of the framework with the treasure hunting game:

  • Development of the gesture-based interface for the terminal that the “deaf and mute” persons will use and the speech and haptics-based interface for the terminal that the visually impaired persons will use.
  • Extension of the game-like application so as to employ all novel technologies.

WP6: Integration of the framework with the Hearing Impaired news videos

  • Sign segmentation and alignment to provide annotated sign data
  • Unsupervised consistency checking and clustering of sign data
  • Extension of Sign Language Tutor with the new data

Participant requirements

  • Programming experience (preferably C/C++)
  • Multimodal signal processing

References

[1] Murat Saraclar and Brian Roark. Utterance classification with discriminative language modeling. Speech Communication, 48(3-4):276-287, March-April 2006.
[2] P. Duchnowski, D. Lum, J. Krause, M. Sexton, M. Bratakos, and L. Braida, “Development of Speechreading Supplements Based on Automatic Speech Recognition,” IEEE Trans. on Biomedical Engineering, vol. 47, no. 4, pp. 487–496, 2000.
[3] T. Kristjansson, B. Frey, and T. Huang, “Event-coupled hidden Markov models,” IEEE International Conference on Multimedia and Expo, ICME2000, vol. 1, 2000.

Dimitrios Tzovaras
Dimitrios.Tzovarasiti.gr
LEADER/SENIOR
Professor
Telematics Institute, Centre for Research and Technology Thessaloniki, Greece
Lale Akarun
akarunboun.edu.tr
LEADER/SENIOR
Professor
BUMM, Bogaziçi University
Murat Saraçlar
murat.saraclarboun.edu.tr
LEADER/SENIOR
Professor
BUMM, Bogaziçi University
Giovanna Varni
giovannainfomus.dist.unige.it
PhD Student Universite di Genova-DIST-InfoMus Lab
Siddika Parlak
siddika.parlakgmail.com
MS Student Department of Electrical-Electronic Engineering, Bogazici University
Konstantinos Moustakas
moustakiti.gr
PhD Student Informatics and Telematics Institute / Centre for Research and Technology Hellas
Byungjun Kwon
byungjungmail.com
MS Student Koninklijke Conservatorium
Alexey Karpov
karpoviias.spb.su
Researcher St. Petersburg Institute for Informatics and Automation
Deniz Kahramaner
dennizkgmail.com
Programmer Robert College
Marek Hruz
mhruzkky.zcu.cz
PhD Student Department of Cybernetics, University of West Bohemia in Pilsen, Czech Republic
Pavel Campr
camprkky.zcu.cz
PhD Student Department of Cybernetics, University of West Bohemia in Pilsen, Czech Republic
Savvas Argyropoulos
savvasiti.gr
PhD Student Informatics and Telematics Institute
Erinç Dikici
erincdikiciyahoo.com
MS Student Bogazici University, Department of Electrical-Electronic Engineering
Ismail Ari
ismailarboun.edu.tr
MS Student Bogazici University, Computer Engineering Department
Oya Aran
aranoyaboun.edu.tr
PhD Student Bogazici University
Harun Karabalkan
karabalkansu.sabanciuniv.edu
MS Student Vision and Pattern Analysis Laboratory, Sabanci University

E-mail list address: enterface07p3listeci.cmpe.boun.edu.tr

(You may use enterface07alllisteci.cmpe.boun.edu.tr to send e-mail to all participants. If you have any problems, please contact arman.savranboun.edu.tr)

4
Multimodal Speaker Identity Conversion
Description   Team   E-mail   Schedule   Expand   Collapse  

Project Description :

Objective

The goal of this project is to perform high quality identity transformation on audiovisual recordings from a source speaker A into another one, the target speaker B. Transformation is achieved using both voice conversion technique and video morphing (or avatar controlling). The result will be a set of audiovisual files of the target speaker B speaking with his/her own voice and acting like A does. This project follows the project 4 of eNTERFACE’06 Multimodal Character Morphing [1]. Conclusions of last year project highlighted various ideas to improve the quality of the multimodal conversion.

Background

Voice conversion

The audio part of this project is the well-known problem of voice conversion. From a complete corpus of a voice B and a limited corpus of a voice A, we want to drive B’s speech using A’s voice. Numerous works have been developed in the last few years on this subject. The main part of them is based on the Gaussian Mixture Model [4 - 8] as introduced by Stylianou et al. in [2]. Although those algorithms permit to obtain voices very similar to the target, there is still a lack in audio quality. One important technological constraint is related to the availability of parallel data for the source and target speakers. A recent trend is to use a unit selection framework for voice conversion, in which case no parallel data is required [14].

The voice conversion system used in the previous eNTERFACE workshop is built on a GMM-based mapping function between source and target spectral envelopes followed by a frame selection algorithm to produce final spectral envelopes and LP Analysis/Synthesis [1,15].

Video morphing

The video part of this project is exactly the same problem but with a 2D/3D dataset instead of the 1D speech content. Starting from video samples (face and shoulders) of A (minutes, hours ?) and some pictures of B (or a few video samples ?), we want to get the same video but with B acting instead of A. A 3D model of speaker B will be created [11] and animated following the face movements of speaker A [10,12,13]. In project 4 of eNTERFACE’06, we just considered animation of an already built avatar, without having to build an avatar corresponding to the target speaker face.

Technical Description

During the eNTERFACE’06 workshop, a dedicated audiovisual database has been built (eNTERFACE06_arctic database [16]). In addition, recordings will be necessary if real video morphing is performed (at least pictures of the target speaker are necessary to build his/her 3D model).

One part of the team will develop the voice conversion software, starting from last eNTERFACE workshop system and/or from other established techniques. Many clues to improve the system built last year are given in [1] and [15] (F0 mapping, weighted euclidean distance or other kind of distance to compute target and concatenation costs in the frame selection part, OLA method with the problem of phase discontinuities, pitch synchronous methods). Among other things, the lack of audio quality in voice conversion is due to the problem of correct separation of the source and the vocal tract in speech. Some trials have been reported in the literature to process separately the vocal tract and the glottis flow (e.g. [3]). Another approach of source/filter separation for speech has been presented in [9]. Such ideas could be incorporated in this project.

The second part of the team will work on the animation of 3D faces. We will develop models of the lip movements and facial animations that will be coherent with the speech pronounced and the emotion expressed. These models will be trained using the database and should allow us to animate any 3D model of the selected participants.
Both teams will have to work together, particularly for the synchronization of speech/lips/expressiveness, to insure that the results can be merged correctly to get the converted audiovisual files. More collaboration could be necessary if the use of multimodal features is concerned.

The project should result in a set of audiovisual files with a target speaker (face and voice) saying what the source speaker says and moving like him/her.

Equipment and software needed: whiteboard, large room with network, Matlab, C,C++ compilers, eNTERFACE’06 multimodal character morphing software and database

Workpackages

  1. Pre-workshop preparation *
  2. Voice analysis/resynthesis (using one voice database, extracting 10 sentences and re-synthesizing these 10 sentences using the remaining database)**
  3. Face analysis/resynthesis (using one person model, using 10 video sequence and re-synthesizing these 10 video sequences) ***
  4. Voice conversion (requires database from another speaker) **
  5. Face movements conversion (requires database from another speaker) ***
  6. Multimodal conversion * (*) all the team, (**) 3 persons in speech synthesis, voice conversion, speech analysis, (***) 2 persons in 3D face modelling, animation and gesture analysis

References

[1] Dutoit, T., Holzapfel, A., Jottrand, M., Marqués, F., Moinet, A., Ofli, F., Stylianou, Y., “Multimodal Speaker Conversion — his master’s voice. . . and face —“, eNTERFACE workshop 2006
[2] Stylianou, Y., Cappe, O. and Moulines, E., "Continuous probabilistic transform for voice conversion", IEEE Trans. Speech & Audio processing, vol. 6,pp 131-142, 1998
[3] Suenderman , D., Bonafonte, A., Ney, H., Hoege, H., "A Study on Residual Prediction Techniques for Voice Conversion", ICASSP 2005
[4] Ye, H., Young, S., "Perceptually Weighted Linear Transformation for Voice Conversion", Eurospeech 2003
[5] Ye, H., Young, S., "High Quality Voice Morphing", ICASSP 2004
[6] Ye, H., Young, S., "Voice conversion for unknown speakers", ICSLP 2004
[7] Chen, Y., Chu, M., Chang, E., Liu, J., Liu, R., "Voice Conversion with Smoothed GMM and MAP Adaptation", Eurospeech 2003
[8] Qin, L., Chen, G., Ling, Z., Dai, L., "An Improved Spectral and Prosodic Transformation Method in STRAIGHT-based Voice Conversion", ICASSP 2005
[9] Bozkurt, B., Doval, B., d`Alessandro, C., Dutoit, T., "Zeros of Z-Transform (ZZT) decomposition of speech for source-tract separation", ICSLP 2004
[10] Ezzat, T., Geiger, G., Poggio, T., "Trainable Videorealistic Speech Animation", Proc. of SIGGRAPH 2002
[11] Blanz, V., Vetter, T., "A Morphable Model for the Synthesis of 3D Faces", SIGGRAPH 99
[12] Noh, J.Y., Neumann, U., "Expression cloning", SIGGRAPH 2001
[13] Pyun, H., Kim, Y., Chae, W., Woo Kang, H., Yong Shin, S., "An example-based approach for facial expression cloning", SIGGRAPH 2003
[14] Sünderman, D., Bonafonte, A., Ney, H., Höge, H., “A first step towards text-independent voice conversion”. ICSLP 2004
[15] Dutoit, T., Holzapfel, A., Jottrand, M., Moinet, A., Pérez, J., Stylianou, Y., « Towards a voice conversion system based on frame selection » to be published in proceedings of ICASSP 2007

Thierry Dutoit
thierry.dutoitfpms.ac.be
LEADER/SENIOR
Professor
Faculte Polytechnique de Mons (FPMs), BELGIUM
Yannis Stylianou
stylianoics.forth.gr
LEADER/SENIOR
Professor
University of Crete, Heraklion
Ferran Marqués
ferrangps.tsc.upc.edu
LEADER/SENIOR
Professor
Universitat Politècnica de Catalunya, Spain
Igor Pandzic
igor.pandzicfer.hr
LEADER/SENIOR
Professor
Department of Telecommunications, University of Zagreb, Croatia
Murat Saraçlar
murat.saraclarboun.edu.tr
LEADER/SENIOR
Professor
BUMM, Bogaziçi University
Maria Markaki
mmarkakicsd.uoc.gr
PhD Student University of Crete
Kristina Stankovic
kristina.stankovicfer.hr
BS Student Faculty of Electrical Engineering and Computing, University of Zagreb, Croatia
Jottrand Matthieu
matthieu.jottrandfpms.ac.be
PhD Student Faculté Polytechnique de Mons (FPMs)
Thanasis Krontiris
TDK.krontirgmail.com
BS Student COMPUTER SCIENCE DEPARTMENT , UNIVERSITY OF CRETE
Maria Astrinaki
Astrinaki.Mariagmail.com
BS Student Computer Sience Department, University Of Crete
Zara Aurélie
aurelie.zaraorange-ftgroup.com
PhD Student France Télécom R&D/ LIMSI-CNRS
Elias Apostolopoulos
ilapostcsd.uoc.gr
BS Student UNIVERSITY OF CRETE, COMPUTER SCIENCE DEPARTMENT
Zeynep Inanoglu
zeynepgatesscholar.org
PhD Student University of Cambridge

E-mail list address: enterface07p4listeci.cmpe.boun.edu.tr

(You may use enterface07alllisteci.cmpe.boun.edu.tr to send e-mail to all participants. If you have any problems, please contact arman.savranboun.edu.tr)

5
Audio-Driven Human Body Motion Analysis and Synthesis
Description   Team   E-mail   Schedule   Expand   Collapse  

Project Description :

Objective

This project is on multicamera audio-driven human body motion analysis towards automatic and realistic audio-driven avatar synthesis. We plan to address this problem in the context of a dance performance, where the gestures or the movements of a human actor are mainly driven by a musical piece. We will analyze the relations between the audio (music) and the body movements on a training video sequence acquired during the performance of a dancer. The joint analysis will provide us with a correlation model that can be used to animate a dancing avatar when driven with any musical piece of the same genre.

Background

There exists almost no prior research work reported on the problem of audio-driven human body motion analysis and synthesis. The most relevant literature is on speechdriven lip animation [1]. Since lip movement is physiologically tightly coupled with acoustic speech, it is relatively an easy task to find a mapping between the phonemes of speech and the visemes of lip movement. Many schemes exist to find such audio-tovisual mappings among which the HMM (Hidden Markov Model)-based techniques are the most common as they yield smooth animations exploiting temporal dynamics of speech. Some of these works also incorporate synthesis of facial expressions along with the lip movements to make animated faces look more natural [2-4]. There are several challenges involved in audio-driven human body motion analysis and synthesis: First, there does not exist a well-established set of elementary audio and motion patterns, unlike phonemes and visemes in speech articulation. Second, body motion patterns (e.g. dance figures) are person dependent and open to interpretation, and may exhibit variations in time even for the same person. Third, audio and body motion are not physiologically coupled and the synchronicity in between may exhibit variations. Moreover, motion patterns may span time intervals of different length with respect to its audio counterparts. A very recent work [5] addresses the challenges similar to those mentioned above in the context of prosody-driven head gesture synthesis, using a multi-stream parallel HMM structure to find the jointly recurring gesture-prosody patterns and the corresponding audio-to-visual mapping. We think that the framework proposed in this work can also be applied to our problem.

Technical Description

The whole analysis-synthesis system will consist of four main tasks as explained briefly in the sequel:
Body motion capture and feature extraction: This involves automated capture of body motion from multiview video recorded by a multicamera system (available in MVGL Lab at Koç University). The motion capture process will be based on 3D tracking of the markers attached to the person’s body in the scene. We will fit a generic 3D skeleton model to detect and track markers. We will make use of the multistereo correspondence information from multiple cameras to obtain 3D positions of the markers. This task will provide us with a set of features of 3D point locations over time that expresses the alignment of the markers in 3D world. All the executables related to the body motion capture will be developed on C/C++.

Audio feature extraction: An appropriate set of features will be extracted from the audio signal that is synchronized with the body motion parameters. The mel frequency cepstral coefficients (MFCC) along with additional prosodic features can be considered as audio features. Audio feature extraction will be performed using the well known HTK Tool.

Multimodal analysis: The feature sets resulting from body motion and audio will jointly be analyzed to model the correlation between audio patterns and body motion patterns. For this purpose, we plan to use a two-step HMM-based unsupervised analysis framework as proposed in [5]. At the first step, the audio and motion features will separately be analyzed by a parallel HMM structure to learn and model the elementary patterns for a particular performer. A multi-stream parallel HMM structure will then be employed to find the jointly recurring audio-motion patterns and the corresponding audio-to-visual mapping. All the simulations at this second step will be implemented by using the HTK Toolkit.

Synthesis and animation: The body motion synthesis system will take an audio signal as an input and produce a sequence of body motion features, which are correlated with the input audio. The synthesis will be based on the HMM-based audio-body motion correlation model derived from the multimodal analysis. The synthesized body motion will then be animated on an avatar.

Workpackages

  • Acquisition of the calibrated multicamera audiovisual data (prior to workshop)
  • Tracking and capturing skeleton-based body motion parameters (week 1)
  • Extraction of body motion features (week 1)
  • Extraction of audio features (week 1)
  • Individual analysis and temporal clustering of recurrent body motion and audio patterns (week 2)
  • Joint audio-body motion analysis and correlation modeling of concurrent audio-body motion patterns (week 3)
  • Audio-driven body motion synthesis and animation (weeks 3 and 4)

Benefits of the research

In this research work, we will first develop an automated human body motion capture system based solely on image processing and computer vision tools using standard digital video cameras. Second we will provide a framework for joint analysis of loosely correlated modalities such as motion and audio and demonstrate how this framework can be used for audio-driven motion synthesis.

Deliverables

  • A multicamera motion capture system software
  • Executables/Scripts for multimodal analysis
  • Report and demonstration

References

[1] T. Chen, “Audiovisual speech processing,” IEEE Signal Processing Mag., Vol. 18, pp. 9–21, 2001.
[2] C. Bregler, M. Covell, and M. Slaney, “Video rewrite: Driving visual speech with audio,” Proc. ACM SIGGRAPH ’97, pp. 353–360, 1997.
[3] M. Brand, “Voice puppetry,” Proc. of the 26th annual conference on Computer graphics and interactive techniques, pp. 21–28, 1999.
[4] Y. Li and H.-Y. Shum, “Learning dynamic audio-visual mapping with input output hidden markov models,” IEEE Trans. on Multimedia, vol. 8, no. 3, pp. 542–549, 2006.
[5] M.E. Sargin, E. Erzin, Y. Yemez, A.M. Tekalp, A.T. Erdem, C. Erdem, and M. Ozkan, “Prosody-driven head-gesture animation,” accepted for publication in Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing: ICASSP 2007.
[6] J. K. Aggarwal and Q. Cai, “Human motion analysis: A review,” Computer Vision and Image Understanding: CVIU, vol. 73, no. 3, pp. 428–440, 1999.
[7] S. Yonemoto, A. Matsumoto, D. Arita, and R.-I. Taniguchi, “A real-time motion capture system with multiple camera fusion,” Proc. IEEE Int. Conf. on Image Analysis and Processing: ICIAP, 1999, pp. 600–605.

Ferda Ofli
fofliku.edu.tr
LEADER/SENIOR
MS Student
Koc University
Lale Akarun
akarunboun.edu.tr
LEADER/SENIOR
Professor
BUMM, Bogaziçi University
Tanju Erdem
terdemmomentum-dmt.com
LEADER/SENIOR
Researcher
Momentum Technologies
Murat Tekalp
mtekalpku.edu.tr
LEADER/SENIOR
Professor
Koc University
Engin Erzin
eerzinalm.ku.edu.tr
LEADER/SENIOR
Professor
Koc University
Yücel Yemez
yyemezku.edu.tr
LEADER/SENIOR
Professor
Koc University
Yasemin Demir
ydemirku.edu.tr
MS Student Koc University
Elif Bozkurt
ebozkurtmomentum-dmt.com
Researcher Momentum Digital Media Technologies
Cristian Canton-Ferrer
ccantongps.tsc.upc.edu
PhD Student Technical University of Catalonia
Tilmanne Joelle
joelle.tilmannefpms.ac.be
PhD Student Faculté Polytechnique de Mons
Idil Kizoglu
idilkizogluyahoo.com
BS Student Bogazici University
Koray Balci
koraybalcigmail.com
PhD Student Bogazici Universitesi

E-mail list address: enterface07p5listeci.cmpe.boun.edu.tr

(You may use enterface07alllisteci.cmpe.boun.edu.tr to send e-mail to all participants. If you have any problems, please contact arman.savranboun.edu.tr)

6
Event Recognition for Meaningful Human-Computer Interaction in a Smart Environment
Description   Team   E-mail   Schedule   Expand   Collapse  

Project Description :

Objective

The localization and recognition of spatio-temporal events are problems of great theoretical and practical interest. Two specific scenarios are nowadays of special interest: home environment and smart rooms. In these scenarios, context awareness is based on technologies like gesture and motion segmentation, unsupervised learning of human actions, determination of the focus of attention or intelligent allocation of computational resources to different modalities. All these technologies pose interesting and difficult research questions. In home environments it is interesting to use low-cost sensor equipment connected to a home computer for activity monitoring and helping the user in various settings. In smart rooms, more sophisticated equipment is usually used, allowing for more robust applications. We would like to explore interconnected issues in several simple but real scenarios, where different sensors monitor an environment and their input is used for intention sensing, authentication and analysis of interactions of individuals.

Background

The project aims at two goals: a) implement a small-scale, hierarchical biometrics system using audio, vision, and RFID modalities and b) analyze the different interactions and focus of attention of individuals doing a task. For the proposed scenario, the computer controls a door. The first aim of the system is to recognize persons authorized to open the door. For this purpose, the visual (face) and audio (voice) inputs are matched against a small database. The project involves the fusion of both modalities. In a typical scenario, the users will move about in front of the door. The system will recognize the movement behaviours that lead to attempts of opening the door, before activating the authentication sequence. These behaviours can be hardcoded, or learned in an unsupervised manner.

Different sensor settings can be used: In the first one, two low-cost cameras, and a microphone are primarily employed to collect input. RFID tags will be attached to objects for additional sensing capabilities. The sensors feed their input to a computer that monitors the scene and controls various environmental parameters. In the second setting, a similar scenario is analyzed using more sophisticated sensor equipment. Recordings from a smart room, using multiple cameras and microphones will be provided. Algorithms can be tested in both scenarios to perform comparative analysis. In a multi-sensory environment, supported with embedded computer technology, the system can capture and interpret what the users are doing and assist or collaborate with the users in real-time. Such an environment should be aware of users’ intentions, tasks and feelings, and allow people to interact with the environment in a natural way: by moving, pointing and gesturing (Tangelder et al., 2005). The proposed team has experience in modeling such an environment (Tangelder et al., 2005), in gesture recognition (Aran and Akarun, 2006, Canton-Ferrer et al. 2005), in resource conscious face recognition (Salah et al. 2002), and multimodal fusion (Gökberk et al., 2005, Luque et al., 2006).

There are many possible applications. Tracking babies, kids, or elderly people for particular events, intrusion detection, gesture or speech based controlling of environmental parameters (e.g. lights, audio volume of the TV set, etc.) can be implemented. The aim of the project is to implement the tools as black-box modules that would allow straightforward application to flexible scenarios.

Technical Description

  • The database: The database will partly be collected prior to the Workshop to facilitate the implementation of the separate modules.
  • Attention deployment: Monitorization of several modalities for "events”, defined or learned during the training phase. Coarse-to-fine processing for events is proposed.
  • Gesture and pose analysis: A previously implemented HMM-based gesture recognition module will be adapted.
  • Face detection: The Viola-Jones face detector will be employed.
  • Multimodal person ID: A face recognition system is currently being developed in BU. UPC has also face and speech recognition systems, and a multimodal fusion scheme. Partners with experience on these areas may contribute their own systems.

Workpackages

  • WP-1: Data collection and proposal of technological modules and software platform (pre-workshop). Additional data collection during the workshop
  • WP-2: Person Identification. Face and speaker ID. Multimodal fusion.
  • WP-3: Gesture & pose analysis.

References

  1. Aran, O., L. Akarun, "Recognizing Two Handed Gestures with Generative, Discriminative and Ensemble Methods via Fisher Kernels", Int. Workshop on Multimedia Content Representation, Classification and Security, 2006.
  2. Gökberk, B., A.A. Salah, L. Akarun, "Rank-based Decision Fusion for 3D Shape-based Face Recognition," Int. Conf. Audio- and Video-Based Biometric Person Authentication, LNCS 3546 pp.1019-1028, Springer Verlag, 2005.
  3. Salah, A.A., E. Alpayd?n, L. Akarun, "A Selective Attention Based Method for Visual Pattern Recognition with Application to Handwritten Digit Recognition and Face Recognition," IEEE Trans. Pattern Analysis and Machine Intelligence, Vol.24, No.3, pp. 420-425, 2002.
  4. Tangelder, J.W.H., Ben A.M. Schouten, Stefan Bonchev, "A Multi-Sensor Architecture for Human-Centered Smart Environments," Proceedings CAID&CD 2005 Conference.
  5. Tangelder, J.W.H., Ben A.M. Schouten, "Sparse face representations for face recognition in smart environments" International Conference on Pattern Recognition (ICPR 2006), Hong Kong, August 20-24, 2006.
  6. C. Canton-Ferrer, J. R. Casas, M. Pardàs. “Human Model and Motion Based 3D Action Recognition in Multiple View Scenarios”. European Signal Processing Conference (EUSIPCO) 2006.
  7. J. Luque, R. Morros, A. Garde, J. Anguita, M. Farrus, D. Macho, F. Marqués, C. Martínez, V. Vilaplana, J. Hernando Audio, Video and Multimodal Person Identification in a Smart Room CLEAR 2006, Lecture Notes in Computer Science, Springer-Verlag, Berlin Heidelberg, 2007
  8. A. Abad, C. Canton-Ferrer, C. Segura, J.L. Landabaso, D. Macho, J.R.Casas, J. Hernando, M. Pardàs, C. Nadeu. UPC Audio, Video and Multimodal Person Tracking Systems in the CLEAR Evaluation Campaign. CLEAR 2006, Lecture Notes in Computer Science, Springer-Verlag, Berlin Heidelberg, 2007
  9. C. Segura, C. Canton-Ferrer, A. Abad, J.R. Casas, J.Hernando. Multimodal Head Orientation Towards Attention Tracking in SmartRooms. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Honolulu (USA), April 16-20, 2007.
Lale Akarun
akarunboun.edu.tr
LEADER/SENIOR
Professor
BUMM, Bogaziçi University
Ben Schouten
benscwi.nl
LEADER/SENIOR
Professor
CWI, Amsterdam
Ramon Morros
morrosgps.tsc.upc.edu
LEADER/SENIOR
Professor
UPC
Albert Ali Salah
salahboun.edu.tr
LEADER/SENIOR
Researcher
CWI, Amsterdam
Cem Keskin
keskinccmpe.boun.edu.tr
PhD Student Bogazici University
Onkar Ambekar
onkar.ambekarcwi.nl
PhD Student Centrum voor Wiskunde en Informatica(CWI)
Jordi Luque Serrano
luquetsc.upc.edu
PhD Student Technical University of Catalonia
Carlos Segura Perales
cseguragps.tsc.upc.edu
PhD Student Technical University of Catalonia
Ceren Kayalar
ckayalarsu.sabanciuniv.edu
PhD Student Computer Graphics Laboratory (CGLAB) / Sabanci University

E-mail list address: enterface07p6listeci.cmpe.boun.edu.tr

(You may use enterface07alllisteci.cmpe.boun.edu.tr to send e-mail to all participants. If you have any problems, please contact arman.savranboun.edu.tr)

7
3D Face Recognition Performance under Adversorial Conditions
Description   Team   E-mail   Schedule   Expand   Collapse  

Project Description :

The purpose of this project is to develop a 3D face biometric interface that is robust for unintentional and malicious subject behaviors that compromise the identification reliability.

A cooperative subject exposes his face in a still position in front of the scanner, has a frontal pose, and avoids extreme expressions and any occluding material. However, a subject, aware of 3D person identification cameras, may try to eschew being recognized by posing awkwardly, and worse still, by resorting to occlusions via dangling hair, eyeglasses, facial hair et. In this project, we will model attempts to invalidate 3D face recognition, and any other effort to mislead the system or to induce a fake character.
To this effect, we will capture 3D face data imitating difficult surveillance conditions and non-cooperating subjects, trying various realistic but effective occlusions and poses. We expect to collect a database of about 100 people in various poses, expressions and occlusion conditions using Inspeck Mega Capturor II. Using this database, we will test the performance of 3D face identification algorithms. As a byproduct of the project, we will also develop a recognition algorithm by parts, that is, person recognition based on partial 3D evidence.

Workpackages:

  1. Data collection by 3D scanner from approximately 100 people, from among eNTERFACE participants and BÜ students. If possible, data must be collected twice with a lapse of time. The protocol will be defined before the start of the project. The following strategies are envisioned:
    • Various poses: Frontal view, various pan/tilt/rotate angles (-90 degrees to +90 degrees with 30 degrees of interval)
    • Various exaggerated expressions: Grin, anger, puffing cheeks, sulking, cheek wrinkling, eyes closed …, random expressions
    • Various occlusions: Mouth is hidden behind a scarf, eyes or other parts of the face hidden behind hair, eyes hidden by eye glasses, moustache and beard. (we plan to provide false beard and moustache)

      Outputs: 3D data for various conditions listed above from 100 subjects. (Nearly 20 scans for each subject)
      Duration: 2 weeks (Week 1 and Week 2) (at least half an hour for each subject)
  2. Preprocessing of scanner output:
    • Some available tools will be used for noise removal and hole filling. The tools and algorithms should be defined before the project starts.)

      Inputs: Output data from Stage 1, and available algorithms.
      Outputs: Cleaned data.
      Duration: 2 weeks concurrently with the data collection effort (Week 1 and Week 2)
  3. Face facial feature localization, 3D face segmentation and integration of multiple views from 3D scanner outputs. We assume that the face has been correctly localized:
    • 3D fiducial point localization algorithms will be developed possibly based on the existing BÜ software. These algorithms are already being tested on well-known databases (FRGC). However, 3D faces collected during the workshop will constitute a more challenging set.
    • Face recognition by parts (say, nose patch, eye patches, cheek patches etc.) has proven to be more flexible and robust. Hence face segmentation, possibly based on fiducial points, will be completed.
    • Face recognition by unknown look directions

      Inputs: project database (obtained after stages 1 and 2), 3D landmarking algorithms
      Outputs: Performance of algorithms on project database. Duration: 2 weeks (Week 3 and Week 4)
Ilkay Ulusoy
ilkaymetu.edu.tr
LEADER/SENIOR
Professor
Middle East Technical University
Lale Akarun
akarunboun.edu.tr
LEADER/SENIOR
Professor
BUMM, Bogaziçi University
Tevfik Metin Sezgin
metin.sezgincl.cam.ac.uk
LEADER/SENIOR
Professor
University of Cambridge, Computer Laboratory
Bülent Sankur
bulent.sankurboun.edu.tr
LEADER/SENIOR
Professor
BUMM, Bogaziçi University
Jana Trojanova
jeskynka.janaseznam.cz
PhD Student Department of Cybernetics, University of West Bohemia in Pilsen, Czech Republic
Semih Esenlik
semiheseyahoo.com
BS Student Bogazici University
Nesli Bozkurt
e124410metu.edu.tr
MS Student Middle East Technical University
Aydin Akyol
akyolsu.sabanciuniv.edu
PhD Student Istanbul Technical University
Oya Çeliktutan
oyaxceliktutanyahoo.com
MS Student BUMM, Bogaziçi University
Kerem Caliskan
kcaliskaninfodif.com
PhD Student Informatics Institute - Medical Informatics
Arman Savran
arman.savranboun.edu.tr
PhD Student BUMM, Bogaziçi University
Hamdi Dibeklioglu
hamdi.dibeklioglucmpe.boun.edu.tr
MS Student BUMM, Bogaziçi University
Erdem Akagündüz
erdemametu.edu.tr
PhD Student Middle East Technical University
Cem Demirkir
cemdboun.edu.tr
PhD Student BUMM, Bogaziçi University

E-mail list address: enterface07p7listeci.cmpe.boun.edu.tr

(You may use enterface07alllisteci.cmpe.boun.edu.tr to send e-mail to all participants. If you have any problems, please contact arman.savranboun.edu.tr)

8
Audiovisual Content Generation Controlled by Physiological Signals for Clinical and Artistic Applications
Description   Team   E-mail   Schedule   Expand   Collapse  

Project Description :

Objective

This project proposes to pursue the research done during the two first editions of eNTERFACE workshops on the use of physiological signals (EEG, EMG, ECG…) to control digital sound and image synthesis processes. Taking advantage of our previous experience, we would like to carry on new tentative of this very exciting project of ”biologically-driven musical instruments”. Furthermore, we aim to enlarge the field of our research by investigating the way this kind of brain-computer interfaces could be helpful in clinical applications.

Background

In one hand, advancements in science and computer enable now musicians to perform music using a computer with gestural controllers [1] or sensors [6]. On the other hand, advancements in Brain-computer Interface (BCI) research show that basic control of the brain thoughts is possible [7]. Some recent works in BCI have tried to use the sound as a way to better understand the brain activity, i.e to provide an auditory display of the brain activity [4]. In the framework of the two first eNTERFACE workshops in 2005 and 2006, we followed an approach closer to [5] and tried to build digital musical instruments that were controlled by signals produced by human body. These experiments were successful since we were able to demonstrate at the end of the workshops live intruments with biomusicians interacting with their digital musical instruments thanks to their EEG and EMG signals [2][3]. This year, we would like to pursue this work done on the biologically-driven musical instruments especially by investigating a more medicaloriented scenario of biological signals sonification. For a detailed description of our previous projects, you can refer to the eNTERFACE’05 and ‘06 proceedings available at http://www.enterface.net.

Technical Description

During the two first projects, an important part of our work consisted of developing a conceptual framework, i.e the software architecture of the system, allowing modules (data acquisition, data processing and analysis, sound synthesis, visualization…) to communicate among them. This year, we will take benefit of this adaptive framework, and focus on more high-level aspects of biologically-driven musical interfaces.

  • Data acquisition, analysis, fusion and interpretation: Four types of data will be considered with associated captors:, electroencephalogram (EEG), electromyogram (EMG), electro-oculogram (EOC) and electrocardiogram (ECG) data.
  • Sound synthesis and interaction: two strategies of linking biological signals to multimedia synthesis (sound and visual) will be followed:
    • Paradigm of “physiological data sonification”: here the synthesized multimedia should be used to highlight some features of physiological signals, i.e to transcript these features into sound or image. This aims to improve the analysis of physiological and might be used as a tool for computer-aided diagnosis.
    • Paradigm of “digital musical instrument”: this approach relies on more aesthetical considerations and aims to exploit the physiological activity of a performer to drive digital music and paintings generation in order to perform a biologicallydriven artistic experience.
  • Software: Matlab, EEGLab, MedicalStudio for physiological signals analysis + realtime sound synthesis software (Max-MSP, Pure Data, CSound etc…) + image synthesis tools (Jitter, Processing etc…). One of our objectives this year will be to integrate OpenInterface in the existing system. OpenInterface is an open-source platform dedicated to the development of multimodal interactive systems.

Workpackages

  • WP0 (Pre-workshop preparation): setup testing and collecting of every types of physiological data (database)
  • WP1: Bio-Muse platform
  • WP2: Physiological signal analysis
  • WP3: “digital musical instrument-oriented” sound synthesis (with visual feedback)
  • WP4: “data sonification-oriented” sound synthesis (with visual feedback)
  • WP5: report and demos (live demos and videos)

References

[1] Arfib D., Couturier J.M., Kessous L., Verfaille V., “Mapping strategies between gesture control parameters and synthesis models parameters using perceptual spaces”, Organised Sound 7(2), Cambridge University Press, pp. 135-152
[2] Arslan, B., Brouse, A., Castet, J., Filatriau, J.J., Lehembre, R., Noirhomme, Q., Simon, C., “A biologically-driven musical instrument”, In Proceedings of the 1st summer workshop on multimodal interfaces (eNTERFACE05), Mons, Belgium, 2005, pp.35-45.
[3] Brouse A., Filatriau J-J., Gaitanis K., Lehembre R., Macq B., Miranda E., Zenon A., « An instrument of sound and visual creation driven by biological signals », In Proc. of the 2nd workshop on multimodal interfaces (eNTERFACE’06), Dubrovnik, Croatia, 2006.
[4] Hermann T., Meinicke P., Bekel H., Ritter H. , “Sonification for EEG data analysis”, in Proceedings of the 2002 International Conference on Auditory Display (ICAD02),Kyoto, Japan, 2002s.
[5] Miranda E. and Brouse A., “Toward Direct Brain Computer Musical Interfaces”, Conference on New Interfaces for Musical Expression (NIME05), Vancouver, Canada, 2005.cal Engineering, vol. 51, 2004.
[6] Tanaka, A., “Musical performance practice on sensor-based instruments”, In Trends in Gestural Control of Music, M. M. Wanderley and M. Battier, eds. IRCAM, pp. 389- 406, 2000.
[7] Wolpaw, J.R., Birbaumer, N., McFarland, D.J.; Pfurtscheller, G, Vaughan, T.M., “Brain computer interfaces for communication and control”. Clinical Neurophysiology 113 (2002), 767-791, 2002.

Benoit Macq
Benoit.MacqUCLouvain.be
LEADER/SENIOR
Professor
TELE Lab, UCL Louvain La Neuve
Ben Knapp
b.knappqub.ac.uk
LEADER/SENIOR
Professor
Queens University, Belfast
Lehembre Rémy
lehembretele.ucl.ac.be
LEADER/SENIOR
PhD Student
UCL - Université catholique de Louvain
Filatriau Jean-Julien
filatriautele.ucl.ac.be
LEADER/SENIOR
PhD Student
Université Catholique de Louvain (UCL-TELE), Belgium
Brouse Andrew
brousetele.ucl.ac.be
PhD Student TELE Lab, Université Catholique de Louvain, Belgium
Koray Tahiroglu
ktahiroguiah.fi
PhD Student University of Art and Design Helsinki
Mohammad Soleymani
mohammad.soleymanicui.unige.ch
PhD Student Computer vision and Multimedia Lab., University of Geneva
Alaattin Sayin
sayinaistanbul.edu.tr
MS Student Istanbul University
Miguel Angel Ortiz Perez
mortizperez01qub.ac.uk
PhD Student Sonic Arts Research Centre, Queen's University Belfast
Christian Muehl
cmuehlgmail.com
MS Student University of Osnabrück
Benovoy Mitchel
benovoymcim.mcgill.ca
MS Student Centre for Intelligent Machines, McGill University, Montreal, Canada
Christian Frisson
frissontele.ucl.ac.be
PhD Student UCL-TELE
Cumhur Erkut
Cumhur.Erkuttkk.fi
Researcher University of Art and Design Helsinki
Hannah Drayson
hannah.draysonplymouth.ac.uk
PhD Student University of Plymouth
Thomas Greg Corcoran
thomascorcragmail.com
Researcher
Umut Gundogdu
gunumutistanbul.edu.tr
MS Student Istanbul University Dept. of Electrical Electronical Eng.

E-mail list address: enterface07p8listeci.cmpe.boun.edu.tr

(You may use enterface07alllisteci.cmpe.boun.edu.tr to send e-mail to all participants. If you have any problems, please contact arman.savranboun.edu.tr)

9
USIMAG Tool: A Software for Real-time Elastography and Tensorial Elastography
Description   Team   E-mail   Schedule   Expand   Collapse  

Project Description :

Objective

This project aims at developing a freeware Elastography Analysis and Visualization application that would serve as a common platform for future collaboration of the participating research groups and dissemination of their research outputs to a wider medical audience. The development environment will be C++.

Background

Changes in tissue stiffness correlate with pathological phenomena that can aid the diagnosis of several diseases such as breast and prostate cancer (GARR97,HILT01) or cardiovascular dysfunctions (SCHU05,MAUR05). Many different approaches try to estimate and image the elastic properties of tissues, but this is not possible with conventional ultrasound, MRI, CT or nuclear imaging. There are mechanical ways to estimate the biomechanical properties of the tissue such as indentation, which is mostly used for thin layers of tissue ex-vivo (KROU98,SRIN04). Researchers have introduced new techniques using imaging modalities such as MRI and ultrasound, and also there are some investigations in the optical field using microscopes (DUNC01), always imaging the tissue response to some stimulus. These techniques may be referred to as Elasticity Imaging. A review is found in(PARK05). Elastography (OPHI91) relies among the ultrasound quasi-static techniques for imaging the elastic properties of soft tissues and it is well established in the literature. There are studies comparing results (DOYL01) which show that freehand elastography, although it has a lower SNR, has proven its capability to detect lesions such as breast carcinomas (OTAK03). The displacement field from which researches normally obtain the strain is estimated with different techniques. We will refer to papers such as (SRIN02), which use time-domain cross-correlation techniques, or(PESA99), which uses iterative phase zero estimation, among others. Some researchers visualize the estimated displacement and strain fields following the path in(OPHI91); they focuse on the Forward Problem. Some others, calculate from the displacement and strain fields, mechanical properties of the tissue such as Young's modulus, by using the constitutive elasticity equations solving the so called Inverse Problem. In the former, either axial strain or lateral strain(OPHI91), Poisson's ratio(RIGH04), or shear strain(KONO00) elastograms are visualized. The Inverse Problem approach, deals with Young's modulus visualization, the shear modulus(DOYL05) or other related parameters. A comparative study between this two approaches can be found in (DOYL05). Nowadays, Elastographic software is starting to appear in the commercial system, with very basic functionality.

Technical Description

The target application will have the following functionality:

  1. I data: B-mode images or RF signals (Pre and post compression)
  2. Computation with different algorithms (Optical flow, cross-correlation,…)
  3. Filtering tools
  4. Scalar visualization tools
  5. Tensor visualization tools

USIMAG Tool will prepare software for the physician to change parameters for filtering and visualization in Real Time Elastography, and will be ready to implement in different ecographic systems. The participating labs are encouraged to contribute with their original methods to the remaining sets of functions. The code development will be monitored by LPI-UVA and CTM-ULPGC members for compatibility. USIMAG Tool is based on C++, and VTK/ITK functions through a hidden layer, which means that participants may import their own functions and/or use the VTK/ITK functions. Consequently, experience in C++ programming and VTK/ITK, together with familiarity to Linux is important.

Workpackages

  • April: after the team selection, data acquisition, in order to have database of elastographic signals ready for the workshop start.
  • Computation, Filtering, and Visualization, teams will be formed.
  • Each group will be monitored by the project supervisor for compatibility issues, identifying integration problems.
  • Last three days of the workshop will be used for preparing the demo of the USIMAG-Tool application. (Live demos and videos).
  • Final presentation

References

[1] N. Belaid, I. C´espedes, J. Thijssen, and J Ophir. Lesion detection in simulated elastographic and
ecographic images: A psycho-physical study. Ultrasound in Medicine and Biology, 20:877–
891, 1994.
[2] M. M. Doyley, J. C. Bamber, F. Fuechsel, and N. L. Bush. A freehand elastographic imaging
approach for clinical breast imaging: System development and performance evaluation.
Ultrasound in Medicine and Biology, 27:1347–1357, 2001.
[3] MM Doyley, S Srinivasan, SA Pendergrass, Z Wu, and J Ophir. Comparative evaluation of
strain-based and model-based modulus elastography. Ultrasound in Medicine and Biology,
31(6):787–802, 2005.
[4] D.D. Duncan and S.J. Kirkpatrick. Processing algorithms for tracking speckle shifts in optical
elastography of biological tissues. Journl of Biomedical Optics, 6(4):418–426, July 2001.
[5] B.S. Garra, I. C´espedes, J. Ophir, S. Spratt, R. A. Zuurbier, C. M. Magnant, and M. F. Pennanen.
Elastography of breast lesions: initial clinical results. Radiology, 202:79–86, 1997.
[6] K. M. Hiltawsky, M. Kruger, C. Starke, L. Heuser, H. Ermert, and A. Jensen. Freehand ultrasound
elastography of breast lesions: Clinical results. Ultrasound Med. Biol., 27:1461–1469,
2001.
[7] E. E. Konofagou and Ophir J. A new elastographic method for estimation and imaging of lateral
displacements, lateral strains, corrected axial strains and poisson’s ratios in tissues. Ultrasound
in Medicine and Biology, 24(8):1183–1199, 1998.

Ruben I. Cardenes Almeida
rubenlpi.tel.uva.es
LEADER/SENIOR
Professor
ETSI Telecomunicación, Valladolid, SPAIN
Dario Sosa Cabrera
darioctm.ulpgc.es
LEADER/SENIOR
Researcher
Universidad de Las Palmas de Gran Canaria
Javier Gonzalez Fernandez
jgonzalezctm.ulpgc.es
LEADER/SENIOR
Researcher
Universidad de Las Palmas de Gran Canaria
Karl Krissian
krissiandis.ulpgc.es
Researcher Universidad de Las Palmas de Gran Canaria
Santiago Aja Fernandez
sanajatel.uva.es
Professor Universidad de Valladolid
Veronica Garcia Pérez
veronicalpi.tel.uva.es
PhD Student University of Valladolid
Gonzalo Vegas Sánchez-Ferrero
gvegsanlpi.tel.uva.es
PhD Student LPI, University of Valladolid (Spain)
Rodrigo de Luis Garcia
rodluiyllera.tel.uva.es
Researcher ETSI Telecomunicación, Valladolid, SPAIN

E-mail list address: enterface07p9listeci.cmpe.boun.edu.tr

(You may use enterface07alllisteci.cmpe.boun.edu.tr to send e-mail to all participants. If you have any problems, please contact arman.savranboun.edu.tr)

10
Realtime and Accurate Musical Control of Expression in Singing Synthesis (RAMCESS)
Description   Team   E-mail   Schedule   Expand   Collapse  

Project Description :

Objectives

The first purpose of this project is to continue the development of strategic entities involved in expressive voice production: glottal signal synthesis, physical modelling of the vocal tract, interpolation/extrapolation/navigation mapping schemes, noise and turbulances modelling. At a second level, "dimensionality" (meaning low-level synthesis parameters mapping into perceptual features e.g. tenseness, breathiness, etc.) of voice quality will be discussed and refined, targeting a coherent and unified descriptive canevas for voice timbre modification. As, in 2007, we take advantage of work and experience from both preceding eNTERFACE workshops, we will target the realization of a full computer-based musical instrument, also considering gestural control issues (especially through bi-manual manipulation), and artisitic possibilities (e.g. eventually targeting a concert at the end of the workshop).

Background

Expressivity is nowadays one of the most challenging topics in view by the researchers in speech synthesis. Indeed, recent synthesizers provide acceptable speech in term of intelligibility and naturalness but the need to improve human/computer interactions carry out researchers to develop more “human”, more expressive systems. Some recent realizations [1] have shown that a interesting option was to record multiple databases corresponding to a certain number of “labelled” expressions (e.g. happy, sad, angry, etc). At synthesis time, the expression of the virtual speaker is set by choosing the units in the corresponding database.

Two years ago, during eNTERFACE’05, the group decided to investigate an opposite option. Indeed, we postulated that “emotion” in speech was not the result of switchs between labelled expressions but a continuous evolution of voice features extremely correlated with context. This approach came back to a more acoustic/psychoacoustic description of voice production mecanisms, in which a large number of theories (e.g. [2,3]) have been developed these last years, but often underexploited in voice synthesis, and particularly in the realtime context. Thus, we developed a set of flexible voice synthesizers “conducted” in realtime by a operator. At this level, the synthesizer achieved really interesting - but quite rough - expressive results (expressive accents: efforts, lax/pressed, hoarseness and noise) [4].

After a lot of inter-workshop work, and a particular focus on gestural control issues [5], it becames clear that such a framework was particularly efficient for singing synthesis. This approach was confirmed last year, at eNTERFACE'06, where we focused on a singing synthesis scheme, with particular constraints related to expressivity dimensions and gestural control abilities. At the end of the workshop, we achieved a large number of voice quality control modules, like glottal signal generator, geometrical model of vocal tract, parameters conversion functions, interpolators, and mapping strategies implementations [6]. This new coming "library" can now serve as a basis in the development of concrete monophonic singing prototypes and discussions around these topics, with an easy access to tests and validation in realtime.

Technical Description

As we now reach eNTERFACE'07 workshop with the above-mentioned background, goals evolve to the following:

1. Review, extend and discuss the existing voice quality control library
Preceding eNTERFACE workshops result now in a set of voice quality realtime control modules, including glottal pulse generators (CALM: Causal/Anticausal Linear Model [7]), physical model of vocal tract (LPC lattice filter), a collection of vector computation objets in order to implement the coefficient conversion framework (conversions between filter coefficients, formants features, reflection coefficients, tubes sections, etc.), noise and turbulances, vocal tract shape plotting, and usual controllers software interfaces (tablet, dataglove, joystick, etc.). These modules are now fully developed as Max/MSP objects and Pure Data porting is in the pipeline.
2. Review, extend and discuss low-level two-handed mappings and dimensionality of the voice source
On the top of production modules, stand two really important and actually underexploited mapping strategies: the low-level mapping of two-handed movements (meaning the first-level modifications on gestural controllers information, in order to be more appropriate for the current application, e.g., how to implement an expressive vibrato with a usual force sensing resistor?) and the dimensionality of the voice source (meaning the first-level modifications on synthesis parameters, in order to better represent perceptual axis of timbre variations, e.g. implementing the tenseness on the top of open quotient and asymmetry coefficent). These behaviors will be implemented as Max/MSP and/or PureData patch softwares.
3. Develop a full musical instrument based on a natural singing behavior
When two-handed movements and voice source perceptual features are correctly interpreted, it is time to consider dedicated singing mapping strategies. It concerns, on the one hand, mechanisms to be implemented in order to produce natural singingvoice timbre variations (e.g. singing formant, harmonic/formant adaptation, movements in size of the vocal tract, etc.). These results will be achieved by discussing singing voice litterature and analysing expressive singing databases. On the other hand, performing abilities of the system have to be optimized, in order to make those natural singing sounds interesting from an artistic point of view. Different instrumental behaviors (keyboard-based, "fretless" control, conducting movements, etc.) have thus to be considered and adapted to singing synthesis [8]. This work will also be implemented as Max/MSP and/or PureData patch softwares.

We focus on the fact that, at the end of the workshop, the group should provide a complete and usable monophonic musical tool (from interface to sound production). This synthesizer will be made of publically avialable modules. Indeed, we also will produce a report and it should be interesting that it also contains some practicing discussions, eventually resulting from real musical sessions.

Workpackages

1. WP1 - Workshop Preparation
These tasks concern work that will be achieve by all team before the workshop: gather a significant database of singing sounds, meaning that considered singing styles have to be decided at this step, discussions about software architecture, and dissemination of the existing voice quality control library;

2. WP2 - Expressive Voice Analysis
This work concerns the processing of various gathered singing sounds, in order to extract usable features or tendancies for expressive voice implementation. It requires general voice (e.g. speech) analysis expertise, such as the implementation of pitch tracking, formants detection, phase processing, etc. offline algorithms (work in Matlab).

3. WP3 - Expressive Voice Synthesis
These tasks are related to the development of expressive voice production modules. It asks mainly good view of voice (e.g. speech) synthesis issues, such as source/filter implementations, harmonic/noise modelling, dynamics control. A particular focus is also made on realtime synthesis issues: latency, interpolability, computational load, continuity (e.g. phase, pitch, etc.) problems.

4. WP4 - Gestural Controllers & Low-Level Bi-Manual Mappings
This work concerns the evaluation of different gestural control strategies, based on the choice made at the level of devices, but also considering first-level interpretation of gestural datas, in order to imitate (keyboard-like, string-like, trumpet-like, etc.) or innovate (based on general ergonomic issues) in the context of efficent control of expressive sound synthesis.

5. WP5 - Dimensionality & Singing Synthesis Behavior
These tasks concern the gathering of various theories related to perceptual aspects of voice timbre, with here a particular focus on singing voice timbre description, in order to produce an unified framework for voice quality control and singing mechanisms implementation. It is based on an iterative process where various mapping strategies will be implemented and tested.

6. WP7 - Digital Luthery & Performing Issues
This last workpackage acts as a constant review of performing abilities (under artistic perspectives) of the different systems that will be assembled. It requires good knowledges about the musical performance context (live situation, interpreter constraints, etc.). This work will stand as the final "summarizing" process where engineering prototypes will evolve into musical instruments.

References

[1] http://www.loquendo.com
[2] G. Carlsson and J. Sundberg, "Formant Frequency Tuning in Singing", Journal of Voice, vol. 6, no. 3, pp. 256–60, 1992.
[3] N. Henrich, C. d’Alessandro, M. Castellengo, and B. Doval, "Glottal Open Quotient in Singing: Measurements and Correlation with Laryngeal Mechanisms, Vocal Intensity, and Fundamental Frequency", Journal of Acoustics Society of America, vol. 117, pp. 1417–1430, March 2005.
[4] C. d’Alessandro, N. D’Alessandro, S. Le Beux, J. Simko, F. Cetin, and H. Pirker, "The Speech Conductor: Gestural Control of Speech Synthesis", in Proceedings of eNTERFACE’05 Summer Workshop on Multimodal Interfaces, 2005.
[5] N. D’Alessandro, C. d’Alessandro, S. Le Beux, and B. Doval, "Realtime CALM Synthesizer, New Approaches in Hands-Controlled Voice Synthesis", in Proceedings of the 6th International Conference on New Interfaces for Musical Expression, pp. 266–271, 2006.
[6] N. D'Alessandro, B. Doval, S. Le Beux, P. Woodruff, Y. Fabre, C. d'Alessandro and T. Dutoit, "Realtime and Accurate Musical Control of Expression in Singing Synthesis," Journal on Multimodal User Interfaces, vol. 1, no. 1, pp. 31-39, March 2007.
[7] B. Doval and C. d’Alessandro, "The Voice Source as a Causal/Anticausal Linear Filter", in Proceedings of VOQUAL’03, Voice Quality: Functions, Analysis and Synthesis, ISCA Workshop, August 2003.
[8] N. D’Alessandro, T. Dutoit, "HandSketch Bi-Manual Controller: Investigation on Expressive Control Issues of an Augmented Tablet," [to be published in] Proceedings of the 7th International Conference on New Interfaces for Musical Expression, June 2007.

Nicolas D`Alessandro
nicolas.dalessandrofpms.ac.be
LEADER/SENIOR
PhD Student
Faculté Polytechnique de Mons
Moinet Alexis
alexis.moinetfpms.ac.be
PhD Student FPMs
Holzapfel Andre
hannovercsd.uoc.gr
PhD Student University of Crete
Baris Bozkurt
barisbozkurtiyte.edu.tr
Professor Izmir Inst. of Tech., Urla/Izmir
Onur Babacan
onurbabacangmail.com
BS Student Izmir Inst. of Tech., Urla/Izmir
Dubuisson Thomas
thomas.dubuissonfpms.ac.be
PhD Student Faculté Polytechnique de Mons (FPMs)
Kessous Loic
kessouspost.tau.ac.il
Senior Researcher Tel Aviv University
Vlieghe Maxime
maxime.vlieghegmail.com
MS Student Faculté Polytechnique de Mons

E-mail list address: enterface07p10listeci.cmpe.boun.edu.tr

(You may use enterface07alllisteci.cmpe.boun.edu.tr to send e-mail to all participants. If you have any problems, please contact arman.savranboun.edu.tr)

11
Mobile-phone Based Gesture Recognition
Description   Team   E-mail   Schedule   Expand   Collapse  

Project Description :

Objective

The simple keypad that is present in many mobile phones is adequate for placing voice calls but falls short when interacting with other mobile applications, such as web browsing, image and video browsing, and location services. Because mobile phones are handheld devices, user's gestures can be easily utilized as additional inputs, for example moving the phone to the right resulting in panning to the right on a map on the screen. The goal of the project is to develop fast gesture recognition by analyzing the motion of the camera phone from video input. The developed algorithm can be arbitrarily complex and even support handwriting via handheld device.

Technical Description

Software:

Hardware:

Project coordinators will provide:

  • A tutorial on development environment, setup, introduction to mobile programming
  • Example source code that can perform simple image processing on video input
  • Image, video data – i.e. various gestures captured by cell phone camera

Schedule:

  • Week 1. Getting familiar with the development environment and implementing basic i/o functions.
  • Week 2. Implementing fast motion estimation for gesture recognition from video stream.
  • Week 3. Implementing the image browser user interface and integrating with gesture recognition.
  • Week 4. Improve speed and performance, documentation.

Workpackages

  • 1-2 students who are proficient in C++ and knowledgeable in basic image processing algorithms. Familiarity with C# and Microsoft Visual C++ or Studio IDE is a plus.
  • A project co-leader is also needed to oversee the progress.

References

Berna Erol
berna_erolrii.ricoh.com
LEADER/SENIOR
Researcher
Ricoh California Research Center, Menlo Park, CA, USA
Murat Saraçlar
murat.saraclarboun.edu.tr
LEADER/SENIOR
Professor
BUMM, Bogaziçi University
Tevfik Metin Sezgin
metin.sezgincl.cam.ac.uk
LEADER/SENIOR
Professor
University of Cambridge, Computer Laboratory
Burcu Barla
burcubarlagmail.com
BS Student Bogaziçi University Electrical-Electronics Department
Caglayan Dicle
cdiclegmail.com
MS Student Bogazici University
Ögem Boymul
ogem.boymulboun.edu.tr
BS Student Bogaziçi University
Baris Bahar
barisbahar86yahoo.com
BS Student Bogazici University Computer Engineering
Milos Zelezny
zeleznykky.zcu.cz
Professor Department of Cybernetics, University of West Bohemia in Pilsen
Candan Herdem
kcandanherdemyahoo.com
MS Student Gazi University, Ankara, Turkey
Deniz Türdü
denizturdusu.sabanciuniv.edu
MS Student Vision and Pattern Analysis Laboratory, Sabanci University

E-mail list address: enterface07p11listeci.cmpe.boun.edu.tr

(You may use enterface07alllisteci.cmpe.boun.edu.tr to send e-mail to all participants. If you have any problems, please contact arman.savranboun.edu.tr)

12
Benchmark for Multimodal Biometric Authentication
Description   Team   E-mail   Schedule   Expand   Collapse  

Project Description :

Objective

This project aims at creating a small benchmark for the testing integration of existing monomodal robust hashing techniques and feature extraction techniques into multimodal methods for biometric authentication.

Abstract

It is possible to authenticate individuals by means of robust hashing or by means of feature extraction functions. For instance, one may take a photo of the face of a person and robustly hash it in order to obtain a low-dimensional descriptor (hash). The same can be achieved by means of feature extraction functions. The identifiers thus obtained can be compared to preexisting ones in a database for a match. Systems based on multimodal (i.e., joint) robust hashing and feature extraction strategies combine two or more monomodal functions (for instance, one method to hash a face image and another method to hash a fingerprint image) in order to increase security.

Technical Description

The main goal of the project is to create a small GUI-driven benchmark driven in order to test multimodal identification strategies. This benchmarking system will be just a prototype (target deliverable of the project). Despite the necessarily limited scope of this demo, it should be usable and, if possible, easily extensible. Extendability (in order to include new items and features) may not be completely automatic, for keeping the implementation simple. That is, it will not be possible to do all operations through a GUI, and manual adjustments in the source code will probably be necessary. A basic scheme showing the relationships between the different parts of the bechmarking system is shown in Figure 1.
The benchmark will have a database of authenticated individuals with corresponding input signals and a database of verified identifiers (i.e., hash values, extracted features) obtained from the authenticated individuals in the database. The proposed initial database will include 2D face images and short audio clips (around 3 seconds) of the same individuals reading a password. Ten images of every individual and ten utterances of the password will be included in order to minimize the effects of intraindividual variability.

An individual entry in the database will resemble the following pattern:

  • Name
  • Authenticated (Boolean)
  • Fake/Attacked (Boolean)
    • If affirmative, attack from the library of attacks used (see below)
  • List of image files and audio clip files associated
    • For each image, audio: list of identifiers generated by each relevant function in the library of monomodal functions (see below)

The benchmark should be able to admit new modules in the following libraries of functions (see Figure 1):

  1. Library of monomodal hashing and feature extraction methods. It will include standard monomodal functions. For each function, the following items will be defined:


    Figure 1: Relationships between the elements of the benchmark
    • Function-dependent input parameters, including type of signal handled by the function, thresholds, etc.
    • The function must return an identifier string (binary or real-valued, depending on the function).
    • In acquisition mode, the identifier will be stored in the database associated to the individual whose signal has been employed to derive
      the identifier.
    • In comparison mode, the function must be able to compare the identifier obtained to those previously stored in the database during acquisition, giving a Boolean decision of similarity.
      Proposed initial functions:
    • Robust image hashing: algorithm A of [1].
    • Robust audio hashing: Philips method [2].
      Both methods yield binary identifiers.
  2. Library of attacks. It will list attack functions on the signals stored in the individuals database (chimeric characters, noise addition, filtering, etc). Attacked signals will be used to assess how robust multimodal methods are to these attacks:
    • is it possible to fool the system in order to verify non-authentic signals?
    • are multimodal methods more robust to noisy versions of the signals?
      The results of the attacks may be stored properly labelled in the database —noting that these are not authenticated signals— or they may be used
      on the fly.
      Proposed initial attacks:
    • Chimeric characters: random combinations of face imags and audio clips from different individuals in the database. Face images can also be morphed from randomly chosen images in the database. It can be seen as an intentional, malicious attack.
    • Pseudorandom Gaussian noise addition to images and audio from the database. It can be seen as an unintentional attack.
  3. Library of multimodal methods. It will list functions which, using the library of monomodal techniques, specify ways to combine two (or more) monomodal techniques in order to create multimodal identifiers. For instance, the system could allow to combine a method to robustly hash face images with a method to extract features from a fingerprint; the newly created method should be stored in the library as a multimodal function. It is important that each multimodal function implements an overallcomparison function, able to break ties between the monomodal decisions when trying to match unauthenticated signals with authenticated signals in the database.
    Proposed initial multimodal method: a function combining [1] and [2] into a multimodal fingerprint. If the monomodal methods give opposite Boolean decisions in the comparison stage, the overall comparison function will output the decision with higher percentage of binary matches.
  4. Library of benchmarking scripts. It will list scripts which may be run in batch mode (i.e., autonomously), using suitable signals from the database, one multimodal method, and one attack. Quality measures such as the rates of detection and false alarm (obtained by comparison with the authentic identifiers) will be computed during the execution of the script. In the scripts there may be loops where some attack parameters are generated pseudorandomly. A resettable Boolean variable will indicate if the script has been run by the benchmark already.
    Proposed initial benchmarking scripts:
    1. Script 1:
      • Pseudorandomly generate a sufficiently large new database of false characters, using the individuals database and the chimeric attacks.
      • Run the proposed multimodal method on the new database.
      • Compute and store multimodal and monomodal probabilities of detection and false alarm, comparing with the authenticated individuals database, using the methods proposed.
    2. Script 2:
      • Pseudorandomly generate a sufficiently large new database of noisy characters, using the individuals database and the Gaussian noise attack.
      • Run the proposed multimodal method on the new database.
      • Compute and store multimodal and monomodal probabilities of detection and false alarm, comparing with the authenticated individuals database, using the methods proposed.

The output file with the data resulting from running a benchmarking script will be timestamped and included in a database of results. Using these result files, an output module will be able to produce simple text reports or plots from the results of running benchmarking scripts. For the proposed scripts, a text report may include details such as functions and parameters used, number of iterations, database signals used, and quality measures obtained. An output plot will show ROC plots (probability of false alarm versus probability of detection) for different thresholds or noise levels.

Benchmark Architecture

In order to speed up the development time, the GUI will be implemented using Matlab, and so will be the functions. The database will be setup with Mysql, which is interfaceable with Matlab. The participants may of course suggest and use alternatives that they feel more at ease with; for instance, it is easy to interface C/C++ with Matlab through Mex libraries.

The main GUI window (see Figure 2) will allow to browse and add/remove elements from the database and all four libraries. It will have a button for acquisition, through which the database will be updated: new identifiers will be created for new individuals in the database using the functions in the library, or existing individuals will be updated if new functions have been added since last update. Another button in the main GUI will allow to run all uncompleted benchmark scripts.

Figure 2: Sketch of main GUI window

Workpackages

  1. (Pre-workshop preparation) Data collection and database creation.
  2. (Pre-workshop preparation) Development of a basic GUI and benchmarking system workflow in Matlab.
  3. (Week 1) Implementation/adaptation of the monomodal and multimodal identification methods chosen.
  4. (Week 1) Implementation/adaptation of the attacks chosen.
  5. (Week 2) Development and integration of the main benchmarking system blocks.
  6. (Week 3) Development of output module.
  7. (Week 3) Debugging and bug fixing.
  8. (Week 4) Benchmark tests.
  9. (Week 4) Project report and demo showcasing.

Workforce

The project will be undertaken by least two persons with good programming skills and some familiarity with signal processing techniques. The participants are expected to implement and test a basic version of the benchmark during the workshop, under the direction and with the collaboration of the coordinators.

References

[1] M. K. Mihcak and R. Venkatesan. New iterative geometric methods for robust perceptual image hashing. In Procs. of ACM Workshop on Security and Privacy in Digital Rights Management, Philadelphia, USA, 2001.
[2] J. Haitsma, T. Kalker, and J. Oostveen. Robust audio hashing for content identification. In Procs. of the International Workshop on Content-Based Multimedia Indexing, pages 117–125, Brescia, Italy, September 2001.

Felix Balado
fizihl.ucd.ie
LEADER/SENIOR
Professor
University College Dublin
Kivanc Mihcak
kivanc.mihcakboun.edu.tr
LEADER/SENIOR
Professor
Bogaziçi University
Neil Hurley
neil.hurleyucd.ie
LEADER/SENIOR
University College Dublin
Morgan Tirel
morgan.tireletudiant.univ-rennes1.fr
MS Student University of Rennes, France
Neslihan Gerek
neslihan.gerekgmail.com
MS Student Bogazici University
Ekin Olcan Sahin
ekin.sahinboun.edu.tr
MS Student Bogazici University
Guenole Silvestre
guenoleihl.ucd.ie
University College Dublin
Cliona Roche
cliona.rocheucd.ie
PhD Student University College Dublin
Sinan Kesici
sinan940yahoo.com
BS Student Bogaziçi Uni. Electrical-Electronics Eng.

E-mail list address: enterface07p12listeci.cmpe.boun.edu.tr

(You may use enterface07alllisteci.cmpe.boun.edu.tr to send e-mail to all participants. If you have any problems, please contact arman.savranboun.edu.tr)

Openinterface Project
Team   Expand   Collapse  
Marcos Serrano
marcos.serranoimag.fr
Researcher University of Grenoble
Lionel Lawson
jean-yves.lawsonuclouvain.be
Researcher UCL, Université catholique de Louvain
Yann Goffette
yann.goffettestudent.uclouvain.be
MS Student UCL-BCHI
Louvigny Henri-Nicolas
henri-nicolas.louvignystudent.uclouvain.be
MS Student UCL-BCHI

 

Participants come from 20 countries

Turkey (52), Spain (18), Belgium (14), Greece (11), France (7), Ireland (5), UK (5), Czech Republic (4),
Netherlands (4), Romania (4), USA (4), Italy (3), Canada (2), Finland (2), Croatia (1), Germany (1),
Israel (1), Russia (1), Senegal (1), Switzerland (1)



eNTERFACE 2007 Web Site Similar | Cordis | Eurasip | Isca | Bogaziçi University | CmpE | EE