|
|
 |
Projects & Teams
|
 |
|
eNTERFACE'07 Projects |
1 |
Multi-Approach DT-MRI Data Analysis & Visualization Platform
Description
Team
E-mail
Schedule Expand
Collapse
|
|
Project Description :
Objective
This project aims at developing a freeware (not open-source) DT-MRI
Analysis and Visualization platform that would serve as a common platform
for future collaboration of the participating research groups and dissemination
of their research outputs to a wider medical audience. The development
environment will be VAVframe, a software framework under development Bogazici
University, Turkey. VAVframe is a ITK/VTK based C++ framework for Linux.
It will support distributed computing as well. The framework allows the
users to implement their research algorithms (analysis/visualization/interaction)
as well as use ITK/VTK functions.
Background
The human brain mapping refers to understanding the functional and the
physiological structure of human brain. This project is concentrated on
the physiological structure as revealed by DT-MRI imaging technique. DT-MRI
is a relatively new technique with increasing importance, esp in understanding
neurodegenerative diseases. The engineering challenge is to reconstruct
the connection network.
There are two basic approaches in utilizing the information DT-MRI provides:
Fiber Tractography and Connectivity Mapping. The former approach relies
on the principal diffusion direction and attempts to reconstruct the fiber
that passes through a given point. The basic tool used for tractography
is numerical integration of the principal diffusion direction (the major
eigenvector of the diffusion tensor) among which the most popular method
is the 4th order Runge-Kutta [1]. Fiber tractography is prone to cumulative
errors, can not overcome the partial volume effect and disregards part
of the information embedded in the diffusion tensor (which itself is an
approximation based on gaussianity assumption). The latter approach attempts
to utilize the true nature of the DT-MRI data, i.e. the gaussian diffusion
process, by estimating a connectivity map. They consider each and every
possible connection with weights set by the dataset. Several approaches
in this group are based on some sort of Monte-Carlo simulations of the
random walk model [2,3,4]. Lenglet et al., on the other hand, recasted
the connectivity problem to Riemannian differential geometry framework
where they defined their local metric tensor using the DTI data and solved
for geodesics [5]. Probably, the most important point that differentiates
these two approaches is their behaviour at problematic regions such as
crossing and kissing fibers. The tractography methods either pretends
to follow a single fiber by choosing a direction to proceed or stops tracking,
whereas the connectivity mapping based methods allow for branching. Although
branching is not correct anatomically, presenting the DT-MRI data in this
way is more loyal to the nature of the acquired data (localized gaussian
maps of diffusing particles) and allows the users to interpret is. Thus,
we can say that connectivity mapping is a more direct way of communicating
the information embedded in DT-MRI data. However, it is not trivial to
interpret connectivity maps.
Technical Description
A basic DTI analysis and visualization application developed under VAVframe
will be provided by VAVlab (www.vavlab.ee.boun.edu.tr),
Bogazici University. Consequently, the principal C++ classes will have
been implemented. The participants will initially be provided with a tutorial/introduction
on this package, the coding conventions and the documentation procedures
that must be followed. The infrastructure is based on C++ classes implemented
under Linux (Fedore Core 6) with ITK and VTK libraries. The participants
will be either asked to study a certain algorithm assigned to them and
implement it, or they may prefer to implement an algorithm of their choice
(such as their own research results). The details will be set once the
participating labs are known. The deliverable will be a DTI application
usable in a clinical environment.
Workpackages
WP1: (now – June 2007) Implementation of the basic DTI application
at VAVlab.
WP2: (Weeks 1-2) Implementation of SIMILAR Tensor Standard based
I/O functions
WP3: (Weeks 1-2) Implementation of tensor visualization routines
WP4: (Weeks 1-2) Implementation of analysis routines (Tractography
/ Connectivity / Tensor Registration). Details will be set after the participating
labs are known.
WP5: (Week 3) Integration
WP6: (Week 4) Documentation and Demo
Preferred skills: C++ programming under Linux, familiarity with
VTK and ITK, Signal Processing / Computer Graphics background
References
[1]C.R. Tench, P.S. Morgan, M. Wilson, and L.D. Blumhardt, “White matter
mapping using diffusion tensor mri,” Magnetic Resonance in Medicine, vol.
47, pp. 967–972, 2002.
[2] M.A. Koch, D.G. Norris, and M. Hund-Georgiadis, “An investigation
of functional and anatomical connectivity using magnetic resonance imaging,”
Neuroimage, vol. 16, pp. 241–250, 2002.
[3] P. Hagmann, J.P. Thiran, P. Vandergheynst, S. Clarke, and R. Meuli,
“Statistical fiber tracking on dt-mri data as a potential tool for morphological
brain studies,” ISMRM Workshop on Diffusion MRI : Biophysical Issues,
2000.
[4] M.K. Chung, M. Lazar, A.L. Alexander, Y. Lu, and R. Davidson, “Probabilistic
connectivity measure in diffusion tensor imaging via anisotropic kernel
smoothing,” Tech. Rep. 1081, University of Wisconsin, 2003.
[5] C. Lenglet, R. Deriche, and O. Faugeras, “Diffusion tensor magnetic
resonance imaging : Brain connectivity mapping,” Tech. Rep. 4983, INRIA,
France, 2003. |
Burak Acar
acarbu boun.edu.tr
|
LEADER/SENIOR Professor |
BUMM / VAVlab, Bogaziçi University |
|
Roland Bammer
rbammer stanford.edu
|
LEADER/SENIOR Professor |
Stanford University |
|
Marcos Martin Fernandez
marcma tel.uva.es
|
LEADER/SENIOR Professor |
University of Valladolid, Spain |
|
Suzan Uskudarli
suzan.uskudarli boun.edu.tr
|
LEADER/SENIOR Professor |
BUMM / VAVlab, Bogaziçi University |
|
Ali Vahit Sahiner
alivahit.sahiner boun.edu.tr
|
LEADER/SENIOR Professor |
BUMM / VAVlab, Bogaziçi University |
|
|
Deniz Diktas
|
|
BUMM, Bogaziçi University |
|
Sila Girgin
silagirgin gmail.com
|
MS Student |
BUMM / VAVlab, Bogaziçi University |
|
Murat Aksoy
maksoy stanford.edu
|
PhD Student |
Stanford University |
|
DIA Ousmane Amadou
ousamdia gmail.com
|
MS Student |
Ecole Superieure Polytechnique de Dakar |
|
Ioannis Marras
imarras aiia.csd.auth.gr
|
PhD Student |
Artificial Intelligence & Information Analysis lab, Department of Informatics, Aristotle University |
|
Luis Miguel San Jose
lsanjose tel.uva.es
|
Professor |
ETSI Telecomunicación, Valladolid, SPAIN |
|
Emma Munoz-Moreno
emunmor lpi.tel.uva.es
|
PhD Student |
ETSI Telecomunicación, Valladolid, SPAIN |
|
Susana Merino Caviedes
smercav lpi.tel.uva.es
|
PhD Student |
ETSI Telecomunicación, Valladolid, SPAIN |
|
Miguel Angel Martin Fernandez
migmar tel.uva.es
|
PhD Student |
ETSI Telecomunicación, Valladolid, SPAIN |
|
Guldem Kucuk
guldemk istanbul.edu.tr
|
MS Student |
Istanbul University |
|
Neslihan Avcu
avcuneslehan yahoo.com
|
MS Student |
Dokuz Eylül University |
|
Erkin Tekeli
erkin.tekeli boun.edu.tr
|
PhD Student |
BUMM, Vavlab, Bogaziçi University |
|
|
|
E-mail list address:
enterfacedti@googlegroups.com
(You may use enterface07all listeci.cmpe.boun.edu.tr to send e-mail to all participants.
If you have any problems, please contact arman.savran boun.edu.tr)
|
2 |
Advanced Multimodal Interfaces for Flexible Communications
Description
Team
E-mail
Schedule Expand
Collapse
|
|
Project Description :
Utilizing any of the most common IP multi-communication
clients available, we would like to implement the interface of an asymmetric
communication channel. The students will design and integrate the blocks
that will permit automatic media translation. This media translation will
allow users to communicate using the best media option for them even if
it is not the best option for their interlocutor (the translator will adapt
the content accordingly). Goal
This project aims at developing a prototype that will exemplify how multimodal
services will enhance the way we will communicate in the future. Utilizing
one of the most common communication IP clients (Skype) as a platform, students
will develop an application whose interface will be plastic in as many modalities
as possible. The designed plasticity will not only enable adapting the application
to a PC and a PDA at the same time, but also to the user current environment.
Students will develop and/or integrate media translators or adapters available
during communication. These media adapters will allow communications to
be asymmetric. Asymmetric communication will allow any user to choose how
to communicate depending on his own status (e.g. can/can’t speak – video
available/unavailable) regardless of the other interlocutor’s choice. Following
the Next Generation Network standards and architecture, students will implement
the media adapters as services. They will develop a client application that
will combine those services and permit offering the best communication option
to all users. Workpackages
- Study and design of the interface: choice of graphical adaptation,
modality translations, etc.
- Development and integration of the text-to-speech adaptation and the
speech-to-text adaptation.
- Integration of a language translator to be introduced in the text
and speech adaptation service
- Development and integration of a video-speech adaptation and speech-video
adaptation (avatar).
- Programming of the prototype for PC and PDA to simulate how communication
would work.
- User tests
- Report
Deliverables
- Prototype
- Report
- Document with the design of the multimodal adaptation and how the
services should be integrated in a communications network.
Background
- Skype developer zone web site. | www
- TISPAN (ETSI) in charge of developing the standards for NGN | www
NGN RELEASE 1. ETSI TR 180 001
NGN generic capabilities and their use to develop services. ETSI TR
181 004
- 3GPP, in charge of developing of the IMS architecture standardization
| www
IMS release 6
- IETF, in order to define most of the actual used protocols for communications
over IP. | www
|
Ana C. Andrés
ana.c.andresdelvalle accenture.com
|
LEADER/SENIOR Researcher |
Accenture Technology Labs |
|
Allasia Jérôme
jerome.allasia irisa.fr
|
Researcher |
IRISA |
|
Ionut Petre
ipetre ici.ro
|
Researcher |
Research Institute for Informatics ICI Bucharest |
|
Saeed Usman
saeed eurecom.fr
|
PhD Student |
Institute Eurecom |
|
Nicolau Dragos
dragos ici.ro
|
Researcher |
National Research and Development Institute for Informatics-Bucharest, Romania |
|
Dragos Catalin Barbu
dbarbu ici.ro
|
MS Student |
Research Institute for Informatics ICI Bucharest |
|
Radut Valentin
vradut ici.ro
|
Researcher |
Research Institute for Informatics, Bucharest, Romania |
|
Jerome Urbain
jerome.urbain fpms.ac.be
|
PhD Student |
Belgium |
|
|
|
E-mail list address:
enterface07p2 listeci.cmpe.boun.edu.tr
(You may use enterface07all listeci.cmpe.boun.edu.tr to send e-mail to all participants.
If you have any problems, please contact arman.savran boun.edu.tr)
|
3 |
A Multimodal Framework for the Communication of Disabled
Description
Team
E-mail
Schedule Expand
Collapse
|
|
Project Description :
Objective
This project aims to build a multimodal framework that combines visual,
aural and haptic interaction with gesture-speech-text recognition, speech
synthesis and sign language recognition and synthesis, in order to enable
the communication of people exhibiting different kinds of disabilities.
This project will use tools from the Project 2 and Project 3 of the eNTERFACE
2006 workshop. Additionally, the project aims at constructing a cross-modal
transformation framework, which will be able to combine all the modalities
from an individual, perform recognition of the transmitted message, and
translate it into another form that is perceivable by the receiver. The
project will focus on exploiting the correlation between modalities in
order to enhance the perceivable information by an impaired individual
who cannot perceive all incoming modalities. A collaborative VR game and
a multimodal video of news for the hearing impaired will be used as two
application environments.
Background
This project is based on Project 2 and Project 3 of eNTERFACE’06. In
project 2: “Multimodal tools and interfaces for the intercommunication
between visually impaired and “deaf and mute” people”, the system provided
alternative tools and interfaces to blind and deaf-and-mute persons so
as to enable their intercommunication as well as their interaction with
the computer. The proposed application integrates haptics, audio, visual
output as well as computer vision, sign language analysis and synthesis,
speech recognition and synthesis, in order to provide an interactive environment
where the blind and deaf and mute users can collaborate. In Project 3:
Sign Language Tutoring Tool”, a tutoring tool is developed for interactive
sign language education. The users can watch pre-recorded sign videos
and practice the signs and receive automatic feedback about the quality
of their performance. The application integrates manual and non-manual
sign language recognition, sign synthesis in an interactive and educative
environment for deaf and mute.
The proposed project will develop/improve tools for multimodal communication:
continuous sign/speech segmentation and recognition, sliding text recognition,
cued speech recognition, and speech and sign synthesis. These tools will
then be used to develop a modality replacement framework. The basic idea
is that a modality, which would not be perceived due to a specific disability,
can be employed to improve the information that is conveyed in the perceivable
modalities and increase the accuracy rates of recognition. The correlations
between modalities will be explored and the framework will be integrated
in two environments:
- A treasure hunting game application that is jointly played by the
blind and deaf-and-mute user by developing a modality replacement framework
for the unconstrained communication between blind and deaf-mute people
and by modeling the virtual environment using “smart” objects that will
also include information about their possible interaction mechanisms
with the users.
- Speech and text aided sign segmentation on Broadcast News videos
for the hearing impaired. In news for the hearing impaired, the speaker
also signs with the hands as she talks. On top of this, there is also
corresponding text superimposed on the video. The aim is to use modalities
with less noise (speech and/or text) to segment/detect the modalities
with noisy signals (sign). The aim is to segment and annotate the signs
in the videos via the help of either the speech or both the speech and
the text and to generate segmented, and annotated sign videos to be
used in the Sign Language Tutor application. The annotated sign data
that will be collected in this project will be integrated to Sign Language
Tutoring tool and will provide a huge amount of training signs for the
users.
Technical Description
The goals of the project are the following:
- To study the modalities and their characteristics better perceived
by the disabled users.
- To build and tune an information-theoretic framework on cross modal
transformations especially for the intercommunication between blind
and “deaf and mute” people.
- To develop efficient mechanism for multimodal replacement through
the communication channels of the terminals and the collaborative virtual
environment.
Speech: For processing speech modality, automatic speech recognition
techniques will be used for speech to text conversion. For the news videos,
since the noise is high and the vocabulary is large, techniques that increase
the utterance retrieval rate [1] must be used. This step will also provide
the start and end frames of each spoken word.
Sliding text: In addition to speech modality, the sliding text
will be processed with OCR techniques. The information extracted from
this modality is expected to be the same with the speech modality. Thus,
the results can be corrected by using both modalities to provide accurate
information for segmentation and annotation.
Sign: By using the information extracted from the speech and
sliding text modalities, the signs will be segmented and annotated. For
this purpose, the segmentation of spoken word can aid sign segmentation
and that sign can be annotated by the spoken word. The annotated signs
will form a new sign database after consistency and clustering analysis.
Cued Speech: Cued speech [2] is a specific gestural language
(different from the sign language) used for communication between hearing
impaired people and other people and consists of a combination of lip
shapes and gestures. Thus, the transmitted message is contained into three
modalities: audio, lip shapes, and hand shapes. The fact that hand shapes
are made near the face and also that the exact number and orientation
of fingers has to be determined in order to deduce the correct gesture
differentiate Cued Speech from sign language.
Coupled Hidden Markov Models (CHMM) [3] will be employed to model the
inter-dependencies and the asynchronous nature of different modalities.
Given the strict demands for real-time processing, the project will be
totally developed in C++. However, for the feature extraction process
other programming environments (e.g. Matlab) may be also considered as
a reference point.
The target deliverables of the project are
- A modality replacement framework that will be integrated in the treasure
hunting game and hearing impaired news videos so as to allow the unconstrained
(up to a certain degree) communication of blind and deaf-mute people.
- A new sign database formed with the segmented and annotated signs
from the news recordings.
Workpackages
WP1: Pre-workshop preparations: Collection of news videos, preliminary
discussion on the architecture, software tools, etc.
WP2: Design of the architecture of the collaborative environment, the
terminals and the modalities used in each terminal: speech, sign, text,
cued speech
WP3: Definition and tuning of the information-theoretic framework on cross
modal transformations
WP4: Integration and synchronization of the interfaces into the collaborative
virtual environment.
WP5: Integration of the framework with the treasure hunting game:
- Development of the gesture-based interface for the terminal that
the “deaf and mute” persons will use and the speech and haptics-based
interface for the terminal that the visually impaired persons will use.
- Extension of the game-like application so as to employ all novel technologies.
WP6: Integration of the framework with the Hearing Impaired news videos
- Sign segmentation and alignment to provide annotated sign data
- Unsupervised consistency checking and clustering of sign data
- Extension of Sign Language Tutor with the new data
Participant requirements
- Programming experience (preferably C/C++)
- Multimodal signal processing
References
[1] Murat Saraclar and Brian Roark. Utterance classification with discriminative
language modeling. Speech Communication, 48(3-4):276-287, March-April
2006.
[2] P. Duchnowski, D. Lum, J. Krause, M. Sexton, M. Bratakos, and L. Braida,
“Development of Speechreading Supplements Based on Automatic Speech Recognition,”
IEEE Trans. on Biomedical Engineering, vol. 47, no. 4, pp. 487–496, 2000.
[3] T. Kristjansson, B. Frey, and T. Huang, “Event-coupled hidden Markov
models,” IEEE International Conference on Multimedia and Expo, ICME2000,
vol. 1, 2000.
|
Dimitrios Tzovaras
Dimitrios.Tzovaras iti.gr
|
LEADER/SENIOR Professor |
Telematics Institute, Centre for Research and Technology Thessaloniki, Greece |
|
Lale Akarun
akarun boun.edu.tr
|
LEADER/SENIOR Professor |
BUMM, Bogaziçi University |
|
Murat Saraçlar
murat.saraclar boun.edu.tr
|
LEADER/SENIOR Professor |
BUMM, Bogaziçi University |
|
Giovanna Varni
giovanna infomus.dist.unige.it
|
PhD Student |
Universite di Genova-DIST-InfoMus Lab |
|
Siddika Parlak
siddika.parlak gmail.com
|
MS Student |
Department of Electrical-Electronic Engineering, Bogazici University |
|
Konstantinos Moustakas
moustak iti.gr
|
PhD Student |
Informatics and Telematics Institute / Centre for Research and Technology Hellas |
|
Byungjun Kwon
byungjun gmail.com
|
MS Student |
Koninklijke Conservatorium |
|
Alexey Karpov
karpov iias.spb.su
|
Researcher |
St. Petersburg Institute for Informatics and Automation |
|
Deniz Kahramaner
dennizk gmail.com
|
Programmer |
Robert College |
|
Marek Hruz
mhruz kky.zcu.cz
|
PhD Student |
Department of Cybernetics, University of West Bohemia in Pilsen, Czech Republic |
|
Pavel Campr
campr kky.zcu.cz
|
PhD Student |
Department of Cybernetics, University of West Bohemia in Pilsen, Czech Republic |
|
Savvas Argyropoulos
savvas iti.gr
|
PhD Student |
Informatics and Telematics Institute |
|
Erinç Dikici
erincdikici yahoo.com
|
MS Student |
Bogazici University, Department of Electrical-Electronic Engineering |
|
Ismail Ari
ismailar boun.edu.tr
|
MS Student |
Bogazici University, Computer Engineering Department |
|
Oya Aran
aranoya boun.edu.tr
|
PhD Student |
Bogazici University |
|
Harun Karabalkan
karabalkan su.sabanciuniv.edu
|
MS Student |
Vision and Pattern Analysis Laboratory, Sabanci University |
|
|
|
E-mail list address:
enterface07p3 listeci.cmpe.boun.edu.tr
(You may use enterface07all listeci.cmpe.boun.edu.tr to send e-mail to all participants.
If you have any problems, please contact arman.savran boun.edu.tr)
|
4 |
Multimodal Speaker Identity Conversion
Description
Team
E-mail
Schedule Expand
Collapse
|
|
Project Description :
Objective
The goal of this project is to perform high quality identity transformation
on audiovisual recordings from a source speaker A into another one, the
target speaker B. Transformation is achieved using both voice conversion
technique and video morphing (or avatar controlling). The result will
be a set of audiovisual files of the target speaker B speaking with his/her
own voice and acting like A does. This project follows the project 4 of
eNTERFACE’06 Multimodal Character Morphing [1]. Conclusions of last year
project highlighted various ideas to improve the quality of the multimodal
conversion.
Background
Voice conversion
The audio part of this project is the well-known problem of voice conversion.
From a complete corpus of a voice B and a limited corpus of a voice A,
we want to drive B’s speech using A’s voice. Numerous works have been
developed in the last few years on this subject. The main part of them
is based on the Gaussian Mixture Model [4 - 8] as introduced by Stylianou
et al. in [2]. Although those algorithms permit to obtain voices very
similar to the target, there is still a lack in audio quality. One important
technological constraint is related to the availability of parallel data
for the source and target speakers. A recent trend is to use a unit selection
framework for voice conversion, in which case no parallel data is required
[14].
The voice conversion system used in the previous eNTERFACE workshop
is built on a GMM-based mapping function between source and target spectral
envelopes followed by a frame selection algorithm to produce final spectral
envelopes and LP Analysis/Synthesis [1,15].
Video morphing
The video part of this project is exactly the same problem but with
a 2D/3D dataset instead of the 1D speech content. Starting from video
samples (face and shoulders) of A (minutes, hours ?) and some pictures
of B (or a few video samples ?), we want to get the same video but with
B acting instead of A. A 3D model of speaker B will be created [11] and
animated following the face movements of speaker A [10,12,13]. In project
4 of eNTERFACE’06, we just considered animation of an already built avatar,
without having to build an avatar corresponding to the target speaker
face.
Technical Description
During the eNTERFACE’06 workshop, a dedicated audiovisual database has
been built (eNTERFACE06_arctic database [16]). In addition, recordings
will be necessary if real video morphing is performed (at least pictures
of the target speaker are necessary to build his/her 3D model).
One part of the team will develop the voice conversion software, starting
from last eNTERFACE workshop system and/or from other established techniques.
Many clues to improve the system built last year are given in [1] and
[15] (F0 mapping, weighted euclidean distance or other kind of distance
to compute target and concatenation costs in the frame selection part,
OLA method with the problem of phase discontinuities, pitch synchronous
methods). Among other things, the lack of audio quality in voice conversion
is due to the problem of correct separation of the source and the vocal
tract in speech. Some trials have been reported in the literature to process
separately the vocal tract and the glottis flow (e.g. [3]). Another approach
of source/filter separation for speech has been presented in [9]. Such
ideas could be incorporated in this project.
The second part of the team will work on the animation of 3D faces.
We will develop models of the lip movements and facial animations that
will be coherent with the speech pronounced and the emotion expressed.
These models will be trained using the database and should allow us to
animate any 3D model of the selected participants.
Both teams will have to work together, particularly for the synchronization
of speech/lips/expressiveness, to insure that the results can be merged
correctly to get the converted audiovisual files. More collaboration could
be necessary if the use of multimodal features is concerned.
The project should result in a set of audiovisual files with a target
speaker (face and voice) saying what the source speaker says and moving
like him/her.
Equipment and software needed: whiteboard, large room with
network, Matlab, C,C++ compilers, eNTERFACE’06 multimodal character morphing
software and database
Workpackages
- Pre-workshop preparation *
- Voice analysis/resynthesis (using one voice database, extracting
10 sentences and re-synthesizing these 10 sentences using the remaining
database)**
- Face analysis/resynthesis (using one person model, using 10 video
sequence and re-synthesizing these 10 video sequences) ***
- Voice conversion (requires database from another speaker) **
- Face movements conversion (requires database from another speaker)
***
- Multimodal conversion * (*) all the team, (**) 3 persons in speech
synthesis, voice conversion, speech analysis, (***) 2 persons in 3D
face modelling, animation and gesture analysis
References
[1] Dutoit, T., Holzapfel, A., Jottrand, M., Marqués, F., Moinet, A.,
Ofli, F., Stylianou, Y., “Multimodal Speaker Conversion — his master’s
voice. . . and face —“, eNTERFACE workshop 2006
[2] Stylianou, Y., Cappe, O. and Moulines, E., "Continuous probabilistic
transform for voice conversion", IEEE Trans. Speech & Audio processing,
vol. 6,pp 131-142, 1998
[3] Suenderman , D., Bonafonte, A., Ney, H., Hoege, H., "A Study on Residual
Prediction Techniques for Voice Conversion", ICASSP 2005
[4] Ye, H., Young, S., "Perceptually Weighted Linear Transformation for
Voice Conversion", Eurospeech 2003
[5] Ye, H., Young, S., "High Quality Voice Morphing", ICASSP 2004
[6] Ye, H., Young, S., "Voice conversion for unknown speakers", ICSLP
2004
[7] Chen, Y., Chu, M., Chang, E., Liu, J., Liu, R., "Voice Conversion
with Smoothed GMM and MAP Adaptation", Eurospeech 2003
[8] Qin, L., Chen, G., Ling, Z., Dai, L., "An Improved Spectral and Prosodic
Transformation Method in STRAIGHT-based Voice Conversion", ICASSP 2005
[9] Bozkurt, B., Doval, B., d`Alessandro, C., Dutoit, T., "Zeros of Z-Transform
(ZZT) decomposition of speech for source-tract separation", ICSLP 2004
[10] Ezzat, T., Geiger, G., Poggio, T., "Trainable Videorealistic Speech
Animation", Proc. of SIGGRAPH 2002
[11] Blanz, V., Vetter, T., "A Morphable Model for the Synthesis of 3D
Faces", SIGGRAPH 99
[12] Noh, J.Y., Neumann, U., "Expression cloning", SIGGRAPH 2001
[13] Pyun, H., Kim, Y., Chae, W., Woo Kang, H., Yong Shin, S., "An example-based
approach for facial expression cloning", SIGGRAPH 2003
[14] Sünderman, D., Bonafonte, A., Ney, H., Höge, H., “A first step towards
text-independent voice conversion”. ICSLP 2004
[15] Dutoit, T., Holzapfel, A., Jottrand, M., Moinet, A., Pérez, J., Stylianou,
Y., « Towards a voice conversion system based on frame selection » to
be published in proceedings of ICASSP 2007 |
Thierry Dutoit
thierry.dutoit fpms.ac.be
|
LEADER/SENIOR Professor |
Faculte Polytechnique de Mons (FPMs), BELGIUM |
|
Yannis Stylianou
styliano ics.forth.gr
|
LEADER/SENIOR Professor |
University of Crete, Heraklion |
|
Ferran Marqués
ferran gps.tsc.upc.edu
|
LEADER/SENIOR Professor |
Universitat Politècnica de Catalunya, Spain |
|
Igor Pandzic
igor.pandzic fer.hr
|
LEADER/SENIOR Professor |
Department of Telecommunications, University of Zagreb, Croatia |
|
Murat Saraçlar
murat.saraclar boun.edu.tr
|
LEADER/SENIOR Professor |
BUMM, Bogaziçi University |
|
Maria Markaki
mmarkaki csd.uoc.gr
|
PhD Student |
University of Crete |
|
Kristina Stankovic
kristina.stankovic fer.hr
|
BS Student |
Faculty of Electrical Engineering and Computing, University of Zagreb, Croatia |
|
Jottrand Matthieu
matthieu.jottrand fpms.ac.be
|
PhD Student |
Faculté Polytechnique de Mons (FPMs) |
|
Thanasis Krontiris
TDK.krontir gmail.com
|
BS Student |
COMPUTER SCIENCE DEPARTMENT , UNIVERSITY OF CRETE |
|
Maria Astrinaki
Astrinaki.Maria gmail.com
|
BS Student |
Computer Sience Department, University Of Crete |
|
Zara Aurélie
aurelie.zara orange-ftgroup.com
|
PhD Student |
France Télécom R&D/ LIMSI-CNRS |
|
Elias Apostolopoulos
ilapost csd.uoc.gr
|
BS Student |
UNIVERSITY OF CRETE, COMPUTER SCIENCE DEPARTMENT |
|
Zeynep Inanoglu
zeynep gatesscholar.org
|
PhD Student |
University of Cambridge |
|
|
|
E-mail list address:
enterface07p4 listeci.cmpe.boun.edu.tr
(You may use enterface07all listeci.cmpe.boun.edu.tr to send e-mail to all participants.
If you have any problems, please contact arman.savran boun.edu.tr)
|
5 |
Audio-Driven Human Body Motion Analysis and Synthesis
Description
Team
E-mail
Schedule Expand
Collapse
|
|
Project Description :
Objective
This project is on multicamera audio-driven human body motion analysis
towards automatic and realistic audio-driven avatar synthesis. We plan
to address this problem in the context of a dance performance, where the
gestures or the movements of a human actor are mainly driven by a musical
piece. We will analyze the relations between the audio (music) and the
body movements on a training video sequence acquired during the performance
of a dancer. The joint analysis will provide us with a correlation model
that can be used to animate a dancing avatar when driven with any musical
piece of the same genre.
Background
There exists almost no prior research work reported on the problem of
audio-driven human body motion analysis and synthesis. The most relevant
literature is on speechdriven lip animation [1]. Since lip movement is
physiologically tightly coupled with acoustic speech, it is relatively
an easy task to find a mapping between the phonemes of speech and the
visemes of lip movement. Many schemes exist to find such audio-tovisual
mappings among which the HMM (Hidden Markov Model)-based techniques are
the most common as they yield smooth animations exploiting temporal dynamics
of speech. Some of these works also incorporate synthesis of facial expressions
along with the lip movements to make animated faces look more natural
[2-4]. There are several challenges involved in audio-driven human body
motion analysis and synthesis: First, there does not exist a well-established
set of elementary audio and motion patterns, unlike phonemes and visemes
in speech articulation. Second, body motion patterns (e.g. dance figures)
are person dependent and open to interpretation, and may exhibit variations
in time even for the same person. Third, audio and body motion are not
physiologically coupled and the synchronicity in between may exhibit variations.
Moreover, motion patterns may span time intervals of different length
with respect to its audio counterparts. A very recent work [5] addresses
the challenges similar to those mentioned above in the context of prosody-driven
head gesture synthesis, using a multi-stream parallel HMM structure to
find the jointly recurring gesture-prosody patterns and the corresponding
audio-to-visual mapping. We think that the framework proposed in this
work can also be applied to our problem.
Technical Description
The whole analysis-synthesis system will consist of four main tasks as
explained briefly in the sequel:
Body motion capture and feature extraction: This involves automated capture
of body motion from multiview video recorded by a multicamera system (available
in MVGL Lab at Koç University). The motion capture process will be based
on 3D tracking of the markers attached to the person’s body in the scene.
We will fit a generic 3D skeleton model to detect and track markers. We
will make use of the multistereo correspondence information from multiple
cameras to obtain 3D positions of the markers. This task will provide
us with a set of features of 3D point locations over time that expresses
the alignment of the markers in 3D world. All the executables related
to the body motion capture will be developed on C/C++.
Audio feature extraction: An appropriate set of features will
be extracted from the audio signal that is synchronized with the body
motion parameters. The mel frequency cepstral coefficients (MFCC) along
with additional prosodic features can be considered as audio features.
Audio feature extraction will be performed using the well known HTK Tool.
Multimodal analysis: The feature sets resulting from body motion
and audio will jointly be analyzed to model the correlation between audio
patterns and body motion patterns. For this purpose, we plan to use a
two-step HMM-based unsupervised analysis framework as proposed in [5].
At the first step, the audio and motion features will separately be analyzed
by a parallel HMM structure to learn and model the elementary patterns
for a particular performer. A multi-stream parallel HMM structure will
then be employed to find the jointly recurring audio-motion patterns and
the corresponding audio-to-visual mapping. All the simulations at this
second step will be implemented by using the HTK Toolkit.
Synthesis and animation: The body motion synthesis system will
take an audio signal as an input and produce a sequence of body motion
features, which are correlated with the input audio. The synthesis will
be based on the HMM-based audio-body motion correlation model derived
from the multimodal analysis. The synthesized body motion will then be
animated on an avatar.
Workpackages
- Acquisition of the calibrated multicamera audiovisual data (prior
to workshop)
- Tracking and capturing skeleton-based body motion parameters (week
1)
- Extraction of body motion features (week 1)
- Extraction of audio features (week 1)
- Individual analysis and temporal clustering of recurrent body motion
and audio patterns (week 2)
- Joint audio-body motion analysis and correlation modeling of concurrent
audio-body motion patterns (week 3)
- Audio-driven body motion synthesis and animation (weeks 3 and 4)
Benefits of the research
In this research work, we will first develop an automated human body
motion capture system based solely on image processing and computer vision
tools using standard digital video cameras. Second we will provide a framework
for joint analysis of loosely correlated modalities such as motion and
audio and demonstrate how this framework can be used for audio-driven
motion synthesis.
Deliverables
- A multicamera motion capture system software
- Executables/Scripts for multimodal analysis
- Report and demonstration
References
[1] T. Chen, “Audiovisual speech processing,” IEEE Signal Processing
Mag., Vol. 18, pp. 9–21, 2001.
[2] C. Bregler, M. Covell, and M. Slaney, “Video rewrite: Driving visual
speech with audio,” Proc. ACM SIGGRAPH ’97, pp. 353–360, 1997.
[3] M. Brand, “Voice puppetry,” Proc. of the 26th annual conference on
Computer graphics and interactive techniques, pp. 21–28, 1999.
[4] Y. Li and H.-Y. Shum, “Learning dynamic audio-visual mapping with
input output hidden markov models,” IEEE Trans. on Multimedia, vol. 8,
no. 3, pp. 542–549, 2006.
[5] M.E. Sargin, E. Erzin, Y. Yemez, A.M. Tekalp, A.T. Erdem, C. Erdem,
and M. Ozkan, “Prosody-driven head-gesture animation,” accepted for publication
in Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing:
ICASSP 2007.
[6] J. K. Aggarwal and Q. Cai, “Human motion analysis: A review,” Computer
Vision and Image Understanding: CVIU, vol. 73, no. 3, pp. 428–440, 1999.
[7] S. Yonemoto, A. Matsumoto, D. Arita, and R.-I. Taniguchi, “A real-time
motion capture system with multiple camera fusion,” Proc. IEEE Int. Conf.
on Image Analysis and Processing: ICIAP, 1999, pp. 600–605. |
Ferda Ofli
fofli ku.edu.tr
|
LEADER/SENIOR MS Student |
Koc University |
|
Lale Akarun
akarun boun.edu.tr
|
LEADER/SENIOR Professor |
BUMM, Bogaziçi University |
|
Tanju Erdem
terdem momentum-dmt.com
|
LEADER/SENIOR Researcher |
Momentum Technologies |
|
Murat Tekalp
mtekalp ku.edu.tr
|
LEADER/SENIOR Professor |
Koc University |
|
Engin Erzin
eerzin alm.ku.edu.tr
|
LEADER/SENIOR Professor |
Koc University |
|
Yücel Yemez
yyemez ku.edu.tr
|
LEADER/SENIOR Professor |
Koc University |
|
Yasemin Demir
ydemir ku.edu.tr
|
MS Student |
Koc University |
|
Elif Bozkurt
ebozkurt momentum-dmt.com
|
Researcher |
Momentum Digital Media Technologies |
|
Cristian Canton-Ferrer
ccanton gps.tsc.upc.edu
|
PhD Student |
Technical University of Catalonia |
|
Tilmanne Joelle
joelle.tilmanne fpms.ac.be
|
PhD Student |
Faculté Polytechnique de Mons |
|
Idil Kizoglu
idilkizoglu yahoo.com
|
BS Student |
Bogazici University |
|
Koray Balci
koraybalci gmail.com
|
PhD Student |
Bogazici Universitesi |
|
|
|
E-mail list address:
enterface07p5 listeci.cmpe.boun.edu.tr
(You may use enterface07all listeci.cmpe.boun.edu.tr to send e-mail to all participants.
If you have any problems, please contact arman.savran boun.edu.tr)
|
6 |
Event Recognition for Meaningful Human-Computer Interaction in a Smart Environment
Description
Team
E-mail
Schedule Expand
Collapse
|
|
Project Description :
Objective
The localization and recognition of spatio-temporal events are problems
of great theoretical and practical interest. Two specific scenarios are
nowadays of special interest: home environment and smart rooms. In these
scenarios, context awareness is based on technologies like gesture and
motion segmentation, unsupervised learning of human actions, determination
of the focus of attention or intelligent allocation of computational resources
to different modalities. All these technologies pose interesting and difficult
research questions. In home environments it is interesting to use low-cost
sensor equipment connected to a home computer for activity monitoring
and helping the user in various settings. In smart rooms, more sophisticated
equipment is usually used, allowing for more robust applications. We would
like to explore interconnected issues in several simple but real scenarios,
where different sensors monitor an environment and their input is used
for intention sensing, authentication and analysis of interactions of
individuals.
Background
The project aims at two goals: a) implement a small-scale, hierarchical
biometrics system using audio, vision, and RFID modalities and b) analyze
the different interactions and focus of attention of individuals doing
a task. For the proposed scenario, the computer controls a door. The first
aim of the system is to recognize persons authorized to open the door.
For this purpose, the visual (face) and audio (voice) inputs are matched
against a small database. The project involves the fusion of both modalities.
In a typical scenario, the users will move about in front of the door.
The system will recognize the movement behaviours that lead to attempts
of opening the door, before activating the authentication sequence. These
behaviours can be hardcoded, or learned in an unsupervised manner.
Different sensor settings can be used: In the first one, two low-cost
cameras, and a microphone are primarily employed to collect input. RFID
tags will be attached to objects for additional sensing capabilities.
The sensors feed their input to a computer that monitors the scene and
controls various environmental parameters. In the second setting, a similar
scenario is analyzed using more sophisticated sensor equipment. Recordings
from a smart room, using multiple cameras and microphones will be provided.
Algorithms can be tested in both scenarios to perform comparative analysis.
In a multi-sensory environment, supported with embedded computer technology,
the system can capture and interpret what the users are doing and assist
or collaborate with the users in real-time. Such an environment should
be aware of users’ intentions, tasks and feelings, and allow people to
interact with the environment in a natural way: by moving, pointing and
gesturing (Tangelder et al., 2005). The proposed team has experience in
modeling such an environment (Tangelder et al., 2005), in gesture recognition
(Aran and Akarun, 2006, Canton-Ferrer et al. 2005), in resource conscious
face recognition (Salah et al. 2002), and multimodal fusion (Gökberk et
al., 2005, Luque et al., 2006).
There are many possible applications. Tracking babies, kids, or elderly
people for particular events, intrusion detection, gesture or speech based
controlling of environmental parameters (e.g. lights, audio volume of
the TV set, etc.) can be implemented. The aim of the project is to implement
the tools as black-box modules that would allow straightforward application
to flexible scenarios.
Technical Description
- The database: The database will partly be collected prior to the Workshop
to facilitate the implementation of the separate modules.
- Attention deployment: Monitorization of several modalities for "events”,
defined or learned during the training phase. Coarse-to-fine processing
for events is proposed.
- Gesture and pose analysis: A previously implemented HMM-based gesture
recognition module will be adapted.
- Face detection: The Viola-Jones face detector will be employed.
- Multimodal person ID: A face recognition system is currently being
developed in BU. UPC has also face and speech recognition systems, and
a multimodal fusion scheme. Partners with experience on these areas
may contribute their own systems.
Workpackages
- WP-1: Data collection and proposal of technological modules and software
platform (pre-workshop). Additional data collection during the workshop
- WP-2: Person Identification. Face and speaker ID. Multimodal fusion.
- WP-3: Gesture & pose analysis.
References
- Aran, O., L. Akarun, "Recognizing Two Handed Gestures with Generative,
Discriminative and Ensemble Methods via Fisher Kernels", Int. Workshop
on Multimedia Content Representation, Classification and Security, 2006.
- Gökberk, B., A.A. Salah, L. Akarun, "Rank-based Decision Fusion
for 3D Shape-based Face Recognition," Int. Conf. Audio- and Video-Based
Biometric Person Authentication, LNCS 3546 pp.1019-1028, Springer Verlag,
2005.
- Salah, A.A., E. Alpayd?n, L. Akarun, "A Selective Attention Based
Method for Visual Pattern Recognition with Application to Handwritten
Digit Recognition and Face Recognition," IEEE Trans. Pattern Analysis
and Machine Intelligence, Vol.24, No.3, pp. 420-425, 2002.
- Tangelder, J.W.H., Ben A.M. Schouten, Stefan Bonchev, "A Multi-Sensor
Architecture for Human-Centered Smart Environments," Proceedings
CAID&CD 2005 Conference.
- Tangelder, J.W.H., Ben A.M. Schouten, "Sparse face representations
for face recognition in smart environments" International Conference
on Pattern Recognition (ICPR 2006), Hong Kong, August 20-24, 2006.
- C. Canton-Ferrer, J. R. Casas, M. Pardàs. “Human Model and Motion
Based 3D Action Recognition in Multiple View Scenarios”. European Signal
Processing Conference (EUSIPCO) 2006.
- J. Luque, R. Morros, A. Garde, J. Anguita, M. Farrus, D. Macho, F.
Marqués, C. Martínez, V. Vilaplana, J. Hernando Audio, Video and Multimodal
Person Identification in a Smart Room CLEAR 2006, Lecture Notes in Computer
Science, Springer-Verlag, Berlin Heidelberg, 2007
- A. Abad, C. Canton-Ferrer, C. Segura, J.L. Landabaso, D. Macho, J.R.Casas,
J. Hernando, M. Pardàs, C. Nadeu. UPC Audio, Video and Multimodal Person
Tracking Systems in the CLEAR Evaluation Campaign. CLEAR 2006, Lecture
Notes in Computer Science, Springer-Verlag, Berlin Heidelberg, 2007
- C. Segura, C. Canton-Ferrer, A. Abad, J.R. Casas, J.Hernando. Multimodal
Head Orientation Towards Attention Tracking in SmartRooms. IEEE International
Conference on Acoustics, Speech and Signal Processing (ICASSP), Honolulu
(USA), April 16-20, 2007.
|
Lale Akarun
akarun boun.edu.tr
|
LEADER/SENIOR Professor |
BUMM, Bogaziçi University |
|
Ben Schouten
bens cwi.nl
|
LEADER/SENIOR Professor |
CWI, Amsterdam |
|
Ramon Morros
morros gps.tsc.upc.edu
|
LEADER/SENIOR Professor |
UPC |
|
Albert Ali Salah
salah boun.edu.tr
|
LEADER/SENIOR Researcher |
CWI, Amsterdam |
|
Cem Keskin
keskinc cmpe.boun.edu.tr
|
PhD Student |
Bogazici University |
|
Onkar Ambekar
onkar.ambekar cwi.nl
|
PhD Student |
Centrum voor Wiskunde en Informatica(CWI) |
|
Jordi Luque Serrano
luque tsc.upc.edu
|
PhD Student |
Technical University of Catalonia |
|
Carlos Segura Perales
csegura gps.tsc.upc.edu
|
PhD Student |
Technical University of Catalonia |
|
Ceren Kayalar
ckayalar su.sabanciuniv.edu
|
PhD Student |
Computer Graphics Laboratory (CGLAB) / Sabanci University |
|
|
|
E-mail list address:
enterface07p6 listeci.cmpe.boun.edu.tr
(You may use enterface07all listeci.cmpe.boun.edu.tr to send e-mail to all participants.
If you have any problems, please contact arman.savran boun.edu.tr)
|
7 |
3D Face Recognition Performance under Adversorial Conditions
Description
Team
E-mail
Schedule Expand
Collapse
|
|
Project Description :
The purpose of this project is to develop a
3D face biometric interface that is robust for unintentional and malicious
subject behaviors that compromise the identification reliability.
A cooperative subject exposes his face in a still position in front of
the scanner, has a frontal pose, and avoids extreme expressions and any
occluding material. However, a subject, aware of 3D person identification
cameras, may try to eschew being recognized by posing awkwardly, and worse
still, by resorting to occlusions via dangling hair, eyeglasses, facial
hair et. In this project, we will model attempts to invalidate 3D face
recognition, and any other effort to mislead the system or to induce a
fake character.
To this effect, we will capture 3D face data imitating difficult surveillance
conditions and non-cooperating subjects, trying various realistic but
effective occlusions and poses. We expect to collect a database of about
100 people in various poses, expressions and occlusion conditions using
Inspeck Mega Capturor II. Using this database, we will test the performance
of 3D face identification algorithms. As a byproduct of the project, we
will also develop a recognition algorithm by parts, that is, person recognition
based on partial 3D evidence.
Workpackages:
- Data collection by 3D scanner from approximately 100 people, from
among eNTERFACE participants and BÜ students. If possible, data must
be collected twice with a lapse of time. The protocol will be defined
before the start of the project. The following strategies are envisioned:
- Various poses: Frontal view, various pan/tilt/rotate angles (-90
degrees to +90 degrees with 30 degrees of interval)
- Various exaggerated expressions: Grin, anger, puffing cheeks,
sulking, cheek wrinkling, eyes closed …, random expressions
- Various occlusions: Mouth is hidden behind a scarf, eyes or other
parts of the face hidden behind hair, eyes hidden by eye glasses,
moustache and beard. (we plan to provide false beard and moustache)
Outputs: 3D data for various conditions listed
above from 100 subjects. (Nearly 20 scans for each subject)
Duration: 2 weeks (Week 1 and Week 2) (at least
half an hour for each subject)
- Preprocessing of scanner output:
- Some available tools will be used for noise removal and hole filling.
The tools and algorithms should be defined before the project starts.)
Inputs: Output data from Stage 1, and available algorithms.
Outputs: Cleaned data.
Duration: 2 weeks concurrently with the data collection
effort (Week 1 and Week 2)
- Face facial feature localization, 3D face segmentation and integration
of multiple views from 3D scanner outputs. We assume that the face has
been correctly localized:
- 3D fiducial point localization algorithms will be developed possibly
based on the existing BÜ software. These algorithms are already being
tested on well-known databases (FRGC). However, 3D faces collected
during the workshop will constitute a more challenging set.
- Face recognition by parts (say, nose patch, eye patches, cheek patches
etc.) has proven to be more flexible and robust. Hence face segmentation,
possibly based on fiducial points, will be completed.
- Face recognition by unknown look directions
Inputs: project database (obtained after stages 1
and 2), 3D landmarking algorithms
Outputs: Performance of algorithms on project database.
Duration: 2 weeks (Week 3 and Week 4)
|
Ilkay Ulusoy
ilkay metu.edu.tr
|
LEADER/SENIOR Professor |
Middle East Technical University |
|
Lale Akarun
akarun boun.edu.tr
|
LEADER/SENIOR Professor |
BUMM, Bogaziçi University |
|
Tevfik Metin Sezgin
metin.sezgin cl.cam.ac.uk
|
LEADER/SENIOR Professor |
University of Cambridge, Computer Laboratory |
|
Bülent Sankur
bulent.sankur boun.edu.tr
|
LEADER/SENIOR Professor |
BUMM, Bogaziçi University |
|
Jana Trojanova
jeskynka.jana seznam.cz
|
PhD Student |
Department of Cybernetics, University of West Bohemia in Pilsen, Czech Republic |
|
Semih Esenlik
semihese yahoo.com
|
BS Student |
Bogazici University |
|
Nesli Bozkurt
e124410 metu.edu.tr
|
MS Student |
Middle East Technical University |
|
Aydin Akyol
akyol su.sabanciuniv.edu
|
PhD Student |
Istanbul Technical University |
|
Oya Çeliktutan
oyaxceliktutan yahoo.com
|
MS Student |
BUMM, Bogaziçi University |
|
Kerem Caliskan
kcaliskan infodif.com
|
PhD Student |
Informatics Institute - Medical Informatics |
|
Arman Savran
arman.savran boun.edu.tr
|
PhD Student |
BUMM, Bogaziçi University |
|
Hamdi Dibeklioglu
hamdi.dibeklioglu cmpe.boun.edu.tr
|
MS Student |
BUMM, Bogaziçi University |
|
Erdem Akagündüz
erdema metu.edu.tr
|
PhD Student |
Middle East Technical University |
|
Cem Demirkir
cemd boun.edu.tr
|
PhD Student |
BUMM, Bogaziçi University |
|
|
|
E-mail list address:
enterface07p7 listeci.cmpe.boun.edu.tr
(You may use enterface07all listeci.cmpe.boun.edu.tr to send e-mail to all participants.
If you have any problems, please contact arman.savran boun.edu.tr)
|
8 |
Audiovisual Content Generation Controlled by Physiological Signals for Clinical and Artistic Applications
Description
Team
E-mail
Schedule Expand
Collapse
|
|
Project Description :
Objective
This project proposes to pursue the research done during the two first
editions of eNTERFACE workshops on the use of physiological signals (EEG,
EMG, ECG…) to control digital sound and image synthesis processes. Taking
advantage of our previous experience, we would like to carry on new tentative
of this very exciting project of ”biologically-driven musical instruments”.
Furthermore, we aim to enlarge the field of our research by investigating
the way this kind of brain-computer interfaces could be helpful in clinical
applications.
Background
In one hand, advancements in science and computer enable now musicians
to perform music using a computer with gestural controllers [1] or sensors
[6]. On the other hand, advancements in Brain-computer Interface (BCI)
research show that basic control of the brain thoughts is possible [7].
Some recent works in BCI have tried to use the sound as a way to better
understand the brain activity, i.e to provide an auditory display of the
brain activity [4]. In the framework of the two first eNTERFACE workshops
in 2005 and 2006, we followed an approach closer to [5] and tried to build
digital musical instruments that were controlled by signals produced by
human body. These experiments were successful since we were able to demonstrate
at the end of the workshops live intruments with biomusicians interacting
with their digital musical instruments thanks to their EEG and EMG signals
[2][3]. This year, we would like to pursue this work done on the biologically-driven
musical instruments especially by investigating a more medicaloriented
scenario of biological signals sonification. For a detailed description
of our previous projects, you can refer to the eNTERFACE’05 and ‘06 proceedings
available at http://www.enterface.net.
Technical Description
During the two first projects, an important part of our work consisted
of developing a conceptual framework, i.e the software architecture of
the system, allowing modules (data acquisition, data processing and analysis,
sound synthesis, visualization…) to communicate among them. This year,
we will take benefit of this adaptive framework, and focus on more high-level
aspects of biologically-driven musical interfaces.
- Data acquisition, analysis, fusion and interpretation: Four
types of data will be considered with associated captors:, electroencephalogram
(EEG), electromyogram (EMG), electro-oculogram (EOC) and electrocardiogram
(ECG) data.
- Sound synthesis and interaction: two strategies of linking
biological signals to multimedia synthesis (sound and visual) will be
followed:
- Paradigm of “physiological data sonification”: here the synthesized
multimedia should be used to highlight some features of physiological
signals, i.e to transcript these features into sound or image. This
aims to improve the analysis of physiological and might be used
as a tool for computer-aided diagnosis.
- Paradigm of “digital musical instrument”: this approach relies
on more aesthetical considerations and aims to exploit the physiological
activity of a performer to drive digital music and paintings generation
in order to perform a biologicallydriven artistic experience.
- Software: Matlab, EEGLab, MedicalStudio for physiological
signals analysis + realtime sound synthesis software (Max-MSP, Pure
Data, CSound etc…) + image synthesis tools (Jitter, Processing etc…).
One of our objectives this year will be to integrate OpenInterface in
the existing system. OpenInterface is an open-source platform dedicated
to the development of multimodal interactive systems.
Workpackages
- WP0 (Pre-workshop preparation): setup testing and collecting of every
types of physiological data (database)
- WP1: Bio-Muse platform
- WP2: Physiological signal analysis
- WP3: “digital musical instrument-oriented” sound synthesis (with visual
feedback)
- WP4: “data sonification-oriented” sound synthesis (with visual feedback)
- WP5: report and demos (live demos and videos)
References
[1] Arfib D., Couturier J.M., Kessous L., Verfaille V., “Mapping strategies
between gesture control parameters and synthesis models parameters using
perceptual spaces”, Organised Sound 7(2), Cambridge University
Press, pp. 135-152
[2] Arslan, B., Brouse, A., Castet, J., Filatriau, J.J., Lehembre, R.,
Noirhomme, Q., Simon, C., “A biologically-driven musical instrument”,
In Proceedings of the 1st summer workshop on multimodal interfaces
(eNTERFACE05), Mons, Belgium, 2005, pp.35-45.
[3] Brouse A., Filatriau J-J., Gaitanis K., Lehembre R., Macq B., Miranda
E., Zenon A., « An instrument of sound and visual creation driven by biological
signals », In Proc. of the 2nd workshop on multimodal interfaces (eNTERFACE’06),
Dubrovnik, Croatia, 2006.
[4] Hermann T., Meinicke P., Bekel H., Ritter H. , “Sonification for EEG
data analysis”, in Proceedings of the 2002 International Conference on
Auditory Display (ICAD02),Kyoto, Japan, 2002s.
[5] Miranda E. and Brouse A., “Toward Direct Brain Computer Musical Interfaces”,
Conference on New Interfaces for Musical Expression (NIME05), Vancouver,
Canada, 2005.cal Engineering, vol. 51, 2004.
[6] Tanaka, A., “Musical performance practice on sensor-based instruments”,
In Trends in Gestural Control of Music, M. M. Wanderley and M.
Battier, eds. IRCAM, pp. 389- 406, 2000.
[7] Wolpaw, J.R., Birbaumer, N., McFarland, D.J.; Pfurtscheller, G, Vaughan,
T.M., “Brain computer interfaces for communication and control”. Clinical
Neurophysiology 113 (2002), 767-791, 2002. |
Benoit Macq
Benoit.Macq UCLouvain.be
|
LEADER/SENIOR Professor |
TELE Lab, UCL Louvain La Neuve |
|
Ben Knapp
b.knapp qub.ac.uk
|
LEADER/SENIOR Professor |
Queens University, Belfast |
|
Lehembre Rémy
lehembre tele.ucl.ac.be
|
LEADER/SENIOR PhD Student |
UCL - Université catholique de Louvain |
|
Filatriau Jean-Julien
filatriau tele.ucl.ac.be
|
LEADER/SENIOR PhD Student |
Université Catholique de Louvain (UCL-TELE), Belgium |
|
Brouse Andrew
brouse tele.ucl.ac.be
|
PhD Student |
TELE Lab, Université Catholique de Louvain, Belgium |
|
Koray Tahiroglu
ktahirog uiah.fi
|
PhD Student |
University of Art and Design Helsinki |
|
Mohammad Soleymani
mohammad.soleymani cui.unige.ch
|
PhD Student |
Computer vision and Multimedia Lab., University of Geneva |
|
Alaattin Sayin
sayina istanbul.edu.tr
|
MS Student |
Istanbul University |
|
Miguel Angel Ortiz Perez
mortizperez01 qub.ac.uk
|
PhD Student |
Sonic Arts Research Centre, Queen's University Belfast |
|
Christian Muehl
cmuehl gmail.com
|
MS Student |
University of Osnabrück |
|
Benovoy Mitchel
benovoym cim.mcgill.ca
|
MS Student |
Centre for Intelligent Machines, McGill University, Montreal, Canada |
|
Christian Frisson
frisson tele.ucl.ac.be
|
PhD Student |
UCL-TELE |
|
Cumhur Erkut
Cumhur.Erkut tkk.fi
|
Researcher |
University of Art and Design Helsinki |
|
Hannah Drayson
hannah.drayson plymouth.ac.uk
|
PhD Student |
University of Plymouth |
|
Thomas Greg Corcoran
thomascorcra gmail.com
|
Researcher |
|
|
Umut Gundogdu
gunumut istanbul.edu.tr
|
MS Student |
Istanbul University Dept. of Electrical Electronical Eng. |
|
|
|
E-mail list address:
enterface07p8 listeci.cmpe.boun.edu.tr
(You may use enterface07all listeci.cmpe.boun.edu.tr to send e-mail to all participants.
If you have any problems, please contact arman.savran boun.edu.tr)
|
9 |
USIMAG Tool: A Software for Real-time Elastography and Tensorial Elastography
Description
Team
E-mail
Schedule Expand
Collapse
|
|
Project Description :
Objective
This project aims at developing a freeware Elastography Analysis and
Visualization application that would serve as a common platform for future
collaboration of the participating research groups and dissemination of
their research outputs to a wider medical audience. The development environment
will be C++.
Background
Changes in tissue stiffness correlate with pathological phenomena that
can aid the diagnosis of several diseases such as breast and prostate
cancer (GARR97,HILT01) or cardiovascular dysfunctions (SCHU05,MAUR05).
Many different approaches try to estimate and image the elastic properties
of tissues, but this is not possible with conventional ultrasound, MRI,
CT or nuclear imaging. There are mechanical ways to estimate the biomechanical
properties of the tissue such as indentation, which is mostly used for
thin layers of tissue ex-vivo (KROU98,SRIN04). Researchers have introduced
new techniques using imaging modalities such as MRI and ultrasound, and
also there are some investigations in the optical field using microscopes
(DUNC01), always imaging the tissue response to some stimulus. These techniques
may be referred to as Elasticity Imaging. A review is found in(PARK05).
Elastography (OPHI91) relies among the ultrasound quasi-static techniques
for imaging the elastic properties of soft tissues and it is well established
in the literature. There are studies comparing results (DOYL01) which
show that freehand elastography, although it has a lower SNR, has proven
its capability to detect lesions such as breast carcinomas (OTAK03). The
displacement field from which researches normally obtain the strain is
estimated with different techniques. We will refer to papers such as (SRIN02),
which use time-domain cross-correlation techniques, or(PESA99), which
uses iterative phase zero estimation, among others. Some researchers visualize
the estimated displacement and strain fields following the path in(OPHI91);
they focuse on the Forward Problem. Some others, calculate from the displacement
and strain fields, mechanical properties of the tissue such as Young's
modulus, by using the constitutive elasticity equations solving the so
called Inverse Problem. In the former, either axial strain or lateral
strain(OPHI91), Poisson's ratio(RIGH04), or shear strain(KONO00) elastograms
are visualized. The Inverse Problem approach, deals with Young's modulus
visualization, the shear modulus(DOYL05) or other related parameters.
A comparative study between this two approaches can be found in (DOYL05).
Nowadays, Elastographic software is starting to appear in the commercial
system, with very basic functionality.
Technical Description
The target application will have the following functionality:
- I data: B-mode images or RF signals (Pre and post compression)
- Computation with different algorithms (Optical flow, cross-correlation,…)
- Filtering tools
- Scalar visualization tools
- Tensor visualization tools
USIMAG Tool will prepare software for the physician to change parameters
for filtering and visualization in Real Time Elastography, and will be
ready to implement in different ecographic systems. The participating
labs are encouraged to contribute with their original methods to the remaining
sets of functions. The code development will be monitored by LPI-UVA and
CTM-ULPGC members for compatibility. USIMAG Tool is based on C++, and
VTK/ITK functions through a hidden layer, which means that participants
may import their own functions and/or use the VTK/ITK functions. Consequently,
experience in C++ programming and VTK/ITK, together with familiarity to
Linux is important.
Workpackages
- April: after the team selection, data acquisition, in order to have
database of elastographic signals ready for the workshop start.
- Computation, Filtering, and Visualization, teams will be formed.
- Each group will be monitored by the project supervisor for compatibility
issues, identifying integration problems.
- Last three days of the workshop will be used for preparing the demo
of the USIMAG-Tool application. (Live demos and videos).
- Final presentation
References
[1] N. Belaid, I. C´espedes, J. Thijssen, and J Ophir. Lesion detection
in simulated elastographic and
ecographic images: A psycho-physical study. Ultrasound in Medicine and
Biology, 20:877–
891, 1994.
[2] M. M. Doyley, J. C. Bamber, F. Fuechsel, and N. L. Bush. A freehand
elastographic imaging
approach for clinical breast imaging: System development and performance
evaluation.
Ultrasound in Medicine and Biology, 27:1347–1357, 2001.
[3] MM Doyley, S Srinivasan, SA Pendergrass, Z Wu, and J Ophir. Comparative
evaluation of
strain-based and model-based modulus elastography. Ultrasound in Medicine
and Biology,
31(6):787–802, 2005.
[4] D.D. Duncan and S.J. Kirkpatrick. Processing algorithms for tracking
speckle shifts in optical
elastography of biological tissues. Journl of Biomedical Optics, 6(4):418–426,
July 2001.
[5] B.S. Garra, I. C´espedes, J. Ophir, S. Spratt, R. A. Zuurbier, C.
M. Magnant, and M. F. Pennanen.
Elastography of breast lesions: initial clinical results. Radiology, 202:79–86,
1997.
[6] K. M. Hiltawsky, M. Kruger, C. Starke, L. Heuser, H. Ermert, and A.
Jensen. Freehand ultrasound
elastography of breast lesions: Clinical results. Ultrasound Med. Biol.,
27:1461–1469,
2001.
[7] E. E. Konofagou and Ophir J. A new elastographic method for estimation
and imaging of lateral
displacements, lateral strains, corrected axial strains and poisson’s
ratios in tissues. Ultrasound
in Medicine and Biology, 24(8):1183–1199, 1998.
|
Ruben I. Cardenes Almeida
ruben lpi.tel.uva.es
|
LEADER/SENIOR Professor |
ETSI Telecomunicación, Valladolid, SPAIN |
|
Dario Sosa Cabrera
dario ctm.ulpgc.es
|
LEADER/SENIOR Researcher |
Universidad de Las Palmas de Gran Canaria |
|
Javier Gonzalez Fernandez
jgonzalez ctm.ulpgc.es
|
LEADER/SENIOR Researcher |
Universidad de Las Palmas de Gran Canaria |
|
Karl Krissian
krissian dis.ulpgc.es
|
Researcher |
Universidad de Las Palmas de Gran Canaria |
|
Santiago Aja Fernandez
sanaja tel.uva.es
|
Professor |
Universidad de Valladolid |
|
Veronica Garcia Pérez
veronica lpi.tel.uva.es
|
PhD Student |
University of Valladolid |
|
Gonzalo Vegas Sánchez-Ferrero
gvegsan lpi.tel.uva.es
|
PhD Student |
LPI, University of Valladolid (Spain) |
|
Rodrigo de Luis Garcia
rodlui yllera.tel.uva.es
|
Researcher |
ETSI Telecomunicación, Valladolid, SPAIN |
|
|
|
E-mail list address:
enterface07p9 listeci.cmpe.boun.edu.tr
(You may use enterface07all listeci.cmpe.boun.edu.tr to send e-mail to all participants.
If you have any problems, please contact arman.savran boun.edu.tr)
|
10 |
Realtime and Accurate Musical Control of Expression in Singing Synthesis (RAMCESS)
Description
Team
E-mail
Schedule Expand
Collapse
|
|
Project Description :
Objectives
The first purpose of this project is to continue the development of strategic
entities involved in expressive voice production: glottal signal synthesis,
physical modelling of the vocal tract, interpolation/extrapolation/navigation
mapping schemes, noise and turbulances modelling. At a second level, "dimensionality"
(meaning low-level synthesis parameters mapping into perceptual features
e.g. tenseness, breathiness, etc.) of voice quality will be discussed
and refined, targeting a coherent and unified descriptive canevas for
voice timbre modification. As, in 2007, we take advantage of work and
experience from both preceding eNTERFACE workshops, we will target the
realization of a full computer-based musical instrument, also considering
gestural control issues (especially through bi-manual manipulation), and
artisitic possibilities (e.g. eventually targeting a concert at the end
of the workshop).
Background
Expressivity is nowadays one of the most challenging topics in view by
the researchers in speech synthesis. Indeed, recent synthesizers provide
acceptable speech in term of intelligibility and naturalness but the need
to improve human/computer interactions carry out researchers to develop
more “human”, more expressive systems. Some recent realizations [1] have
shown that a interesting option was to record multiple databases corresponding
to a certain number of “labelled” expressions (e.g. happy, sad, angry,
etc). At synthesis time, the expression of the virtual speaker is set
by choosing the units in the corresponding database.
Two years ago, during eNTERFACE’05, the group decided to investigate
an opposite option. Indeed, we postulated that “emotion” in speech was
not the result of switchs between labelled expressions but a continuous
evolution of voice features extremely correlated with context. This approach
came back to a more acoustic/psychoacoustic description of voice production
mecanisms, in which a large number of theories (e.g. [2,3]) have been
developed these last years, but often underexploited in voice synthesis,
and particularly in the realtime context. Thus, we developed a set of
flexible voice synthesizers “conducted” in realtime by a operator. At
this level, the synthesizer achieved really interesting - but quite rough
- expressive results (expressive accents: efforts, lax/pressed, hoarseness
and noise) [4].
After a lot of inter-workshop work, and a particular focus on gestural
control issues [5], it becames clear that such a framework was particularly
efficient for singing synthesis. This approach was confirmed last year,
at eNTERFACE'06, where we focused on a singing synthesis scheme, with
particular constraints related to expressivity dimensions and gestural
control abilities. At the end of the workshop, we achieved a large number
of voice quality control modules, like glottal signal generator, geometrical
model of vocal tract, parameters conversion functions, interpolators,
and mapping strategies implementations [6]. This new coming "library"
can now serve as a basis in the development of concrete monophonic singing
prototypes and discussions around these topics, with an easy access to
tests and validation in realtime.
Technical Description
As we now reach eNTERFACE'07 workshop with the above-mentioned background,
goals evolve to the following:
1. Review, extend and discuss the existing voice quality control library
Preceding eNTERFACE workshops result now in a set of voice quality realtime
control modules, including glottal pulse generators (CALM: Causal/Anticausal
Linear Model [7]), physical model of vocal tract (LPC lattice filter),
a collection of vector computation objets in order to implement the coefficient
conversion framework (conversions between filter coefficients, formants
features, reflection coefficients, tubes sections, etc.), noise and turbulances,
vocal tract shape plotting, and usual controllers software interfaces
(tablet, dataglove, joystick, etc.). These modules are now fully developed
as Max/MSP objects and Pure Data porting is in the pipeline.
2. Review, extend and discuss low-level two-handed mappings and dimensionality
of the voice source
On the top of production modules, stand two really important and actually
underexploited mapping strategies: the low-level mapping of two-handed
movements (meaning the first-level modifications on gestural controllers
information, in order to be more appropriate for the current application,
e.g., how to implement an expressive vibrato with a usual force sensing
resistor?) and the dimensionality of the voice source (meaning the first-level
modifications on synthesis parameters, in order to better represent perceptual
axis of timbre variations, e.g. implementing the tenseness on the top
of open quotient and asymmetry coefficent). These behaviors will be implemented
as Max/MSP and/or PureData patch softwares.
3. Develop a full musical instrument based on a natural singing behavior
When two-handed movements and voice source perceptual features are correctly
interpreted, it is time to consider dedicated singing mapping strategies.
It concerns, on the one hand, mechanisms to be implemented in order to
produce natural singingvoice timbre variations (e.g. singing formant,
harmonic/formant adaptation, movements in size of the vocal tract, etc.).
These results will be achieved by discussing singing voice litterature
and analysing expressive singing databases. On the other hand, performing
abilities of the system have to be optimized, in order to make those natural
singing sounds interesting from an artistic point of view. Different instrumental
behaviors (keyboard-based, "fretless" control, conducting movements,
etc.) have thus to be considered and adapted to singing synthesis [8].
This work will also be implemented as Max/MSP and/or PureData patch softwares.
We focus on the fact that, at the end of the workshop, the group should
provide a complete and usable monophonic musical tool (from interface
to sound production). This synthesizer will be made of publically avialable
modules. Indeed, we also will produce a report and it should be interesting
that it also contains some practicing discussions, eventually resulting
from real musical sessions.
Workpackages
1. WP1 - Workshop Preparation
These tasks concern work that will be achieve by all team before the workshop:
gather a significant database of singing sounds, meaning that considered
singing styles have to be decided at this step, discussions about software
architecture, and dissemination of the existing voice quality control
library;
2. WP2 - Expressive Voice Analysis
This work concerns the processing of various gathered singing sounds,
in order to extract usable features or tendancies for expressive voice
implementation. It requires general voice (e.g. speech) analysis expertise,
such as the implementation of pitch tracking, formants detection, phase
processing, etc. offline algorithms (work in Matlab).
3. WP3 - Expressive Voice Synthesis
These tasks are related to the development of expressive voice production
modules. It asks mainly good view of voice (e.g. speech) synthesis issues,
such as source/filter implementations, harmonic/noise modelling, dynamics
control. A particular focus is also made on realtime synthesis issues:
latency, interpolability, computational load, continuity (e.g. phase,
pitch, etc.) problems.
4. WP4 - Gestural Controllers & Low-Level Bi-Manual Mappings
This work concerns the evaluation of different gestural control strategies,
based on the choice made at the level of devices, but also considering
first-level interpretation of gestural datas, in order to imitate (keyboard-like,
string-like, trumpet-like, etc.) or innovate (based on general ergonomic
issues) in the context of efficent control of expressive sound synthesis.
5. WP5 - Dimensionality & Singing Synthesis Behavior
These tasks concern the gathering of various theories related to perceptual
aspects of voice timbre, with here a particular focus on singing voice
timbre description, in order to produce an unified framework for voice
quality control and singing mechanisms implementation. It is based on
an iterative process where various mapping strategies will be implemented
and tested.
6. WP7 - Digital Luthery & Performing Issues
This last workpackage acts as a constant review of performing abilities
(under artistic perspectives) of the different systems that will be assembled.
It requires good knowledges about the musical performance context (live
situation, interpreter constraints, etc.). This work will stand as the
final "summarizing" process where engineering prototypes will
evolve into musical instruments.
References
[1] http://www.loquendo.com
[2] G. Carlsson and J. Sundberg, "Formant Frequency Tuning in Singing",
Journal of Voice, vol. 6, no. 3, pp. 256–60, 1992.
[3] N. Henrich, C. d’Alessandro, M. Castellengo, and B. Doval, "Glottal
Open Quotient in Singing: Measurements and Correlation with Laryngeal
Mechanisms, Vocal Intensity, and Fundamental Frequency", Journal
of Acoustics Society of America, vol. 117, pp. 1417–1430, March 2005.
[4] C. d’Alessandro, N. D’Alessandro, S. Le Beux, J. Simko, F. Cetin,
and H. Pirker, "The Speech Conductor: Gestural Control of Speech
Synthesis", in Proceedings of eNTERFACE’05 Summer Workshop on Multimodal
Interfaces, 2005.
[5] N. D’Alessandro, C. d’Alessandro, S. Le Beux, and B. Doval, "Realtime
CALM Synthesizer, New Approaches in Hands-Controlled Voice Synthesis",
in Proceedings of the 6th International Conference on New Interfaces for
Musical Expression, pp. 266–271, 2006.
[6] N. D'Alessandro, B. Doval, S. Le Beux, P. Woodruff, Y. Fabre, C. d'Alessandro
and T. Dutoit, "Realtime and Accurate Musical Control of Expression
in Singing Synthesis," Journal on Multimodal User Interfaces, vol.
1, no. 1, pp. 31-39, March 2007.
[7] B. Doval and C. d’Alessandro, "The Voice Source as a Causal/Anticausal
Linear Filter", in Proceedings of VOQUAL’03, Voice Quality: Functions,
Analysis and Synthesis, ISCA Workshop, August 2003.
[8] N. D’Alessandro, T. Dutoit, "HandSketch Bi-Manual Controller:
Investigation on Expressive Control Issues of an Augmented Tablet,"
[to be published in] Proceedings of the 7th International Conference on
New Interfaces for Musical Expression, June 2007.
|
Nicolas D`Alessandro
nicolas.dalessandro fpms.ac.be
|
LEADER/SENIOR PhD Student |
Faculté Polytechnique de Mons |
|
Moinet Alexis
alexis.moinet fpms.ac.be
|
PhD Student |
FPMs |
|
Holzapfel Andre
hannover csd.uoc.gr
|
PhD Student |
University of Crete |
|
Baris Bozkurt
barisbozkurt iyte.edu.tr
|
Professor |
Izmir Inst. of Tech., Urla/Izmir |
|
Onur Babacan
onurbabacan gmail.com
|
BS Student |
Izmir Inst. of Tech., Urla/Izmir |
|
Dubuisson Thomas
thomas.dubuisson fpms.ac.be
|
PhD Student |
Faculté Polytechnique de Mons (FPMs) |
|
Kessous Loic
kessous post.tau.ac.il
|
Senior Researcher |
Tel Aviv University |
|
Vlieghe Maxime
maxime.vlieghe gmail.com
|
MS Student |
Faculté Polytechnique de Mons |
|
|
|
E-mail list address:
enterface07p10 listeci.cmpe.boun.edu.tr
(You may use enterface07all listeci.cmpe.boun.edu.tr to send e-mail to all participants.
If you have any problems, please contact arman.savran boun.edu.tr)
|
11 |
Mobile-phone Based Gesture Recognition
Description
Team
E-mail
Schedule Expand
Collapse
|
|
Project Description :
Objective
The simple keypad that is present in many mobile phones is adequate for
placing voice calls but falls short when interacting with other mobile
applications, such as web browsing, image and video browsing, and location
services. Because mobile phones are handheld devices, user's gestures
can be easily utilized as additional inputs, for example moving the phone
to the right resulting in panning to the right on a map on the screen.
The goal of the project is to develop fast gesture recognition by analyzing
the motion of the camera phone from video input. The developed algorithm
can be arbitrarily complex and even support handwriting via handheld device.
Technical Description
Software:
Hardware:
Project coordinators will provide:
- A tutorial on development environment, setup, introduction to mobile
programming
- Example source code that can perform simple image processing on video
input
- Image, video data – i.e. various gestures captured by cell phone camera
Schedule:
- Week 1. Getting familiar with the development environment and implementing
basic i/o functions.
- Week 2. Implementing fast motion estimation for gesture recognition
from video stream.
- Week 3. Implementing the image browser user interface and integrating
with gesture recognition.
- Week 4. Improve speed and performance, documentation.
Workpackages
- 1-2 students who are proficient in C++ and knowledgeable in basic
image processing algorithms. Familiarity with C# and Microsoft Visual
C++ or Studio IDE is a plus.
- A project co-leader is also needed to oversee the progress.
References
|
Berna Erol
berna_erol rii.ricoh.com
|
LEADER/SENIOR Researcher |
Ricoh California Research Center, Menlo Park, CA, USA |
|
Murat Saraçlar
murat.saraclar boun.edu.tr
|
LEADER/SENIOR Professor |
BUMM, Bogaziçi University |
|
Tevfik Metin Sezgin
metin.sezgin cl.cam.ac.uk
|
LEADER/SENIOR Professor |
University of Cambridge, Computer Laboratory |
|
Burcu Barla
burcubarla gmail.com
|
BS Student |
Bogaziçi University Electrical-Electronics Department |
|
Caglayan Dicle
cdicle gmail.com
|
MS Student |
Bogazici University |
|
Ögem Boymul
ogem.boymul boun.edu.tr
|
BS Student |
Bogaziçi University |
|
Baris Bahar
barisbahar86 yahoo.com
|
BS Student |
Bogazici University Computer Engineering |
|
Milos Zelezny
zelezny kky.zcu.cz
|
Professor |
Department of Cybernetics, University of West Bohemia in Pilsen |
|
Candan Herdem
kcandanherdem yahoo.com
|
MS Student |
Gazi University, Ankara, Turkey |
|
Deniz Türdü
denizturdu su.sabanciuniv.edu
|
MS Student |
Vision and Pattern Analysis Laboratory, Sabanci University |
|
|
|
E-mail list address:
enterface07p11 listeci.cmpe.boun.edu.tr
(You may use enterface07all listeci.cmpe.boun.edu.tr to send e-mail to all participants.
If you have any problems, please contact arman.savran boun.edu.tr)
|
12 |
Benchmark for Multimodal Biometric Authentication
Description
Team
E-mail
Schedule Expand
Collapse
|
|
Project Description :
Objective
This project aims at creating a small benchmark for the testing integration
of existing monomodal robust hashing techniques and feature extraction techniques
into multimodal methods for biometric authentication.
Abstract
It is possible to authenticate individuals by means of robust hashing or by
means of feature extraction functions. For instance, one may take a photo of
the face of a person and robustly hash it in order to obtain a low-dimensional
descriptor (hash). The same can be achieved by means of feature extraction functions.
The identifiers thus obtained can be compared to preexisting ones in a database
for a match. Systems based on multimodal (i.e., joint) robust hashing and feature
extraction strategies combine two or more monomodal functions (for instance,
one method to hash a face image and another method to hash a fingerprint image)
in order to increase security.
Technical Description
The main goal of the project is to create a small GUI-driven benchmark driven
in order to test multimodal identification strategies. This benchmarking system
will be just a prototype (target deliverable of the project). Despite the necessarily
limited scope of this demo, it should be usable and, if possible, easily extensible.
Extendability (in order to include new items and features) may not be completely
automatic, for keeping the implementation simple. That is, it will not be possible
to do all operations through a GUI, and manual adjustments in the source code
will probably be necessary. A basic scheme showing the relationships between
the different parts of the bechmarking system is shown in Figure 1.
The benchmark will have a database of authenticated individuals with corresponding
input signals and a database of verified identifiers (i.e., hash values, extracted
features) obtained from the authenticated individuals in the database. The proposed
initial database will include 2D face images and short audio clips (around 3
seconds) of the same individuals reading a password. Ten images of every individual
and ten utterances of the password will be included in order to minimize the
effects of intraindividual variability.
An individual entry in the database will resemble the following pattern:
- Name
- Authenticated (Boolean)
- Fake/Attacked (Boolean)
- If affirmative, attack from the library of attacks used (see below)
- List of image files and audio clip files associated
- For each image, audio: list of identifiers generated by each relevant
function in the library of monomodal functions (see below)
The benchmark should be able to admit new modules in the following libraries
of functions (see Figure 1):
- Library of monomodal hashing and feature extraction methods.
It will include standard monomodal functions. For each function, the following
items will be defined:
Figure 1: Relationships between the elements of the benchmark
- Function-dependent input parameters, including type of signal handled
by the function, thresholds, etc.
- The function must return an identifier string (binary or real-valued,
depending on the function).
- In acquisition mode, the identifier will be stored in the database
associated to the individual whose signal has been employed to derive
the identifier.
- In comparison mode, the function must be able to compare the
identifier obtained to those previously stored in the database during acquisition,
giving a Boolean decision of similarity.
Proposed initial functions:
- Robust image hashing: algorithm A of [1].
- Robust audio hashing: Philips method [2].
Both methods yield binary identifiers.
- Library of attacks. It will list attack functions on the
signals stored in the individuals database (chimeric characters, noise addition,
filtering, etc). Attacked signals will be used to assess how robust multimodal
methods are to these attacks:
- is it possible to fool the system in order to verify non-authentic
signals?
- are multimodal methods more robust to noisy versions of the signals?
The results of the attacks may be stored properly labelled in the database
—noting that these are not authenticated signals— or they
may be used
on the fly.
Proposed initial attacks:
- Chimeric characters: random combinations of face imags and audio clips
from different individuals in the database. Face images can also be morphed
from randomly chosen images in the database. It can be seen as an intentional,
malicious attack.
- Pseudorandom Gaussian noise addition to images and audio from the database.
It can be seen as an unintentional attack.
- Library of multimodal methods. It will list functions which, using the
library of monomodal techniques, specify ways to combine two (or more) monomodal
techniques in order to create multimodal identifiers. For instance, the system
could allow to combine a method to robustly hash face images with a method
to extract features from a fingerprint; the newly created method should be
stored in the library as a multimodal function. It is important that each
multimodal function implements an overallcomparison function, able to break
ties between the monomodal decisions when trying to match unauthenticated
signals with authenticated signals in the database.
Proposed initial multimodal method: a function combining [1] and [2] into
a multimodal fingerprint. If the monomodal methods give opposite Boolean decisions
in the comparison stage, the overall comparison function will output the decision
with higher percentage of binary matches.
- Library of benchmarking scripts. It will list scripts which may be run in
batch mode (i.e., autonomously), using suitable signals from the database,
one multimodal method, and one attack. Quality measures such as the rates
of detection and false alarm (obtained by comparison with the authentic identifiers)
will be computed during the execution of the script. In the scripts there
may be loops where some attack parameters are generated pseudorandomly. A
resettable Boolean variable will indicate if the script has been run by the
benchmark already.
Proposed initial benchmarking scripts:
- Script 1:
- Pseudorandomly generate a sufficiently large new database of false characters,
using the individuals database and the chimeric attacks.
- Run the proposed multimodal method on the new database.
- Compute and store multimodal and monomodal probabilities of detection
and false alarm, comparing with the authenticated individuals database,
using the methods proposed.
- Script 2:
- Pseudorandomly generate a sufficiently large new database of noisy characters,
using the individuals database and the Gaussian noise attack.
- Run the proposed multimodal method on the new database.
- Compute and store multimodal and monomodal probabilities of detection
and false alarm, comparing with the authenticated individuals database,
using the methods proposed.
The output file with the data resulting from running a benchmarking script
will be timestamped and included in a database of results. Using these result
files, an output module will be able to produce simple text reports or plots
from the results of running benchmarking scripts. For the proposed scripts,
a text report may include details such as functions and parameters used, number
of iterations, database signals used, and quality measures obtained. An output
plot will show ROC plots (probability of false alarm versus probability of detection)
for different thresholds or noise levels.
Benchmark Architecture
In order to speed up the development time, the GUI will be implemented using
Matlab, and so will be the functions. The database will be setup with Mysql,
which is interfaceable with Matlab. The participants may of course suggest and
use alternatives that they feel more at ease with; for instance, it is easy
to interface C/C++ with Matlab through Mex libraries.
The main GUI window (see Figure 2) will allow to browse and add/remove elements
from the database and all four libraries. It will have a button for acquisition,
through which the database will be updated: new identifiers will be created
for new individuals in the database using the functions in the library, or existing
individuals will be updated if new functions have been added since last update.
Another button in the main GUI will allow to run all uncompleted benchmark scripts.

Figure 2: Sketch of main GUI window
Workpackages
- (Pre-workshop preparation) Data collection and database creation.
- (Pre-workshop preparation) Development of a basic GUI and benchmarking
system workflow in Matlab.
- (Week 1) Implementation/adaptation of the monomodal and multimodal identification
methods chosen.
- (Week 1) Implementation/adaptation of the attacks chosen.
- (Week 2) Development and integration of the main benchmarking system blocks.
- (Week 3) Development of output module.
- (Week 3) Debugging and bug fixing.
- (Week 4) Benchmark tests.
- (Week 4) Project report and demo showcasing.
Workforce
The project will be undertaken by least two persons with good programming
skills and some familiarity with signal processing techniques. The participants
are expected to implement and test a basic version of the benchmark during the
workshop, under the direction and with the collaboration of the coordinators.
References
[1] M. K. Mihcak and R. Venkatesan. New iterative geometric methods for robust
perceptual image hashing. In Procs. of ACM Workshop on Security and Privacy
in Digital Rights Management, Philadelphia, USA, 2001.
[2] J. Haitsma, T. Kalker, and J. Oostveen. Robust audio hashing for content
identification. In Procs. of the International Workshop on Content-Based Multimedia
Indexing, pages 117–125, Brescia, Italy, September 2001.
|
Felix Balado
fiz ihl.ucd.ie
|
LEADER/SENIOR Professor |
University College Dublin |
|
Kivanc Mihcak
kivanc.mihcak boun.edu.tr
|
LEADER/SENIOR Professor |
Bogaziçi University |
|
Neil Hurley
neil.hurley ucd.ie
|
LEADER/SENIOR |
University College Dublin |
|
Morgan Tirel
morgan.tirel etudiant.univ-rennes1.fr
|
MS Student |
University of Rennes, France |
|
Neslihan Gerek
neslihan.gerek gmail.com
|
MS Student |
Bogazici University |
|
Ekin Olcan Sahin
ekin.sahin boun.edu.tr
|
MS Student |
Bogazici University |
|
Guenole Silvestre
guenole ihl.ucd.ie
|
|
University College Dublin |
|
Cliona Roche
cliona.roche ucd.ie
|
PhD Student |
University College Dublin |
|
Sinan Kesici
sinan940 yahoo.com
|
BS Student |
Bogaziçi Uni. Electrical-Electronics Eng. |
|
|
|
E-mail list address:
enterface07p12 listeci.cmpe.boun.edu.tr
(You may use enterface07all listeci.cmpe.boun.edu.tr to send e-mail to all participants.
If you have any problems, please contact arman.savran boun.edu.tr)
|
|
Openinterface Project
Team
Expand
Collapse
|
Marcos Serrano
marcos.serrano imag.fr |
Researcher |
University of Grenoble |
|
Lionel Lawson
jean-yves.lawson uclouvain.be
|
Researcher |
UCL, Université catholique de Louvain |
|
Yann Goffette
yann.goffette student.uclouvain.be |
MS Student |
UCL-BCHI |
|
Louvigny Henri-Nicolas
henri-nicolas.louvigny student.uclouvain.be
|
MS Student |
UCL-BCHI |
|
|
Participants come from 20 countries
Turkey (52), Spain (18), Belgium (14), Greece (11), France (7), Ireland (5),
UK (5), Czech Republic (4),
Netherlands (4), Romania (4), USA (4), Italy (3), Canada (2), Finland (2), Croatia
(1), Germany (1),
Israel (1), Russia (1), Senegal (1), Switzerland (1)
|
|