(17/2/2003)
Fikret Gürgen, Off: 204, Ext : 1863 (or 1323)
Textbook:
Digital
Speech Processing by Sadaoki Furui (Marcel
Decker, second edition).
Reserved in the library
-
Discrete-Time Speech Signal Processing, T. F. Quateri, Prentice Hall
(2002).
-
Fundamentals of Speech Recognition, Rabiner L., Juang B. H., Prentice
Hall (1993).
-
Speech and Audio Signal Processing by Morgan and Gold, John Wiley
(2000).
-
Speech Proc. and Synthesis Toolboxes, Childers, John Wiley (2000).
- Speaker and Speech Recognition by C H
Lee, K Paliwal, .. (1996)
- Advances in Speech Processing by S.
Furui & M. Sohndi (1992)
- Connectionist Models for Speech
Recognition by N. Morgan & H. Bourlard (1993)
- Speech Signal Processing by Proakis, et
al. (1993)
- Automatic Speech Recognition by Kai Fu
Lee, Kluwer Academic Publishers (1989)
- Neural Nets and Speech Proc. by D.
Morgan & C. Scofield, Kluwer Academic Pub. (1991)
·
Magazines
- IEEE Speech and Audio Transactions
- Speech Communication - Computer Speech and Language, etc.
·
Conference Proceedings:
- ICASSP: Intl Conf. on Acoustics
Speech and Signal Processing
- ICSLP: Intl Conf. on Spoken
Language Processing
- EuroSpeech: European Conf. on
Speech Processing
Tentative Topics:
1) a) Introduction to Speech Processing
b) Digitization
of Sound
2) Theory and
Application of Discrete Signal Processing (overview notes)
a) Signals and systems
b) Introduction to statistical
approaches
c) Introduction to pattern
recognition
d) Transforms (Fourier, Laplace,
Mellin, Hartley,)
3) Speech Signals
and Representations (Chapter 5) or Feature Extraction
a) Speech production model
b) Linear Predictive Coding (LPC)
Analysis
-
Cepstrum coefficients -DFT (or FFT)
coeff.
- Line Spectrum Pair coeff. -The other coeff.
4) Speech Recognition (Chapter 8)
a)
Speech and speaker recognition issues
b)
Hidden Markov Model (HMM)
c)
Dynamic Time-Warping and Neural Networks
5) Speech coding (Chapter 6)
a)
Coding in Time domain -Ex: Pulse code
modulation (PCM) and derivations
b)
Coding in Frequency domain –Ex: Subband coding
c)
Analysis-synthesis and coding combination (LPC)
d)
Vector quantization (VQ) -Linde-Buzzo-Gray (LBG) algorithm
6) A Brief of Speech synthesis (Chapter 7) and speech enhancement
7) Future directions (Chapter 10)
and
our current projects: audio watermarking, speech-music discrimination, speaker
identification, ultrasound and medical
applications.
**********************************************************************
Course Grading: [Project + Presentation (% 35)] + [Midterm %30] + [Take home Final
%35]
Hardware: A Pentium based PC with Sound card
(any product)
Software Tools:
1) MATLAB (download free
software)
http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/voicebox.html
2) Oregon
Graduate Instıtute (OGI):
Center for Spoken Language Understanding (CSLU) Speech Toolkit
3) C programming language
Internet sites: 1. http://cslu.cse.ogi.edu
2. http://cslr.colorado.edu 3. http://svr-ftp.eng.cam.ac.uk/cstit 4. http://www.furui.cs.titech.ac.jp/english etc.
Homeworks:
Homework #1
1)
Audio File
Input/Output (digitization of sound)
Homework #2
1)
Speech
analysis by LPC and Fourier transform
-Take speech frames by using
windows. Then create a “spectrogram” using same tool. Give brief explanation of
what you have done.
Homework #3
1) Linear
Predictive Coding routines and distance between two speech segments
-Find feature of your
choice using above speech frames. Find also distance between two speech
segments
Homework #4
1) Speaker verification for three
speakers
-Find feature of your choice using
three speakers. Find K-means (K size codebook) of each speaker and find the distance of the unknown utterance
to each codebook vectors.
Homework #5
1) Prepare your project proposal &
tools and describe them in 2-4 pages.