CmpE 623 Computer Speech Processing

(17/2/2003)

Fikret Gürgen, Off: 204, Ext : 1863 (or 1323)

 

Textbook:

 Digital Speech Processing by Sadaoki Furui  (Marcel Decker, second edition).

            Reserved in the library

-      Discrete-Time Speech Signal Processing, T. F. Quateri, Prentice Hall (2002).

-      Fundamentals of Speech Recognition, Rabiner L., Juang B. H., Prentice Hall (1993).

-          Speech and Audio Signal Processing by Morgan and Gold, John Wiley (2000).

-          Speech Proc. and Synthesis Toolboxes, Childers, John Wiley (2000).

-      Speaker and Speech Recognition by C H Lee, K Paliwal, .. (1996)

-      Advances in Speech Processing by S. Furui & M. Sohndi (1992)

-      Connectionist Models for Speech Recognition by N. Morgan & H. Bourlard (1993)

-      Speech Signal Processing by Proakis, et al. (1993)

-      Automatic Speech Recognition by Kai Fu Lee, Kluwer Academic Publishers (1989)

-      Neural Nets and Speech Proc. by D. Morgan & C. Scofield, Kluwer Academic Pub. (1991)

 

·         Magazines

- IEEE Speech and Audio Transactions - Speech Communication - Computer Speech and Language, etc.

·         Conference Proceedings:

- ICASSP: Intl Conf. on Acoustics Speech and Signal Processing

- ICSLP: Intl Conf. on Spoken Language Processing

- EuroSpeech: European Conf. on Speech Processing

                

Tentative Topics:

 

1) a)            Introduction to Speech Processing

    b)            Digitization of Sound

 

2) Theory and Application of Discrete Signal Processing (overview notes)

            a) Signals and systems

            b) Introduction to statistical approaches

            c) Introduction to pattern recognition

            d) Transforms (Fourier, Laplace, Mellin, Hartley,)

 

3) Speech Signals and Representations (Chapter 5) or Feature Extraction

            a) Speech production model

            b) Linear Predictive Coding (LPC) Analysis

                - Cepstrum coefficients  -DFT (or FFT) coeff. 

                     - Line Spectrum Pair coeff. -The other coeff.                                                                             

4) Speech Recognition (Chapter 8)

            a) Speech and speaker recognition issues

            b) Hidden Markov Model (HMM)

            c) Dynamic Time-Warping and Neural Networks

 

5) Speech coding (Chapter 6)

            a) Coding in Time domain  -Ex: Pulse code modulation (PCM) and derivations

            b) Coding in Frequency domain –Ex: Subband coding             

            c) Analysis-synthesis and coding combination (LPC)

            d) Vector quantization (VQ)   -Linde-Buzzo-Gray (LBG) algorithm

 

6) A Brief of Speech synthesis  (Chapter 7) and speech enhancement

 

7) Future directions (Chapter 10)

 and

    our current projects: audio watermarking, speech-music discrimination, speaker identification, ultrasound and medical applications.

 

**********************************************************************

 

Course Grading: [Project + Presentation (% 35)] + [Midterm %30] + [Take home Final  %35]

 

Hardware: A Pentium based PC with Sound card (any product)

 

Software Tools: 

                              1) MATLAB (download free software)

 http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/voicebox.html

                              2) Oregon Graduate Instıtute (OGI):

                  Center for Spoken Language Understanding (CSLU) Speech Toolkit

                  3) C programming language

Internet sites: 1. http://cslu.cse.ogi.edu   2. http://cslr.colorado.edu  3. http://svr-ftp.eng.cam.ac.uk/cstit  4. http://www.furui.cs.titech.ac.jp/english  etc.

 

Homeworks:

 

Homework #1

1)      Audio File Input/Output (digitization of sound)

-Prepare 3 sound files such as speech, music and noise. Give brief explanation of what you have done.

May use software tool  (1)  at  internet address.

 

Homework #2

1)      Speech analysis by LPC and Fourier transform

-Use the three files above. Give brief explanation of what you have done.

-Take speech frames by using windows. Then create a “spectrogram” using same tool. Give brief explanation of what you have done.

 

Homework #3

1)            Linear Predictive Coding routines and distance between two speech segments

-Find feature of your choice using above speech frames. Find also distance between two speech segments

 

Homework #4

            1)            Speaker verification for three speakers

                        -Find feature of your choice using three speakers. Find K-means (K size codebook) of each speaker and  find the distance of the unknown utterance to each codebook vectors.

 

Homework #5

            1)            Prepare your project proposal & tools and describe them in 2-4 pages.