|
|
Theses by Author
| Thesis # |
Thesis Information |
| 1 |
A Unifying Theory for Rank-Based Multiple Classifier
by Afsar Saranli
from Middle East Technical University/Dept. of Electrical and Electronics Eng. in 2000
Details |
Supervisor(s): Prof. Dr. Mubeccel Demirekler
Keywords(s): statistical multiple classifier systems, rank based decision combination, fusion, classifier observation space, event-space partitioning, pattern recognition, independence, complementariness, speaker identification, speech recognition
Abstract: This thesis presents a theoretical investigation of the rank-based multiple classifier
decision combination problem and develop a unified framework to understand a
variety of such systems.
The rank-based combination is formulated as a discrete optimization problem
with the total probability of correct decision as the objective function to be
maximized. This formulation introduces a set of classifier observation statistics
to be estimated by observing the classifiers operate on a cross-validation test
set. The resulting binary programming problem is shown to have a simple global
optimum solution but requiring prohibitive number of observation statistics. To
reduce this dimensionality, a method based on observation space partitioning is
developed. By this formalism the number of observation statistics can be reduced
to levels feasible to estimate from the available cross-validation test data. Specific
partitionings can be defined when reasonable assumptions or prior knowledge
about the classifiers are incorporated into the problem. Also, certain specific
partitionings effectively lead to the Highest Rank, Borda Count and Logistic
Regression methods from the literature and establish the links between Type 1
and Type 2 systems.
The concepts of independence and complementariness of combined rank-based
classifiers are investigated using basic concepts from Information Theory and measures
on independence and complementariness are developed. The Dominance
condition is developed as an indicator of performance improvement through combination.
Independence of classifiers is shown to have no direct role in classifier
complementariness.
Finally the potential of the theory and practical issues in implementation are
comparatively illustrated by applying the theory and the existing methods in two
real-life pattern recognition problems from speech processing with encouraging
results.
Download: Thesis
|
| 2 |
Biologically Inspired 3D Face Recognition
by Albert Ali Salah
from Boğaziçi University / Department of Computer Engineering in 2007
Details |
Supervisor(s): Prof. Lale Akarun
Keywords(s): 3d face recognition, human face recognition, 3D registration, automatic landmarking, mixture models, factor analysis, GOLLUM, iterative closest point, thin-plate splines, average face models
Abstract: Face recognition has been an active area of study for both computer vision and image processing communities, not only for biometrics but also for human-computer interaction applications. The purpose of the present work is to evaluate the existing 3D face recognition techniques and seek biologically motivated methods to improve them. We especially look at findings in psychophysics and cognitive science for insights. We propose a biologically motivated computational model, and focus on the earlier stages of the model, whose performance is critical for the later stages. Our emphasis is on automatic localization of facial features. We first propose a strong unsupervised learning algorithm for flexible and automatic training of Gaussian mixture models and use it in a novel feature-based algorithm for facial fiducial point localization. We also propose a novel structural correction algorithm to evaluate the quality of landmarking and to localize fiducial points under adverse conditions. We test the effects of automatic landmarking under rigid and non-rigid registration methods. For the rigid registration approach, we implement the iterative closest point method (ICP). The most important drawback of ICP is the computational cost of registering a test scan to each scan in the gallery. By using an average face model in rigid registration, we show that the computation bottleneck can be eliminated. Following psychophysical arguments on the "other race effect", we reason that organizing faces into different gender and morphological groups will help us in designing more discriminative classifiers. We test this claim by employing different average face models for dense registration. We propose a shape-based clustering approach that assigns faces into groups with nondescript gender and race. Finally, we propose a regular re-sampling step that increases the speed and the accuracy significantly. These components make up a full 3D face recognition system.
Download: Thesis
|
| 3 |
Privacy Protecting Biometric Authentication Systems
by Alisher Kholmatov
from Sabanci University in 2008
Details |
Supervisor(s): Berrin Yanikoglu
Keywords(s): biometrics, privacy, signature, fingerprint, fuzzy vault
Abstract: As biometrics gains popularity and proliferates into the daily life, there is an
increased concern over the loss of privacy and potential misuse of biometric data
held in central repositories. The major concerns are about i) the use of biometrics to
track people, ii) non-revocability of biometrics (eg. if a fingerprint is compromised
it can not be canceled or reissued), and iii) disclosure of sensitive information such
as race, gender and health problems which may be revealed by biometric traits. The
straightforward suggestion of keeping the biometric data in a user owned token (eg.
smart cards) does not completely solve the problem, since malicious users can claim
that their token is broken to avoid biometric verification altogether. Put together,
these concerns brought the need for privacy preserving biometric authentication
methods in the recent years.
In this dissertation, we survey existing privacy preserving biometric systems and
implement and analyze fuzzy vault in particular; we propose a new privacy preserving
approach; and we study the discriminative capability of online signatures as it
relates to the success of using online signatures in the available privacy preserving
biometric verification systems. Our privacy preserving authentication scheme combines
multiple biometric traits to obtain a multi-biometric template that hides the
constituent biometrics and allows the possibility of creating non-unique identifiers
for a person, such that linking separate template databases is impossible. We provide
two separate realizations of the framework: one uses two separate fingerprints
of the same individual to obtain a combined biometric template, while the other one
combines a fingerprint with a vocal pass-phrase. We show that both realizations of
the framework are successful in verifying a person’s identity given both biometric
traits, while preserving privacy (i.e. biometric data is protected and the combined
identifier can not be used to track people).
The Fuzzy Vault emerged as a promising construct which can be used in protecting
biometric templates. It combines biometrics and cryptography in order to
get the benefits of both fields; while biometrics provides non-repudiation and convenience,
cryptography guarantees privacy and adjustable levels of security. On the
other hand, the fuzzy vault is a general construct for unordered data, and as such, it
is not straightforward how it can be used with different biometric traits. In the scope
of this thesis, we demonstrate realizations of the fuzzy vault using fingerprints and
online signatures such that authentication can be done while biometric templates
are protected. We then demonstrate how to use the fuzzy vault for secret sharing,
using biometrics. Secret sharing schemes are cryptographic constructs where a secret
is split into shares and distributed amongst the participants in such a way that
it is reconstructed/revealed only when a necessary number of share holders come
together (e.g. in joint bank accounts). The revealed secret can then be used for
encryption or authentication. Finally, we implemented how correlation attacks can
be used to unlock the vault; showing that further measures are needed to protect
the fuzzy vault against such attacks.
The discriminative capability of a biometric modality is based on its uniqueness/
entropy and is an important factor in choosing a biometric for a large-scale
deployment or a cryptographic application. We present an individuality model for
online signatures in order to substantiate their applicability in biometric authentication.
In order to build our model, we adopt the Fourier domain representation of the
signature and propose a matching algorithm. The signature individuality is measured
as the probability of a coincidental match between two arbitrary signatures,
where model parameters are estimated using a large signature database. Based
on this preliminary model and estimated parameters, we conclude that an average
online signature provides a high level of security for authentication purposes.
Finally, we provide a public online signature database along with associated
testing protocols that can be used for testing signature verification systems.
Download: Thesis
|
| 4 |
WATERMARKING FOR 3D REPRESENTATIONS
by Alper Koz
from Middle East Technical University/ Electrical and Electronics Engineering in 2007
Details |
Supervisor(s): Assoc. Prof. A. Aydin Alatan
Keywords(s): Video Watermarking, Temporal Contrast Thresholds, 3D watermarking, Geometry Watermarking, Free View Watermarking, Human Visual System, Projective Invariants
Abstract:
In this thesis, a number of novel watermarking techniques for different 3D representations are presented. A novel watermarking method is proposed for the mono-view video, which might be interpreted as the basic implicit representation of 3D scenes. The proposed method solves the common flickering problem in the existing video watermarking schemes by means of adjusting the watermark strength with respect to temporal contrast thresholds of human visual system (HVS), which define the maximum invisible distortions in the temporal direction. The experimental results indicate that the proposed method gives better results in both objective and subjective measures, compared to some recognized methods in the literature.
The watermarking techniques for the geometry and image based representations of 3D scenes, denoted as 3D watermarking, are examined and classified into three groups, as 3D-3D, 3D-2D and 2D-2D watermarking, in which the pair of symbols identifies whether the watermark is embedded-detected in a 3D model or a 2D projection of it. A detailed literature survey on 3D-3D watermarking is presented that mainly focuses on protection of the intellectual property rights of the 3D geometrical representations. This analysis points out the specific problems in 3D-3D geometry watermarking , such as the lack of a unique 3D scene representation, standardization for the coding schemes and benchmarking tools on 3D geometry watermarking.
For 2D-2D watermarking category, the copyright problem for the emerging free-view televisions (FTV) is introduced. The proposed watermarking method for this original problem embeds watermarks into each view of the multi-view video by utilizing the spatial sensitivity of HVS. The hidden signal in a selected virtual view is detected by computing the normalized correlation between the selected view and a generated pattern, namely rendered watermark, which is obtained by applying the same rendering operations which has occurred on the selected view to the original watermark. An algorithm for the estimation of the virtual camera position and rotation is also developed based on the projective planar relations between image planes. The simulation results show the applicability of the method to the FTV systems.
Finally, the thesis also presents a novel 3D-2D watermarking method, in which a watermark is embedded into 3-D representation of the object and detected from a 2-D projection (image) of the same model. A novel solution based on projective invariants is proposed which modifies the cross ratio of the five coplanar points on the 3D model according to the watermark bit and extracts the embedded bit from the 2D projections of the model by computing the cross-ratio. After presenting the applicability of the algorithm via simulations, the future directions for this novel problem for 3D watermarking are addressed.
Download: Thesis
|
| 5 |
Video Object Tracking with Feedback of Performance Measures
by Çiğdem Eroğlu Erdem
from Boğaziçi University, Department of Electrical and Electronics Engineering in 2002
Details |
Supervisor(s): Prof. Bülent Sankur
Keywords(s): Video segmentation, object tracking, active contours, performance evaluation, image segmentation, motion estimation, fuzzy clustering
Abstract: The task of segmentation and tracking of objects in a video sequence is an important high-level video processing problem for object-based video manipulation and representation. This task involves utilization of many low-level pre-processing tasks such as image segmentation and motion estimation. It is also very important to assess the performance of the video object segmentation and tracking algorithms quantitatively and objectively. Performance evaluation measures are proposed both when the ground-truth segmentation maps are available and when they are unavailable. A semi-automatic video object tracking method is introduced that uses the proposed performance evaluation measures in a feedback loop to adjust its parameters locally on the object boundary. New low-level image segmentation and motion estimation algorithms, namely, an illumination invariant fuzzy image segmentation algorithm and a motion estimation estimation algorithm in the frequency domain using fuzzy c-planes clustering are also presented in this thesis.
Download: Thesis
|
| 6 |
Three-Dimensional Face Recognition
by Berk Gokberk
from Computer Engineering Department in 2006
Details |
Supervisor(s): Lale Akarun
Keywords(s): Face recognition, Biometrics, Fusion
Abstract: In this thesis, we attack the problem of identifying humans from their three dimensional facial characteristics. For this purpose, a complete 3D face recognition system is developed. We divide the whole system into sub-processes. These sub-processes can be categorized as follows: 1) registration, 2) representation of faces, 3) extraction of discriminative features, and 4) fusion of matchers. For each module, we evaluate the state-of-the art methods, and also propose novel ones. For the registration task, we propose to use a generic face model which speeds up the correspondence establishment process. We compare the benefits of rigid and non-rigid registration schemes using a generic face model. In terms of face representation schemes, we implement a diverse range of approaches such as point clouds, curvature-based descriptors, and range images. In relation to these, various feature extraction methods are used to determine the discriminative facial features. We also propose to use local region-based representation schemes which may be advantageous in terms of both dimensionality reduction and for determining invariant regions under several facial variations. Finally, with the realization of diverse 3D face experts, we perform an in-depth analysis of decision-level fusion algorithms. In addition to the evaluation of baseline fusion methods, we propose to use two novel fusion schemes where the first one employs a confidence-aided combination approach, and the second one implements a two-level serial integration method. Recognition simulations performed on the 3DRMA and the FRGC databases show that: 1) generic face template-based rigid registration of faces is better than the non-rigid variant, 2) principal curvature directions and surface normals have better discriminative power, 3) representing faces using local patch descriptors can both reduce the feature dimensionality and improve the identification rate, and 4) confidence-assisted fusion rules and serial two-stage fusion schemes have a potential to improve the accuracy when compared to other decision-level fusion rules.
Download: Thesis
|
| 7 |
A COMPARISON OF DIFFERENT APPROACHES TO TARGET DIFFERENTIATION WITH SONAR
by Birsel Ayrulu-Erdem
from Bilkent University, Department of Electrical and Electronics Engineering in 2001
Details |
Supervisor(s): Prof. Billur Barshan
Keywords(s): Sonar sensing, target differentiation, target localization, artificial neural networks, learning, feature extraction, statistical pattern recognition, Dempster-Shafer evidential reasoning, majority voting, sensing systems, acoustic signal processing, mobile robots, map building, Voronoi diagram.
Abstract: This study compares the performances of different classification schemes and fusion techniques for target differentiation and localization of commonly encountered features in indoor robot environments using sonar sensing. Differentiation of such features is of interest for intelligent systems in a variety of applications such as system control based on acoustic signal detection and identification, map-building, navigation, obstacle avoidance, and target tracking. The classification schemes employed include the target differentiation algorithm developed by Ayrulu and Barshan, statistical pattern recognition techniques, fuzzy c-means clustering algorithm, and artificial neural networks. The fusion techniques used are Dempster-Shafer evidential reasoning and different voting schemes. To solve the consistency problem arising in simple majority voting, different voting schemes including preference ordering and reliability measures are proposed and verified experimentally. To improve the performance of neural network classifiers, different input signal representations, two different training algorithms, and both modular and non-modular network structures are considered. The best classification and localization scheme is found to be the neural network classifier trained with the wavelet transform of the sonar signals. This method is applied to map-building in mobile robot environments. Physically different sensors such as infrared sensors and structured-light systems besides sonar sensors are also considered to improve the performance in target classification and localization.
Download: Thesis
|
| 8 |
DEVELOPMENT OF SINGLE AND MULTICHANNEL 2-D DELTA DOMAIN LATTICE FILTER AND APPLICATIONS TO IMAGE RESTORATION
by C. Mehmet Hendekli
from Bogazici University in 2000
Details |
Supervisor(s): Aysin Ertuzun
Keywords(s): 2-D lattice filters, delta domain lattice filter, image restoration
Abstract: In this work the theoretical development of single and multichannel 2-D delta
domain lattice filter has been done and the mathematical contributions have been supported
by the simulations for noise removal purposes which give quite satisfactory results even
under very low signal-to-noise ratio conditions. In the experiments, images degraded by
either additive Gaussian noise or multiplicative noise have been processed by using a joint
process estimation algorithm involving the new developed 2-D delta domain lattice filter
structure. Finally, necessary discussions have been made and the conclusions have been
drawn.
Download: Thesis
|
| 9 |
Density-Based Shape Descriptors and Similarity Learning for 3D Object Retrieval
by Ceyhun Burak Akgül
from Boğaziçi University EE Dept., ENST Telecom ParisTech in 2007
Details |
Supervisor(s): Bülent Sankur, FrancisSchmitt
Keywords(s): Content-based retrieval, 3D shape descriptors, kernel density estimation, statistical learning, risk minimization
Abstract: Next generation search engines will enable query formulations, other than text, relying
on visual information encoded in terms of images and shapes. The 3D search technology,
in particular, targets specialized application domains ranging from computer aided-design
and manufacturing to cultural heritage archival and presentation. Content-based retrieval
research aims at developing search engines that would allow users to perform a query by
similarity of content.
This thesis deals with two fundamentals problems in content-based 3D object retrieval:
(1) How to describe a 3D shape to obtain a reliable representative for the subsequent
task of similarity search?
(2) How to supervise the search process to learn inter-shape similarities for more eective
and semantic retrieval?
Concerning the rst problem, we develop a novel 3D shape description scheme based
on probability density of multivariate local surface features. We constructively obtain
local characterizations of 3D points on a 3D surface and then summarize the resulting
local shape information into a global shape descriptor. For probability density estimation,
we use the general purpose kernel density estimation methodology, coupled with a fast
approximation algorithm: the fast Gauss transform. The conversion mechanism from local
features to global description circumvents the correspondence problem between two shapes
and proves to be robust and eective. Experiments that we have conducted on several 3D
object databases show that density-based descriptors are very fast to compute and very
eective for 3D similarity search.
Concerning the second problem, we propose a similarity learning scheme that incorporates
a certain amount of supervision into the querying process to allow more eective
and semantic retrieval. Our approach relies on combining multiple similarity scores by
optimizing a convex regularized version of the empirical ranking risk criterion. This score
fusion approach to similarity learning is applicable to a variety of search engine problems
using arbitrary data modalities. In this work, we demonstrate its eectiveness in 3D object
retrieval.
7
Download: Thesis
|
| 10 |
Sequential Bayesian Modeling of Non-stationary Non-Gaussian Processes
by Deniz Gençağa
from Bogazici University in 2007
Details |
Supervisor(s): Aysin Ertuzun, Ercan Kuruoglu
Keywords(s): Bayesian signal processing, Time-varying autoregressive process, nonstationary mixtures, source separation
Abstract: are involved until the development of Sequential
Monte Carlo techniques which are also known as the particle filters. In particle filtering,
the problem is expressed in terms of state-space equations where the linearity and
Gaussianity requirements of the Kalman filtering are generalized. Therefore, we need
information about the functional form of the state variations. In this thesis, we bring a
general solution for the cases where these variations are unknown and the process
distributions cannot be expressed by any closed form probability density function. Here,
we propose a novel modeling scheme which is as unified as possible to cover all these
problems. Therefore we study the performance analysis of our unifying particle filtering
methodology on non-stationary Alpha Stable process modeling. It is well known that the
probability density functions of these processes cannot be expressed in closed form, except
for some limited number of cases. Moreover, this distribution family presents a direct
generalization from Gaussian to non-Gaussian distributions, since they have common
properties, such as the stability property and the Central Limit Theorem. To model time
structures of these processes, linear autoregressions are utilized, which are widely used in
the literature. We propose three novel techniques to model non-stationary alpha stable
processes. These include the modeling of time-varying autoregressive processes with
known, unknown and constant, unknown and time-varying distribution parameters,
respectively. Successful performances of these techniques have been shown by empirical
analysis. It has also been demonstrated that the empirical results approach to their posterior
ii
Cramer Rao Lower Bound values and time-varying alpha stable processes can be modeled
in their most general form succesfully.
Next, to extend our unifying approach to model non-stationary cross-correlated
vector autoregressive processes which are widely encountered in biomedical applications,
mobile communications and chemical process modeling. Here, we extend our particle
filtering scheme to multivariate cases so that the relationships between different processes
can also be modeled. By means of our novel methodology, relationships between non-
Gaussian vector autoregressive processes can also be modeled. Successful simulation
results show that this extension can be used as a building block to model more challenging
problems which are discussed below.
Finally, to provide a solution to model non-stationary mixtures of cross-correlated
processes, our methodology is expanded to its most unifying form. This modeling scheme
can also be interpreted as a Dependent Component Analysis where both the mixing matrix
and the latent processes (sources) are modeled by only observing their mixtures. Here, we
propose two novel techniques. First method is used to model non-stationary mixtures of
cross-correlated processes which do not possess time structures, while the second one is
utilized for modeling non-stationary mixtures of cross-correlated autoregressive processes.
Successful simulation results verify that our particle filtering methodology is very flexible
and provides a unifying solution for the modeling of non-stationary processes in all cases
described above.
Download: Thesis
|
| 11 |
PRIORITIZED 3D SCENE RECONSTRUCTION AND RATE-DISTORTION
by Evren Imre
from Middle East Technical University / Electrical and Electronics Engineering in 2007
Details |
Supervisor(s): Assoc Prof. Dr. A. Aydın Alatan
Keywords(s): Feature tracking, structure-from-motion, rate-distortion efficient scene representation
Abstract: In this dissertation, a novel scheme performing 3D reconstruction of a scene from a 2D video sequence is presented. To this aim, first, the trajectories of the salient features in the scene are determined as a sequence of displacements via Kanade-Lukas-Tomasi tracker and Kalman filter. Then, a tentative camera trajectory with respect to a metric reference reconstruction is estimated. All frame pairs are ordered with respect to their amenability to 3D reconstruction by a metric that utilizes the baseline distances and the number of tracked correspondences between the frames. The ordered frame pairs are processed via a sequential structure-from- motion algorithm to estimate the sparse structure and camera matrices. The metric and the associated reconstruction algorithm are shown to outperform their counterparts in the literature via experiments. Finally, a mesh-based, rate- distortion efficient representation is constructed through a novel procedure driven by the error between a target image, and its prediction from a reference image and the current mesh. At each iteration, the triangular patch, whose projection on the predicted image has the largest error, is identified. Within this projected region and its correspondence on the reference frame, feature matches are extracted. The pair with the least conformance to the planar model is used to determine the vertex to be added to the mesh. The procedure is shown to outperform the dense depth-map representation in all tested cases, and the block motion vector representation, in scenes with large depth range, in rate-distortion sense.
Download: Thesis
|
| 12 |
Improving the Performance of Speaker Identification Systems by Classifier Combination Techniques
by Hakan ALTINÇAY
from Middle East Technical University / Electrical and Electronics Engineering Department in 2000
Details |
Supervisor(s): Mübeccel Demirekler
Keywords(s): Multiple Classifier Systems, Linear and Logarithmic Opinion Pool, Weight Estimation, Contextual Information, Model Clustering, Dempster-Shafer Formalism, Complementariness, Statistical Pattern Recognition, Speaker Identification
Abstract: In this thesis, speaker identification problem is addressed and the use of multiple classifier systems for this purpose is studied. A method of selecting the classifiers which provide complementary information for the combination operation is proposed. Using this method, two classifiers are selected to be used in the combination operation.
The study can be considered in two parts. In the first part, we describe a relation between classification systems and information transmission systems. By looking at the classification systems from this perspective, we propose a method of classifier weight estimation for the linear and logarithmic opinion pool type classifier combination schemes for which some tools from information theory are used. These weights provide contextual information about the classifiers such as class dependent classifier reliability and global classifier reliability. A measure for decision consensus among the classifiers is also proposed which is formulated as a multiplicative part of the classifier weights. Simulation experiments in closed-set speaker identification have shown that the method of weight estimation described improved the identification rates of both linear and logarithmic opinion type combination schemes.
In the second part, a completely different rank-based classifier combination technique is studied. The combination scheme is based on Dempster-Shafer theory of evidence as the theory is well suited for the representation and processing of uncertain or missing information. The method is based on the extraction of ranking statistics. These statistics are used to define confusion matrices for different ranks. Using these rank confusion matrices, the speakers are clustered into model sets where they share set specific properties. Some of these model sets are used to reflect the strengths and weaknesses of the classifiers where some others carry speaker dependent ranking statistics of the corresponding classifier. These information sets from multiple classifiers are combined to arrive at a joint decision. For the combination task, a rule-based algorithm is developed where Dempster's rule of combination is applied in the final step. Our simulation results have shown that the proposed method perform better compared to some other rank-based combination methods.
Download: Thesis
|
| 13 |
Audio Watermarking, Steganalysis Using Audio Quality Metrics, and Robust Audio Hashing
by Hamza ÖZER
from Boğaziçi University in 2005
Details |
Supervisor(s): Prof. Dr. Bülent Sankur, Prof. Dr. Emin Anarım, Prof. Dr. Nasir Memon
Keywords(s): Audio Watermarking, Steganalysis, Audio Hashing
Abstract: We propose a technique for the problem of detecting the very presence of hidden
messages in an audio object. The detector is based on the characteristics of the denoised
residuals of the audio file. Our proposition is established upon the presupposition that
the hidden message in a cover object leaves statistical evidence that can be detected
with the use of some audio distortion measures. The distortions caused by hidden
message are measured in terms of objective and perceptual quality metrics. The detector
discriminates between cover and stego files using a selected subset of features and an
SVM classifier. We have evaluated the detection performance of the proposed
steganalysis technique with the well-known watermarking and steganographic methods.
We present novel and robust audio fingerprinting techniques based on the
summarization of the time-frequency spectral characteristics of an audio object. The
perceptual hash functions are based on the periodicity series of the fundamental and on
the singular-value description of the cepstral frequencies. The proposed hash functions
are found, on the one hand, to perform very satisfactorily in identification and
verification tests, and on the other hand, to be very resilient to a large variety of attacks.
Moreover we address the issue of security of hashes and propose a keying technique,
thus a key dependent hashing.
We also present a non-oblivious, extremely robust watermarking scheme for audio
signals. The watermarking algorithm is based on the SVD of the spectrogram of the
signal. Thus the SVD of the spectrogram is modified according to the watermarking
bits. The algorithm is tested for inaudibility performance with audio quality measures
and robustness tests with audio stirmark benchmark tool, which have a variety of
common signal processing distortions. The mean bit error rate is 0.629 percent.
Download: Thesis
|
| 14 |
IMAGE QUALITY STATISTICS AND THEIR USE IN STEGANALYSIS AND COMPRESSION
by Ismail Avcibas
from Bogazici University in 2001
Details |
Supervisor(s): Bulent Sankur
Keywords(s): Image Quality Measures, Analysis of Variance, Steganalysis, Image Coding
Abstract: We categorize comprehensively image quality measures, extend measures defined for gray scale images to their multispectral case, and propose novel image quality measures. The statistical behavior of the measures and their sensitivity to various kinds of distortions, data hiding and coding artifacts are investigated via Analysis of Variance techniques. Their similarities or differences have been illustrated by plotting their Kohonen maps. Measures that give consistent scores across an image class and that are sensitive to distortions and coding artifacts are pointed out.
We present techniques for steganalysis of images that have been potentially subjected to watermarking or steganographic algorithms. Our hypothesis is that watermarking and steganographic schemes leave statistical evidence that can be exploited for detection with the aid of image quality features and multivariate regression analysis. The steganalyzer is built using multivariate regression on the selected quality metrics. In the absence of the ground-truth, a common reference image is obtained based on blurring. Simulation results with the chosen feature set and well-known watermarking and steganographic techniques indicate that our approach is able to reasonably accurately distinguish between marked and unmarked images.
We also present a technique that provides progressive transmission and near-lossless compression in one single framework. The proposed technique produces a bitstream that results in progressive reconstruction of the image just like what one can obtain with a reversible wavelet codec. In addition, the proposed scheme provides near-lossless reconstruction with respect to a given bound after each layer of the successively refinable bitstream is decoded. Experimental results for both lossless and near-lossless cases are presented, which are competitive with the state-of-the-art compression schemes.
Download: Thesis
|
| 15 |
Cross-Lingual Voice Conversion
by Oytun Türk
from Boğaziçi University Electrical and Electronics Engineering in 2007
Details |
Supervisor(s): Levent Arslan
Keywords(s): voice conversion, cross lingual
Abstract: Cross-lingual voice conversion refers to the automatic transformation of a
source speaker’s voice to a target speaker’s voice in a language that the target speaker
can not speak. It involves a set of statistical analysis, pattern recognition, machine
learning, and signal processing techniques. This study focuses on the problems related
to cross-lingual voice conversion by discussing open research questions, presenting
new methods, and performing comparisons with the state-of-the-art techniques. In the
training stage, a Phonetic Hidden Markov Model based automatic segmentation and
alignment method is developed for cross-lingual applications which support textindependent
and text-dependent modes. Vocal tract transformation function is
estimated using weighted speech frame mapping in more detail. Adjusting the weights,
similarity to target voice and output quality can be balanced depending on the
requirements of the cross- lingual voice conversion application. A context-matching
algorithm is developed to reduce the one-to-many mapping problems and enable nonparallel
training. Another set of improvements are proposed for prosody transformation
including stylistic modeling and transformation of pitch and the speaking rate. A high
quality cross-lingual voice conversion database is designed for the evaluation of the
proposed methods. The database consists of recordings from bilingual speakers of
American English and Turkish. It is employed in objective and subjective evaluations,
and in case studies for testing new ideas in cross- lingual voice conversion.
Download: Thesis
|
| 16 |
VOICE TRANSFORMATION AND DEVELOPMENT OF RELATED SPEECH ANALYSIS TOOLS FOR TURKISH
by Ozgul SALOR
from METU / Electrical and Electronics Engineering in 2005
Details |
Supervisor(s): Mubeccel Demirekler
Keywords(s): voice transformation, phonetic aligner, phoneme recognizer, phonetic alphabet, speech corpus
Abstract: In this dissertation, new approaches in the design of a voice transformation(VT)system for Turkish are proposed. Objectives in this thesis are two-fold. The first objective is to develop standard speech corpora and segmentation tools for Turkish speech research. The second objective is to consider new approaches for VT.
A triphone-balanced set of 2462 Turkish sentences is prepared for analysis. Audio corpus of 100 speakers, each uttering 40 sentences out of the 2462-sentence set, is used to train a speech recognition system designed for English.
This system is ported to Turkish to obtain a phonetic aligner and a phoneme recognizer. The triphone-balanced sentence set and the phonetic aligner are used to develop a speech corpus for VT.
A new voice transformation approach based on Mixed Excitation Linear Prediction (MELP) speech coding framework is proposed. Multi-stage vector
quantization of MELP is used to obtain speaker-specific line-spectral frequency (LSF) codebooks for source and target speakers. Histograms mapping the LSF spaces of source and target speakers are used for transformation in the baseline system. The baseline system is improved by a dynamic programming approach to estimate the target LSFs. As a second approach to the VT problem, quantizing the LSFs using k-means clustering algorithm is applied with dimension reduction of LSFs using principle component analysis. This approach provides speaker specific codebooks out of the speech corpus instead of using MELP's pre-trained LSF codebook. Evaluations show that both dimension reduction and dynamic programming improve the transformation performance.
Download: Thesis
|
| 17 |
Multiple Objective Optimization for Video Streaming
by Tanir Ozcelebi
from Koc University in 2007
Details |
Supervisor(s): Prof. Murat Tekalp
Keywords(s): cross-layer optimization, video streaming, multiple object optimization, quality of service
Abstract: In this thesis, we propose Multiple Objective Optimization
(MOO) frameworks for efficient video streaming. Firstly, we introduce
pre-roll delay-distortion optimization (DDO) for uninterrupted
content-adaptive video streaming over low capacity, constant bitrate
(CBR) channels using MOO. Content analysis is used to divide the input
video into shots with assigned relevance levels. The video is adaptively
encoded and streamed aiming minimum pre-roll delay and distortion with
the optimal spatial and temporal resolutions and quantization parameters
for each shot. With buffer and distortion constraints, the bitrate of
unimportant shots is reduced to achieve an acceptable quality in
important shots. Secondly, we introduce a cross-layer optimized video
rate adaptation and scheduling scheme to achieve maximum "application
layer" Quality-of-Service (QoS), maximum video throughput (video seconds
per transmission slot), and QoS fairness for wireless video streaming.
Using the MOO framework, these objectives are jointly optimized such
that the user with i) the least remaining playback time, ii) highest
available video throughput and iii) maximum video quality is served.
Finally, we propose an adaptive framework for compression and streaming
of stereo video using the existing network infrastructure. We employ
content-adaptive stereo video coding (CA-SC), where additional
compression is achieved by spatial and/or temporal downsampling
depending on the content. An end-to-end streaming system where the
end-users can view the video in mono or stereo mode depending on their
display capabilities is implemented and MOO formulations are proposed.
The improvements achieved are demonstrated with experimental results.
Download: Thesis
|
| 18 |
A comparative analysis of different approaches to target differentiation and localization using infrared sensors
by Tayfun Aytaç
from Bilkent University, Department of Electrical and Electronics Engineering in 2006
Details |
Supervisor(s): Prof. Billur Barshan
Keywords(s): infrared sensors, optical sensing, target differentiation, target localization, surface recognition, position estimation, feature extraction, statistical pattern recognition, artificial neural networks
Abstract: This study compares the performances of various techniques for the differentiation and localization of commonly encountered features in indoor environments,
such as planes, corners, edges, and cylinders, possibly with different surface properties, using simple infrared sensors. The intensity measurements obtained from
such sensors are highly dependent on the location, geometry, and surface properties of the reflecting feature in a way that cannot be represented by a simple
analytical relationship, therefore complicating the localization and differentiation
process. The techniques considered include rule-based, template-based, and neural network-based target differentiation, parametric surface differentiation, and statistical pattern recognition techniques such as parametric density estimation, various linear and quadratic classifiers, mixture of normals, kernel estimator, k-nearest neighbor, artificial neural network, and support vector machine classifiers. The geometrical properties of the targets are more distinctive than their
surface properties, and surface recognition is the limiting factor in differentiation.
Mixture of normals classifier with three components correctly differentiates three
types of geometries with different surface properties, resulting in the best performance (100%) in geometry differentiation. For a set of six surfaces, we get a correct differentiation rate of 100% in parametric differentiation based on reflection
modeling. The results demonstrate that simple infrared sensors, when coupled
with appropriate processing, can be used to extract substantially more information than such devices are commonly employed for. The demonstrated system
would find application in intelligent autonomous systems such as mobile robots
whose task involves surveying an unknown environment made of different geometry and surface types. Industrial applications where different materials/surfaces
must be identified and separated may also benefit from this approach.
Download: Thesis
|
| 19 |
Improved State Estimation for Jump Markov Linear Systems
by Umut Orguner
from Middle East Technical University / Department of Electrical and Electronics Engineering in 2006
Details |
Supervisor(s): Prof. Mübeccel Demirekler
Keywords(s): Multiple model, state estimation, jump Markov linear system, transition probability, Markov chain, interacting multiple model, IMM, risk sensitive
Abstract: This thesis presents a comprehensive example framework on how current multiple model state estimation algorithms for jump Markov linear systems can be improved. The possible improvements are categorized as:
-Design of multiple model state estimation algorithms using new criteria.
-Improvements obtained using existing multiple model state estimation algorithms.
In the first category, risk-sensitive estimation is proposed for jump Markov linear systems. Two types of cost functions namely, the instantaneous and cumulative cost functions related with risk-sensitive estimation are examined and for each one, the corresponding multiple model estate estimation algorithm is derived. For the cumulative cost function, the derivation involves the reference probability method where one defines and uses a new probability measure under which the involved processes has independence properties. The performance of the proposed risk-sensitive filters are illustrated and compared with conventional algorithms using simulations.
The thesis addresses the second category of improvements by proposing
-Two new online transition probability estimation schemes for jump Markov linear systems.
-A mixed multiple model state estimation scheme which combines desirable properties of two different multiple model state estimation methods.
The two online transition probability estimators proposed use the recursive Kullback-Leibler (RKL) procedure and the maximum likelihood (ML) criteria to derive the corresponding identification schemes. When used in state estimation, these methods result in an average error decrease in the root mean square (RMS) state estimation errors, which is proved using simulation studies.
The mixed multiple model estimation procedure which utilizes the analysis of the single Gaussian approximation of Gaussian mixtures in Bayesian filtering, combines IMM (Interacting Multiple Model) filter and GPB2 (2nd Order Generalized Pseudo Bayesian) filter efficiently. The resulting algorithm reaches the performance of GPB2 with less Kalman filters.
Download: Thesis
|
|
|