Onur Güngör

Part-time faculty at Bogazici University & Data Science Manager at Udemy

Boğaziçi University

Welcome

I received my PhD at Boğaziçi University Computer Engineering Department. My research focuses on named entity recognition for morphologically rich languages [ 1, 2, 3], but I also write papers about explaining NLP predictions [ 4], compiling interesting corpora [ 5], and correcting annoying spelling errors [ 6].

I also work as a data science manager at Udemy developing systems that solve business problems using natural language processing methods. For details of my industrial experience, please refer to my LinkedIn profile.

Interests

Named entity recognition
Morphologically rich languages
Morphological disambiguation
Large language models

Education

PhD in Computer Eng., 2021

Boğaziçi University
MS in Computer Eng., 2009

Boğaziçi University
BS in Computer Eng., 2006

Boğaziçi University

News

I started giving a graduate level lecture called Transformer-based Pre-training Methods in Natural Language Processing at Bogazici University Computer Engineering in Fall 22/23.
I defended my PhD thesis on 19 February 2021 and awarded with the PhD title!
We recieved a lot of media attention about our latest online experiment ( BOUN Newsletter, Milliyet Pazar)!
We are running an experiment aiming to compare human and machine attention when solvind spelling errors related to ``-de/-da’’ clitics in Turkish. Turkish speaking people are invited to participate, and try out our error correction model with new sentences.

Featured Publications

Onur Güngör, Tunga Güngör, Suzan Uskudarli

January 2019 Natural Language Engineering

The effect of morphology in named entity recognition with sequence tagging

This work proposes a sequential tagger for named entity recognition in morphologically rich languages. Several schemes for representing the morphological analysis of a word in the context of named entity recognition are examined. Word representations are formed by concatenating word and character embeddings with the morphological embeddings based on these schemes. The impact of these representations is measured by training and evaluating a sequential tagger composed of a conditional random field layer on top of a bidirectional long short-term memory layer. Experiments with Turkish, Czech, Hungarian, Finnish and Spanish produce the state-of-the-art results for all these languages, indicating that the representation of morphological information improves performance.

PDF Publisher's site Code

Onur Gungor, Suzan Uskudarli, Tunga Gungor

August 2018 Proceedings of the 27th International Conference on Computational Linguistics (COLING 2018)

Improving Named Entity Recognition by Jointly Learning to Disambiguate Morphological Tags

Previous studies have shown that linguistic features of a word such as possession, genitive or other grammatical cases can be employed in word representations of a named entity recognition (NER) tagger to improve the performance for morphologically rich languages. However, these taggers require external morphological disambiguation (MD) tools to function which are hard to obtain or non-existent for many languages. In this work, we propose a model which alleviates the need for such disambiguators by jointly learning NER and MD taggers in languages for which one can provide a list of candidate morphological analyses. We show that this can be done independent of the morphological annotation schemes, which differ among languages. Our experiments employing three different model architectures that join these two tasks show that joint learning improves NER performance. Furthermore, the morphological disambiguator’s performance is shown to be competitive.

PDF ACL Anthology Code

Onur Güngör, Eray Yıldız, Suzan Uskudarli, Tunga Güngör

June 2017 arXiv preprint arXiv:1706.00506

Morphological embeddings for named entity recognition in morphologically rich languages

In this work, we present new state-of-the-art results of 93.59,% and 79.59,% for Turkish and Czech named entity recognition based on the model of (Lample et al., 2016). We contribute by proposing several schemes for representing the morphological analysis of a word in the context of named entity recognition. We show that a concatenation of this representation with the word and character embeddings improves the performance. The effect of these representation schemes on the tagging performance is also investigated.

PDF arXiv page

Onur Güngör, Eray Yıldız

May 2017 2017 25th Signal Processing and Communications Applications Conference (SIU)

Linguistic features in Turkish word representations

Distributed word representations which are learned using unsupervised methods are employed in many Natural Language Processing (NLP) tasks. They have led to state-of-the-art results in many NLP tasks for many languages. There have been studies reporting that word representations include morphological and semantical information. There are also work that aim to propose word representations which handle the morphological and syntactical information better. However, studies that evaluate the quality of the word representations for morphologically rich languages like Turkish are limited. In this study, we aim to explore the syntactic and morphological information captured by the Turkish word representations which are learned using skip-gram method on a large corpus. To assess the quality of information found in relations between Turkish word embeddings, analogical reasoning task is performed using couples consisting of root words and their inflected or derivative forms. We contribute with detailed experiments and show that word embeddings trained with skip-gram method have differing capabilities in capturing information for inflection and derivation groups in Turkish. We make the test sets and word embeddings publicly available to other researchers for further research.

PDF Publisher's site Code

Onur Güngör, Tunga Güngör

March 2010 International Conference on Intelligent Text Processing and Computational Linguistics

Morphological annotation of a corpus with a collaborative multiplayer game

In most of the natural language processing tasks, state-of-the-art systems usually rely on machine learning methods for building their mathematical models. Given that the majority of these systems employ supervised learning strategies, a corpus that is annotated for the problem area is essential. The current method for annotating a corpus is to hire several experts and make them annotate the corpus manually or by using a helper software. However, this method is costly and time-consuming. In this paper, we propose a novel method that aims to solve these problems. By employing a multiplayer collaborative game that is playable by ordinary people on the Internet, it seems possible to direct the covert labour force so that people can contribute by just playing a fun game. Through a game site which incorporates some functionality inherited from social networking sites, people are motivated to contribute to the annotation process by answering questions about the underlying morphological features of a target word. The experiments show that the 63.5% of the actual question types are successful based on a two-phase evaluation.

Recent Publications

Quickly discover relevant content by filtering publications.

Onur Güngör

March 2021 Bogazici University PhD Thesis

Neural Named Entity Recognition for Morphologically Rich Languages

Named entity recognition (NER) is an important task in natural language processing (NLP). Until the revival of neural network based models for NLP, NER taggers employed traditional machine learning approaches or finite-state transducers to detect the entities in a given sentence. Neural models improved the state-of-the-art performance with sequence-based models and word embeddings. These approaches neglect the morphological information embedded in the surface forms of the words. In this thesis, we introduce two NER taggers that utilize such information, which we show to be significant for morphologically rich languages. Using these taggers, we improve the state-of-the-art performance levels for Turkish, Czech, Hungarian, Finnish, and Spanish. The ablation studies show that these improvements result from the inclusion of morphological information. We also show that it is possible for the neural network to also learn how to disambiguate morphological analyses, thereby, eliminating the dependence on external morphological disambiguators that are not always available. In the second part of this thesis, we propose a model agnostic approach for explaining any sequence-based NLP task by extending a well-known feature-attribution method. We assess the plausibility of the explanations for our NER tagger for Turkish and Finnish through several novel experiments.

PDF

Onur Güngör, Tunga Güngör, Suzan Uskudarli

December 2020 PLOS ONE

EXSEQREG: Explaining sequence-based NLP tasks with regions with a case study using morphological features for named entity recognition

The state-of-the-art systems for most natural language engineering tasks employ machine learning methods. Despite the improved performances of these systems, there is a lack of established methods for assessing the quality of their predictions. This work introduces a method for explaining the predictions of any sequence-based natural language processing (NLP) task implemented with any model, neural or non-neural. Our method named EXSEQREG introduces the concept of region that links the prediction and features that are potentially important for the model. A region is a list of positions in the input sentence associated with a single prediction. Many NLP tasks are compatible with the proposed explanation method as regions can be formed according to the nature of the task. The method models the prediction probability differences that are induced by careful removal of features used by the model. The output of the method is a list of importance values. Each value signifies the impact of the corresponding feature on the prediction. The proposed method is demonstrated with a neural network based named entity recognition (NER) tagger using Turkish and Finnish datasets. A qualitative analysis of the explanations is presented. The results are validated with a procedure based on the mutual information score of each feature. We show that this method produces reasonable explanations and may be used for i) assessing the degree of the contribution of features regarding a specific prediction of the model, ii) exploring the features that played a significant role for a trained model when analyzed across the corpus.

PDF DOI Publisher's site Code

See all publications

Onur Güngör

Part-time faculty at Bogazici University & Data Science Manager at Udemy

Boğaziçi University

Welcome

Interests

Education

News

Recent Posts

26 Sheep and 10 Goats Question and Chatgpt

Is Programming Dead?

Featured Publications

The effect of morphology in named entity recognition with sequence tagging

Improving Named Entity Recognition by Jointly Learning to Disambiguate Morphological Tags

Morphological embeddings for named entity recognition in morphologically rich languages

Linguistic features in Turkish word representations

Morphological annotation of a corpus with a collaborative multiplayer game

Recent Publications

Neural Named Entity Recognition for Morphologically Rich Languages

EXSEQREG: Explaining sequence-based NLP tasks with regions with a case study using morphological features for named entity recognition

Contact