Onur Güngör

Onur Güngör

PhD Student

Boğaziçi University

Welcome

I am a PhD student at Boğaziçi University Computer Engineering Department. My research focuses on named entity recognition for morphologically rich languages [ 1, 2, 3], but I also write papers about explaining NLP predictions [submitted], compiling interesting corpora [ 4], and correcting annoying spelling errors [ 5].

I also work as a senior data scientist at sahibinden.com developing systems that solve business problems using machine learning methods. For details of my industrial experience, please refer to my LinkedIn profile.

Interests

  • Named entity recognition
  • Morphologically rich languages
  • Morphological disambiguation
  • Games with a purpose

Education

  • PhD in Computer Eng., 2020

    Boğaziçi University

  • MS in Computer Eng., 2009

    Boğaziçi University

  • BS in Computer Eng., 2006

    Boğaziçi University

News

  • We recieved a lot of media attention about our latest online experiment ( BOUN Newsletter, Milliyet Pazar)!
  • We are running an experiment aiming to compare human and machine attention when solvind spelling errors related to ``-de/-da’’ clitics in Turkish. Turkish speaking people are invited to participate, and try out our error correction model with new sentences.

Recent Publications

Quickly discover relevant content by filtering publications.

Detecting Clitics Related Orthographic Errors in Turkish

For the spell correction task, vocabulary based methods have been replaced with methods that take morphological and grammar rules into account. However, such tools are fairly immature, and, worse, non-existent for many low resource languages. Checking only if a word is well-formed with respect to the morphological rules of a language may produce false negatives due to the ambiguity resulting from the presence of numerous homophonic words. In this work, we propose an approach to detect and correct the “de/da” clitic errors in Turkish text. Our model is a neural sequence tagger trained with a synthetically constructed dataset consisting of positive and negative samples. The model’s performance with this dataset is presented according to different word embedding configurations. The model achieved an F1 score of 86.67% on a synthetically constructed dataset. We also compared the model’s performance on a manually curated dataset of challenging samples that proved superior to other spelling correctors with 71% accuracy compared to the second-best (Google Docs) with and accuracy of 34%.

The effect of morphology in named entity recognition with sequence tagging

This work proposes a sequential tagger for named entity recognition in morphologically rich languages. Several schemes for representing the morphological analysis of a word in the context of named entity recognition are examined. Word representations are formed by concatenating word and character embeddings with the morphological embeddings based on these schemes. The impact of these representations is measured by training and evaluating a sequential tagger composed of a conditional random field layer on top of a bidirectional long short-term memory layer. Experiments with Turkish, Czech, Hungarian, Finnish and Spanish produce the state-of-the-art results for all these languages, indicating that the representation of morphological information improves performance.