Detecting Subjectivity in The News Texts in Turkish

Summary:

Subjectivity and sentiment analysis research has gained increasing attention in the recent years like many language technologies. Its aim is to investigate and to develop techniques to recognize subjectivity or sentiment in human-generated content such as text, speech or image. While subjectivity and sentiment detection tasks are necessarily related to each other, subjectivity detection is relatively understudied and needs more attention, being a challenging problem even for humans. For capturing subjectivity clues in the text, various linguistic properties are made use of and for predicting the subjectivity of an unknown piece of text, machine learning methods are applied. In this respect, the subjectivity detection problem can be reduced to a text classication problem. A set of texts evaluated for some predened clues of subjectivity, are input to a learning module, which will predict if a given unknown piece of text is subjective or objective. In this work, we study subjectivity detection in news items using machine learning methods and develop a framework that runs at the document-level. We assume that the descriptive features of expressions is a good candidate to capture the subjective tone in texts and based on this premise, propose a novel feature set for subjectivity classication. We implement a supervised scheme and extensively evaluate it on a dataset which we have collected and annotated. Our ndings present new directions and useful contributions to the subjectivity detection literature. We introduce the rst subjectivity detection system in Turkish language, present our new database with annotations and report high accuracy in subjectivity detection.

Özet:

Taraflılık ve olumluluk analizi, son yıllarda oldukça ilgi çeken bir araştırma alanı haline geldi. Bu alanda, temel olarak, metin, konuşma veya resim gibi içeriklerde taraflılık veya olumluluk gibi özellikler olup olmadığını bulmayı sağlayan yöntemler geliştirilir ve araştırılır. Taraflılık ve olumluluk analizi alanları, adlarının da çağrıştırdığı gibi, birbirleriyle oldukça ilgilidir fakat taraflılık analizi görece daha az ilgi görmüş bir alandır ve insanlar içinbile zor bir konu olmasından dolayı daha fazla çalışılmaya muhtaçtır. Taraflılık tespitini ilk alt probleme ayırabiliriz; birincisi, taraflılık özelliklerini çıkarmak ve ikinci olarak, verilen yeni bir metnin taraflılığını tahmin etmek. Birinci problem için, dilbilimsel özellikler başlıca başvuru kaynaklarındandır. Taraflılığın tahmininde ise çoğunlukla yapay öğrenme yöntemleri kullanılır. Bu açıdan, taraflılık tespiti problemi, bir çeşit metin sınıflandırma problemine indirgenebilir. Biz bu çalışma-da, yapay öğrenme yöntemlerini kullanarak, haber metinlerinde taraflılık tespiti problemi konusunda çalıştık ve doküman seviyesinde çalışan bir uygulama geliştirdik. Metinlerdeki betimleyici öğelerin taraflı tonu yakalamada iyi bir özellik olabileceği önkabulü altında, taraflılık sınıflandırması için yeni bir öznitelik kümesi tanımladık. Denetimli öğrenme algoritmaları kullanarak, yöntemimizi kendi topladığımız ve etiketlediğimiz bir veri seti üzerinde test edip değerlendirdik. Yöntemimizin ve deney bulgularımızın, taraflılık tespiti alanına katkı sunacak nitelikte kullanışlı olduğunu gördük; deneylerdeki başarım değerlerinin de düşük olmadığını gözlemledik. Sonuç olarak, bu çalışma ile Türkçede yapılmış ilk taraflılık sınıflandırması sistemini, etiketlenmiş yeni bir veri seti ile beraber sunuyoruz.

Search form

Main Menu

Detecting Subjectivity in The News Texts in Turkish