Using Titles vs. Full-text as Source for Automated Semantic Document Annotation
Using Titles vs. Full-text as Source for Automated Semantic Document Annotation
A substantial portion of today’s largest Knowledge Graph—the Linked Open Data (LOD) cloud—is composed of metadata describing various types of documents such as scientific publications, news reports, and media articles. Although broad access to this metadata represents an important advancement, the task of assigning semantic annotations and organizing documents along semantic concepts remains limited in several respects. Traditional semantic annotation methods, such as concept assignment based on the SKOS standard, typically require full-text analysis of the documents.
This study presents the first systematic evaluation of multiple classification approaches to examine the feasibility of performing semantic annotation using only metadata—specifically, document titles—which are published as labels within the LOD cloud. We compare the classification results obtained from titles with those derived from full-text analysis. In addition to traditional methods such as kNN and SVM, the study investigates modern techniques including Learning to Rank and neural networks, while also revisiting foundational models such as logistic regression, Rocchio, and Naive Bayes.
Experimental results show that, in three out of four datasets, title-only classification achieves more than 90% of the quality of full-text classification. This finding demonstrates the viability of fully title-based automatic semantic annotation and opens up promising opportunities for expanding and enriching Knowledge Graphs.
Xem thêm
Please wait while flipbook is loading. For more related info, FAQs and issues please refer to DearFlip WordPress Flipbook Plugin Help documentation.



