Using Titles vs. Full-text as Source for Automated Semantic Document Annotation

Lukas Galke ZBW-Leibniz Information Centre for Economics, Kiel l.galke@zbw.eu

Using Titles vs. Full-text as Source for Automated Semantic Document Annotation

Tác giả: Lukas Galke ZBW-Leibniz Information Centre for Economics, Kiel l.galke@zbw.eu

Thể loại: Công nghệ thông tin

Từ khóa:

Chọn định dạng để tải về:

EPUB DOCX PDF MOBI

Lĩnh vực:

Công nghệ thông tin

Ngôn ngữ: Tiếng Anh

Nơi xuất bản: https://arxiv.org/pdf/1705.05311

Chia sẻ:

Using Titles vs. Full-text as Source for Automated Semantic Document Annotation

A substantial portion of today’s largest Knowledge Graph—the Linked Open Data (LOD) cloud—is composed of metadata describing various types of documents such as scientific publications, news reports, and media articles. Although broad access to this metadata represents an important advancement, the task of assigning semantic annotations and organizing documents along semantic concepts remains limited in several respects. Traditional semantic annotation methods, such as concept assignment based on the SKOS standard, typically require full-text analysis of the documents. This study presents the first systematic evaluation of multiple classification approaches to examine the feasibility of performing semantic annotation using only metadata—specifically, document...

A substantial portion of today’s largest Knowledge Graph—the Linked Open Data (LOD) cloud—is composed of metadata describing various types of documents such as scientific publications, news reports, and media articles. Although broad access to this metadata represents an important advancement, the task of assigning semantic annotations and organizing documents along semantic concepts remains limited in several respects. Traditional semantic annotation methods, such as concept assignment based on the SKOS standard, typically require full-text analysis of the documents.

This study presents the first systematic evaluation of multiple classification approaches to examine the feasibility of performing semantic annotation using only metadata—specifically, document titles—which are published as labels within the LOD cloud. We compare the classification results obtained from titles with those derived from full-text analysis. In addition to traditional methods such as kNN and SVM, the study investigates modern techniques including Learning to Rank and neural networks, while also revisiting foundational models such as logistic regression, Rocchio, and Naive Bayes.

Experimental results show that, in three out of four datasets, title-only classification achieves more than 90% of the quality of full-text classification. This finding demonstrates the viability of fully title-based automatic semantic annotation and opens up promising opportunities for expanding and enriching Knowledge Graphs.

Xem thêm

Please wait while flipbook is loading. For more related info, FAQs and issues please refer to DearFlip WordPress Flipbook Plugin Help documentation.

Tài liệu khác cùng chủ đề

Current & Emerging Computing Technology

Don Bentley

Thể loại: Công nghệ thông tin

Embedded Controllers Using C and Arduino / 2E

James Fiore

Thể loại: Sách mở

Real-time prediction of the week-ahead flood index using hybrid deep learning algorithms with synoptic climate mode indices

A.A. Masrur Ahmed, Shahida Akther, Thong Nguyen-Huy, Nawin Raj, S. Janifer Jabin Jui, S.Z. Farzana

Thể loại: Công nghệ thông tin

Spatiotemporal performance evaluation of high-resolution multiple satellite and reanalysis precipitation products over the semiarid region of India

"Elangovan Devadarshini, Kulanthaivelu Bhuvaneswari, Shanmugam Mohan Kumar, Vellingiri Geethalakshmi, Manickam Dhasarathan, Alagarsamy Senthil, Kandasam Senthilraja, Shahbaz Mushtaq, Thong Nguyen‑Huy, ThanhMai, Louis Kouadio"

Thể loại: Công nghệ thông tin

Forecasting Multi-Step Soil Moisture with Three-Phase Hybrid Wavelet-Least Absolute Shrinkage Selection Operator-Long Short-Term Memory Network (moDWT-Lasso-LSTM) Model

W. J. M. Lakmini Prarthana, Jayasinghe, Ravinesh C. Deo, Nawin Raj, Sujan Ghimire, Zaher Mundher Yaseen, Thong Nguyen-Huy, Afshin Ghahramani

Thể loại: Công nghệ thông tin

“The Leaf Essential Oils of Syzygium oblatum and Syzygium abortivum: Chemical Composition, Antimicrobial Activity, and Molecular Docking Study

Do Ngoc Dai, Le Thi Huong, Pham Thi Nhu Quynh, Nguyen Thi Le Quyen, Nguyen Ngoc Linh, Phi Thi Tuyet Nhung, Nguyen Xuan Ha, Ninh The Son

Thể loại: Công nghệ thông tin

High security and privacy protection model for STI/HIV risk prediction

Zhaohui Tang, Thi Phuoc Van Nguyen, Wencheng Yang, Xiaoyu Xia, Huaming Chen, Amy B. Mullens, Judith A. Dean, Sonya R Osborne ,Yan Li

Thể loại: Công bố quốc tế

Deep Learning-Assisted Sensitive 3C-SiC Sensor for Long-Term Monitoring of Physical Respiration

Thi Lap Tran, Duy Van Nguyen, Hung Nguyen, Thi Phuoc Van Nguyen, Pingan Song, Ravinesh C Deo, Clint Moloney, Viet Dung Dao, Nam-Trung Nguyen, Toan Dinh

Thể loại: Công nghệ thông tin