Using Titles vs. Full-text as Source for Automated Semantic Document Annotation

Tác giả: Lukas Galke ZBW-Leibniz Information Centre for Economics, Kiel l.galke@zbw.eu
Thể loại: Công nghệ thông tin
Lượt xem: 6
Từ khóa:
Chọn định dạng để tải về:
Lĩnh vực:
Ngôn ngữ: Tiếng Anh
Nơi xuất bản: https://arxiv.org/pdf/1705.05311
Chia sẻ:
Using Titles vs. Full-text as Source for Automated Semantic Document Annotation
A substantial portion of today’s largest Knowledge Graph—the Linked Open Data (LOD) cloud—is composed of metadata describing various types of documents such as scientific publications, news reports, and media articles. Although broad access to this metadata represents an important advancement, the task of assigning semantic annotations and organizing documents along semantic concepts remains limited in several respects. Traditional semantic annotation methods, such as concept assignment based on the SKOS standard, typically require full-text analysis of the documents. This study presents the first systematic evaluation of multiple classification approaches to examine the feasibility of performing semantic annotation using only metadata—specifically, document...

A substantial portion of today’s largest Knowledge Graph—the Linked Open Data (LOD) cloud—is composed of metadata describing various types of documents such as scientific publications, news reports, and media articles. Although broad access to this metadata represents an important advancement, the task of assigning semantic annotations and organizing documents along semantic concepts remains limited in several respects. Traditional semantic annotation methods, such as concept assignment based on the SKOS standard, typically require full-text analysis of the documents.

This study presents the first systematic evaluation of multiple classification approaches to examine the feasibility of performing semantic annotation using only metadata—specifically, document titles—which are published as labels within the LOD cloud. We compare the classification results obtained from titles with those derived from full-text analysis. In addition to traditional methods such as kNN and SVM, the study investigates modern techniques including Learning to Rank and neural networks, while also revisiting foundational models such as logistic regression, Rocchio, and Naive Bayes.

Experimental results show that, in three out of four datasets, title-only classification achieves more than 90% of the quality of full-text classification. This finding demonstrates the viability of fully title-based automatic semantic annotation and opens up promising opportunities for expanding and enriching Knowledge Graphs.


Xem thêm

Please wait while flipbook is loading. For more related info, FAQs and issues please refer to DearFlip WordPress Flipbook Plugin Help documentation.

Tài liệu khác cùng chủ đề

Current & Emerging Computing Technology

Don Bentley
Thể loại: Sách mở
Lượt xem: 4

Embedded Controllers Using C and Arduino / 2E

James Fiore
Thể loại: Công nghệ thông tin
Lượt xem: 8

Real-time prediction of the week-ahead flood index using hybrid deep learning algorithms with synoptic climate mode indices

A.A. Masrur Ahmed, Shahida Akther, Thong Nguyen-Huy, Nawin Raj, S. Janifer Jabin Jui, S.Z. Farzana
Thể loại: Công nghệ thông tin
Lượt xem: 72

Spatiotemporal performance evaluation of high-resolution multiple satellite and reanalysis precipitation products over the semiarid region of India

"Elangovan Devadarshini, Kulanthaivelu Bhuvaneswari, Shanmugam Mohan Kumar, Vellingiri Geethalakshmi, Manickam Dhasarathan, Alagarsamy Senthil, Kandasam Senthilraja, Shahbaz Mushtaq, Thong Nguyen‑Huy, ThanhMai, Louis Kouadio"
Thể loại: Công nghệ thông tin
Lượt xem: 71

Forecasting Multi-Step Soil Moisture with Three-Phase Hybrid Wavelet-Least Absolute Shrinkage Selection Operator-Long Short-Term Memory Network (moDWT-Lasso-LSTM) Model

W. J. M. Lakmini Prarthana, Jayasinghe, Ravinesh C. Deo, Nawin Raj, Sujan Ghimire, Zaher Mundher Yaseen, Thong Nguyen-Huy, Afshin Ghahramani
Thể loại: Công nghệ thông tin
Lượt xem: 92

“The Leaf Essential Oils of Syzygium oblatum and Syzygium abortivum: Chemical Composition, Antimicrobial Activity, and Molecular Docking Study

Do Ngoc Dai, Le Thi Huong, Pham Thi Nhu Quynh, Nguyen Thi Le Quyen, Nguyen Ngoc Linh, Phi Thi Tuyet Nhung, Nguyen Xuan Ha, Ninh The Son
Thể loại: Công bố quốc tế
Lượt xem: 76

High security and privacy protection model for STI/HIV risk prediction

Zhaohui Tang, Thi Phuoc Van Nguyen, Wencheng Yang, Xiaoyu Xia, Huaming Chen, Amy B. Mullens, Judith A. Dean, Sonya R Osborne ,Yan Li
Thể loại: Công nghệ thông tin
Lượt xem: 72

Deep Learning-Assisted Sensitive 3C-SiC Sensor for Long-Term Monitoring of Physical Respiration

Thi Lap Tran, Duy Van Nguyen, Hung Nguyen, Thi Phuoc Van Nguyen, Pingan Song, Ravinesh C Deo, Clint Moloney, Viet Dung Dao, Nam-Trung Nguyen, Toan Dinh
Thể loại: Công nghệ thông tin
Lượt xem: 71

Pyrolysis of wheat straw pellets in a pilot-scale reactor: Effect of temperature and residence time

Bidhan Nath, Guangnan Chen, Les Bowtell, Nguyen Huy Thong
Thể loại: Công nghệ thông tin
Lượt xem: 79

Pyrolytic Pathway of Wheat Straw Pellet by the Thermogravimetric Analyzer

Bidhan Nath, Les Bowtell, Guangnan Chen, Elizabeth Graham, Nguyen Huy Thong
Thể loại: Công bố quốc tế
Lượt xem: 79

Copula-Probabilistic Flood Risk Analysis with an Hourly Flood Monitoring Index

Ravinesh Chand, Nguyen Huy Thong, Ravinesh C. Deo , Sujan Ghimire, Mumtaz Ali, Afshin Ghahramani
Thể loại: Công nghệ thông tin
Lượt xem: 81

Бнаружение уязвимостей и применение методов обеспечения Безопасности веб-сайта

Nguyen Phuc Hau Le Duc Huy Nguyen Thuy Trang R.S. Zaripova
Thể loại: Công nghệ thông tin
Lượt xem: 69