Main Navigation
Main Content
Sidebar

Russian Digital Libraries Journal

Home
About
Current
Archives
Register
Login
Search

Published since 1998

ISSN 1562-5419

16+

Language

Русский
English

Search

Search articles for

Advanced filters

Published After

Published Before

By Author

Search Results

Title extraction from english scientific books in PDF format

Дмитрий Сергеевич Филиппов

392-411

Abstract:

Relevance of the issue under study is due to tenuity of methods proposed by other researchers that use simple heuristics or machine learning algorithms. The purpose of the article is to provide better way to extract titles from scientific PDF documents and offer better and more reasonable approach to title selection generally. The leading approach to the study is regard as many cases and problems appeared during extraction as possible and find an approach to solve all of them. The results showed the efficiency of chosen approach in case of having a document set with all of considered problems. The research highlights that deep analysis of current task problem is a perspective to make the best solutions and tools. The article may be useful for all researchers and developers who often encounter the problem of document structural analysis or title detection as secondary task of a main program workflow.

Keywords: Pdf processing, title extraction, header extraction, strategy based approach, title heuristic, structural analysis, style information, text analysis, document analysis, information extraction.

1 - 1 of 1 items

Information

For Readers
For Authors
For Librarians

Make a Submission

Current Issue

Russian Digital Libraries Journal

ISSN 1562-5419

Information

About the Journal
Aims and Scopes
Themes
Author Guidelines
Submissions
Privacy Statement
Contact
eLIBRARY.RU
dblp computer science bibliography

Send a manuscript

Authors need to register with the journal prior to submitting or, if already registered, can simply log in and begin the five-step process.

Make a Submission