• Main Navigation
  • Main Content
  • Sidebar

Russian Digital Libraries Journal

  • Home
  • About
    • About the Journal
    • Aims and Scopes
    • Themes
    • Editor-in-Chief
    • Editorial Team
    • Submissions
    • Open Access Statement
    • Privacy Statement
    • Contact
  • Current
  • Archives
  • Register
  • Login
  • Search
Published since 1998
ISSN 1562-5419
16+
Language
  • Русский
  • English

Search

Advanced filters

Search Results

Procedure for Comparing Text Recognition Software Solutions For Scientific Publications by the Quality of Metadata Extraction

Ilia Igorevich Kuznetsov , Oleg Panteleevich Novikov, Dmitry Yurievich ILIN
654-680
Abstract:

Metadata of scientific publications are used to build catalogs, determine the citation of publications, and perform other tasks. Automation of metadata extraction from PDF files provides means to speed up the execution of the designated tasks, while the possibility of further use of the obtained data depends on the quality of extraction. Existing software solutions were analyzed, after which three of them were selected: GROBID, CERMINE, ScientificPdfParser. A procedure for comparing software solutions for recognizing texts of scientific publications by the quality of metadata extraction is proposed. Based on the procedure, an experiment was conducted to extract 4 types of metadata (title, abstract, publication date, author names). To compare software solutions, a dataset of 112,457 publications divided into 23 subject areas formed on the basis of Semantic Scholar data was used. An example of choosing an effective software solution for metadata extraction under the conditions of specified priorities for subject areas and types of metadata using a weighted sum is given. It was determined that for the given example CERMINE shows efficiency 10.5% higher than GROBID and 9.6% higher than ScientificPdfParser.

Keywords: text recognition, scientific publications, metadata, data extraction quality, procedure.
1 - 1 of 1 items
Information
  • For Readers
  • For Authors
  • For Librarians
Make a Submission
Current Issue
  • Atom logo
  • RSS2 logo
  • RSS1 logo

Russian Digital Libraries Journal

ISSN 1562-5419

Information

  • About the Journal
  • Aims and Scopes
  • Themes
  • Author Guidelines
  • Submissions
  • Privacy Statement
  • Contact
  • eLIBRARY.RU
  • dblp computer science bibliography

Send a manuscript

Authors need to register with the journal prior to submitting or, if already registered, can simply log in and begin the five-step process.

Make a Submission
About this Publishing System

© 2015-2025 Kazan Federal University; Institute of the Information Society