본문 바로가기
  • Home

Reference String Recognition based on Word Sequence Tagging and Post-processing: Evaluation with English and German Datasets

  • Journal of The Korea Society of Computer and Information
  • Abbr : JKSCI
  • 2018, 23(5), pp.1-7
  • DOI : 10.9708/jksci.2018.23.05.001
  • Publisher : The Korean Society Of Computer And Information
  • Research Area : Engineering > Computer Science
  • Received : March 28, 2018
  • Accepted : May 16, 2018
  • Published : May 31, 2018

In-Su Kang 1

1경성대학교

Accredited

ABSTRACT

Reference string recognition is to extract individual reference strings from a reference section of an academic article, which consists of a sequence of reference lines. This task has been attacked by heuristic-based, clustering-based, classification-based approaches, exploiting lexical and layout characteristics of reference lines. Most classification-based methods have used sequence labeling to assign labels to either a sequence of tokens within reference lines, or a sequence of reference lines. Unlike the previous token-level sequence labeling approach, this study attempts to assign different labels to the beginning, intermediate and terminating tokens of a reference string. After that, post-processing is applied to identify reference strings by predicting their beginning and/or terminating tokens. Experimental evaluation using English and German reference string recognition datasets shows that the proposed method obtains above 94% in the macro-averaged F1.

Citation status

* References for papers published after 2023 are currently being built.

This paper was written with support from the National Research Foundation of Korea.