본문 바로가기
  • Home

A Study on Collecting and Structuring Language Resource for Named Entity Recognition and Relation Extraction from Biomedical Abstracts

Seul-ki Kang 1 최윤수 2 Sung-Pil Choi 1

1경기대학교
2한국과학기술정보연구원

Accredited

ABSTRACT

This paper introduces an integrated model for systematically constructing a linguistic resource database that can be used by machine learning-based biomedical information extraction systems. The proposed method suggests an orderly process of collecting and constructing dictionaries and training sets for both named-entity recognition and relation extraction. Multiple heterogeneous structures for the resources which are collected from diverse sources are analyzed to derive essential items and fields for constructing the integrated database. All the collected resources are converted and refined to build an integrated linguistic resource storage. In this paper, we constructed entity dictionaries of gene, protein, disease and drug, which are considered core linguistic elements or core named entities in the biomedical domains and conducted verification tests to measure their acceptability.

Citation status

* References for papers published after 2023 are currently being built.