@article{ART001741556},
author={이연수 and 남성운 and 윤대현},
title={A study on the enhanced filtering method of the deduplication for bulk harvest of web records},
journal={The Korean Journal of Archival Studies},
issn={1229-7941},
year={2013},
number={35},
pages={133-160}
TY - JOUR
AU - 이연수
AU - 남성운
AU - 윤대현
TI - A study on the enhanced filtering method of the deduplication for bulk harvest of web records
JO - The Korean Journal of Archival Studies
PY - 2013
VL - null
IS - 35
PB - Korean Society Of Archival Studies
SP - 133
EP - 160
SN - 1229-7941
AB - As the network and electronic devices have been developed rapidly, the influences the web exerts on our daily lives have been increasing. Information created on the web has been playing more and more essential role as the important records which reflect each era. So there is a strong demand to archive information on the web by a standardized method. One of the methods is the snapshot strategy, which is crawling the web contents periodically using automatic software. But there are two problems in this strategy. First, it can harvest the same and duplicate contents and it is also possible that meaningless and useless contents can be crawled due to complex IT skills implemented on the web.
In this paper, we will categorize the problems which can emerge when crawling web contents using snapshot strategy and present the possible solutions to settle the problems through the technical aspects by crawling the web contents in the public institutions.
KW - web records;web contents;web crawling;deduplication
DO -
UR -
ER -
이연수, 남성운 and 윤대현. (2013). A study on the enhanced filtering method of the deduplication for bulk harvest of web records. The Korean Journal of Archival Studies, 35, 133-160.
이연수, 남성운 and 윤대현. 2013, "A study on the enhanced filtering method of the deduplication for bulk harvest of web records", The Korean Journal of Archival Studies, no.35, pp.133-160.
이연수, 남성운, 윤대현 "A study on the enhanced filtering method of the deduplication for bulk harvest of web records" The Korean Journal of Archival Studies 35 pp.133-160 (2013) : 133.
이연수, 남성운, 윤대현. A study on the enhanced filtering method of the deduplication for bulk harvest of web records. 2013; 35 : 133-160.
이연수, 남성운 and 윤대현. "A study on the enhanced filtering method of the deduplication for bulk harvest of web records" The Korean Journal of Archival Studies no.35(2013) : 133-160.
이연수; 남성운; 윤대현. A study on the enhanced filtering method of the deduplication for bulk harvest of web records. The Korean Journal of Archival Studies, 35, 133-160.
이연수; 남성운; 윤대현. A study on the enhanced filtering method of the deduplication for bulk harvest of web records. The Korean Journal of Archival Studies. 2013; 35 133-160.
이연수, 남성운, 윤대현. A study on the enhanced filtering method of the deduplication for bulk harvest of web records. 2013; 35 : 133-160.
이연수, 남성운 and 윤대현. "A study on the enhanced filtering method of the deduplication for bulk harvest of web records" The Korean Journal of Archival Studies no.35(2013) : 133-160.