본문 바로가기
  • Home

Performance Comparison of Logistic Regression Algorithms on RHadoop

  • Journal of The Korea Society of Computer and Information
  • Abbr : JKSCI
  • 2017, 22(4), pp.9-16
  • DOI : 10.9708/jksci.2017.22.04.009
  • Publisher : The Korean Society Of Computer And Information
  • Research Area : Engineering > Computer Science
  • Received : January 18, 2017
  • Accepted : April 13, 2017
  • Published : April 28, 2017

Byungho Jung 1 Lim Dong Hoon 1

1경상대학교

Accredited

ABSTRACT

Machine learning has found widespread implementations and applications in many different domains in our life. Logistic regression is a type of classification in machine leaning, and is used widely in many fields, including medicine, economics, marketing and social sciences. In this paper, we present the MapReduce implementation of three existing algorithms, this is, Gradient Descent algorithm, Cost Minimization algorithm and Newton-Raphson algorithm, for logistic regression on RHadoop that integrates R and Hadoop environment applicable to large scale data. We compare the performance of these algorithms for estimation of logistic regression coefficients with real and simulated data sets. We also compare the performance of our RHadoop and RHIPE platforms. The performance experiments showed that our Newton-Raphson algorithm when compared to Gradient Descent and Cost Minimization algorithms appeared to be better to all data tested, also showed that our RHadoop was better than RHIPE in real data, and was opposite in simulated data.

Citation status

* References for papers published after 2022 are currently being built.