TY - JOUR
T1 - A survey of open source data science tools
AU - Barlas, Panagiotis
AU - Lanning, Ivor
AU - Heavey, Cathal
N1 - Publisher Copyright:
© Emerald Group Publishing Limited.
PY - 2015/8/10
Y1 - 2015/8/10
N2 - Purpose – Data science is the study of the generalizable extraction of knowledge from data. It includes a variety of components and develops on methods and concepts from many domains, containing mathematics, probability models, machine learning, statistical learning, computer programming, data engineering, pattern recognition and learning, visualization and data warehousing aiming to extract value from data. The purpose of this paper is to provide an overview of open source (OS) data science tools, proposing a classification scheme that can be used to study OS data science software. Design/methodology/approach – The proposed classification scheme is based on general characteristics, project activity, operational characteristics and data mining characteristics. The authors then use the proposed scheme to examine 70 identified Open Source Software. From this the authors provide insight about the current status of OS data science tools and reveal the state-of-the-art tools. Findings – The features of 70 OS tools are recorded based on the criteria of the four group characteristics, general characteristics, project activity, operational characteristics and data mining characteristics. Interesting results came from the analysis of these features and are recorded here. Originality/value – The contribution of this survey is development of a new classification scheme for examination and study of OS data science tools. In parallel, this study provides an overview of existing OS data science tools.
AB - Purpose – Data science is the study of the generalizable extraction of knowledge from data. It includes a variety of components and develops on methods and concepts from many domains, containing mathematics, probability models, machine learning, statistical learning, computer programming, data engineering, pattern recognition and learning, visualization and data warehousing aiming to extract value from data. The purpose of this paper is to provide an overview of open source (OS) data science tools, proposing a classification scheme that can be used to study OS data science software. Design/methodology/approach – The proposed classification scheme is based on general characteristics, project activity, operational characteristics and data mining characteristics. The authors then use the proposed scheme to examine 70 identified Open Source Software. From this the authors provide insight about the current status of OS data science tools and reveal the state-of-the-art tools. Findings – The features of 70 OS tools are recorded based on the criteria of the four group characteristics, general characteristics, project activity, operational characteristics and data mining characteristics. Interesting results came from the analysis of these features and are recorded here. Originality/value – The contribution of this survey is development of a new classification scheme for examination and study of OS data science tools. In parallel, this study provides an overview of existing OS data science tools.
KW - Data
KW - Data mining
KW - Data science
KW - Data science tools
KW - Genetic algorithms
KW - Image processing
KW - Information retrieval
KW - Knowledge acquisition
KW - Open source
UR - http://www.scopus.com/inward/record.url?scp=84938228091&partnerID=8YFLogxK
U2 - 10.1108/IJICC-07-2014-0031
DO - 10.1108/IJICC-07-2014-0031
M3 - Article
AN - SCOPUS:84938228091
SN - 1756-378X
VL - 8
SP - 232
EP - 261
JO - International Journal of Intelligent Computing and Cybernetics
JF - International Journal of Intelligent Computing and Cybernetics
IS - 3
ER -