Comparative Study for Text Document Classification Using Different Machine Learning Algorithms

Yin Min Tun, Phyu Hnin Myint


Classification is a supervised learning method: the goal is finding the labels of the unknown object. In the real world, the tedious amounts of manual works are required to label the unknown documents. The system is initially trained by labeled documents by using one of the supervise machine learning algorithm and then applied trained model to predict the label of the unknown documents.  The framework of text document classification consists of: input text document, pre-processing, feature extraction and classification. The analysis four common classification methods are performed: Naïve Bayes, Decision Tree, Support Vector Machine and K-nearest neighbors for text document classification. The main focus of this paper is to present comparative study of different exiting classification methods for text document classification. The experiment performed different classification methods on the Enron Email Dataset and measure classification accuracy, true positive, true negative, false positive and false negative to compare the performance of different classification methods.


Classification; Text Mining; Classification Methods; Enron Email Dataset.

