Analyzing the Trade-offs between Runtime and Accuracy in Classification Algorithms for Natural Language Processing

Authors

  • bundit Anuyahong Educational Researcher and Assistant Professor Dr. , Scarborough street, Southport, Gold Coast, Queensland, 4215, Australia
  • Thanuttchayanin Doksroifa Business English Department, Faculty of Business Administration, Rajamangala University of Technology Rattanakosin, Wang Klai Kangwon Campus, Thailand
  • Urai Makkana Business English Department, Faculty of Business Administration, Rajamangala University of Technology Rattanakosin, Wang Klai Kangwon Campus, Thailand

Keywords:

Trade-offs, Runtime, Accuracy, Classification algorithms, Natural language processing

Abstract

This research aims to analyze the trade-offs between runtime and accuracy in classification algorithms for Natural Language Processing (NLP) and propose an optimization framework for balancing these trade-offs. The study employs a quantitative approach and evaluates the performance of different classification algorithms using metrics such as precision, recall, F1-score, and AUC. The population for this study is all publicly available datasets for NLP classification, and the data is collected using open-source NLP tools. The study shows that certain classification algorithms such as Random Forest, Decision Trees, Naive Bayes, SVM, or Neural Networks perform better than others in terms of both runtime and accuracy. However, some algorithms are faster but less accurate, while others are slower but more accurate. The analysis provided insights into how the choice of algorithm affects the trade-offs between runtime and accuracy in NLP. Based on the results, an optimization framework is proposed that can assist researchers and practitioners in NLP to choose the optimal algorithm for a given task and dataset, considering the desired balance between runtime and accuracy. This research provides valuable insights into the trade-offs between runtime and accuracy in NLP classification algorithms and proposes a framework that can help researchers make informed decisions about which algorithm to choose.

References

D. Jurafsky and J. H. Martin, "Speech and Language Processing," 3rd ed. Pearson Education, 2019.

C. D. Manning and H. Schütze, "Foundations of Statistical Natural Language Processing," The MIT Press, 1999.

T. A. Nguyen, et al., "A Systematic Study of the Class Imbalance Problem in Convolutional Neural Networks," IEEE Access, vol. 7, pp. 73699-73710, 2019.

Y. Gal and Z. Ghahramani, "The Importance of Being Bayesian in Deep Learning," arXiv preprint arXiv:1604.01662, 2016.

J. Huang, S. Chen, X. He, Z. Liu, and X. Zhang, "A Comparative Study of Machine Learning Algorithms for Spam Detection," Journal of Information Science and Engineering, vol. 37, no. 4, pp. 1027-1040, 2021.

D. Singh and A. Tripathi, "Comparative Analysis of Machine Learning Algorithms for Sentiment Analysis," International Journal of Recent Technology and Engineering, vol. 8, no. 4S, pp. 1257-1260, 2020.

Y. Zhang, H. Yang, and X. He, "A Comparative Study of Machine Learning Algorithms for Text Classification," Journal of Computer Science and Technology, vol. 35, no. 3, pp. 626-641, 2020.

J. Lafferty, A. McCallum, and F. Pereira, "Conditional random fields: probabilistic models for segmenting and labeling sequence data," in Proceedings of the 18th International Conference on Machine Learning (ICML-01), 2001, pp. 282-289.

H. Han, J. Pei, and M. Kamber, Data Mining: Concepts and Techniques. Elsevier, 2011.

F. Gao, T. Zhang, and W. Fan, "A comparative study of classification algorithms for imbalanced sentiment analysis," in Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, 2017, pp. 2079-2082.

V. Bhardwaj and R. Gupta, "Comparison of machine learning algorithms for sentiment analysis on user-generated reviews," in Advances in Computing, Communication, and Information Science, 2021, pp. 11-21.

R. K. Amplayo, Y. J. Kim, and S. Park, "Ensemble based multi-kernel learning for text classification," Expert Systems with Applications, vol. 174, p. 114739, 2021.

Amplayo, R. K., Kim, Y. J., and Park, S., "Ensemble based multi-kernel learning for text classification," Expert Systems with Applications, vol. 174, p. 114739, 2021.

Tan, C., and Zhang, J., "A brief survey of deep learning in natural language processing," arXiv preprint arXiv:2004.03705, 2020.

Kim, J., Kim, S., and Kang, I., "Trade-offs between accuracy and runtime in machine learning-based natural language processing," Journal of Information Science, vol. 47, no. 2, pp. 200-214, 2021.

Meng, F., Xie, F., Zhang, Y., Xiong, H., and Wu, X., "A comprehensive survey on natural language processing," Neurocomputing, vol. 399, pp. 26-48, 2020.

Downloads

Published

2023-04-27

How to Cite

Anuyahong, bundit, Thanuttchayanin Doksroifa, & Urai Makkana. (2023). Analyzing the Trade-offs between Runtime and Accuracy in Classification Algorithms for Natural Language Processing. International Journal of Computer (IJC), 48(1), 17–25. Retrieved from https://ijcjournal.org/index.php/InternationalJournalOfComputer/article/view/2066

Issue

Section

Articles