Reframing in Clustering: An Introductory Survey
Keywords:
Reframing, Clustering, Classification, Data Mining, Machine Learning.Abstract
Reframing is an essential task for improving the performance of machine learning and data mining algorithms in the areas where there are context changes between the source and target domains. A major assumption in many reframing algorithms is that the target domain has some labelled data. However, in many real-world applications, this assumption may not hold. For example, we sometimes have a clustering task in one domain of interest, but we only have sufficient source data in another domain of interest, where the latter data may be in a different feature space or follow a different data distribution. Moreover, both source and target data may be unlabelled. In such cases, reframing in clustering, if done successfully, would greatly improve the performance of clustering by avoiding much expensive data labeling efforts. In recent years, reframing in clustering has emerged as a new clustering framework to address this problem. In this paper, we present a review on the state-of-the-art reframing in clustering approaches, and to the best of our knowledge it has never been done in the literature. We give a definition of reframing in clustering. We also explore some potential future issues in this area of research.
References
C.F. Ahmed, N. Lachiche, C. Charnay and A. Braud. “Reframing continuous input attributes”. In: Proc of the 2014 IEEE 26th International Conference on Tools with Artificial Intelligence. pp. 31–38. IEEE (2014).
W. Barbakh and C. Fyfe. “Online clustering algorithms”. International Journal of Neural System. 18(03), 185–194 (2008).
Y. Cheng. “Mean shift, mode seeking, and clustering”. IEEE Transactions on Pattern Analysis and Machine Intelligence. 17(8), 790–799 (1995).
W. Dai, Q. Yang, G.R. Xue and Y. Yu. “Boosting for transfer learning”. In: Proceedings of the 24th international conference on Machine learning. pp. 193–200. ACM (2007).
W. Dai, Q. Yang, G.R. Xue and Y. Yu. “Self-taught clustering”. In: Proceedings of the 25th international conference on Machine learning. pp. 200–207. ACM (2008).
K.G. Derpanis. “Mean shift clustering”. Lecture Notes. [on-line] http://www.cse.yorku.ca/˜kosta/CompVis Notes/mean shift. pdf (2005) [May 21, 2018].
D.H. Fisher. “Knowledge acquisition via incremental conceptual clustering”. Machine learning. 2(2), 139–172 (1987).
G.P.C. Fung, J.X. Yu, H. Lu and P.S. Yu. “Text classification without negative examples revisit”. IEEE Transactions on Knowledge and Data Engineering. 18(1), 6–20 (2006).
J. Han and M. Kamber. “Data mining: concepts and techniques”. Morgan Kaufmann (2006).
J.A. Hartigan and M.A. Wong. “Algorithm as 136: A k-means clustering algorithm”. Applied statistics. pp. 100–108 (1979).
J. Hern´andez-Orallo, P. Flach and C. Ferri. “A unified view of performance metrics: Translating threshold choice into expected classification loss”. The Journal of Machine Learning Research. 13(1), 2813–2869 (2012).
A.K. Jain and R.C. Dubes. “Algorithms for clustering data”. Prentice-Hall, Inc. (1988).
A.K. Jain, M.N. Murty and P.J. Flynn. “Data clustering: a review”. ACM computing surveys (CSUR). 31(3), 264–323 (1999).
H.O. Jose, B.P. Ricardo, M. Kull, P. Flach, F.A. Chowdhury, N. Lachiche and M A. artiynez-Uso. “Reframing in context: A methodology for model reuse in machine learning”. AI Communications. 2015.
L. Marston, J.L. Peacock, K. Yu, P. Brocklehurst, S.A. Calvert, A. Greenough, and N. Marlow. “Comparing methods of analysing datasets with small clusters: case studies using four paediatric datasets”. Paediatric and perinatal epidemiology. 23(4), 380–392 (2009).
J.G. Moreno-Torres, T. Raeder, R. Alaiz-Rodr´ıguez, N.V. Chawla and F. Herrera. “A unifying view on dataset shift in classification”. Pattern Recognition. 45(1), 521–530 (2012).
N. Nikitinsky, T. Sokolova and E. Pshehotskaya, “Practical issues of clustering relatively small text data sets for business purposes”. In The International Conference on Digital Security and Forensics (DigitalSec2014). pp. 15–22. The Society of Digital Information and Wireless Communication (2014).
S.J. Pan and Q. Yang. “A survey on transfer learning”. IEEE Transactions on Knowledge and Data Engineering, 22(10), 1345–1359 (2010).
J.R. Quinlan. “Improved use of continuous attributes in C4.5”. Journal of Artificial Intelligence Research. 4, 77–90 (1996).
M.G. Rahman and M.Z. Islam. “Missing value imputation using a fuzzy clustering-based em approach”. Knowledge and Information Systems. pp. 1–34 (2015).
P.N. Tan, M. Steinbach and V. Kumar. “Data mining cluster analysis: Basic concepts and algorithms”. (2013).
Z. Wang Y. , Song and C. Zhang. “Transferred dimensionality reduction”. “In Machine learning and knowledge discovery in databases. pp. 550–565. Springer (2008).
Downloads
Published
How to Cite
Issue
Section
License
Authors who submit papers with this journal agree to the following terms.