Reframing in Clustering: An Introductory Survey
Reframing is an essential task for improving the performance of machine learning and data mining algorithms in the areas where there are context changes between the source and target domains. A major assumption in many reframing algorithms is that the target domain has some labelled data. However, in many real-world applications, this assumption may not hold. For example, we sometimes have a clustering task in one domain of interest, but we only have sufficient source data in another domain of interest, where the latter data may be in a different feature space or follow a different data distribution. Moreover, both source and target data may be unlabelled. In such cases, reframing in clustering, if done successfully, would greatly improve the performance of clustering by avoiding much expensive data labeling efforts. In recent years, reframing in clustering has emerged as a new clustering framework to address this problem. In this paper, we present a review on the state-of-the-art reframing in clustering approaches, and to the best of our knowledge it has never been done in the literature. We give a definition of reframing in clustering. We also explore some potential future issues in this area of research.
C.F. Ahmed, N. Lachiche, C. Charnay and A. Braud. “Reframing continuous input attributes”. In: Proc of the 2014 IEEE 26th International Conference on Tools with Artificial Intelligence. pp. 31–38. IEEE (2014).
W. Barbakh and C. Fyfe. “Online clustering algorithms”. International Journal of Neural System. 18(03), 185–194 (2008).
Y. Cheng. “Mean shift, mode seeking, and clustering”. IEEE Transactions on Pattern Analysis and Machine Intelligence. 17(8), 790–799 (1995).
W. Dai, Q. Yang, G.R. Xue and Y. Yu. “Boosting for transfer learning”. In: Proceedings of the 24th international conference on Machine learning. pp. 193–200. ACM (2007).
W. Dai, Q. Yang, G.R. Xue and Y. Yu. “Self-taught clustering”. In: Proceedings of the 25th international conference on Machine learning. pp. 200–207. ACM (2008).
K.G. Derpanis. “Mean shift clustering”. Lecture Notes. [on-line] http://www.cse.yorku.ca/˜kosta/CompVis Notes/mean shift. pdf (2005) [May 21, 2018].
D.H. Fisher. “Knowledge acquisition via incremental conceptual clustering”. Machine learning. 2(2), 139–172 (1987).
G.P.C. Fung, J.X. Yu, H. Lu and P.S. Yu. “Text classification without negative examples revisit”. IEEE Transactions on Knowledge and Data Engineering. 18(1), 6–20 (2006).
J. Han and M. Kamber. “Data mining: concepts and techniques”. Morgan Kaufmann (2006).
J.A. Hartigan and M.A. Wong. “Algorithm as 136: A k-means clustering algorithm”. Applied statistics. pp. 100–108 (1979).
J. Hern´andez-Orallo, P. Flach and C. Ferri. “A unified view of performance metrics: Translating threshold choice into expected classification loss”. The Journal of Machine Learning Research. 13(1), 2813–2869 (2012).
A.K. Jain and R.C. Dubes. “Algorithms for clustering data”. Prentice-Hall, Inc. (1988).
A.K. Jain, M.N. Murty and P.J. Flynn. “Data clustering: a review”. ACM computing surveys (CSUR). 31(3), 264–323 (1999).
H.O. Jose, B.P. Ricardo, M. Kull, P. Flach, F.A. Chowdhury, N. Lachiche and M A. artiynez-Uso. “Reframing in context: A methodology for model reuse in machine learning”. AI Communications. 2015.
L. Marston, J.L. Peacock, K. Yu, P. Brocklehurst, S.A. Calvert, A. Greenough, and N. Marlow. “Comparing methods of analysing datasets with small clusters: case studies using four paediatric datasets”. Paediatric and perinatal epidemiology. 23(4), 380–392 (2009).
J.G. Moreno-Torres, T. Raeder, R. Alaiz-Rodr´ıguez, N.V. Chawla and F. Herrera. “A unifying view on dataset shift in classification”. Pattern Recognition. 45(1), 521–530 (2012).
N. Nikitinsky, T. Sokolova and E. Pshehotskaya, “Practical issues of clustering relatively small text data sets for business purposes”. In The International Conference on Digital Security and Forensics (DigitalSec2014). pp. 15–22. The Society of Digital Information and Wireless Communication (2014).
S.J. Pan and Q. Yang. “A survey on transfer learning”. IEEE Transactions on Knowledge and Data Engineering, 22(10), 1345–1359 (2010).
J.R. Quinlan. “Improved use of continuous attributes in C4.5”. Journal of Artificial Intelligence Research. 4, 77–90 (1996).
M.G. Rahman and M.Z. Islam. “Missing value imputation using a fuzzy clustering-based em approach”. Knowledge and Information Systems. pp. 1–34 (2015).
P.N. Tan, M. Steinbach and V. Kumar. “Data mining cluster analysis: Basic concepts and algorithms”. (2013).
Z. Wang Y. , Song and C. Zhang. “Transferred dimensionality reduction”. “In Machine learning and knowledge discovery in databases. pp. 550–565. Springer (2008).
Authors who submit papers with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).
- By submitting the processing fee, it is understood that the author has agreed to our terms and conditions which may change from time to time without any notice.
- It should be clear for authors that the Editor In Chief is responsible for the final decision about the submitted papers; have the right to accept\reject any paper. The Editor In Chief will choose any option from the following to review the submitted papers:A. send the paper to two reviewers, if the results were negative by one reviewer and positive by the other one; then the editor may send the paper for third reviewer or he take immediately the final decision by accepting\rejecting the paper. The Editor In Chief will ask the selected reviewers to present the results within 7 working days, if they were unable to complete the review within the agreed period then the editor have the right to resend the papers for new reviewers using the same procedure. If the Editor In Chief was not able to find suitable reviewers for certain papers then he have the right to reject the paper.
- Author will take the responsibility what so ever if any copyright infringement or any other violation of any law is done by publishing the research work by the author
- Before publishing, author must check whether this journal is accepted by his employer, or any authority he intends to submit his research work. we will not be responsible in this matter.
- If at any time, due to any legal reason, if the journal stops accepting manuscripts or could not publish already accepted manuscripts, we will have the right to cancel all or any one of the manuscripts without any compensation or returning back any kind of processing cost.
- The cost covered in the publication fees is only for online publication of a single manuscript.