The Number of Groups in an Aggregated Approach in Taxonomy with the Use of Stability Measures and Classical Indices – A Comparative Analysis

Dorota Rozmus

doi:10.18778/0208-6018.357.04

Autor

Dorota Rozmus University of Economics in Katowice, Faculty of Finance, Department of Economic and Financial Analysis, Katowice, Poland https://orcid.org/0000-0002-0565-5319

DOI:

https://doi.org/10.18778/0208-6018.357.04

Słowa kluczowe:

taksonomia, klasteryzacja, podejście zagregowane, stabilność metod taksonomicznych

Abstrakt

We współczesnych rozważaniach z dziedziny taksonomii w literaturze często poruszane są dwa pojęcia: podejście zagregowane oraz stabilność metod grupowania. Do tej pory te były one rozważane osobno. Natomiast ciekawą propozycję w zakresie połączenia tych dwóch pojęć przedstawili Y. Șenbabaoğlu, G. Michailidis i J.Z. Li, którzy zasugerowali podejście zagregowane w taksonomii, połączone z zaproponowaną przez siebie miarą stabilności jako kryterium wyboru optymalnej liczby grup (k).

Celem artykułu jest porównanie wyników wyboru wartości parametru k za pomocą wspomnianej miary stabilności oraz klasycznych indeksów (np. Calińskiego‑Harabasza, Dunna).

Pobrania

Statystyki pobrań niedostępne.

Bibliografia

Aldenderfer M.S., Blashfield R.K. (1984), Cluster analysis, Sage, Beverly Hills.

Anderberg M.R. (1973), Cluster analysis for applications, Academic Press, New York–San Francisco–London.

Ben-Hur A., Guyon I. (2003), Detecting stable clusters using principal component analysis, “Methods in Molecular Biology”, no. 224, pp. 159–182.

Brock G., Pihur V., Datta S., Datta S. (2008), clValid: an R package for cluster validation, “Journal of Statistical Software”, vol. 25(4), pp. 1–22, https://doi.org/10.18637/jss.v025.i04

Caliński R.B., Harabasz J. (1974), A dendrite method for cluster analysis, “Communications in Statistics”, vol. 3, pp. 1–27.

Chiu D.S., Talhouk A. (2018), diceR: an R package for class discovery using an ensemble driven approach, “BMC Bioinformatics”, no. 19, 11, https://doi.org/10.1186/s12859-017-1996-y

Davies D.L., Bouldin D.W. (1979), A Cluster Separation Measure, “IEEE Transactions on Pattern Analysis and Machine Intelligence”, vol. 1(2), pp. 224–227.

Dudoit S., Fridlyand J. (2003), Bagging to improve the accuracy of a clustering procedure, “Bioinformatics”, vol. 19(9), pp. 1090–1099.

Dunn J.C. (1974), Well-Separated Clusters and Optimal Fuzzy Partitions, “Journal of Cybernetics”, vol. 4(1), pp. 95–104.

Eurostat (2019), Database, https://ec.europa.eu/eurostat/web/main/data/database (accessed: 20.11.2021).

Everitt B.S., Landau S., Leese M. (2001), Cluster analysis, Edward Arnold, London.

Fang Y., Wang J. (2012), Selection of the number of clusters via the bootstrap method, “Computational Statistics and Data Analysis”, no. 56, pp. 468–477.

Fred A., Jain A.K. (2002), Data clustering using evidence accumulation, “Proceedings of the Sixteenth International Conference on Pattern Recognition”, pp. 276–280.

Gordon A.D. (1987), A review of hierarchical classification, “Journal of the Royal Statistical Society”, ser. A, pp. 119–137.

Gordon A.D. (1996), Hierarchical classification, [in:] P. Arabie, L.J. Hubert, G. de Soete (eds.), Clustering and classification, World Scientific, Singapore, pp. 65–121.

Henning C. (2007), Cluster-wise assessment of cluster stability, “Computational Statistics and Data Analysis”, no. 52, pp. 258–271.

Hornik K. (2005), A CLUE for CLUster ensembles, “Journal of Statistical Software”, no. 14, pp. 65–72.

Kaufman L., Rousseeuw P.J. (1990), Finding groups in data: an introduction to cluster analysis, Wiley, New York.

Kuncheva L.I., Vetrov D.P. (2006), Evaluation of stability of k-means cluster ensembles with respect to random initialization, “IEEE Transactions on Pattern Analysis & Machine Intelligence”, vol. 28(11), pp. 1798–1808.

Leisch F. (1999), Bagged clustering, “Adaptive Information Systems and Modeling in Economics and Management Science”, Working Papers, SFB, no. 51.

Lord E., Willems M., Lapointe F.J., Makarenkov V . (2017), Using the stability of objects to determine the number of clusters in datasets, “Information Sciences”, no. 393, pp. 29–46.

Marino V., Presti L.L. (2019), Stay in touch! New insights into end-user attitudes towards engagement platforms, “Journal of Consumer Marketing”, no. 36, pp. 772–783.

Monti S., Tamayo P., Mesirov J., Golub T. (2003), Consensus clustering: A resampling-based method for class discovery and visualization of gene expression microarray data, “Machine Learning”, no. 52, pp. 91–118.

Șenbabaoğlu Y., Michailidis G., Li J.Z. (2014), Critical limitations of consensus clustering in class discovery, “Scientific Reports”, no. 4, 6207, https://doi.org/10.1038/srep06207

Shamir O., Tishby N. (2008), Cluster stability for finite samples, “Advances in Neural Information Processing Systems”, no. 20, pp. 1297–1304.

Sokołowski A. (1995), Percentage points of the similarity measure for partitions, “Statistics in Transition”, vol. 2(2), pp. 195–199.

Suzuki R., Shimodaira H. (2006), Pvclust: an R package for assessing the uncertainty in hierarchical clustering, “Bioinformatics”, vol. 22(12), pp. 1540–1542.

Volkovich Z., Barzily Z., Toledano-Kitai D., Avros R. (2010), The Hotteling’s metric as a cluster stability index, “Computer Modelling and New Technologies”, vol. 14(4), pp. 65–72.