Abstract
A randomized algorithm is substantiated for the strongly NP-hard problem of partitioning a finite set of vectors of Euclidean space into two clusters of given sizes according to the minimum-of-the sum-of-squared-distances criterion. It is assumed that the centroid of one of the clusters is to be optimized and is determined as the mean value over all vectors in this cluster. The centroid of the other cluster is fixed at the origin. For an established parameter value, the algorithm finds an approximate solution of the problem in time that is linear in the space dimension and the input size of the problem for given values of the relative error and failure probability. The conditions are established under which the algorithm is asymptotically exact and runs in time that is linear in the space dimension and quadratic in the input size of the problem.
Similar content being viewed by others
References
K. Anil and K. Jain, “Data clustering: 50 years beyond k-means,” Pattern Recogn. Lett. 31, 651–666 (2010).
J. B. MacQueen, “Some methods for classification and analysis of multivariate observations,” Proceedings of the 5th Berkeley Symposium of Mathematical Statistics and Probability (Univ. of California Press, Berkeley, 1967), Vol. 1, pp. 281–297.
M. Rao, “Cluster analysis and mathematical programming,” J. Am. Stat. Assoc. 66, 622–626 (1971).
A. E. Galashov and A. V. Kel’manov, “A 2-approximate algorithm to solve one problem of the family of disjoint vector subsets,” Autom. Remote Control 75(4), 595–606 (2014).
A. V. Dolgushev and A. V. Kel’manov, “On the algorithmic complexity of a problem in cluster analysis,” J. Appl. Ind. Math. 5(2), 191–194 (2011).
P. Hansen and B. Jaumard, “Cluster analysis and mathematical programming,” Math. Program. 79, 191–215 (1997).
P. Hansen, B. Jaumard, and N. Mladenovich, “Minimum sum of squares clustering in a low dimensional space,” J. Classification 15, 37–55 (1998).
M. Inaba, N. Katch, and H. Imai, “Applications of weighted Voronoi diagrams and randomization to variance-based clustering,” Proceedings of the Annual Symposium on Computational Geometry (Stony Brook, New York, 1994), pp. 332–339.
D. Aloise, A. Deshpande, P. Hansen, and P. Popat, “NP-hardness of Euclidean sum-of-squares clustering,” Machine Learning 75(2), 245–248 (2009).
A. A. Ageev, A. V. Kel’manov, and A. V. Pyatkin, “NP-hardness of the Euclidean max-cut problem,” Dokl. Math. 89(3), 343–345 (2014).
A. V. Dolgushev and A. V. Kel’manov, “An approximation algorithm for solving a problem of cluster analysis,” J. Appl. Ind. Math. 5(4), 551–558 (2011).
A. V. Dolgushev, A. V. Kel’manov, and V. V. Shenmaier, “A PTAS for a problem of cluster analysis,” Proceedings of the 9th International Conference on Intelligent Information Processing, Budva, Montenegro (Torus, Moscow, 2012), pp. 242–244.
I. I. Eremin, E. Kh. Gimadi, A. V. Kel’manov, A. V. Pyatkin, M. Yu. Khachai, “2-Approximation algorithm for finding a clique with minimum weight of vertices and edges,” Proc. Steklov Inst. Math. 284,Suppl. 1, S87–S95 (2014).
A. V. Kel’manov, “On the complexity of some cluster analysis problems,” Comput. Math. Math. Phys. 51(11), 1983–1988 (2011).
A. V. Kel’manov, “On the complexity of some data analysis problems,” Comput. Math. Math. Phys. 50(11), 1941–1947 (2010).
A. V. Kel’manov, “Off-line detection of a quasi-periodically recurring fragment in a numerical sequence,” Proc. Steklov Inst. Math. 263,Suppl. 2, S84–S92 (2008).
A. V. Kel’manov, S. M. Romanchenko, and S. A. Khamidullin, “Accurate pseudopolynomial-time algorithms for certain NP-hard problems of searching for a vector subsequence,” Vychisl. Mat. Mat. Fiz. 53(1), 143–153 (2013).
A. V. Kel’manov and A. V. Pyatkin, “On complexity of some problems of cluster analysis of vector sequences,” J. Appl. Ind. Math. 7(3), 363–369 (2013).
A. V. Kel’manov and A. V. Pyatkin, “Complexity of certain problems of searching for subsets of vectors and cluster analysis,” Comput. Math. Math. Phys. 49(11), 1966–1971 (2009).
A. V. Kel’manov and A. V. Pyatkin, “On the complexity of a search for a subset of “similar” vectors,” Dokl. Math 78(1), 574–575 (2008).
A. V. Kel’manov and V. I. Khandeev, “A 2-approximation polynomial algorithm for a clustering problem,” J. Appl. Ind. Math. 7(4), 515–521 (2013).
M. Rajeev and R. Prabhakar, Randomized Algorithms (Cambridge University Press, New York, 1995).
A. A. Markov, Calculus of Probabilities (Tipograf. Imperator. Akad. Nauk, St. Petersburg, 1900) [in Russian].
Author information
Authors and Affiliations
Corresponding author
Additional information
Original Russian Text © A.V. Kel’manov, V.I. Khandeev, 2015, published in Zhurnal Vychislitel’noi Matematiki i Matematicheskoi Fiziki, 2015, Vol. 55, No. 2, pp. 335–344.
Rights and permissions
About this article
Cite this article
Kel’manov, A.V., Khandeev, V.I. A randomized algorithm for two-cluster partition of a set of vectors. Comput. Math. and Math. Phys. 55, 330–339 (2015). https://doi.org/10.1134/S096554251502013X
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1134/S096554251502013X