News

This report focuses on how to tune a Spark application to run on a cluster of instances. We define the concepts for the cluster/Spark parameters, and explain how to configure them given a specific set ...
The k-means algorithm is often used in clustering applications but its usage requires a complete data matrix. Missing data, however, are common in many applications. Mainstream approaches to ...
A k-means-type algorithm is proposed for efficiently clustering data constrained to lie on the surface of a p-dimensional unit sphere, or data that are mean-zero-unit-variance standardized ...