This is a fairly direct translation of the ckmeans implementation in Simple Statistics.
The Simple Statistics implementation is based on a version of the algorithm developed by Song, Wang and Zhong in a series of papers and released as an R package.
See also Bill Mill's Python translation.
The command line program cmd/ckmeans/ckmeans.go
reads from standard input. It expects input to look like this:
3
1.2
1.3
2.3
3.4
where the 3 on the first line is the number of clusters to create, and the remaining numbers are the array of data.
I haven't added unit tests yet. The compare-to-*
directories provide Python and JS scripts with the same interface that should give the same results. The compare-to-simple-statistics
directory has a script to create a random dataset.
The Simple Statistics implementation has || 0
in a few places. Bill Mill's Python implementation converts to int
in the corresponding places. I don't know why. To add to my confusion, the R implementation has the corresponding lines commented out and has slightly different code (which may produce different results? I don't know!)
As observed by Mill in a code comment, if all the values in the input are equal then the algorithm just returns one cluster.