Labs of Clouds class at Eurecom
-
Python, IPython and Jupyter Notebooks
-
Python + Pandas + Matplotlib: A great environment for Data Science
-
PySpark
-
RDD and DataFrame APIs
-
Analysis of flight data using the DataFrame API and SparkSQL
-
Data exploration with Sparksql
-
kmeans algorithm
-
A simplified analysis of algorithm convergence
-
A technique for a smart centroid initialization: k -means++
-
Determining the value of k: a simple and visual approach, called the Elbow method
-
Distributed k -means with PySpark
-
SGD algorithm