Skip to content

Software engineer profile recognition through application of data mining techniques on GitHub.

Notifications You must be signed in to change notification settings

Inglezos/Github-Engineer-Profile-Analysis

Repository files navigation

Github Engineer Profile Analysis

Software engineer profile recognition through application of data mining techniques on GitHub

This project, part of my semester course "Pattern Recognition" (Winter 2017), is about analyzing and processing a variety of open-source software repositories data that have been already mined and trying to categorize them into groups, depending on their characteristics and features. More specifically, the issue is to observe among the data the profiles that the software engineers (contributors) can have and further divide them into subcategories, extracting many more details about their skills and traits, using some metrics and thus, hopefully, develop approximately their personal repository profile.




/**********************************************   INSTRUCTIONS    **********************************************/

  1. Set the current working directory and read the dataset into the R environment.

  2. Open the Inglezos_Charalampos_8145_Pattern_Recognition_Project_Code.R script file.

  3. Execute the commands one-by-one and follow the further instructions that I have included thoroughly there in the form of comments, perhaps modifying/adjusting them properly.

  4. Generally, (at some point) you will need to execute the included script files:
    i) "impute_NA.R"
    ii) "kmeans_k.R"
    iii) "fanny_k.R"
    iv) "diana_k.R"
    v) "hclust_k.R"
    vi) "pam_k.R"
    vii) "pam_routine.R"
    viii) "pam_dev_routine.R"
    ix) "pam_ops_routine.R"
    x) "pam_devops_routine.R"

  5. The reader should make sure to check also the other scripts I have included for a complete realization of my work.

  6. Finally, the readers should be free to experiment and see for themselves the boxplots produced by the different clustering models I created and that are described in my detailed report of my assignment.

  7. For any questions and further explanations, please contact me at: [email protected]

  8. Note: Since the .RData files are too large, I preferred to include only the latest version and not previous ones.

About

Software engineer profile recognition through application of data mining techniques on GitHub.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages