ProgressReport

Progress Report

The following is an update on the progress of Obidroid in terms of development

Earlier Feedback

- Looking at the cluster analysis, the question is: 
    - what is special about the apps in cluster 1 and 7?
    - You can look at those and then do something else with the other apps the are lumped together in 2-6. This might be worth digging into more.
- Here is a way to better assess your classification algorithm. Split it into 50/50 positive and negative tests.
    - Based on the clustering results, choose some of the harder to discriminate items to be the positive items you compare against. Then see how well the classification algorithm works on the 50-50 split."

Executive Summary

We proceeded to dig deeper into each feature taken 1 at a time (Univariate Analysis) and taken 2 at a time (Bivariate Analysis)
- Univariate Analysis
  - We present histograms of features grouped on fair and unfair app Labels
- Bivariate Analysis
  - Here we plotted plots for all combinations of features taken 2 at a time
  - 2 kinds of plots are presented :
    - hexbin plots
    - density plots (based on kde)
  - Also instead of scatter plot, we chose to use hexbin plots where the dots are binned as hexagon points.
    - This was because we saw in our scatter plots a lot of points overlapped. And hence the plot was a little deceiving.
      - We could have either added jitter to the scatter plot to see all the points, but that felt like corrupting the data
      - We chose to bin the points with an alpha value in color shade. And hence a darker point represents that there are multiple points in those positions.
Did unsupervised learning (clustering) on the features to understand how they were interacting.
- We had already done k-means clustering previously.
- We proceeded to do MDS clustering in an aim to find how many apps were similar and dissimilar based on our feature extraction.
  - Although we succeeded to plot the apps using matplotlib, it was harder to discern the app names after the plot.
  - Hence we proceeded to develop an interactive D3js based plot, which
    - which can be zoomed in
    - mouseover interaction gives App Names as tooltip.
    - This plotting method is an engine, and can be used for various other plots like PCA, whose code is implemented. But we present here only the MDS output pertaining to our inquiry.
- We also plotted 2-dimensional Dendrogram, but frankly as with most dendrograms, it is harder to draw conclusions.
Did supervised learning to find the most suitable classifier algorithms.
- The code takes 2 approaches
  - Equal Split Approach
    - Divides the labeled apps into fair and unfair
    - Splits fair apps into multiple parts with the size of unfair apps
    - All features are scaled using MinMaxScaler
    - Randomly shuffle sample. Overall size is 46
    - Trains on first 36 apps and tests on last 10.
    - Calculates classifier outputs for each classifier on each split
    - No cross validation is performed.
    - The results are tabulated in table below.
    - To make the results comparable, the splits are performed first and each classifier is applied to the same split
  - All Apps
    - We also apply all the same classifiers to the overall dataset.
    - All features are scaled using MinMaxScaler
    - Sample dataset is randomly shuffled.
    - k=4 fold CrossValidation is performed
    - Classifier performance for each classifier in each fold is tabulated below.
    - Since the folds are performed for each classifier separately, after random shuffling, performance in each fold for separate classifiers may or may not be comparable
- Average precision for each classification operation is done below.
- Adjusted average is calculated, using the below mentioned philosophy:
  - As we have maintained from the start, false positives aren't an issue with us, but false negatives are.
  - So we calculated adjusted average by (TP + TN + FP)/Total
    - Where TP=True Positive, TN=True Negative, FP = False Positive.

Findings

Univariate Analysis
- This was done after talking to the FTC as this is the approach they currently take while sifting through the app store. They search for all apps based on 1 particular attribute.
- Analyze each app on 1 particular feature, although it should be noted that we have added more features of our own extracted from app attributes.
- We were mostly looking for a feature that really stood out for unfair apps.
  - There is no conclusive feature generally on their own that stand out for unfair apps.
Bivariate Analysis
- We found strong correlations between:
  - adjectiveCount x countCapital
  - revLength x countCapital
- and loose correlations between:
  - adjectiveCount x revLength
- To develop a more parsimonious statistical model, we might want to combine these features into their product and drop these individual features.
  - But for now, our classifiers are based on these individual features.
Clustering
- Based on the MDS plots:
  - there are some apps that are really very different from others and hence are easier to spot
  - there are some apps that are very similar to other fair apps. This may not necessarily be a bad thing
    - An example is Whatsapp, which is in fact a very popular app but looking at the reviews we labeled it as unfair, which in this case is more likely to be a false positive.
Classifier
- In the equal split exploration
  - Gaussian Naive Bayes and (k=3)NearestNeighbors were the best bet in most of the splits.
    - Especially GNB had significantly lower false negatives
  - SVM-linear & SVM-NonLinear (rbf kernel) did OK mostly but failed quite spectacularly in others
    - It could also be due to very small size of the training data.
- In overall exploration
  - SVM-linear was the most spectacular with the least false negatives
  - RandomForest was also a lot better than most.
- For our features there is not a lot of difference between kNN uniform or kNN distance weighted algorithms.

Data Preparation

The data was prepared from previously extracted exports/appFeatures.csv using our crawler and scraper scripts.
The data was then casted to appropriate datatypes explicitly to avoid any mistakes
All variables in the data (i.e. the features) were then normalized using min-max normalization.

For detailed feature descriptions, please refer Obidroid Project Report

Univariate Exploration

Following is an exploration of each variable taken one-by-one at a time.

Here we explore the distributions of each data/feature

Feature	Description	Hist (fair/unfair)
`adjectiveCount`	count of all adjectives
`hasPrivacy`	Does it have a valid privacy url
`revLength`	Total characters in a review
`installs`	Total installs of an app
`revSent`	Aggregate review sentiment
`countCapital`	Count capital characters in a sentence
`hasDeveloperWebsite`	Developer website
`hasDeveloperEmail`	Developer email
`avgRating`	average app Rating

Univariate Conclusion

It is hard to draw an inference about fair/unfair but just looking at 1 feature alone.
adjectiveCount tapers in both fair and unfair for large number of adjectives, so with higher adjectives it might seem it is more likely to be unfair
hasPrivacy is almost evenly distributed for both fair/unfair
revLength seems to be uniform in unfair apps
revSent for fair/unfair distribution is quite similar
avgRating is generally skewed towards higher values for both fair/unfair

Bivariate Exploration

Feature1 x Feature 2	Correlation Plot (as hexagon bins)	Density Plots (KDE)
`adjectiveCount` x `hasPrivacy`
`adjectiveCount` x `countCapital`
`adjectiveCount` x `hasDeveloperEmail`		`no plot` [^note-2]
`adjectiveCount` x `hasDeveloperWebsite`		`no plot` [^note-2]
`adjectiveCount` x `hasPrivacy`
`adjectiveCount` x `installs`
`adjectiveCount` x `revLength`
`adjectiveCount` x `revSent`
`countCapital` x `avgRating`
`countCapital` x `hasDeveloperEmail`		`no plot` [^note-2]
`countCapital` x `hasDeveloperWebsite`		`no plot` [^note-2]
`hasDeveloperEmail` x `avgRating`		`no plot` [^note-2]
`hasDeveloperWebsite` x `avgRating`		`no plot` [^note-2]
`hasDeveloperWebsite` x `hasDeveloperEmail`	`no plot` [^note-2]	`no plot` [^note-2]
`hasPrivacy` x `avgRating`
`hasPrivacy` x `countCapital`
`hasPrivacy` x `hasDeveloperEmail`		`no plot` [^note-2]
`hasPrivacy` x `hasDeveloperWebsite`		`no plot` [^note-2]
`hasPrivacy` x `installs`		`no plot` [^note-2]
`hasPrivacy` x `revLength`		`no plot` [^note-2]
`hasPrivacy` x `revSent`
`installs` x `avgRating`
`installs` x `countCapital`
`installs` x `countCapital`
`installs` x `hasDeveloperEmail`		`no plot` [^note-2]
`installs` x `hasDeveloperWebsite`		`no plot` [^note-2]
`revLength` x `avgRating`
`revLength` x `countCapital`
`revLength` x `hasDeveloperEmail`		`no plot` [^note-2]
`revLength` x `hasDeveloperWebsite`		`no plot` [^note-2]
`revLength` x `revSent`
`revSent` x `avgRating`
`revSent` x `countCapital`
`revSent` x `hasDeveloperEmail`		`no plot` [^note-2]
`revSent` x `hasDeveloperWebsite`		`no plot` [^note-2]

Clustering

MDS Plot

Static Plot:

"MDS Plot"

+ represent unfair app
. represent fair app

Interactive MDS Plot

refer plot at bottom.
use trackpad to zoom
use mouseover to view app titles.

Dendrogram

Dendrogram static plot

"Dendrogram"

Don't know how to interpret.

All other plots can be viewed at Github repo.

Classifier Output

Splitting Dataset into Equal `Fair/Unfair` ratio

Our process:

split the entire app sample into fair apps (300) and unfair apps (23)
split the fair apps sample into splits of the size of the unfair apps.
- Total 13 splits
Combined each fair app split with the unfair apps, to make the sample set for classification
randomly shuffled the classification sample
trained on n_sample=36 apps and tested on total-n_sample = 10 apps for each split
Calculated classifier reports for each split
Explicit SplitClassifierReport

Split #	Algorithm	Avg Precision	Avg Accuracy (adjusted)	Confusion Matrix `[[TP FN][FP TN]]`
`0th`	kNN `(wt = distance)`	`0.50`	`0.70`	`[[2 3] [2 3]]`
`0th`	kNN `(wt = uniform)`	`0.50`	`0.70`	`[[2 3] [2 3]]`
`0th`	GaussianNB	`0.60`	`0.80`	`[[3 2][2 3]]`
`0th`	DecisionTreeClassifier	`0.50`	`0.80`	`[[3 2][3 2]]`
`0th`	RandomForest ( `AdaBoostClassifier`)	`0.60`	`0.80`	`[[3 2][2 3]]`
`0th`	SVM-linear (`SVC`)	`0.62`	`0.70`	`[[2 3][1 4]]`
`0th`	SVM-Nonlinear (`NuSVC`)	`0.60`	`0.80`	`[[3 2][2 3]]`

`1st`	kNN `(wt = distance)`	`0.75`	`0.80`	`[[6 2][1 1]]`
`1st`	kNN `(wt = uniform)`	`0.80`	`0.90`	`[[7 1][1 1]]`
`1st`	GaussianNB	`0.64`	`1.0`	`[[8 0][2 0]]`
`1st`	DecisionTreeClassifier	`0.53`	`0.60`	`[[4 4][2 0]]`
`1st`	RandomForest ( `AdaBoostClassifier`)	`0.75`	`0.80`	`[[6 2][1 1]]`
`1st`	SVM-linear (`SVC`)	`0.04`	`0.2`	`[[0 8][0 2]]`
`1st`	SVM-Nonlinear (`NuSVC`)	`0.80`	`0.90`	`[[7 1][1 1]]`

`2nd`	kNN `(wt = distance)`	`0.71`	`0.90`	`[[4 1][2 3]]`
`2nd`	kNN `(wt = uniform)`	`0.71`	`0.90`	`[[4 1][2 3]]`
`2nd`	GaussianNB	`0.38`	`0.8`	`[[3 2][4 1]]`
`2nd`	DecisionTreeClassifier	`0.29`	`0.70`	`[[2 3][4 1]]`
`2nd`	RandomForest ( `AdaBoostClassifier`)	`0.29`	`0.70`	`[[2 3][4 1]]`
`2nd`	SVM-linear (`SVC`)	`0.50`	`0.9`	`[[1 4][1 4]]`
`2nd`	SVM-Nonlinear (`NuSVC`)	`0.60`	`0.80`	`[[3 2][2 3]]`

`3rd`	kNN `(wt = distance)`	`0.80`	`0.90`	`[[4 1][1 4]]`
`3rd`	kNN `(wt = uniform)`	`0.80`	`0.90`	`[[4 1][1 4]]`
`3rd`	GaussianNB	`0.81`	`1.0`	`[[5 0][3 2]]`
`3rd`	DecisionTreeClassifier	`0.60`	`0.80`	`[[3 2][2 3]]`
`3rd`	RandomForest ( `AdaBoostClassifier`)	`0.40`	`0.70`	`[[2 3][3 2]]`
`3rd`	SVM-linear (`SVC`)	`0.81`	`0.7`	`[[2 3][0 5]]`
`3rd`	SVM-Nonlinear (`NuSVC`)	`0.71`	`0.80`	`[[3 2][1 4]]`

`4th`	kNN `(wt = distance)`	`0.68`	`0.70`	`[[4 3][1 2]]`
`4th`	kNN `(wt = uniform)`	`0.68`	`0.70`	`[[4 3][1 2]]`
`4th`	GaussianNB	`0.84`	`1.0`	`[[7 0][2 1]]`
`4th`	DecisionTreeClassifier	`0.62`	`0.60`	`[[3 4][1 2]]`
`4th`	RandomForest ( `AdaBoostClassifier`)	`0.68`	`0.70`	`[[4 3][1 2]]`
`4th`	SVM-linear (`SVC`)	`0.07`	`0.30`	`[[0 7][1 2]]`
`4th`	SVM-Nonlinear (`NuSVC`)	`0.55`	`0.50`	`[[2 5][1 2]]`

`5th`	kNN `(wt = distance)`	`0.62`	`0.70`	`[[4 1][3 2]]`
`5th`	kNN `(wt = uniform)`	`0.62`	`0.70`	`[[4 1][3 2]]`
`5th`	GaussianNB	`0.25`	`1.0`	`[[5 0][5 0]]`
`5th`	DecisionTreeClassifier	`0.71`	`0.80`	`[[3 2][1 4]]`
`5th`	RandomForest ( `AdaBoostClassifier`)	`0.71`	`0.80`	`[[3 2][1 4]]`
`5th`	SVM-linear (`SVC`)	`0.60`	`0.80`	`[[3 2][2 3]]`
`5th`	SVM-Nonlinear (`NuSVC`)	`0.71`	`0.90`	`[[4 1][2 3]]`

`6th`	kNN `(wt = distance)`	`0.80`	`0.90`	`[[5 1][1 3]]`
`6th`	kNN `(wt = uniform)`	`0.80`	`0.90`	`[[5 1][1 3]]`
`6th`	GaussianNB	`0.80`	`1.0`	`[[6 0][3 1]]`
`6th`	DecisionTreeClassifier	`0.57`	`0.60`	`[[2 4][1 3]]`
`6th`	RandomForest ( `AdaBoostClassifier`)	`0.71`	`0.80`	`[[3 2][1 4]]`
`6th`	SVM-linear (`SVC`)	`0.16`	`0.40`	`[[0 6][0 4]]`
`6th`	SVM-Nonlinear (`NuSVC`)	`0.85`	`1.0`	`[[6 0][2 2]]`

`7th`	kNN `(wt = distance)`	`0.80`	`1.0`	`[[4 0][4 2]]`
`7th`	kNN `(wt = uniform)`	`0.83`	`1.0`	`[[4 0][3 3]]`
`7th`	GaussianNB	`0.16`	`1.0`	`[[4 0][6 0]]`
`7th`	DecisionTreeClassifier	`0.52`	`0.80`	`[[2 2][3 3]]`
`7th`	RandomForest ( `AdaBoostClassifier`)	`0.45`	`0.90`	`[[3 1][5 1]]`
`7th`	SVM-linear (`SVC`)	`0.16`	`1.0`	`[[4 0][6 0]]`
`7th`	SVM-Nonlinear (`NuSVC`)	`0.45`	`0.90`	`[[3 1][5 1]]`

`8th`	kNN `(wt = distance)`	`0.63`	`0.90`	`[[1 1][5 3]]`
`8th`	kNN `(wt = uniform)`	`0.63`	`0.90`	`[[1 1][5 3]]`
`8th`	GaussianNB	`0.63`	`0.90`	`[[1 1][5 3]]`
`8th`	DecisionTreeClassifier	`0.42`	`0.90`	`[[1 1][7 1]]`
`8th`	RandomForest ( `AdaBoostClassifier`)	`0.56`	`0.90`	`[[1 1][6 2]]`
`8th`	SVM-linear (`SVC`)	`0.04`	`1.0`	`[[2 0][8 0]]`
`8th`	SVM-Nonlinear (`NuSVC`)	`0.56`	`0.90`	`[[1 1][6 2]]`

`9th`	kNN `(wt = distance)`	`0.71`	`0.90`	`[[4 1][2 3]]`
`9th`	kNN `(wt = uniform)`	`0.71`	`0.90`	`[[4 1][2 3]]`
`9th`	GaussianNB	`0.78`	`1.0`	`[[5 0][4 1]]`
`9th`	DecisionTreeClassifier	`0.22`	`0.90`	`[[4 1][5 0]]`
`9th`	RandomForest ( `AdaBoostClassifier`)	`0.25`	`1.0`	`[[5 0][5 0]]`
`9th`	SVM-linear (`SVC`)	`0.25`	`1.0`	`[[5 0][5 0]]`
`9th`	SVM-Nonlinear (`NuSVC`)	`0.78`	`1.0`	`[[5 0][4 1]]`

`10th`	kNN `(wt = distance)`	`0.92`	`0.60`	`[[5 4][0 1]]`
`10th`	kNN `(wt = uniform)`	`0.93`	`0.7`	`[[6 3][0 1]]`
`10th`	GaussianNB	`0.93`	`0.80`	`[[7 2][0 1]]`
`10th`	DecisionTreeClassifier	`0.91`	`0.30`	`[[2 7][0 1]]`
`10th`	RandomForest ( `AdaBoostClassifier`)	`0.91`	`0.2`	`[[1 8][0 1]]`
`10th`	SVM-linear (`SVC`)	`0.01`	`0.1`	`[[0 9][0 1]]`
`10th`	SVM-Nonlinear (`NuSVC`)	`0.92`	`0.5`	`[[4 5][0 1]]`

`11th`	kNN `(wt = distance)`	`0.43`	`0.80`	`[[2 2][4 2]]`
`11th`	kNN `(wt = uniform)`	`0.78`	`1.0`	`[[4 0][5 1]]`
`11th`	GaussianNB	`0.45`	`0.90`	`[[3 1][5 1]]`
`11th`	DecisionTreeClassifier	`0.31`	`0.80`	`[[2 2][5 1]]`
`11th`	RandomForest ( `AdaBoostClassifier`)	`0.31`	`0.80`	`[[2 2][5 1]]`
`11th`	SVM-linear (`SVC`)	`0.16`	`1.0`	`[[4 0][6 0]]`
`11th`	SVM-Nonlinear (`NuSVC`)	`0.45`	`0.90`	`[[3 1][5 1]]`

`12th`	kNN `(wt = distance)`	`0.60`	`0.80`	`[[4 2][2 2]]`
`12th`	kNN `(wt = uniform)`	`0.60`	`0.80`	`[[4 2][2 2]]`
`12th`	GaussianNB	`0.33`	`0.90`	`[[5 1][4 0]]`
`12th`	DecisionTreeClassifier	`0.57`	`0.90`	`[[5 1][3 1]]`
`12th`	RandomForest ( `AdaBoostClassifier`)	`0.80`	`1.0`	`[[6 0][3 1]]`
`12th`	SVM-linear (`SVC`)	`0.16`	`0.4`	`[[0 6][0 4]]`
`12th`	SVM-Nonlinear (`NuSVC`)	`0.30`	`0.80`	`[[4 2][4 0]]`

For Entire Dataset

scaled the features on MinMaxScaler
performed k=4 fold Cross-Validation
Used same classifiers
Explicit OverallClassifierReport

Fold #	Algorithm	Avg Precision	Avg Accuracy (adjusted)	Confusion Matrix `[[TP FN][FP TN]]`
`1st`	kNN `(wt = distance)`	`0.97`	`0.95`	`[[74 4][ 1 1]]`
`2nd`	kNN `(wt = distance)`	`0.93`	`0.9875`	`[[76 1][ 3 0]]`
`3rd`	kNN `(wt = distance)`	`0.90`	`0.9875`	`[[75 1][ 4 0]]`
`4th`	kNN `(wt = distance)`	`0.79`	`0.9875`	`[[66 1][12 1]]`

`1st`	kNN `(wt = uniform)`	`0.97`	`0.975`	`[[76 2][ 1 1]]`
`2nd`	kNN `(wt = uniform)`	`0.93`	`1.00`	`[[77 0][ 3 0]]`
`3rd`	kNN `(wt = uniform)`	`0.90`	`1.00`	`[[76 0][ 4 0]]`
`4th`	kNN `(wt = uniform)`	`0.79`	`0.9875`	`[[66 1][12 1]]`

`1st`	GaussianNB	`0.96`	`0.9125`	`[[71 7][ 1 1]]`
`2nd`	GaussianNB	`0.96`	`0.95`	`[[73 4][ 1 2]]`
`3rd`	GaussianNB	`0.90`	`1.00`	`[[76 0][ 4 0]]`
`4th`	GaussianNB	`0.64`	`0.5875`	`[[14 53][ 5 8]]`

`1st`	DecisionTreeClassifier	`0.95`	`0.875`	`[[68 10][ 2 0]]`
`2nd`	DecisionTreeClassifier	`0.92`	`0.9125`	`[[70 7][ 3 0]]`
`3rd`	DecisionTreeClassifier	`0.93`	`0.8625`	`[[65 11][ 2 2]]`
`4th`	DecisionTreeClassifier	`0.82`	`0.975`	`[[65 2][10 3]]`

`1st`	RandomForest ( `AdaBoostClassifier`)	`0.95`	`0.9625`	`[[75 3][ 2 0]]`
`2nd`	RandomForest ( `AdaBoostClassifier`)	`0.93`	`0.9875`	`[[76 1][ 3 0]]`
`3rd`	RandomForest ( `AdaBoostClassifier`)	`0.90`	`1.0`	`[[76 0][ 4 0]]`
`4th`	RandomForest ( `AdaBoostClassifier`)	`0.82`	`0.975`	`[[65 2][10 3]]`

`1st`	SVM-linear (`SVC`)	`0.95`	`1.0`	`[[78 0][ 2 0]]`
`2nd`	SVM-linear (`SVC`)	`0.93`	`1.0`	`[[77 0][ 3 0]]`
`3rd`	SVM-linear (`SVC`)	`0.90`	`1.0`	`[[76 0][ 4 0]]`
`4th`	SVM-linear (`SVC`)	`0.70`	`1.0`	`[[67 0][13 0]]`

`1st`	SVM-Nonlinear (`NuSVC`) [^note-1]	`-`	`-`	`-`
`2nd`	SVM-Nonlinear (`NuSVC`) [^note-1]	`-`	`-`	`-`
`3rd`	SVM-Nonlinear (`NuSVC`) [^note-1]	`-`	`-`	`-`
`4th`	SVM-Nonlinear (`NuSVC`) [^note-1]	`-`	`-`	`-`

Classifier Details

Classifier Name	Classifier Parameters
kNN `(wt = distance)`	`(algorithm=auto, leaf_size=30, metric=minkowski, n_neighbors=3, p=2, weights=distance)`
kNN `(wt = uniform)`	`(algorithm=auto, leaf_size=30, metric=minkowski, n_neighbors=3, p=2, weights=uniform)`
GaussianNB	`default`
DecisionTreeClassifier	`(compute_importances=None, criterion=gini, max_depth=None, max_features=None, min_density=None, min_samples_leaf=1, min_samples_split=2, random_state=None, splitter=best)`
RandomForest ( `AdaBoostClassifier`)	AdaBoostClassifier(algorithm=SAMME, base_estimator=DecisionTreeClassifier(compute_importances=None, criterion=gini, max_depth=1, max_features=None, min_density=None, min_samples_leaf=1, min_samples_split=2, random_state=None, splitter=best), base_estimator__compute_importances=None, base_estimator__criterion=gini, base_estimator__max_depth=1, base_estimator__max_features=None, base_estimator__min_density=None, base_estimator__min_samples_leaf=1, base_estimator__min_samples_split=2, base_estimator__random_state=None, base_estimator__splitter=best, learning_rate=1.0, n_estimators=200, random_state=None)
SVM-linear (`SVC`)	`(C=1.0, cache_size=200, class_weight=None, coef0=0.0, degree=3, gamma=0.0, kernel=rbf, max_iter=-1, probability=False, random_state=None, shrinking=True, tol=0.001, verbose=False)`
SVM-Nonlinear (`NuSVC`)	`(cache_size=200, coef0=0.0, degree=3, gamma=0.0, kernel=rbf, max_iter=-1, nu=0.5, probability=False, random_state=None, shrinking=True, tol=0.001, verbose=False)`

[^note-1]: Some bug in the Non-Linear SVM implementation for all apps needs to be resolved. Currently it exits with an error ValueError: specified nu is infeasible
[^note-2]: For some boolean variables no density plots are generated using kde, but for others it does. We don't really know why. Would love your comment on it. The general culprits seem to be hasDeveloperEmail and hasDeveloperWebsite. hasPrivacy works mostly fine.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ProgressReport

Progress Report

Earlier Feedback

Executive Summary

Findings

Data Preparation

Univariate Exploration

Univariate Conclusion

Bivariate Exploration

Clustering

MDS Plot

Dendrogram

Classifier Output

Splitting Dataset into Equal `Fair/Unfair` ratio

For Entire Dataset

Classifier Details

Clone this wiki locally

ProgressReport

Progress Report

Earlier Feedback

Executive Summary

Findings

Data Preparation

Univariate Exploration

Univariate Conclusion

Bivariate Exploration

Clustering

MDS Plot

Dendrogram

Classifier Output

Splitting Dataset into Equal Fair/Unfair ratio

For Entire Dataset

Classifier Details

Clone this wiki locally

Splitting Dataset into Equal `Fair/Unfair` ratio