Skip to content

On the Use of Information Theory to Bound the Effectiveness of Unsupervised Software Retrieval

David N. Palacio edited this page Jan 13, 2021 · 16 revisions

Approach: Information Transmission to Understand Unsupervised Retrieval

Theoretically, software requirements are able to be transformed into multiple forms of information such as source code, test cases, or design artifacts. We are referring to those requirements or any initial/raw form of information as the source artifacts. On the other hand, the information that is a product of a transformation or alteration is considered a target artifact. In the software engineering context, a transformation could be any action or intervention that a software engineer applies to those requirements. Therefore, the concept of "transmission of information" can be seen as the programming task per se or any generative process. Such a generative process is producing the target artifacts.

The Manifold of Information Measures

Experiments

Baseline 0.0.0/1 [or libest]

Exploratory Data Analysis for Interpretability

Manifold Analysis for Entropy Measures Manifold Analysis for Entropy Measures

Scatter Matrix for Minimum Shared Entropy/Extropy

Manifold Entropy Measures Distributions

Shared Information Distributions

Manifold Entropy by Ground Truth

Shared Information by Ground Truth

Supervised Evaluation

Word2Vec Precision-Recall-ROC

Doc2Vec Precion-Recall-ROC

Word2Vec Precision-Recall-Gain [WMD]

Entropy vs Distance

Corr WMD vs MSI-I

Corr WMD vs MSI-X

Mutual Information

Corr WMD vs MI

Composable Manifolds

Mutual Information - WMD Group by Loss

Mutual Information - WMD Group by Noise

Baseline 0.0.2/3 [or sacp]

Exploratory Data Analysis for Interpretability

Manifold Analysis for Entropy Measures Manifold Analysis for Entropy Measures

Scatter Matrix for Minimum Shared Entropy/Extropy

Manifold Entropy Measures Distributions

Shared Information Distributions

Manifold Entropy by Ground Truth

Shared Information by Ground Truth

Supervised Evaluation

Word2Vec Precision-Recall-ROC

Doc2Vec Precion-Recall-ROC

Word2Vec Precision-Recall-Gain [WMD]

Entropy vs Distance

Corr WMD vs MSI-I

Corr WMD vs MSI-X

Mutual Information

Corr WMD vs MI

Composable Manifolds

Mutual Information - WMD Group by Loss

Mutual Information - WMD Group by Noise