Skip to content

Commit

Permalink
clustering docstrings
Browse files Browse the repository at this point in the history
  • Loading branch information
RobinL committed Jul 14, 2024
1 parent 02de96d commit d2b96a8
Showing 1 changed file with 12 additions and 9 deletions.
21 changes: 12 additions & 9 deletions splink/internals/linker_components/clustering.py
Original file line number Diff line number Diff line change
Expand Up @@ -97,16 +97,17 @@ def _compute_metrics_nodes(
"""
Internal function for computing node-level metrics.
Accepts outputs of `linker.predict()` and
`linker.cluster_pairwise_at_threshold()`, along with the clustering threshold
and produces a table of node metrics.
Accepts outputs of `linker.inference.predict()` and
`linker.clustering.cluster_pairwise_at_threshold()`, along with the clustering
threshold and produces a table of node metrics.
Node metrics produced:
* node_degree (absolute number of neighbouring nodes)
Output table has a single row per input node, along with the cluster id (as
assigned in `linker.cluster_pairwise_at_threshold()`) and the metric
node_degree:
|-------------------------------------------------|
| composite_unique_id | cluster_id | node_degree |
|---------------------|-------------|-------------|
Expand Down Expand Up @@ -153,9 +154,10 @@ def _compute_metrics_edges(
"""
Internal function for computing edge-level metrics.
Accepts outputs of `linker._compute_node_metrics()`, `linker.predict()` and
`linker.cluster_pairwise_at_threshold()`, along with the clustering threshold
and produces a table of edge metrics.
Accepts outputs of `linker._compute_node_metrics()`,
`linker.inference.predict()` and
`linker.clustering.cluster_pairwise_at_threshold()`, along with the clustering
threshold and produces a table of edge metrics.
Uses `igraph` under-the-hood for calculations
Expand Down Expand Up @@ -193,7 +195,8 @@ def _compute_metrics_clusters(
Accepts output of `linker._compute_node_metrics()` (which has the relevant
information from `linker.predict() and
`linker.cluster_pairwise_at_threshold()`), produces a table of cluster metrics.
`linker.clustering.cluster_pairwise_at_threshold()`), produces a table of
cluster metrics.
Cluster metrics produced:
* n_nodes (aka cluster size, number of nodes in cluster)
Expand Down Expand Up @@ -238,9 +241,9 @@ def compute_graph_metrics(
and returns a data class of Splink dataframes
Args:
df_predict (SplinkDataFrame): The results of `linker.predict()`
df_predict (SplinkDataFrame): The results of `linker.inference.predict()`
df_clustered (SplinkDataFrame): The outputs of
`linker.cluster_pairwise_predictions_at_threshold()`
`linker.clustering.cluster_pairwise_predictions_at_threshold()`
threshold_match_probability (float, optional): Filter the pairwise match
predictions to include only pairwise comparisons with a
match_probability at or above this threshold. If not provided, the value
Expand Down

0 comments on commit d2b96a8

Please sign in to comment.