From a81062672cfbedefc23f34ff05d03d47cb9dcfd7 Mon Sep 17 00:00:00 2001 From: zslade Date: Wed, 24 Jan 2024 19:20:38 +0000 Subject: [PATCH 01/46] Start of metrics topic guide --- docs/topic_guides/evaluation/clusters.md | 60 +++++++++++++++++++++++- 1 file changed, 59 insertions(+), 1 deletion(-) diff --git a/docs/topic_guides/evaluation/clusters.md b/docs/topic_guides/evaluation/clusters.md index aae5690d1a..b7da880a61 100644 --- a/docs/topic_guides/evaluation/clusters.md +++ b/docs/topic_guides/evaluation/clusters.md @@ -1,3 +1,61 @@ # Cluster Evaluation -This page is under construction - check back soon! \ No newline at end of file +Graphs provide a natural way to think about linked data. Visualising linked data as a graph and utilising graph metrics are powerful routes to assessing linkage quality, as well as enhancing understanding of datasets and models. Insights gained can be used to refine linking strategies, resulting in more accurate predictions. + +Graph metrics (see below) can be particularly useful for obtaining an overall picture of the quality of clusters generated by a Splink model. For example… + +At the individual cluster level, Splink’s [Cluster Studio Dashboard]() enables users to visualise clusters and interrogate their members and the links between them. Applying metrics to individual clusters can be useful for analysing graphs with many nodes when it can be impossible to spot spurious links by eye alone. + +!!! note + It is important to bear in mind that whilst graph metrics can be very useful for assessing linkage quality, they are rarely definitive, especially when taken in isolation. It is often helpful to consider multiple metrics in conjunction with one another to build a comprehensive picture. + + It is also important to consider metrics within the context of their distribution and the underlying dataset. For example: a cluster density (see below) of 0.4 might seem low but could actually be above average for the dataset in question; a cluster of size 80 might be suspiciously large for one dataset but not for another. + + +Where do we answer the question of: what does good look like? + +## Graph metrics and their application to linked data + +A graph is defined as a collection of points (nodes) connected by lines (edges). In data linking, we refer to these collections of nodes as clusters, within which the nodes represent the entity to be linked (e.g. person or journey) and the edges represent a potential match, together with an associated Splink score. + +[Include picture] + +Graph metrics quantify the characteristics of a graph. A simple example of a graph metric is cluster size which is the number of nodes in a cluster. + +For data linking with Splink, it is useful to sort graph metrics into three categories: cluster metrics, node metrics and edge metrics. These are defined below together with their relevance to data linking. + +### :fontawesome-solid-circle-nodes: Cluster metrics + +Cluster metrics refer to characteristics of a cluster as a whole, rather than the individual nodes and edges it contains. + +#### Example: density +[picture] + +The density of a cluster is given by the number of edges a cluster contains divided by the maximum possible number of edges. Density ranges from 0 to 1. A density of 1 means that all nodes are connected to all other nodes in a cluster. + +Relevance to data linking: A high density (close to 1) is generally good as it means there are many edges in support of the records in a cluster being linked. A low density score might warrant further investigation. + +#### Example: cluster centralisation + +TBC + +### ⚫️ Node metrics + +Node metrics refer to features of the nodes within clusters; for example, node degree which is a count of how many edges (links) are joined to a node. +Example: node degree + +### πŸ”— Edge metrics + +These are a measure of the properties of edges within a cluster. Examples include edge betweeness and bridges* + +#### Example: is bridge + +*acknowledge the slight difference between our definition and the literature. + +## ⚑ How to harness the power of graph metrics with Splink ## + +To enable users to calculate a variety of graph metrics for their linked data, Splink provides the `compute_graph_metrics()` method. + +Other possible things to include: +Querying with linker.sql_query? +We have also made one of the metrics computed so far available to use for sampling in cluster studio dashboard? From 144557107aadae98bfd3dff0f46361f6fcbbb914 Mon Sep 17 00:00:00 2001 From: zslade Date: Tue, 30 Jan 2024 15:07:21 +0000 Subject: [PATCH 02/46] restructure intro --- docs/topic_guides/evaluation/clusters.md | 53 ++++++++++++++---------- 1 file changed, 32 insertions(+), 21 deletions(-) diff --git a/docs/topic_guides/evaluation/clusters.md b/docs/topic_guides/evaluation/clusters.md index b7da880a61..50bfb8a9a9 100644 --- a/docs/topic_guides/evaluation/clusters.md +++ b/docs/topic_guides/evaluation/clusters.md @@ -1,39 +1,42 @@ # Cluster Evaluation -Graphs provide a natural way to think about linked data. Visualising linked data as a graph and utilising graph metrics are powerful routes to assessing linkage quality, as well as enhancing understanding of datasets and models. Insights gained can be used to refine linking strategies, resulting in more accurate predictions. +Graphs provide a natural way to think about linked data. Visualising linked data as a graph and employing graph metrics are powerful ways to evaluate linkage quality. Insights gained can be used to refine linking strategies, resulting in more accurate predictions. -Graph metrics (see below) can be particularly useful for obtaining an overall picture of the quality of clusters generated by a Splink model. For example… +Graph metrics can give a big-picture view of the quality of clusters generated by a Splink model. +For example, a distribution of cluster densities can help us understand whether clusters are unexpectedly sparse, potentially indicating missed links across the board. -At the individual cluster level, Splink’s [Cluster Studio Dashboard]() enables users to visualise clusters and interrogate their members and the links between them. Applying metrics to individual clusters can be useful for analysing graphs with many nodes when it can be impossible to spot spurious links by eye alone. +Metrics can also help home in on problematic clusters, such as those containing inaccurate links. For instance, analysing cluster sizes can reveal outliers, like exceptionally large clusters, that may require closer examination. + +Splink’s [Cluster Studio Dashboard]() empowers users to visualise individual clusters and interrogate the links between their member records. Applying metrics at the individual cluster level within the Dashboard is also useful; for example, for analysing large clusters containing many nodes where it can be impossible to spot spurious links by eye alone. !!! note - It is important to bear in mind that whilst graph metrics can be very useful for assessing linkage quality, they are rarely definitive, especially when taken in isolation. It is often helpful to consider multiple metrics in conjunction with one another to build a comprehensive picture. + It is important to bear in mind that whilst graph metrics can be very useful for assessing linkage quality, they are rarely definitive, especially when taken in isolation. A more comprehensive picture can be built by considering various metrics in conjunction with one another. It is also important to consider metrics within the context of their distribution and the underlying dataset. For example: a cluster density (see below) of 0.4 might seem low but could actually be above average for the dataset in question; a cluster of size 80 might be suspiciously large for one dataset but not for another. +## Graphs and graph metrics -Where do we answer the question of: what does good look like? - -## Graph metrics and their application to linked data - -A graph is defined as a collection of points (nodes) connected by lines (edges). In data linking, we refer to these collections of nodes as clusters, within which the nodes represent the entity to be linked (e.g. person or journey) and the edges represent a potential match, together with an associated Splink score. +For clarity, let us first define what we mean by a graph. A graph is a collection of points (nodes) connected by lines (edges). In data linking, we refer to these collections of nodes as clusters, within which the nodes represent the entity to be linked (e.g. person or journey) and the edges represent a potential match, together with an associated Splink score. -[Include picture] +[Include picture here] -Graph metrics quantify the characteristics of a graph. A simple example of a graph metric is cluster size which is the number of nodes in a cluster. +Graph metrics quantify the characteristics of a graph (a cluster). A simple example of a graph metric is cluster size which is the number of nodes in a cluster. -For data linking with Splink, it is useful to sort graph metrics into three categories: cluster metrics, node metrics and edge metrics. These are defined below together with their relevance to data linking. +For data linking with Splink, it is useful to sort graph metrics into three categories: cluster metrics, node metrics and edge metrics. These are defined below together with examples of each and their application to linked data. The examples given are of all metrics currently available in Splink. ### :fontawesome-solid-circle-nodes: Cluster metrics -Cluster metrics refer to characteristics of a cluster as a whole, rather than the individual nodes and edges it contains. +Cluster metrics refer to the characteristics of a cluster as a whole, rather than the individual nodes and edges it contains. #### Example: density -[picture] The density of a cluster is given by the number of edges a cluster contains divided by the maximum possible number of edges. Density ranges from 0 to 1. A density of 1 means that all nodes are connected to all other nodes in a cluster. -Relevance to data linking: A high density (close to 1) is generally good as it means there are many edges in support of the records in a cluster being linked. A low density score might warrant further investigation. +[picture: edges vs max possible edges] + +When examining linked data, a high density (close to 1) is generally considered good as it means there are many edges in support of the records in a cluster being linked. A low density score could indicate links being missed, which could happen, for example, if blocking rules are too tight or the clustering threshold is too high. + + #### Example: cluster centralisation @@ -42,20 +45,28 @@ TBC ### ⚫️ Node metrics Node metrics refer to features of the nodes within clusters; for example, node degree which is a count of how many edges (links) are joined to a node. -Example: node degree + +#### Example: node degree ### πŸ”— Edge metrics These are a measure of the properties of edges within a cluster. Examples include edge betweeness and bridges* -#### Example: is bridge +#### Example: 'is bridge' *acknowledge the slight difference between our definition and the literature. -## ⚑ How to harness the power of graph metrics with Splink ## +## How to apply graph metrics + +Trying to answer the question of 'what does good look like?' +Specific example of applying graph metrics and feeding back into a linking strategy +(Might be for future if we can give enough guidance above for now) +How to combine graph metrics +Communicate that it's an iterative process of feedback, which builds up expectations +Metrics don't + +## ⚑ How to compute graph metrics with Splink To enable users to calculate a variety of graph metrics for their linked data, Splink provides the `compute_graph_metrics()` method. -Other possible things to include: -Querying with linker.sql_query? -We have also made one of the metrics computed so far available to use for sampling in cluster studio dashboard? + From 36602592f1b716db14388f9c08854f84be70ede8 Mon Sep 17 00:00:00 2001 From: zslade Date: Thu, 1 Feb 2024 11:48:00 +0000 Subject: [PATCH 03/46] update --- docs/topic_guides/evaluation/clusters.md | 100 ++++++++++++++++++----- 1 file changed, 79 insertions(+), 21 deletions(-) diff --git a/docs/topic_guides/evaluation/clusters.md b/docs/topic_guides/evaluation/clusters.md index 50bfb8a9a9..34bba81106 100644 --- a/docs/topic_guides/evaluation/clusters.md +++ b/docs/topic_guides/evaluation/clusters.md @@ -2,41 +2,76 @@ Graphs provide a natural way to think about linked data. Visualising linked data as a graph and employing graph metrics are powerful ways to evaluate linkage quality. Insights gained can be used to refine linking strategies, resulting in more accurate predictions. -Graph metrics can give a big-picture view of the quality of clusters generated by a Splink model. -For example, a distribution of cluster densities can help us understand whether clusters are unexpectedly sparse, potentially indicating missed links across the board. +Graph metrics can help determine a big-picture view of the quality of clusters generated by a Splink model. For example, the distribution of cluster sizes can reveal outliers, such as very large clusters, that may require closer examination. -Metrics can also help home in on problematic clusters, such as those containing inaccurate links. For instance, analysing cluster sizes can reveal outliers, like exceptionally large clusters, that may require closer examination. +Metrics can also help us to home in on problematic clusters, such as those containing inaccurate links. For example, the 'is bridge' metric (see below) can be a signaller of false positives. Splink’s [Cluster Studio Dashboard]() empowers users to visualise individual clusters and interrogate the links between their member records. Applying metrics at the individual cluster level within the Dashboard is also useful; for example, for analysing large clusters containing many nodes where it can be impossible to spot spurious links by eye alone. +Should we be saying this if the metrics aren't available in the dashboard yet? !!! note It is important to bear in mind that whilst graph metrics can be very useful for assessing linkage quality, they are rarely definitive, especially when taken in isolation. A more comprehensive picture can be built by considering various metrics in conjunction with one another. It is also important to consider metrics within the context of their distribution and the underlying dataset. For example: a cluster density (see below) of 0.4 might seem low but could actually be above average for the dataset in question; a cluster of size 80 might be suspiciously large for one dataset but not for another. -## Graphs and graph metrics +## Graphs -For clarity, let us first define what we mean by a graph. A graph is a collection of points (nodes) connected by lines (edges). In data linking, we refer to these collections of nodes as clusters, within which the nodes represent the entity to be linked (e.g. person or journey) and the edges represent a potential match, together with an associated Splink score. +For clarity, let us first define what we mean by a graph. A graph is a collection of points (nodes) connected by lines (edges). [Include picture here] -Graph metrics quantify the characteristics of a graph (a cluster). A simple example of a graph metric is cluster size which is the number of nodes in a cluster. +In data linking, we refer to these collections of nodes as clusters, within which the nodes represent the entity to be linked (e.g. person or journey) and the edges represent a potential match. -For data linking with Splink, it is useful to sort graph metrics into three categories: cluster metrics, node metrics and edge metrics. These are defined below together with examples of each and their application to linked data. The examples given are of all metrics currently available in Splink. +[Include picture here] + +Edges come with an associate Splink score (the probability of two records being a match). This makes graphs (clusters) produced by Splink so called weighted graphs, as each edge has a weight (Splink score). + +Graphs can also be directed or undirected. Directed (undirected) graphs are those in which edges (do not) have an associated direction. For example, linking court cases/journeys. +Linking person records, where the ... + +[insert image] + +Impact of directed versus non-directed on the definitions below... + +Other graph properties like loops etc +Loops when it comes to cross dataset linking and deduping... +Does this apply to all cases of data linking? + +## Graph metrics + +Graph metrics quantify the characteristics of a graph (a cluster). A simple example of a graph metric is cluster size, which is the number of nodes in a cluster. + +For data linking with Splink, it is useful to sort graph metrics into three categories: cluster metrics, node metrics and edge metrics. These are defined below together with examples of each and their application to linked data. The examples given are of all metrics currently available to compute in Splink. +What about availability to use in the dashboard? ### :fontawesome-solid-circle-nodes: Cluster metrics Cluster metrics refer to the characteristics of a cluster as a whole, rather than the individual nodes and edges it contains. +#### Example: size + +Cluster size is defined as the number of nodes a cluster contains. + +Big clusters, max size +Small clusters, min size +Modal size +What constitute 'big' is dataset + #### Example: density -The density of a cluster is given by the number of edges a cluster contains divided by the maximum possible number of edges. Density ranges from 0 to 1. A density of 1 means that all nodes are connected to all other nodes in a cluster. +The density of a cluster is given by the number of edges a cluster contains divided by the maximum possible number of edges. -[picture: edges vs max possible edges] +[insert formula] -When examining linked data, a high density (close to 1) is generally considered good as it means there are many edges in support of the records in a cluster being linked. A low density score could indicate links being missed, which could happen, for example, if blocking rules are too tight or the clustering threshold is too high. +Density ranges from 0 to 1. A density of 1 means that all nodes are connected to all other nodes in a cluster. +[picture: edges vs max possible edges] +When examining linked data, a high density (close to 1) is generally considered good as it means there are many edges in support of the records in a cluster being linked. +A low density score could indicate links being missed - why aren't more links being formed between records? - data quality? Blokcing rules too tight?, which could happen, for example, if blocking rules are too tight or the clustering threshold is too high. + +Explain the relationship between density and cluster size and it's consequences. +Stratified sampling in cluster studio. #### Example: cluster centralisation @@ -44,29 +79,52 @@ TBC ### ⚫️ Node metrics -Node metrics refer to features of the nodes within clusters; for example, node degree which is a count of how many edges (links) are joined to a node. +Node metrics quantify the properties of the nodes within clusters. #### Example: node degree +A node degree is the number of edges (links) connected to a node. + +within clusters or across clusters? + +Nodes with high node degree are more connected; they support many links. + +and can be signalers of more reliable links. + +However, high node degree also places more pressure on a node to be a legitimate member of a cluster as its removal could dramatically change the cluster’s structure. Therefore... + +Low node degree + ### πŸ”— Edge metrics -These are a measure of the properties of edges within a cluster. Examples include edge betweeness and bridges* +Edge metrics quantify the properties of edges within a cluster. #### Example: 'is bridge' -*acknowledge the slight difference between our definition and the literature. +An edge is classified as a bridge if its removal breaks a cluster into two smaller clusters. -## How to apply graph metrics +[insert picture] -Trying to answer the question of 'what does good look like?' -Specific example of applying graph metrics and feeding back into a linking strategy -(Might be for future if we can give enough guidance above for now) -How to combine graph metrics -Communicate that it's an iterative process of feedback, which builds up expectations -Metrics don't +Bridges can be signalers of false positives in linked data, especially when joining two highly connected clusters, so... + +Where to acknowledge the slight difference between our definition and the literature/directed etc? + +## How to evaluate cluster quality + +- Trying to answer the question of 'what does good look like?' +- General points about what to expect - 'unexpectedly sparse' - how do we know what is reasonable before making the clusters? +-- Previous model runs, labelled data, a gold standard +-- Build up intuition - iterative process of feedback which builds up expectations + +This is not intended to be a complete guide to... +best case scenario - labelled data (gold standard, synthetic) +Linked/deduplicated similar datasets before +Knowledge of datasets to give an idea of what's expected +No idea but can build up expectations + +Reiterate that metrics don't give the whole answer. ## ⚑ How to compute graph metrics with Splink To enable users to calculate a variety of graph metrics for their linked data, Splink provides the `compute_graph_metrics()` method. - From e705a1a3c54a50e3872614186d5765e22910b0f4 Mon Sep 17 00:00:00 2001 From: zslade Date: Thu, 1 Feb 2024 18:55:09 +0000 Subject: [PATCH 04/46] rearrange and fill in gaps --- docs/topic_guides/evaluation/clusters.md | 114 ++++++++++++----------- 1 file changed, 59 insertions(+), 55 deletions(-) diff --git a/docs/topic_guides/evaluation/clusters.md b/docs/topic_guides/evaluation/clusters.md index 34bba81106..c12e378aab 100644 --- a/docs/topic_guides/evaluation/clusters.md +++ b/docs/topic_guides/evaluation/clusters.md @@ -4,17 +4,30 @@ Graphs provide a natural way to think about linked data. Visualising linked data Graph metrics can help determine a big-picture view of the quality of clusters generated by a Splink model. For example, the distribution of cluster sizes can reveal outliers, such as very large clusters, that may require closer examination. -Metrics can also help us to home in on problematic clusters, such as those containing inaccurate links. For example, the 'is bridge' metric (see below) can be a signaller of false positives. +Metrics can also help us to home in on problematic clusters, such as those containing inaccurate links. For example, the 'is bridge' metric (see below) can be a signaller of false positives. -Splink’s [Cluster Studio Dashboard]() empowers users to visualise individual clusters and interrogate the links between their member records. Applying metrics at the individual cluster level within the Dashboard is also useful; for example, for analysing large clusters containing many nodes where it can be impossible to spot spurious links by eye alone. -Should we be saying this if the metrics aren't available in the dashboard yet? +Splink’s [Cluster Studio Dashboard]() empowers users to visualise individual clusters and interrogate the links between their member records, as well as view graph metrics of individual clusters. -!!! note - It is important to bear in mind that whilst graph metrics can be very useful for assessing linkage quality, they are rarely definitive, especially when taken in isolation. A more comprehensive picture can be built by considering various metrics in conjunction with one another. - - It is also important to consider metrics within the context of their distribution and the underlying dataset. For example: a cluster density (see below) of 0.4 might seem low but could actually be above average for the dataset in question; a cluster of size 80 might be suspiciously large for one dataset but not for another. +This topic guide... -## Graphs +## Evaluating cluster quality + +Determining what makes a 'good' cluster is not straightforward and will look different for different datasets and use cases. + +In ideal circumstance, we'd already have some idea of what good quality clusters look like. This could be from having labelled data. +Perhaps linked and quality assured similar datasets before or having SME knowledge of datasets can also help set expectations... +However in a lot of cases, no prior knowledge. + +This means that cluster evaluation is often an iterative process in which we develop an understanding/buildup an intuition of what good looks like for our linkage by using graph metrics to guide our QA efforts. Lessons learnt, affirmative or negative, then fed back and update. Use metrics to get traction + +Even then, graph metrics are not a one stop shop when it comes to cluster evaluation. +It is important to bear in mind that whilst graph metrics can be very useful for assessing linkage quality, they are rarely definitive, especially when taken in isolation. A more comprehensive picture can be built by considering various metrics in conjunction with one another. + +It is also important to consider metrics within the context of their distribution and the underlying dataset. For example: a cluster density (see below) of 0.4 might seem low but could actually be above average for the dataset in question; a cluster of size 80 might be suspiciously large for one dataset but not for another. + +However, they are still great + +## Linked data as graphs For clarity, let us first define what we mean by a graph. A graph is a collection of points (nodes) connected by lines (edges). @@ -26,56 +39,62 @@ In data linking, we refer to these collections of nodes as clusters, within whic Edges come with an associate Splink score (the probability of two records being a match). This makes graphs (clusters) produced by Splink so called weighted graphs, as each edge has a weight (Splink score). -Graphs can also be directed or undirected. Directed (undirected) graphs are those in which edges (do not) have an associated direction. For example, linking court cases/journeys. -Linking person records, where the ... +Graphs can also be directed or undirected. Directed (undirected) graphs are those in which edges (do not) have an associated direction. For example, ... [insert image] -Impact of directed versus non-directed on the definitions below... +[Impact of directed versus non-directed on the definitions below...] +[Are there any differences between our definitions and those in the literature?] -Other graph properties like loops etc -Loops when it comes to cross dataset linking and deduping... -Does this apply to all cases of data linking? +Other properties of graphs such as self-loops and multi-edges are not be present in clusters produced with Splink. ## Graph metrics Graph metrics quantify the characteristics of a graph (a cluster). A simple example of a graph metric is cluster size, which is the number of nodes in a cluster. -For data linking with Splink, it is useful to sort graph metrics into three categories: cluster metrics, node metrics and edge metrics. These are defined below together with examples of each and their application to linked data. The examples given are of all metrics currently available to compute in Splink. -What about availability to use in the dashboard? +For data linking with Splink, it is useful to sort graph metrics into three categories: +- cluster metrics, +- node metrics, and +- edge metrics + +Each of these are defined below together with examples and how they can be applied to linked data to evaluate cluster quality. The examples given are of all metrics currently available in Splink. ### :fontawesome-solid-circle-nodes: Cluster metrics Cluster metrics refer to the characteristics of a cluster as a whole, rather than the individual nodes and edges it contains. -#### Example: size +#### Example: cluster size + +Cluster size is defined as the number of nodes within a cluster. -Cluster size is defined as the number of nodes a cluster contains. +When thinking about cluster size, one important thing to consider is the size of the biggest clusters produced and to ask - does this seem reasonable for the dataset being linked? For example, does it make sense that one person is appearing hundreds of times in the data resulting in a cluster of size 100+ nodes? If the answer is no, then false positives links are probably being formed. This could be due to having blocking rules which are too loose or the clustering threshold which is too low. -Big clusters, max size -Small clusters, min size -Modal size -What constitute 'big' is dataset +If you don't have prior knowledge of what seems reasonable, then it is worth inspecting a sample of the largest clusters in Splink Cluster Studio to validate or invalidate links[link to guidance]. From there you can develop an understanding of what maximum cluster size to expect. -#### Example: density +There also might be an expected cutoff on minimum cluster size. For example, when linking two datasets in which you know people appear least once in each, the minimum expected size of cluster will be 2. Clusters smaller than the minimum size indicate links have been missed. This could be due to blocking rules not letting through comparisons on true matches. -The density of a cluster is given by the number of edges a cluster contains divided by the maximum possible number of edges. +Lisewise, the modal cluster size... -[insert formula] -Density ranges from 0 to 1. A density of 1 means that all nodes are connected to all other nodes in a cluster. +#### Example: cluster density + +The density of a cluster is given by the number of edges it contains divided by the maximum possible number of edges. Density ranges from 0 to 1. A density of 1 means that all nodes are connected to all other nodes in a cluster. [picture: edges vs max possible edges] -When examining linked data, a high density (close to 1) is generally considered good as it means there are many edges in support of the records in a cluster being linked. -A low density score could indicate links being missed - why aren't more links being formed between records? - data quality? Blokcing rules too tight?, which could happen, for example, if blocking rules are too tight or the clustering threshold is too high. +When evaluating clusters, a high density (close to 1) is generally considered good as it means there are many edges in support of the records in a cluster being linked. + +A low density could indicate links being missed. This could happen for example if blocking rules are too tight or the clustering threshold is too high. +A sample of low density clusters can be inspected in Splink Cluster Studio by choosing [inser option here]. Ask yourself the question: why aren't more links being formed between records? + +It is important to consider cluster density within the context of cluster size. Bigger clusters can have a greater range of densities than smaller ones +This is why `sampling_method = "lowest_density_clusters_by_size"` performs a stratified sample...] -Explain the relationship between density and cluster size and it's consequences. -Stratified sampling in cluster studio. +[Explain the relationship between density and cluster size and it's consequences. Stratified sampling in cluster studio.] #### Example: cluster centralisation -TBC +[TBC] ### ⚫️ Node metrics @@ -87,13 +106,11 @@ A node degree is the number of edges (links) connected to a node. within clusters or across clusters? -Nodes with high node degree are more connected; they support many links. +High node degree also places more pressure on a node to be a legitimate member of a cluster as its removal could dramatically change the cluster’s structure. Therefore... -and can be signalers of more reliable links. +Low node degree -However, high node degree also places more pressure on a node to be a legitimate member of a cluster as its removal could dramatically change the cluster’s structure. Therefore... - -Low node degree +[TBC] ### πŸ”— Edge metrics @@ -101,30 +118,17 @@ Edge metrics quantify the properties of edges within a cluster. #### Example: 'is bridge' -An edge is classified as a bridge if its removal breaks a cluster into two smaller clusters. +An edge is classified as a bridge if its removal splits a cluster into two smaller clusters. [insert picture] -Bridges can be signalers of false positives in linked data, especially when joining two highly connected clusters, so... - -Where to acknowledge the slight difference between our definition and the literature/directed etc? +Bridges can be signalers of false positives in linked data, especially when joining two highly connected clusters. Examining bridges can shed light on potential errors in the linking process leading to false positive links. -## How to evaluate cluster quality +## How to compute graph metrics with Splink -- Trying to answer the question of 'what does good look like?' -- General points about what to expect - 'unexpectedly sparse' - how do we know what is reasonable before making the clusters? --- Previous model runs, labelled data, a gold standard --- Build up intuition - iterative process of feedback which builds up expectations - -This is not intended to be a complete guide to... -best case scenario - labelled data (gold standard, synthetic) -Linked/deduplicated similar datasets before -Knowledge of datasets to give an idea of what's expected -No idea but can build up expectations - -Reiterate that metrics don't give the whole answer. +To enable users to calculate a variety of graph metrics for their linked data, Splink provides the `compute_graph_metrics()` method. -## ⚑ How to compute graph metrics with Splink +Code snippets and outputs -To enable users to calculate a variety of graph metrics for their linked data, Splink provides the `compute_graph_metrics()` method. +## How to Splink Cluster Studio From 455755844ccb9d807cf3f19180f8859247cba75d Mon Sep 17 00:00:00 2001 From: zslade Date: Fri, 2 Feb 2024 09:46:30 +0000 Subject: [PATCH 05/46] updates --- docs/topic_guides/evaluation/clusters.md | 22 ++++++++++++---------- 1 file changed, 12 insertions(+), 10 deletions(-) diff --git a/docs/topic_guides/evaluation/clusters.md b/docs/topic_guides/evaluation/clusters.md index c12e378aab..67bae07088 100644 --- a/docs/topic_guides/evaluation/clusters.md +++ b/docs/topic_guides/evaluation/clusters.md @@ -1,6 +1,6 @@ # Cluster Evaluation -Graphs provide a natural way to think about linked data. Visualising linked data as a graph and employing graph metrics are powerful ways to evaluate linkage quality. Insights gained can be used to refine linking strategies, resulting in more accurate predictions. +Graphs provide a natural way to think about linked data [link to intro page for a refresher]. Visualising linked data as a graph and employing graph metrics are powerful ways to evaluate linkage quality. Insights gained can be used to refine linking strategies, resulting in more accurate predictions. Graph metrics can help determine a big-picture view of the quality of clusters generated by a Splink model. For example, the distribution of cluster sizes can reveal outliers, such as very large clusters, that may require closer examination. @@ -8,26 +8,28 @@ Metrics can also help us to home in on problematic clusters, such as those conta Splink’s [Cluster Studio Dashboard]() empowers users to visualise individual clusters and interrogate the links between their member records, as well as view graph metrics of individual clusters. -This topic guide... - ## Evaluating cluster quality -Determining what makes a 'good' cluster is not straightforward and will look different for different datasets and use cases. +What is a high quality cluster? +When it comes to data linking, the highest quality clusters will be those containing all possible true matches and no false matches (no false positives). +This idealised situation is often not realised in practice, at least not across all clusters (for instance, blocking rules necessary to make computations tractable can prevent record comparisons between some true matches ever being made). +However, graph metrics can help us get closer to a satisfactory level of quality and monitor it going forward. -In ideal circumstance, we'd already have some idea of what good quality clusters look like. This could be from having labelled data. -Perhaps linked and quality assured similar datasets before or having SME knowledge of datasets can also help set expectations... +What does high quality look like for your clusters? +It can be hard to know what good quality looks like for you. +In the rare circumstances that access to labelled data to give an idea of what good looks like and a benchmark that we can iterate towards. However often not the case. +Having experience with linking similar datasets which have undergone quality assurance via clerical review or or having SME knowledge of datasets can also help set expectations... However in a lot of cases, no prior knowledge. +How much variation from this you can tolerate, and in which direction, will depend on use case. - this will inform when you stop iterating -This means that cluster evaluation is often an iterative process in which we develop an understanding/buildup an intuition of what good looks like for our linkage by using graph metrics to guide our QA efforts. Lessons learnt, affirmative or negative, then fed back and update. Use metrics to get traction +This means that cluster evaluation is often an iterative process. An understanding is developed... in which we develop an understanding/buildup an intuition of what good looks like for our linkage by using graph metrics to guide our QA efforts. Lessons learnt, affirmative or negative, then fed back and update. Use metrics to get traction Even then, graph metrics are not a one stop shop when it comes to cluster evaluation. It is important to bear in mind that whilst graph metrics can be very useful for assessing linkage quality, they are rarely definitive, especially when taken in isolation. A more comprehensive picture can be built by considering various metrics in conjunction with one another. It is also important to consider metrics within the context of their distribution and the underlying dataset. For example: a cluster density (see below) of 0.4 might seem low but could actually be above average for the dataset in question; a cluster of size 80 might be suspiciously large for one dataset but not for another. -However, they are still great - -## Linked data as graphs +## Linked data as graphs <-- to go somewhere upstream For clarity, let us first define what we mean by a graph. A graph is a collection of points (nodes) connected by lines (edges). From 5a185879889796527b9503cbf8c4d1e883e15a68 Mon Sep 17 00:00:00 2001 From: zslade Date: Mon, 5 Feb 2024 13:38:57 +0000 Subject: [PATCH 06/46] merge latest --- docs/comparison_level_composition.md | 7 ++++++- docs/comparison_level_library.md | 18 +++++++++++++++++- docs/comparison_library.md | 14 +++++++++++++- docs/comparison_template_library.md | 9 ++++++++- docs/datasets.md | 16 ++++++++++++++-- 5 files changed, 58 insertions(+), 6 deletions(-) diff --git a/docs/comparison_level_composition.md b/docs/comparison_level_composition.md index b0487df816..9ec815dee5 100644 --- a/docs/comparison_level_composition.md +++ b/docs/comparison_level_composition.md @@ -13,7 +13,12 @@ For example, `or_(null_level("first_name"), null_level("surname"))` creates a ch The Splink comparison level composition functions available for each SQL dialect are as given in this table: -{% include-markdown "./includes/generated_files/comparison_composition_library_dialect_table.md" %} +||:simple-duckdb:
DuckDB|:simple-apachespark:
Spark|:simple-amazonaws:
Athena|:simple-sqlite:
SQLite|:simple-postgresql:
PostgreSql| +|:-:|:-:|:-:|:-:|:-:|:-:| +|[and_](#splink.comparison_level_composition.and_)|βœ“|βœ“|βœ“|βœ“|βœ“| +|[not_](#splink.comparison_level_composition.not_)|βœ“|βœ“|βœ“|βœ“|βœ“| +|[or_](#splink.comparison_level_composition.or_)|βœ“|βœ“|βœ“|βœ“|βœ“| + diff --git a/docs/comparison_level_library.md b/docs/comparison_level_library.md index 6805395419..c40c84e970 100644 --- a/docs/comparison_level_library.md +++ b/docs/comparison_level_library.md @@ -21,7 +21,23 @@ However, not every comparison level is available for every [Splink-compatible SQ The pre-made Splink comparison levels available for each SQL dialect are as given in this table: -{% include-markdown "./includes/generated_files/comparison_level_library_dialect_table.md" %} +||:simple-duckdb:
DuckDB|:simple-apachespark:
Spark|:simple-amazonaws:
Athena|:simple-sqlite:
SQLite|:simple-postgresql:
PostgreSql| +|:-:|:-:|:-:|:-:|:-:|:-:| +|[array_intersect_level](#splink.comparison_level_library.ArrayIntersectLevelBase)|βœ“|βœ“|βœ“||βœ“| +|[columns_reversed_level](#splink.comparison_level_library.ColumnsReversedLevelBase)|βœ“|βœ“|βœ“|βœ“|βœ“| +|[damerau_levenshtein_level](#splink.comparison_level_library.DamerauLevenshteinLevelBase)|βœ“|βœ“||βœ“|| +|[datediff_level](#splink.comparison_level_library.DatediffLevelBase)|βœ“|βœ“|βœ“||βœ“| +|[distance_function_level](#splink.comparison_level_library.DistanceFunctionLevelBase)|βœ“|βœ“|βœ“|βœ“|βœ“| +|[distance_in_km_level](#splink.comparison_level_library.DistanceInKmLevelBase)|βœ“|βœ“|βœ“||βœ“| +|[else_level](#splink.comparison_level_library.ElseLevelBase)|βœ“|βœ“|βœ“|βœ“|βœ“| +|[exact_match_level](#splink.comparison_level_library.ExactMatchLevelBase)|βœ“|βœ“|βœ“|βœ“|βœ“| +|[jaccard_level](#splink.comparison_level_library.JaccardLevelBase)|βœ“|βœ“|||| +|[jaro_level](#splink.comparison_level_library.JaroLevelBase)|βœ“|βœ“||βœ“|| +|[jaro_winkler_level](#splink.comparison_level_library.JaroWinklerLevelBase)|βœ“|βœ“||βœ“|| +|[levenshtein_level](#splink.comparison_level_library.LevenshteinLevelBase)|βœ“|βœ“|βœ“|βœ“|βœ“| +|[null_level](#splink.comparison_level_library.NullLevelBase)|βœ“|βœ“|βœ“|βœ“|βœ“| +|[percentage_difference_level](#splink.comparison_level_library.PercentageDifferenceLevelBase)|βœ“|βœ“|βœ“|βœ“|βœ“| + diff --git a/docs/comparison_library.md b/docs/comparison_library.md index 459faec833..783c490bda 100644 --- a/docs/comparison_library.md +++ b/docs/comparison_library.md @@ -17,7 +17,19 @@ However, not every comparison is available for every [Splink-compatible SQL back The pre-made Splink comparisons available for each SQL dialect are as given in this table: -{% include-markdown "./includes/generated_files/comparison_library_dialect_table.md" %} +||:simple-duckdb:
DuckDB|:simple-apachespark:
Spark|:simple-amazonaws:
Athena|:simple-sqlite:
SQLite|:simple-postgresql:
PostgreSql| +|:-:|:-:|:-:|:-:|:-:|:-:| +|[array_intersect_at_sizes](#splink.comparison_library.ArrayIntersectAtSizesBase)|βœ“|βœ“|βœ“||βœ“| +|[damerau_levenshtein_at_thresholds](#splink.comparison_library.DamerauLevenshteinAtThresholdsBase)|βœ“|βœ“||βœ“|| +|[datediff_at_thresholds](#splink.comparison_library.DatediffAtThresholdsBase)|βœ“|βœ“|βœ“||βœ“| +|[distance_function_at_thresholds](#splink.comparison_library.DistanceFunctionAtThresholdsBase)|βœ“|βœ“|βœ“|βœ“|βœ“| +|[distance_in_km_at_thresholds](#splink.comparison_library.DistanceInKmAtThresholdsBase)|βœ“|βœ“|βœ“||βœ“| +|[exact_match](#splink.comparison_library.ExactMatchBase)|βœ“|βœ“|βœ“|βœ“|βœ“| +|[jaccard_at_thresholds](#splink.comparison_library.JaccardAtThresholdsBase)|βœ“|βœ“|||| +|[jaro_at_thresholds](#splink.comparison_library.JaroAtThresholdsBase)|βœ“|βœ“||βœ“|| +|[jaro_winkler_at_thresholds](#splink.comparison_library.JaroWinklerAtThresholdsBase)|βœ“|βœ“||βœ“|| +|[levenshtein_at_thresholds](#splink.comparison_library.LevenshteinAtThresholdsBase)|βœ“|βœ“|βœ“|βœ“|βœ“| + diff --git a/docs/comparison_template_library.md b/docs/comparison_template_library.md index cfd5dc8959..f185a0c6f1 100644 --- a/docs/comparison_template_library.md +++ b/docs/comparison_template_library.md @@ -13,7 +13,14 @@ However, not every comparison is available for every [Splink-compatible SQL back The pre-made Splink comparison templates available for each SQL dialect are as given in this table: -{% include-markdown "./includes/generated_files/comparison_template_library_dialect_table.md" %} +||:simple-duckdb:
DuckDB|:simple-apachespark:
Spark|:simple-amazonaws:
Athena|:simple-sqlite:
SQLite|:simple-postgresql:
PostgreSql| +|:-:|:-:|:-:|:-:|:-:|:-:| +|[date_comparison](#splink.comparison_template_library.DateComparisonBase)|βœ“|βœ“|||| +|[email_comparison](#splink.comparison_template_library.EmailComparisonBase)|βœ“|βœ“|||| +|[forename_surname_comparison](#splink.comparison_template_library.ForenameSurnameComparisonBase)|βœ“|βœ“||βœ“|| +|[name_comparison](#splink.comparison_template_library.NameComparisonBase)|βœ“|βœ“||βœ“|| +|[postcode_comparison](#splink.comparison_template_library.PostcodeComparisonBase)|βœ“|βœ“|βœ“||| + diff --git a/docs/datasets.md b/docs/datasets.md index a1a9a17f45..8a372f5f8b 100644 --- a/docs/datasets.md +++ b/docs/datasets.md @@ -48,7 +48,16 @@ which also contains information on available datasets, and which have already be The datasets available are listed below: -{% include-markdown "./includes/generated_files/datasets_table.md" %} +|dataset name|description|rows|unique entities|link to source| +|-|-|-|-|-| +|`fake_1000`|Fake 1000 from splink demos. Records are 250 simulated people, with different numbers of duplicates, labelled.|1,000|250|[source](https://raw.githubusercontent.com/moj-analytical-services/splink_datasets/master/data/fake_1000.csv)| +|`historical_50k`|The data is based on historical persons scraped from wikidata. Duplicate records are introduced with a variety of errors.|50,000|5,156|[source](https://raw.githubusercontent.com/moj-analytical-services/splink_datasets/master/data/historical_figures_with_errors_50k.parquet)| +|`febrl3`|The Freely Extensible Biomedical Record Linkage (FEBRL) datasets consist of comparison patterns from an epidemiological cancer study in Germany.FEBRL3 data set contains 5000 records (2000 originals and 3000 duplicates), with a maximum of 5 duplicates based on one original record.|5,000|2,000|[source](https://raw.githubusercontent.com/moj-analytical-services/splink_datasets/master/data/febrl/dataset3.csv)| +|`febrl4a`|The Freely Extensible Biomedical Record Linkage (FEBRL) datasets consist of comparison patterns from an epidemiological cancer study in Germany.FEBRL4a contains 5000 original records.|5,000|5,000|[source](https://raw.githubusercontent.com/moj-analytical-services/splink_datasets/master/data/febrl/dataset4a.csv)| +|`febrl4b`|The Freely Extensible Biomedical Record Linkage (FEBRL) datasets consist of comparison patterns from an epidemiological cancer study in Germany.FEBRL4b contains 5000 duplicate records, one for each record in FEBRL4a.|5,000|5,000|[source](https://raw.githubusercontent.com/moj-analytical-services/splink_datasets/master/data/febrl/dataset4b.csv)| +|`transactions_origin`|This data has been generated to resemble bank transactions leaving an account. There are no duplicates within the dataset and each transaction is designed to have a counterpart arriving in 'transactions_destination'. Memo is sometimes truncated or missing.|45,326|45,326|[source](https://raw.githubusercontent.com/moj-analytical-services/splink_datasets/master/data/transactions_origin.parquet)| +|`transactions_destination`|This data has been generated to resemble bank transactions arriving in an account. There are no duplicates within the dataset and each transaction is designed to have a counterpart sent from 'transactions_origin'. There may be a delay between the source and destination account, and the amount may vary due to hidden fees and foreign exchange rates. Memo is sometimes truncated or missing.|45,326|45,326|[source](https://raw.githubusercontent.com/moj-analytical-services/splink_datasets/master/data/transactions_destination.parquet)| + ## `splink_dataset_labels` @@ -59,7 +68,10 @@ Some of the `splink_datasets` have corresponding clerical labels to help assess The datasets available are listed below: -{% include-markdown "./includes/generated_files/dataset_labels_table.md" %} +|dataset name|description|rows|unique entities|link to source| +|-|-|-|-|-| +|`fake_1000_labels`|Clerical labels for fake_1000 |3,176|NA|[source](https://raw.githubusercontent.com/moj-analytical-services/splink_datasets/master/data/fake_1000_labels.csv)| + ## `splink_dataset_utils` API From 932528d4f7c90a3bfd2423de2d02f5c6e0b0da1b Mon Sep 17 00:00:00 2001 From: zslade Date: Mon, 5 Feb 2024 16:21:26 +0000 Subject: [PATCH 07/46] split out sections --- .../graph_metrics.md} | 72 +++++-------------- .../evaluation/clusters/how_to_compute.md | 5 ++ .../evaluation/clusters/overview.md | 32 +++++++++ 3 files changed, 56 insertions(+), 53 deletions(-) rename docs/topic_guides/evaluation/{clusters.md => clusters/graph_metrics.md} (52%) create mode 100644 docs/topic_guides/evaluation/clusters/how_to_compute.md create mode 100644 docs/topic_guides/evaluation/clusters/overview.md diff --git a/docs/topic_guides/evaluation/clusters.md b/docs/topic_guides/evaluation/clusters/graph_metrics.md similarity index 52% rename from docs/topic_guides/evaluation/clusters.md rename to docs/topic_guides/evaluation/clusters/graph_metrics.md index 67bae07088..7bfa84e194 100644 --- a/docs/topic_guides/evaluation/clusters.md +++ b/docs/topic_guides/evaluation/clusters/graph_metrics.md @@ -1,35 +1,4 @@ -# Cluster Evaluation - -Graphs provide a natural way to think about linked data [link to intro page for a refresher]. Visualising linked data as a graph and employing graph metrics are powerful ways to evaluate linkage quality. Insights gained can be used to refine linking strategies, resulting in more accurate predictions. - -Graph metrics can help determine a big-picture view of the quality of clusters generated by a Splink model. For example, the distribution of cluster sizes can reveal outliers, such as very large clusters, that may require closer examination. - -Metrics can also help us to home in on problematic clusters, such as those containing inaccurate links. For example, the 'is bridge' metric (see below) can be a signaller of false positives. - -Splink’s [Cluster Studio Dashboard]() empowers users to visualise individual clusters and interrogate the links between their member records, as well as view graph metrics of individual clusters. - -## Evaluating cluster quality - -What is a high quality cluster? -When it comes to data linking, the highest quality clusters will be those containing all possible true matches and no false matches (no false positives). -This idealised situation is often not realised in practice, at least not across all clusters (for instance, blocking rules necessary to make computations tractable can prevent record comparisons between some true matches ever being made). -However, graph metrics can help us get closer to a satisfactory level of quality and monitor it going forward. - -What does high quality look like for your clusters? -It can be hard to know what good quality looks like for you. -In the rare circumstances that access to labelled data to give an idea of what good looks like and a benchmark that we can iterate towards. However often not the case. -Having experience with linking similar datasets which have undergone quality assurance via clerical review or or having SME knowledge of datasets can also help set expectations... -However in a lot of cases, no prior knowledge. -How much variation from this you can tolerate, and in which direction, will depend on use case. - this will inform when you stop iterating - -This means that cluster evaluation is often an iterative process. An understanding is developed... in which we develop an understanding/buildup an intuition of what good looks like for our linkage by using graph metrics to guide our QA efforts. Lessons learnt, affirmative or negative, then fed back and update. Use metrics to get traction - -Even then, graph metrics are not a one stop shop when it comes to cluster evaluation. -It is important to bear in mind that whilst graph metrics can be very useful for assessing linkage quality, they are rarely definitive, especially when taken in isolation. A more comprehensive picture can be built by considering various metrics in conjunction with one another. - -It is also important to consider metrics within the context of their distribution and the underlying dataset. For example: a cluster density (see below) of 0.4 might seem low but could actually be above average for the dataset in question; a cluster of size 80 might be suspiciously large for one dataset but not for another. - -## Linked data as graphs <-- to go somewhere upstream +# Linked data as graphs <-- to go somewhere upstream For clarity, let us first define what we mean by a graph. A graph is a collection of points (nodes) connected by lines (edges). @@ -50,7 +19,7 @@ Graphs can also be directed or undirected. Directed (undirected) graphs are thos Other properties of graphs such as self-loops and multi-edges are not be present in clusters produced with Splink. -## Graph metrics +# Graph metrics Graph metrics quantify the characteristics of a graph (a cluster). A simple example of a graph metric is cluster size, which is the number of nodes in a cluster. @@ -61,24 +30,30 @@ For data linking with Splink, it is useful to sort graph metrics into three cate Each of these are defined below together with examples and how they can be applied to linked data to evaluate cluster quality. The examples given are of all metrics currently available in Splink. -### :fontawesome-solid-circle-nodes: Cluster metrics +!!! note + + It is important to bear in mind that whilst graph metrics can be very useful for assessing linkage quality, they are rarely definitive, especially when taken in isolation. A more comprehensive picture can be built by considering various metrics in conjunction with one another. + + It is important to consider metrics within the context of their distribution and the underlying dataset. For example: a cluster density (see below) of 0.4 might seem low but could actually be above average for the dataset in question; a cluster of size 80 might be suspiciously large for one dataset but not for another. + +## :fontawesome-solid-circle-nodes: Cluster metrics Cluster metrics refer to the characteristics of a cluster as a whole, rather than the individual nodes and edges it contains. -#### Example: cluster size +### Example: cluster size Cluster size is defined as the number of nodes within a cluster. When thinking about cluster size, one important thing to consider is the size of the biggest clusters produced and to ask - does this seem reasonable for the dataset being linked? For example, does it make sense that one person is appearing hundreds of times in the data resulting in a cluster of size 100+ nodes? If the answer is no, then false positives links are probably being formed. This could be due to having blocking rules which are too loose or the clustering threshold which is too low. -If you don't have prior knowledge of what seems reasonable, then it is worth inspecting a sample of the largest clusters in Splink Cluster Studio to validate or invalidate links[link to guidance]. From there you can develop an understanding of what maximum cluster size to expect. +If you don't have an intuition of what seems reasonable, then it is worth inspecting a sample of the largest clusters in Splink Cluster Studio to validate or invalidate links[link to guidance]. From there you can develop an understanding of what maximum cluster size to expect. There also might be an expected cutoff on minimum cluster size. For example, when linking two datasets in which you know people appear least once in each, the minimum expected size of cluster will be 2. Clusters smaller than the minimum size indicate links have been missed. This could be due to blocking rules not letting through comparisons on true matches. -Lisewise, the modal cluster size... +Lisewise, the modal cluster size...bimodal distributions. -#### Example: cluster density +### Example: cluster density The density of a cluster is given by the number of edges it contains divided by the maximum possible number of edges. Density ranges from 0 to 1. A density of 1 means that all nodes are connected to all other nodes in a cluster. @@ -94,15 +69,15 @@ This is why `sampling_method = "lowest_density_clusters_by_size"` performs a str [Explain the relationship between density and cluster size and it's consequences. Stratified sampling in cluster studio.] -#### Example: cluster centralisation +### Example: cluster centralisation [TBC] -### ⚫️ Node metrics +## ⚫️ Node metrics Node metrics quantify the properties of the nodes within clusters. -#### Example: node degree +### Example: node degree A node degree is the number of edges (links) connected to a node. @@ -114,23 +89,14 @@ Low node degree [TBC] -### πŸ”— Edge metrics +## πŸ”— Edge metrics Edge metrics quantify the properties of edges within a cluster. -#### Example: 'is bridge' +### Example: 'is bridge' An edge is classified as a bridge if its removal splits a cluster into two smaller clusters. [insert picture] -Bridges can be signalers of false positives in linked data, especially when joining two highly connected clusters. Examining bridges can shed light on potential errors in the linking process leading to false positive links. - -## How to compute graph metrics with Splink - -To enable users to calculate a variety of graph metrics for their linked data, Splink provides the `compute_graph_metrics()` method. - -Code snippets and outputs - -## How to Splink Cluster Studio - +Bridges can be signalers of false positives in linked data, especially when joining two highly connected clusters. Examining bridges can shed light on potential errors in the linking process leading to false positive links. \ No newline at end of file diff --git a/docs/topic_guides/evaluation/clusters/how_to_compute.md b/docs/topic_guides/evaluation/clusters/how_to_compute.md new file mode 100644 index 0000000000..6e368cf257 --- /dev/null +++ b/docs/topic_guides/evaluation/clusters/how_to_compute.md @@ -0,0 +1,5 @@ +# How to compute graph metrics with Splink + +To enable users to calculate a variety of graph metrics for their linked data, Splink provides the `compute_graph_metrics()` method. + +Code snippets and outputs \ No newline at end of file diff --git a/docs/topic_guides/evaluation/clusters/overview.md b/docs/topic_guides/evaluation/clusters/overview.md new file mode 100644 index 0000000000..34d74a310b --- /dev/null +++ b/docs/topic_guides/evaluation/clusters/overview.md @@ -0,0 +1,32 @@ +# Cluster Evaluation + +Graphs provide a natural way to think about linked data (see [link to intro page] for a refresher). Visualising linked data as a graph and employing graph metrics are powerful ways to evaluate linkage quality. + +Graph metrics help to give a big-picture view of the clusters generated by a Splink model. Through metric distributions and statistics, we can gauge the quality of clusters and monitor how adjustments to models impact results. + + +Graph metrics can also help us home in on problematic clusters, such as those containing inaccurate links (false positives). Clusters can be spot-checked with Splink’s [Cluster Studio Dashboard]() which enables users to visualise individual clusters and interrogate the links between their member records. + + + + +## Evaluating cluster quality + +### What is a high quality cluster? +When it comes to data linking, the highest quality clusters will be those containing all possible true matches (no false negatives) and no false matches (no false positives). +This idealised situation is rarely realised in practice, at least not across all clusters generated. +Blocking rules, necessary to make computations tractable, can prevent record comparisons between some true matches ever being made. Limitations of data and on resources can place an upper limit on the level of quality possible to achieve. +However, graph metrics can help us get closer to a satisfactory level of quality and monitor it going forward. + +### What does good look like for you? + +The extent of cluster evaluation efforts and what is considered good enough will vary greatly with linkage use case. + +Might have a gold standard or labeled data or other model you want to be just as good as...which gives a clear something to aim towards. + +It can be difficult to make a judgement on quality without prior... + +Using domain knowledge can help set expectations of what is reasonable to expect. For example,... + +This guide is intended to help users to obtain a better understanding of better understand the over all shape, build up an idea of their clusters, identify potential problematic clusters, where to target their efforts, regardless of prior knowledge. +And in that way can create an expectation/baseline of what good looks like. From 253f26a3346460691a2c474390a534cb8141b492 Mon Sep 17 00:00:00 2001 From: zslade Date: Mon, 5 Feb 2024 17:53:49 +0000 Subject: [PATCH 08/46] fix sections --- mkdocs.yml | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/mkdocs.yml b/mkdocs.yml index 87a38f95e9..cda2bbcabb 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -164,7 +164,10 @@ nav: - Overview: "topic_guides/evaluation/overview.md" - Model: "topic_guides/evaluation/model.md" - Edges (Links): "topic_guides/evaluation/edges.md" - - Clusters: "topic_guides/evaluation/clusters.md" + - Clusters: + - Overview: "topic_guides/evaluation/clusters/overview.md" + - Graph metrics: "topic_guides/evaluation/clusters/graph_metrics.md" + - How to compute graph metrics: "topic_guides/evaluation/clusters/how_to_compute_metrics.md" - Performance: - Run times, performance and linking large data: "topic_guides/performance/drivers_of_performance.md" - Spark Performance: From 232a14a1ff7fa943d3b12e590e11d56562ae8591 Mon Sep 17 00:00:00 2001 From: zslade Date: Mon, 5 Feb 2024 17:54:02 +0000 Subject: [PATCH 09/46] Update sections --- .../evaluation/clusters/graph_metrics.md | 29 ++----------- ...o_compute.md => how_to_compute_metrics.md} | 0 .../evaluation/clusters/overview.md | 42 ++++++++++++++----- 3 files changed, 36 insertions(+), 35 deletions(-) rename docs/topic_guides/evaluation/clusters/{how_to_compute.md => how_to_compute_metrics.md} (100%) diff --git a/docs/topic_guides/evaluation/clusters/graph_metrics.md b/docs/topic_guides/evaluation/clusters/graph_metrics.md index 7bfa84e194..054bf717e6 100644 --- a/docs/topic_guides/evaluation/clusters/graph_metrics.md +++ b/docs/topic_guides/evaluation/clusters/graph_metrics.md @@ -1,32 +1,11 @@ -# Linked data as graphs <-- to go somewhere upstream - -For clarity, let us first define what we mean by a graph. A graph is a collection of points (nodes) connected by lines (edges). - -[Include picture here] - -In data linking, we refer to these collections of nodes as clusters, within which the nodes represent the entity to be linked (e.g. person or journey) and the edges represent a potential match. - -[Include picture here] - -Edges come with an associate Splink score (the probability of two records being a match). This makes graphs (clusters) produced by Splink so called weighted graphs, as each edge has a weight (Splink score). - -Graphs can also be directed or undirected. Directed (undirected) graphs are those in which edges (do not) have an associated direction. For example, ... - -[insert image] - -[Impact of directed versus non-directed on the definitions below...] -[Are there any differences between our definitions and those in the literature?] - -Other properties of graphs such as self-loops and multi-edges are not be present in clusters produced with Splink. - # Graph metrics -Graph metrics quantify the characteristics of a graph (a cluster). A simple example of a graph metric is cluster size, which is the number of nodes in a cluster. +Graph metrics quantify the characteristics of a graph. A simple example of a graph metric is cluster size, which is the number of nodes in a cluster. For data linking with Splink, it is useful to sort graph metrics into three categories: -- cluster metrics, -- node metrics, and -- edge metrics +* cluster metrics, +* node metrics, and +* edge metrics Each of these are defined below together with examples and how they can be applied to linked data to evaluate cluster quality. The examples given are of all metrics currently available in Splink. diff --git a/docs/topic_guides/evaluation/clusters/how_to_compute.md b/docs/topic_guides/evaluation/clusters/how_to_compute_metrics.md similarity index 100% rename from docs/topic_guides/evaluation/clusters/how_to_compute.md rename to docs/topic_guides/evaluation/clusters/how_to_compute_metrics.md diff --git a/docs/topic_guides/evaluation/clusters/overview.md b/docs/topic_guides/evaluation/clusters/overview.md index 34d74a310b..e7a702291b 100644 --- a/docs/topic_guides/evaluation/clusters/overview.md +++ b/docs/topic_guides/evaluation/clusters/overview.md @@ -5,7 +5,7 @@ Graphs provide a natural way to think about linked data (see [link to intro page Graph metrics help to give a big-picture view of the clusters generated by a Splink model. Through metric distributions and statistics, we can gauge the quality of clusters and monitor how adjustments to models impact results. -Graph metrics can also help us home in on problematic clusters, such as those containing inaccurate links (false positives). Clusters can be spot-checked with Splink’s [Cluster Studio Dashboard]() which enables users to visualise individual clusters and interrogate the links between their member records. +Graph metrics can also help us home in on problematic clusters, such as those containing inaccurate links (false positives). spot-checking can be performed with Splink’s [Cluster Studio Dashboard]() which enables users to visualise individual clusters and interrogate the links between their member records. @@ -13,20 +13,42 @@ Graph metrics can also help us home in on problematic clusters, such as those co ## Evaluating cluster quality ### What is a high quality cluster? + When it comes to data linking, the highest quality clusters will be those containing all possible true matches (no false negatives) and no false matches (no false positives). -This idealised situation is rarely realised in practice, at least not across all clusters generated. -Blocking rules, necessary to make computations tractable, can prevent record comparisons between some true matches ever being made. Limitations of data and on resources can place an upper limit on the level of quality possible to achieve. -However, graph metrics can help us get closer to a satisfactory level of quality and monitor it going forward. -### What does good look like for you? +This idealised situation is rarely realised in practice, at least not across all clusters generated. Blocking rules, necessary to make computations tractable, can prevent record comparisons between some true matches ever being made, and the limitations of data and resources can place an upper bound on the level of quality that's possible to achieve. However, graph metrics can help us get closer to a satisfactory level of quality as well as monitor it going forward. -The extent of cluster evaluation efforts and what is considered good enough will vary greatly with linkage use case. +### What does high quality look like for you? -Might have a gold standard or labeled data or other model you want to be just as good as...which gives a clear something to aim towards. +The extent of cluster evaluation efforts and what is considered 'good enough' will vary greatly with linkage use-case. You might have a labeled dataset or quality assured outputs from another model which provide a clear target for cluster quality. -It can be difficult to make a judgement on quality without prior... +Domain knowledge can also help set expectations of what is reasonable when it comes to evaluating clusters. For example, ...cluster size. -Using domain knowledge can help set expectations of what is reasonable to expect. For example,... +However, you also might not have a clear idea of wat good looks like which...makes it difficult to make a judgement call on quality with little or no idea... -This guide is intended to help users to obtain a better understanding of better understand the over all shape, build up an idea of their clusters, identify potential problematic clusters, where to target their efforts, regardless of prior knowledge. +This topic guide is intended to help users develop a better understanding of their clusters and help them wisely focus quality assurance efforts, regardless of how much prior knowledge they have about what qualifies quality... And in that way can create an expectation/baseline of what good looks like. + +
+
+ +# Linked data as graphs <-- to go somewhere upstream + +For clarity, let us first define what we mean by a graph. A graph is a collection of points (nodes) connected by lines (edges). + +[Include picture here] + +In data linking, we refer to these collections of nodes as clusters, within which the nodes represent the entity to be linked (e.g. person or journey) and the edges represent a potential match. + +[Include picture here] + +Edges come with an associate Splink score (the probability of two records being a match). This makes graphs (clusters) produced by Splink so called weighted graphs, as each edge has a weight (Splink score). + +Graphs can also be directed or undirected. Directed (undirected) graphs are those in which edges (do not) have an associated direction. For example, ... + +[insert image] + +[Impact of directed versus non-directed on the definitions below...] +[Are there any differences between our definitions and those in the literature?] + +Other properties of graphs such as self-loops and multi-edges are not be present in clusters produced with Splink. From 744226da51e7dceeeaa27dc0a7162c15c7d6eab1 Mon Sep 17 00:00:00 2001 From: zslade Date: Mon, 5 Feb 2024 20:31:35 +0000 Subject: [PATCH 10/46] update overview/intro --- .../evaluation/clusters/overview.md | 20 ++++++++++--------- 1 file changed, 11 insertions(+), 9 deletions(-) diff --git a/docs/topic_guides/evaluation/clusters/overview.md b/docs/topic_guides/evaluation/clusters/overview.md index e7a702291b..89c628b792 100644 --- a/docs/topic_guides/evaluation/clusters/overview.md +++ b/docs/topic_guides/evaluation/clusters/overview.md @@ -14,25 +14,27 @@ Graph metrics can also help us home in on problematic clusters, such as those co ### What is a high quality cluster? -When it comes to data linking, the highest quality clusters will be those containing all possible true matches (no false negatives) and no false matches (no false positives). +When it comes to data linking, the highest quality clusters will be those containing all possible true matches (there will be no false negatives) and no false matches (no false positives). -This idealised situation is rarely realised in practice, at least not across all clusters generated. Blocking rules, necessary to make computations tractable, can prevent record comparisons between some true matches ever being made, and the limitations of data and resources can place an upper bound on the level of quality that's possible to achieve. However, graph metrics can help us get closer to a satisfactory level of quality as well as monitor it going forward. +Generating clusters which all adhere to this ideal is rare in practice. +Blocking rules, necessary to make computations tractable, can prevent record comparisons between some true matches ever being made, and data limitations can place an upper bound on the level of quality achievable. +Despite this, graph metrics can help us get closer to a satisfactory level of quality as well as monitor it going forward. -### What does high quality look like for you? +### What does cluster quality look like for you? -The extent of cluster evaluation efforts and what is considered 'good enough' will vary greatly with linkage use-case. You might have a labeled dataset or quality assured outputs from another model which provide a clear target for cluster quality. +The extent of cluster evaluation efforts and what is considered 'good enough' will vary greatly with linkage use-case. +You might already have gold standard/labelled data or quality assured outputs from another model which define a clear benchmark for cluster quality. -Domain knowledge can also help set expectations of what is reasonable when it comes to evaluating clusters. For example, ...cluster size. +Domain knowledge is also very instructive for guiding evaluation efforts and setting expectations of what is considered reasonable or good. For example, you might already know that a large cluster (containing say 100 nodes) is suspicious for a particular deduped dataset. -However, you also might not have a clear idea of wat good looks like which...makes it difficult to make a judgement call on quality with little or no idea... +However, you may have little or no clear idea of what good quality clusters look like for your linkage. -This topic guide is intended to help users develop a better understanding of their clusters and help them wisely focus quality assurance efforts, regardless of how much prior knowledge they have about what qualifies quality... -And in that way can create an expectation/baseline of what good looks like. +Whatever level of prior knowledge, this topic guide is designed to help users develop a better understanding of their clusters and help focus quality assurance efforts to get the best out of their linkage models.

-# Linked data as graphs <-- to go somewhere upstream +## Linked data as graphs <-- to go somewhere upstream For clarity, let us first define what we mean by a graph. A graph is a collection of points (nodes) connected by lines (edges). From 7bd6878d07584a87a57c48c72bfad7c9307d2a71 Mon Sep 17 00:00:00 2001 From: zslade Date: Mon, 5 Feb 2024 20:40:25 +0000 Subject: [PATCH 11/46] tweaking intro --- .../topic_guides/evaluation/clusters/overview.md | 16 +++++++--------- 1 file changed, 7 insertions(+), 9 deletions(-) diff --git a/docs/topic_guides/evaluation/clusters/overview.md b/docs/topic_guides/evaluation/clusters/overview.md index 89c628b792..d42e36ae01 100644 --- a/docs/topic_guides/evaluation/clusters/overview.md +++ b/docs/topic_guides/evaluation/clusters/overview.md @@ -5,7 +5,7 @@ Graphs provide a natural way to think about linked data (see [link to intro page Graph metrics help to give a big-picture view of the clusters generated by a Splink model. Through metric distributions and statistics, we can gauge the quality of clusters and monitor how adjustments to models impact results. -Graph metrics can also help us home in on problematic clusters, such as those containing inaccurate links (false positives). spot-checking can be performed with Splink’s [Cluster Studio Dashboard]() which enables users to visualise individual clusters and interrogate the links between their member records. +Graph metrics can also help us home in on problematic clusters, such as those containing inaccurate links (false positives). Spot-checking can be performed with Splink’s [Cluster Studio Dashboard]() which enables users to visualise individual clusters and interrogate the links between their member records. @@ -14,22 +14,20 @@ Graph metrics can also help us home in on problematic clusters, such as those co ### What is a high quality cluster? -When it comes to data linking, the highest quality clusters will be those containing all possible true matches (there will be no false negatives) and no false matches (no false positives). +When it comes to data linking, the highest quality clusters will be those containing all possible true matches (there will be no missed links a.k.a. false negatives) and no false matches (no false positives). -Generating clusters which all adhere to this ideal is rare in practice. -Blocking rules, necessary to make computations tractable, can prevent record comparisons between some true matches ever being made, and data limitations can place an upper bound on the level of quality achievable. -Despite this, graph metrics can help us get closer to a satisfactory level of quality as well as monitor it going forward. +Generating clusters which all adhere to this ideal is rare in practice. Blocking rules, necessary to make computations tractable, can prevent record comparisons between some true matches ever being made, and data limitations can place an upper bound on the level of quality achievable. Despite this, graph metrics can help us get closer to a satisfactory level of quality as well as monitor it going forward. ### What does cluster quality look like for you? The extent of cluster evaluation efforts and what is considered 'good enough' will vary greatly with linkage use-case. -You might already have gold standard/labelled data or quality assured outputs from another model which define a clear benchmark for cluster quality. +You might already have gold standard, labelled data or quality assured outputs from another model which define a clear benchmark for cluster quality. -Domain knowledge is also very instructive for guiding evaluation efforts and setting expectations of what is considered reasonable or good. For example, you might already know that a large cluster (containing say 100 nodes) is suspicious for a particular deduped dataset. +Domain knowledge is also very instructive for guiding evaluation efforts and setting expectations of what is considered reasonable or good. For example, you might already know that a large cluster (containing say 100 nodes) is suspicious for a your deduped dataset. -However, you may have little or no clear idea of what good quality clusters look like for your linkage. +However, you may also have little or no clear idea of what good quality clusters look like for your linkage. -Whatever level of prior knowledge, this topic guide is designed to help users develop a better understanding of their clusters and help focus quality assurance efforts to get the best out of their linkage models. +Whatever level of prior expectation, this topic guide is designed to help users develop a better understanding of their clusters and help focus quality assurance efforts to get the best out of their linkage models.

From 18851d3ef178d636a9ac4fb87f6533d307cac978 Mon Sep 17 00:00:00 2001 From: zslade Date: Tue, 6 Feb 2024 09:48:35 +0000 Subject: [PATCH 12/46] tweaks --- docs/topic_guides/evaluation/clusters/graph_metrics.md | 2 +- docs/topic_guides/evaluation/clusters/overview.md | 2 ++ 2 files changed, 3 insertions(+), 1 deletion(-) diff --git a/docs/topic_guides/evaluation/clusters/graph_metrics.md b/docs/topic_guides/evaluation/clusters/graph_metrics.md index 054bf717e6..df604c8374 100644 --- a/docs/topic_guides/evaluation/clusters/graph_metrics.md +++ b/docs/topic_guides/evaluation/clusters/graph_metrics.md @@ -7,7 +7,7 @@ For data linking with Splink, it is useful to sort graph metrics into three cate * node metrics, and * edge metrics -Each of these are defined below together with examples and how they can be applied to linked data to evaluate cluster quality. The examples given are of all metrics currently available in Splink. +Each of these are defined below together with examples and explanations of how they can be applied to linked data to evaluate cluster quality. The examples given are of all metrics currently available in Splink. !!! note diff --git a/docs/topic_guides/evaluation/clusters/overview.md b/docs/topic_guides/evaluation/clusters/overview.md index d42e36ae01..01e32de3b0 100644 --- a/docs/topic_guides/evaluation/clusters/overview.md +++ b/docs/topic_guides/evaluation/clusters/overview.md @@ -29,6 +29,8 @@ However, you may also have little or no clear idea of what good quality clusters Whatever level of prior expectation, this topic guide is designed to help users develop a better understanding of their clusters and help focus quality assurance efforts to get the best out of their linkage models. +What this topic guide includes... +

From 20026b25dfbd2fb90c95dafc13bb1b88949270ae Mon Sep 17 00:00:00 2001 From: zslade Date: Tue, 6 Feb 2024 11:58:25 +0000 Subject: [PATCH 13/46] update density --- .../evaluation/clusters/graph_metrics.md | 36 ++++++++++--------- .../evaluation/clusters/overview.md | 14 ++++---- 2 files changed, 28 insertions(+), 22 deletions(-) diff --git a/docs/topic_guides/evaluation/clusters/graph_metrics.md b/docs/topic_guides/evaluation/clusters/graph_metrics.md index df604c8374..4b047384e0 100644 --- a/docs/topic_guides/evaluation/clusters/graph_metrics.md +++ b/docs/topic_guides/evaluation/clusters/graph_metrics.md @@ -1,11 +1,12 @@ # Graph metrics -Graph metrics quantify the characteristics of a graph. A simple example of a graph metric is cluster size, which is the number of nodes in a cluster. +Graph metrics quantify the characteristics of a graph. A simple example of a graph metric is [cluster size](), which is the number of nodes in a cluster. For data linking with Splink, it is useful to sort graph metrics into three categories: -* cluster metrics, -* node metrics, and -* edge metrics + +* [Cluster metrics]() +* [Node metrics]() +* [Edge metrics]() Each of these are defined below together with examples and explanations of how they can be applied to linked data to evaluate cluster quality. The examples given are of all metrics currently available in Splink. @@ -21,13 +22,13 @@ Cluster metrics refer to the characteristics of a cluster as a whole, rather tha ### Example: cluster size -Cluster size is defined as the number of nodes within a cluster. +Cluster size refers to the number of nodes within a cluster. -When thinking about cluster size, one important thing to consider is the size of the biggest clusters produced and to ask - does this seem reasonable for the dataset being linked? For example, does it make sense that one person is appearing hundreds of times in the data resulting in a cluster of size 100+ nodes? If the answer is no, then false positives links are probably being formed. This could be due to having blocking rules which are too loose or the clustering threshold which is too low. +When thinking about cluster size, it is important to consider the size of the biggest clusters produced and ask yourself, does this seem reasonable for the dataset being linked? For example, does it make sense that one person is appearing hundreds of times in the data resulting in a cluster of over 100 nodes? If the answer is no, then false positives links are probably being formed. This could be due to having blocking rules which are too loose or the clustering threshold which is too low. -If you don't have an intuition of what seems reasonable, then it is worth inspecting a sample of the largest clusters in Splink Cluster Studio to validate or invalidate links[link to guidance]. From there you can develop an understanding of what maximum cluster size to expect. +If you don't have an intuition of what seems reasonable, then it is worth inspecting a sample of the largest clusters in Splink Cluster Studio to validate or invalidate links. From there you can develop an understanding of what maximum cluster size to expect. -There also might be an expected cutoff on minimum cluster size. For example, when linking two datasets in which you know people appear least once in each, the minimum expected size of cluster will be 2. Clusters smaller than the minimum size indicate links have been missed. This could be due to blocking rules not letting through comparisons on true matches. +There also might be a lower bound on cluster size. For example, when linking two datasets in which you know people appear least once in each, the minimum expected size of cluster will be 2. Clusters smaller than the minimum size indicate links have been missed. This could be due to blocking rules not letting through all record comparisons of true matches. Lisewise, the modal cluster size...bimodal distributions. @@ -38,15 +39,16 @@ The density of a cluster is given by the number of edges it contains divided by [picture: edges vs max possible edges] -When evaluating clusters, a high density (close to 1) is generally considered good as it means there are many edges in support of the records in a cluster being linked. +When evaluating clusters, a high density (closer to 1) is generally considered good as it means there are many edges in support of the records in a cluster being linked. A low density could indicate links being missed. This could happen for example if blocking rules are too tight or the clustering threshold is too high. -A sample of low density clusters can be inspected in Splink Cluster Studio by choosing [inser option here]. Ask yourself the question: why aren't more links being formed between records? +A sample of low density clusters can be inspected in Splink Cluster Studio Dashboard with the option `sampling_method = "lowest_density_clusters_by_size"`. When inspecting a cluster, ask yourself the question: why aren't more links being formed between record nodes? + +Bear in mind, small clusters are more likely to achieve a higher density as fewer record comparisons are required to form the maximum edges possible (for instance, a density of 0.66 for a cluster of size 3 can be achieved with only 2 comparisons). -It is important to consider cluster density within the context of cluster size. Bigger clusters can have a greater range of densities than smaller ones -This is why `sampling_method = "lowest_density_clusters_by_size"` performs a stratified sample...] +So it's important to consider a range of sizes when looking evaluating density so ensure you're not just focussed on very big clusters. Smaller clusters can also have the advantage of being easier to spot-check by eye. This is why Cluster Studio Dashboard option `sampling_method = "lowest_density_clusters_by_size"` performs stratified sampling across different cluster sizes. -[Explain the relationship between density and cluster size and it's consequences. Stratified sampling in cluster studio.] + ### Example: cluster centralisation @@ -60,9 +62,9 @@ Node metrics quantify the properties of the nodes within clusters. A node degree is the number of edges (links) connected to a node. -within clusters or across clusters? + Low node degree @@ -78,4 +80,6 @@ An edge is classified as a bridge if its removal splits a cluster into two small [insert picture] -Bridges can be signalers of false positives in linked data, especially when joining two highly connected clusters. Examining bridges can shed light on potential errors in the linking process leading to false positive links. \ No newline at end of file +Bridges can be signalers of false positives in linked data, especially when joining two highly connected clusters. Examining bridges can shed light on potential errors in the linking process leading to false positive links. + +To see how to compute these metrics in Splink, see the next chapter, [How to...]() \ No newline at end of file diff --git a/docs/topic_guides/evaluation/clusters/overview.md b/docs/topic_guides/evaluation/clusters/overview.md index 01e32de3b0..c28b21edef 100644 --- a/docs/topic_guides/evaluation/clusters/overview.md +++ b/docs/topic_guides/evaluation/clusters/overview.md @@ -20,16 +20,18 @@ Generating clusters which all adhere to this ideal is rare in practice. Blocking ### What does cluster quality look like for you? -The extent of cluster evaluation efforts and what is considered 'good enough' will vary greatly with linkage use-case. -You might already have gold standard, labelled data or quality assured outputs from another model which define a clear benchmark for cluster quality. +The extent of cluster evaluation efforts and what is considered 'good enough' will vary greatly with linkage use-case. You might already have labelled data or quality assured outputs from another model which define a clear benchmark for cluster quality. -Domain knowledge is also very instructive for guiding evaluation efforts and setting expectations of what is considered reasonable or good. For example, you might already know that a large cluster (containing say 100 nodes) is suspicious for a your deduped dataset. +Domain knowledge can also set expectations of what is deemed reasonable or good. For example, you might already know that a large cluster (containing say 100 nodes) is suspicious for a your deduplicated dataset. -However, you may also have little or no clear idea of what good quality clusters look like for your linkage. +However, you may also have little or no knowledge about the data or a clear idea of what good quality clusters look like for your linkage. -Whatever level of prior expectation, this topic guide is designed to help users develop a better understanding of their clusters and help focus quality assurance efforts to get the best out of their linkage models. +Whatever the starting point, this topic guide is designed to help users develop a better understanding of their clusters and help focus quality assurance efforts to get the best out of their linkage models. -What this topic guide includes... +## What this topic guide contains: + +* An introduction to the graph metrics currently available in Splink and how to apply them to linked data +* A how-to guide on computing graph metrics in Splink

From 4a8bc93ff88e021e0332662995ec920df8bea9d4 Mon Sep 17 00:00:00 2001 From: zslade Date: Tue, 6 Feb 2024 12:44:38 +0000 Subject: [PATCH 14/46] update node degree --- .../evaluation/clusters/graph_metrics.md | 17 +++++++---------- 1 file changed, 7 insertions(+), 10 deletions(-) diff --git a/docs/topic_guides/evaluation/clusters/graph_metrics.md b/docs/topic_guides/evaluation/clusters/graph_metrics.md index 4b047384e0..44d0333e17 100644 --- a/docs/topic_guides/evaluation/clusters/graph_metrics.md +++ b/docs/topic_guides/evaluation/clusters/graph_metrics.md @@ -30,9 +30,6 @@ If you don't have an intuition of what seems reasonable, then it is worth inspec There also might be a lower bound on cluster size. For example, when linking two datasets in which you know people appear least once in each, the minimum expected size of cluster will be 2. Clusters smaller than the minimum size indicate links have been missed. This could be due to blocking rules not letting through all record comparisons of true matches. -Lisewise, the modal cluster size...bimodal distributions. - - ### Example: cluster density The density of a cluster is given by the number of edges it contains divided by the maximum possible number of edges. Density ranges from 0 to 1. A density of 1 means that all nodes are connected to all other nodes in a cluster. @@ -44,9 +41,9 @@ When evaluating clusters, a high density (closer to 1) is generally considered g A low density could indicate links being missed. This could happen for example if blocking rules are too tight or the clustering threshold is too high. A sample of low density clusters can be inspected in Splink Cluster Studio Dashboard with the option `sampling_method = "lowest_density_clusters_by_size"`. When inspecting a cluster, ask yourself the question: why aren't more links being formed between record nodes? -Bear in mind, small clusters are more likely to achieve a higher density as fewer record comparisons are required to form the maximum edges possible (for instance, a density of 0.66 for a cluster of size 3 can be achieved with only 2 comparisons). +Bear in mind, small clusters are more likely to achieve a higher density as fewer record comparisons are required to form the maximum edges possible (for instance, a maximum density of 1 for a cluster of size 3 can be achieved with only 3 record comparisons). -So it's important to consider a range of sizes when looking evaluating density so ensure you're not just focussed on very big clusters. Smaller clusters can also have the advantage of being easier to spot-check by eye. This is why Cluster Studio Dashboard option `sampling_method = "lowest_density_clusters_by_size"` performs stratified sampling across different cluster sizes. +So it's important to consider a range of sizes when evaluating density to ensure you're not just focussed on very big clusters. Smaller clusters also have the advantage of being easier to assess by eye. This is why the Cluster Studio Dashboard option `sampling_method = "lowest_density_clusters_by_size"` performs stratified sampling across different cluster sizes. @@ -60,13 +57,13 @@ Node metrics quantify the properties of the nodes within clusters. ### Example: node degree -A node degree is the number of edges (links) connected to a node. +Node degree is the number of edges (links) connected to a node. - +However, erroneous links (false positives) could also be the reason for high node degree, so it can be worth inspecting the edges of highly connected nodes. -Low node degree +Just like with density it is important to bear in mind custer size when looking at node degree, as bigger clusters can achieve higher node degree than smaller ones. Low node degree for bigger clusters can be more significant than for smaller clusters. [TBC] @@ -82,4 +79,4 @@ An edge is classified as a bridge if its removal splits a cluster into two small Bridges can be signalers of false positives in linked data, especially when joining two highly connected clusters. Examining bridges can shed light on potential errors in the linking process leading to false positive links. -To see how to compute these metrics in Splink, see the next chapter, [How to...]() \ No newline at end of file +A guide on [how to compute these graph metrics with Splink]() is given in the next chapter. \ No newline at end of file From 5b70ed75cfd7dd91f572ff229559238ab07fd54e Mon Sep 17 00:00:00 2001 From: zslade Date: Tue, 6 Feb 2024 15:17:42 +0000 Subject: [PATCH 15/46] remove directed etc --- docs/topic_guides/evaluation/clusters/overview.md | 9 --------- 1 file changed, 9 deletions(-) diff --git a/docs/topic_guides/evaluation/clusters/overview.md b/docs/topic_guides/evaluation/clusters/overview.md index c28b21edef..e3c5d03a15 100644 --- a/docs/topic_guides/evaluation/clusters/overview.md +++ b/docs/topic_guides/evaluation/clusters/overview.md @@ -47,12 +47,3 @@ In data linking, we refer to these collections of nodes as clusters, within whic [Include picture here] Edges come with an associate Splink score (the probability of two records being a match). This makes graphs (clusters) produced by Splink so called weighted graphs, as each edge has a weight (Splink score). - -Graphs can also be directed or undirected. Directed (undirected) graphs are those in which edges (do not) have an associated direction. For example, ... - -[insert image] - -[Impact of directed versus non-directed on the definitions below...] -[Are there any differences between our definitions and those in the literature?] - -Other properties of graphs such as self-loops and multi-edges are not be present in clusters produced with Splink. From 2bca2d73d7f6313ca3280bc4ff8222eddad874e1 Mon Sep 17 00:00:00 2001 From: zslade Date: Tue, 6 Feb 2024 15:17:51 +0000 Subject: [PATCH 16/46] tweak explanations --- .../evaluation/clusters/graph_metrics.md | 18 ++++++++---------- 1 file changed, 8 insertions(+), 10 deletions(-) diff --git a/docs/topic_guides/evaluation/clusters/graph_metrics.md b/docs/topic_guides/evaluation/clusters/graph_metrics.md index 44d0333e17..727633aa93 100644 --- a/docs/topic_guides/evaluation/clusters/graph_metrics.md +++ b/docs/topic_guides/evaluation/clusters/graph_metrics.md @@ -24,9 +24,9 @@ Cluster metrics refer to the characteristics of a cluster as a whole, rather tha Cluster size refers to the number of nodes within a cluster. -When thinking about cluster size, it is important to consider the size of the biggest clusters produced and ask yourself, does this seem reasonable for the dataset being linked? For example, does it make sense that one person is appearing hundreds of times in the data resulting in a cluster of over 100 nodes? If the answer is no, then false positives links are probably being formed. This could be due to having blocking rules which are too loose or the clustering threshold which is too low. +When thinking about cluster size, it is important to consider the size of the biggest clusters produced and ask yourself, does this seem reasonable for the dataset being linked? For example, does it make sense that one person is appearing hundreds of times in the linked data resulting in a cluster of over 100 nodes? If the answer is no, then false positives links are probably being formed. This could be due to having blocking rules which are too loose or a clustering threshold which is too low. -If you don't have an intuition of what seems reasonable, then it is worth inspecting a sample of the largest clusters in Splink Cluster Studio to validate or invalidate links. From there you can develop an understanding of what maximum cluster size to expect. +If you don't have an intuition of what seems reasonable, then it is worth inspecting a sample of the largest clusters in Splink's [Cluster Studio Dashboard]() to validate or invalidate links. From there you can develop an understanding of what maximum cluster size to expect. There also might be a lower bound on cluster size. For example, when linking two datasets in which you know people appear least once in each, the minimum expected size of cluster will be 2. Clusters smaller than the minimum size indicate links have been missed. This could be due to blocking rules not letting through all record comparisons of true matches. @@ -38,12 +38,12 @@ The density of a cluster is given by the number of edges it contains divided by When evaluating clusters, a high density (closer to 1) is generally considered good as it means there are many edges in support of the records in a cluster being linked. -A low density could indicate links being missed. This could happen for example if blocking rules are too tight or the clustering threshold is too high. +A low density could indicate links being missed. This could happen, for example, if blocking rules are too tight or the clustering threshold is too high. A sample of low density clusters can be inspected in Splink Cluster Studio Dashboard with the option `sampling_method = "lowest_density_clusters_by_size"`. When inspecting a cluster, ask yourself the question: why aren't more links being formed between record nodes? -Bear in mind, small clusters are more likely to achieve a higher density as fewer record comparisons are required to form the maximum edges possible (for instance, a maximum density of 1 for a cluster of size 3 can be achieved with only 3 record comparisons). +Bear in mind, small clusters are more likely to achieve a higher density as fewer record comparisons are required to form the maximum edges possible (the maximum density of 1 for a cluster of size 3 can be achieved with only 3 pairwise record comparisons). -So it's important to consider a range of sizes when evaluating density to ensure you're not just focussed on very big clusters. Smaller clusters also have the advantage of being easier to assess by eye. This is why the Cluster Studio Dashboard option `sampling_method = "lowest_density_clusters_by_size"` performs stratified sampling across different cluster sizes. +So, it's important to consider a range of sizes when evaluating density to ensure you're not just focussed on very big clusters. Smaller clusters also have the advantage of being easier to assess by eye. This is why the option `sampling_method = "lowest_density_clusters_by_size"` performs stratified sampling across different cluster sizes. @@ -57,15 +57,13 @@ Node metrics quantify the properties of the nodes within clusters. ### Example: node degree -Node degree is the number of edges (links) connected to a node. +Node degree is the number of edges connected to a node. High node degree is generally considered good as it means there are many edges in support of records in a cluster being linked. Nodes with low node degree could indicate links being missed (false negatives). -However, erroneous links (false positives) could also be the reason for high node degree, so it can be worth inspecting the edges of highly connected nodes. +However, erroneous links (false positives) could also be the reason for high node degree, so it can be useful to inspect the edges of highly connected nodes. -Just like with density it is important to bear in mind custer size when looking at node degree, as bigger clusters can achieve higher node degree than smaller ones. Low node degree for bigger clusters can be more significant than for smaller clusters. - -[TBC] +Just like with density it is important to bear in mind custer size when looking at node degree. By consequence of having more nodes to form links between, nodes within bigger clusters can achieve higher node degree than those in smaller ones, meaning that low node degree for big clusters can be more significant. ## πŸ”— Edge metrics From 7e4cc9136a714a5c740b9939f66b7079a9ff8ef0 Mon Sep 17 00:00:00 2001 From: zslade Date: Tue, 6 Feb 2024 17:38:42 +0000 Subject: [PATCH 17/46] fleshing out how to guide --- .../clusters/how_to_compute_metrics.md | 44 ++++++++++++++++++- .../evaluation/clusters/overview.md | 6 ++- 2 files changed, 47 insertions(+), 3 deletions(-) diff --git a/docs/topic_guides/evaluation/clusters/how_to_compute_metrics.md b/docs/topic_guides/evaluation/clusters/how_to_compute_metrics.md index 6e368cf257..32084c9b7c 100644 --- a/docs/topic_guides/evaluation/clusters/how_to_compute_metrics.md +++ b/docs/topic_guides/evaluation/clusters/how_to_compute_metrics.md @@ -2,4 +2,46 @@ To enable users to calculate a variety of graph metrics for their linked data, Splink provides the `compute_graph_metrics()` method. -Code snippets and outputs \ No newline at end of file + """ + Generates tables containing graph metrics (for nodes, edges and clusters), + and returns a data class of Splink dataframes + + Args: + df_predict (SplinkDataFrame): The results of `linker.predict()` + df_clustered (SplinkDataFrame): The outputs of + `linker.cluster_pairwise_predictions_at_threshold()` + threshold_match_probability (float): Filter the pairwise match predictions + to include only pairwise comparisons with a match_probability at or + above this threshold. + + Returns: + GraphMetricsResult: A data class containing SplinkDataFrames + of cluster IDs and selected node, edge or cluster metrics. + attribute "nodes" for nodes metrics table + attribute "edges" for edge metrics table + attribute "clusters" for cluster metrics table + + """ + +The `threshold_match_probability` provided should be the same as the clustering threshold passed to `cluster_pairwise_predictions_at_threshold()`. If this information is available to Splink then it will be passed automatically, otherwise the user will have to provide it themselves and take care to ensure that threshold values align. + +As stated above, `compute_graph_metrics()` returns a set of Splink Dataframes. The individual Splink Dataframes containing node, edge and cluster metrics (as introduced and defined in [Graph metrics]()) can be accessed as follows + + """ + `compute_graph_metrics.nodes` for node metrics + `compute_graph_metrics.edges` for edge metrics + `compute_graph_metrics.clusters` for cluster metrics + """ + +The metrics which are computed by `compute_graph_metrics()` include all those mentioned in [Graph metrics](), namely + +* Cluster size +* Cluster density +* Node degree +* Cluster centrality +* 'is bridge' + +All of these metrics are calculated by default. If you are unable to install the packages...required to compute 'is bridge', this metric won't be calculated, however all other metrics will still be produced. + +This topic guide is a work in progress. +Worked through example of computing metrics and applying metrics to evaluate and improve cluster quality. diff --git a/docs/topic_guides/evaluation/clusters/overview.md b/docs/topic_guides/evaluation/clusters/overview.md index e3c5d03a15..35b7727246 100644 --- a/docs/topic_guides/evaluation/clusters/overview.md +++ b/docs/topic_guides/evaluation/clusters/overview.md @@ -28,11 +28,13 @@ However, you may also have little or no knowledge about the data or a clear idea Whatever the starting point, this topic guide is designed to help users develop a better understanding of their clusters and help focus quality assurance efforts to get the best out of their linkage models. -## What this topic guide contains: +## What this topic guide contains -* An introduction to the graph metrics currently available in Splink and how to apply them to linked data +* An introduction to the [graph metrics]() currently available in Splink and how to apply them to linked data * A how-to guide on computing graph metrics in Splink +This topic guide is a work is work in progress and will be updated as new functionality and metrics are released. +

From 3c4685251ccb0b847d4c06bc0d662480da72383a Mon Sep 17 00:00:00 2001 From: zslade Date: Tue, 6 Feb 2024 18:50:02 +0000 Subject: [PATCH 18/46] update how to and small tweaks --- .../clusters/how_to_compute_metrics.md | 21 +++++++------------ .../evaluation/clusters/overview.md | 2 +- 2 files changed, 9 insertions(+), 14 deletions(-) diff --git a/docs/topic_guides/evaluation/clusters/how_to_compute_metrics.md b/docs/topic_guides/evaluation/clusters/how_to_compute_metrics.md index 32084c9b7c..5456e97ebe 100644 --- a/docs/topic_guides/evaluation/clusters/how_to_compute_metrics.md +++ b/docs/topic_guides/evaluation/clusters/how_to_compute_metrics.md @@ -2,7 +2,6 @@ To enable users to calculate a variety of graph metrics for their linked data, Splink provides the `compute_graph_metrics()` method. - """ Generates tables containing graph metrics (for nodes, edges and clusters), and returns a data class of Splink dataframes @@ -21,27 +20,23 @@ To enable users to calculate a variety of graph metrics for their linked data, S attribute "edges" for edge metrics table attribute "clusters" for cluster metrics table - """ - The `threshold_match_probability` provided should be the same as the clustering threshold passed to `cluster_pairwise_predictions_at_threshold()`. If this information is available to Splink then it will be passed automatically, otherwise the user will have to provide it themselves and take care to ensure that threshold values align. -As stated above, `compute_graph_metrics()` returns a set of Splink Dataframes. The individual Splink Dataframes containing node, edge and cluster metrics (as introduced and defined in [Graph metrics]()) can be accessed as follows +As stated above, `compute_graph_metrics()` returns a set of Splink dataframes. The individual Splink dataframes containing node, edge and cluster metrics (as introduced in [Graph metrics]()) can be accessed as follows: - """ - `compute_graph_metrics.nodes` for node metrics - `compute_graph_metrics.edges` for edge metrics - `compute_graph_metrics.clusters` for cluster metrics - """ + compute_graph_metrics.nodes for node metrics + compute_graph_metrics.edges for edge metrics + compute_graph_metrics.clusters for cluster metrics -The metrics which are computed by `compute_graph_metrics()` include all those mentioned in [Graph metrics](), namely +The metrics computed by `compute_graph_metrics()` include all those mentioned in [Graph metrics](), namely * Cluster size * Cluster density * Node degree * Cluster centrality -* 'is bridge' +* 'Is bridge' -All of these metrics are calculated by default. If you are unable to install the packages...required to compute 'is bridge', this metric won't be calculated, however all other metrics will still be produced. +All of these metrics are calculated by default. If you are unable to install the packages...required for 'is bridge', this metric won't be calculated, however all other metrics will still be generated. This topic guide is a work in progress. -Worked through example of computing metrics and applying metrics to evaluate and improve cluster quality. +We are developing a worked through example of computing metrics and applying them to evaluate and improve cluster quality. \ No newline at end of file diff --git a/docs/topic_guides/evaluation/clusters/overview.md b/docs/topic_guides/evaluation/clusters/overview.md index 35b7727246..5693a5749a 100644 --- a/docs/topic_guides/evaluation/clusters/overview.md +++ b/docs/topic_guides/evaluation/clusters/overview.md @@ -31,7 +31,7 @@ Whatever the starting point, this topic guide is designed to help users develop ## What this topic guide contains * An introduction to the [graph metrics]() currently available in Splink and how to apply them to linked data -* A how-to guide on computing graph metrics in Splink +* Instructions on how to compute graph metrics with Splink This topic guide is a work is work in progress and will be updated as new functionality and metrics are released. From d3f99983d714faddf39b546780f964dae3dbcb0d Mon Sep 17 00:00:00 2001 From: zslade Date: Mon, 12 Feb 2024 10:14:32 +0000 Subject: [PATCH 19/46] reorder --- .../evaluation/clusters/graph_metrics.md | 55 ++++++++++--------- 1 file changed, 28 insertions(+), 27 deletions(-) diff --git a/docs/topic_guides/evaluation/clusters/graph_metrics.md b/docs/topic_guides/evaluation/clusters/graph_metrics.md index 727633aa93..05f6c2fcc6 100644 --- a/docs/topic_guides/evaluation/clusters/graph_metrics.md +++ b/docs/topic_guides/evaluation/clusters/graph_metrics.md @@ -16,6 +16,33 @@ Each of these are defined below together with examples and explanations of how t It is important to consider metrics within the context of their distribution and the underlying dataset. For example: a cluster density (see below) of 0.4 might seem low but could actually be above average for the dataset in question; a cluster of size 80 might be suspiciously large for one dataset but not for another. + +## ⚫️ Node metrics + +Node metrics quantify the properties of the nodes within clusters. + +### Example: node degree + +Node degree is the number of edges connected to a node. + +High node degree is generally considered good as it means there are many edges in support of records in a cluster being linked. Nodes with low node degree could indicate links being missed (false negatives). + +However, erroneous links (false positives) could also be the reason for high node degree, so it can be useful to inspect the edges of highly connected nodes. + +Just like with density it is important to bear in mind custer size when looking at node degree. By consequence of having more nodes to form links between, nodes within bigger clusters can achieve higher node degree than those in smaller ones, meaning that low node degree for big clusters can be more significant. + +## πŸ”— Edge metrics + +Edge metrics quantify the properties of edges within a cluster. + +### Example: 'is bridge' + +An edge is classified as a bridge if its removal splits a cluster into two smaller clusters. + +[insert picture] + +Bridges can be signalers of false positives in linked data, especially when joining two highly connected clusters. Examining bridges can shed light on potential errors in the linking process leading to false positive links. + ## :fontawesome-solid-circle-nodes: Cluster metrics Cluster metrics refer to the characteristics of a cluster as a whole, rather than the individual nodes and edges it contains. @@ -49,32 +76,6 @@ So, it's important to consider a range of sizes when evaluating density to ensur ### Example: cluster centralisation -[TBC] - -## ⚫️ Node metrics - -Node metrics quantify the properties of the nodes within clusters. - -### Example: node degree - -Node degree is the number of edges connected to a node. - -High node degree is generally considered good as it means there are many edges in support of records in a cluster being linked. Nodes with low node degree could indicate links being missed (false negatives). - -However, erroneous links (false positives) could also be the reason for high node degree, so it can be useful to inspect the edges of highly connected nodes. - -Just like with density it is important to bear in mind custer size when looking at node degree. By consequence of having more nodes to form links between, nodes within bigger clusters can achieve higher node degree than those in smaller ones, meaning that low node degree for big clusters can be more significant. - -## πŸ”— Edge metrics - -Edge metrics quantify the properties of edges within a cluster. - -### Example: 'is bridge' - -An edge is classified as a bridge if its removal splits a cluster into two smaller clusters. - -[insert picture] - -Bridges can be signalers of false positives in linked data, especially when joining two highly connected clusters. Examining bridges can shed light on potential errors in the linking process leading to false positive links. +Cluster centralisation is the average absolute deviation from maximum node degree normalised with respect to maximum possible value. A guide on [how to compute these graph metrics with Splink]() is given in the next chapter. \ No newline at end of file From 21e2fd3d4a3c7d3c84124bdf2e31213f8272fecd Mon Sep 17 00:00:00 2001 From: zslade Date: Mon, 12 Feb 2024 12:28:31 +0000 Subject: [PATCH 20/46] cluster centralisation --- .../evaluation/clusters/graph_metrics.md | 16 ++++++++++++---- 1 file changed, 12 insertions(+), 4 deletions(-) diff --git a/docs/topic_guides/evaluation/clusters/graph_metrics.md b/docs/topic_guides/evaluation/clusters/graph_metrics.md index 05f6c2fcc6..8588e6ce89 100644 --- a/docs/topic_guides/evaluation/clusters/graph_metrics.md +++ b/docs/topic_guides/evaluation/clusters/graph_metrics.md @@ -27,9 +27,11 @@ Node degree is the number of edges connected to a node. High node degree is generally considered good as it means there are many edges in support of records in a cluster being linked. Nodes with low node degree could indicate links being missed (false negatives). -However, erroneous links (false positives) could also be the reason for high node degree, so it can be useful to inspect the edges of highly connected nodes. +However, erroneous links (false positives) could also be the reason for high node degree, so it can be useful to validate the edges of highly connected nodes. -Just like with density it is important to bear in mind custer size when looking at node degree. By consequence of having more nodes to form links between, nodes within bigger clusters can achieve higher node degree than those in smaller ones, meaning that low node degree for big clusters can be more significant. +It is important to consider [custer size]() when looking at node degree. By definition, larger clusters contain more nodes to form links between, allowing nodes within them to attain higher degrees compared to those in smaller clusters. Consequently, low node degree within larger clusters can carry greater significance. + +Bear in mind, that the degree of a single node in a cluster isn't necessarily representative of the overall connectedness of a cluster. This is where [cluster centralisation]() can help. ## πŸ”— Edge metrics @@ -76,6 +78,12 @@ So, it's important to consider a range of sizes when evaluating density to ensur ### Example: cluster centralisation -Cluster centralisation is the average absolute deviation from maximum node degree normalised with respect to maximum possible value. +Cluster centralisation is defined as the average absolute deviation from maximum [node degree]() normalised with respect to the maximum possible value. In other words, cluster centralisation tells us about the concentration of edges in a cluster. It ranges from 0 to 1. + +A high cluster centralisation (closer to 1) indicates that a few nodes are home to significantly more connections compared to the rest of the nodes in a cluster. This can help locate nodes with a lower number of connections (low node degree) relative to what is possible for that cluster. + +Low centralisation suggests that edges are more evenly distributed amoung nodes. Low centralisation can be good if all nodes within a clusters enjoy many connections. However low centralisation could also indicate that nodes are not as highly connected as they could be. + +
-A guide on [how to compute these graph metrics with Splink]() is given in the next chapter. \ No newline at end of file +A guide on [how to compute these graph metrics with Splink]() is given in the next chapter. From c7d460e5782e21d8f5091b68b4bd9903322f2d10 Mon Sep 17 00:00:00 2001 From: zslade Date: Mon, 12 Feb 2024 15:28:04 +0000 Subject: [PATCH 21/46] small improvements --- .../evaluation/clusters/graph_metrics.md | 30 +++++++++---------- .../evaluation/clusters/overview.md | 6 ++-- 2 files changed, 17 insertions(+), 19 deletions(-) diff --git a/docs/topic_guides/evaluation/clusters/graph_metrics.md b/docs/topic_guides/evaluation/clusters/graph_metrics.md index 8588e6ce89..87ac76393f 100644 --- a/docs/topic_guides/evaluation/clusters/graph_metrics.md +++ b/docs/topic_guides/evaluation/clusters/graph_metrics.md @@ -1,12 +1,12 @@ # Graph metrics -Graph metrics quantify the characteristics of a graph. A simple example of a graph metric is [cluster size](), which is the number of nodes in a cluster. +Graph metrics quantify the characteristics of a graph. A simple example of a graph metric is [cluster size](), which is the number of nodes within a cluster. For data linking with Splink, it is useful to sort graph metrics into three categories: -* [Cluster metrics]() * [Node metrics]() * [Edge metrics]() +* [Cluster metrics]() Each of these are defined below together with examples and explanations of how they can be applied to linked data to evaluate cluster quality. The examples given are of all metrics currently available in Splink. @@ -14,12 +14,12 @@ Each of these are defined below together with examples and explanations of how t It is important to bear in mind that whilst graph metrics can be very useful for assessing linkage quality, they are rarely definitive, especially when taken in isolation. A more comprehensive picture can be built by considering various metrics in conjunction with one another. - It is important to consider metrics within the context of their distribution and the underlying dataset. For example: a cluster density (see below) of 0.4 might seem low but could actually be above average for the dataset in question; a cluster of size 80 might be suspiciously large for one dataset but not for another. + It is also important to consider metrics within the context of their distribution and the underlying dataset. For example: a cluster density (see below) of 0.4 might seem low but could actually be above average for the dataset in question; a cluster of size 80 might be suspiciously large for one dataset but not for another. ## ⚫️ Node metrics -Node metrics quantify the properties of the nodes within clusters. +Node metrics quantify the properties of the nodes which live within clusters. ### Example: node degree @@ -29,21 +29,21 @@ High node degree is generally considered good as it means there are many edges i However, erroneous links (false positives) could also be the reason for high node degree, so it can be useful to validate the edges of highly connected nodes. -It is important to consider [custer size]() when looking at node degree. By definition, larger clusters contain more nodes to form links between, allowing nodes within them to attain higher degrees compared to those in smaller clusters. Consequently, low node degree within larger clusters can carry greater significance. +It is important to consider [cluster size]() when looking at node degree. By definition, larger clusters contain more nodes to form links between, allowing nodes within them to attain higher degrees compared to those in smaller clusters. Consequently, low node degree within larger clusters can carry greater significance. Bear in mind, that the degree of a single node in a cluster isn't necessarily representative of the overall connectedness of a cluster. This is where [cluster centralisation]() can help. ## πŸ”— Edge metrics -Edge metrics quantify the properties of edges within a cluster. +Edge metrics quantify the properties of the edges within a cluster. ### Example: 'is bridge' -An edge is classified as a bridge if its removal splits a cluster into two smaller clusters. +An edge is classified as a 'bridge' if its removal splits a cluster into two smaller clusters. [insert picture] -Bridges can be signalers of false positives in linked data, especially when joining two highly connected clusters. Examining bridges can shed light on potential errors in the linking process leading to false positive links. +Bridges can be signalers of false positives in linked data, especially when joining two highly connected sub-clusters. Examining bridges can shed light on potential errors in the linking process leading to the formation of false positive links. ## :fontawesome-solid-circle-nodes: Cluster metrics @@ -53,9 +53,9 @@ Cluster metrics refer to the characteristics of a cluster as a whole, rather tha Cluster size refers to the number of nodes within a cluster. -When thinking about cluster size, it is important to consider the size of the biggest clusters produced and ask yourself, does this seem reasonable for the dataset being linked? For example, does it make sense that one person is appearing hundreds of times in the linked data resulting in a cluster of over 100 nodes? If the answer is no, then false positives links are probably being formed. This could be due to having blocking rules which are too loose or a clustering threshold which is too low. +When thinking about cluster size, it is important to consider the size of the biggest clusters produced and ask yourself, does this seem reasonable for the dataset being linked? For example, does it make sense that one person is appearing hundreds of times in the linked data resulting in a cluster of over 100 nodes? If the answer is no, then false positives links are probably being formed. This could be, for example, due to having blocking rules which are too loose. -If you don't have an intuition of what seems reasonable, then it is worth inspecting a sample of the largest clusters in Splink's [Cluster Studio Dashboard]() to validate or invalidate links. From there you can develop an understanding of what maximum cluster size to expect. +If you don't have an intuition of what seems reasonable, then it is worth inspecting a sample of the largest clusters in Splink's [Cluster Studio Dashboard]() to validate (or invalidate) links. From there you can develop an understanding of what maximum cluster size to expect. There also might be a lower bound on cluster size. For example, when linking two datasets in which you know people appear least once in each, the minimum expected size of cluster will be 2. Clusters smaller than the minimum size indicate links have been missed. This could be due to blocking rules not letting through all record comparisons of true matches. @@ -68,22 +68,22 @@ The density of a cluster is given by the number of edges it contains divided by When evaluating clusters, a high density (closer to 1) is generally considered good as it means there are many edges in support of the records in a cluster being linked. A low density could indicate links being missed. This could happen, for example, if blocking rules are too tight or the clustering threshold is too high. -A sample of low density clusters can be inspected in Splink Cluster Studio Dashboard with the option `sampling_method = "lowest_density_clusters_by_size"`. When inspecting a cluster, ask yourself the question: why aren't more links being formed between record nodes? +A sample of low density clusters can be inspected in Splink [Cluster Studio Dashboard]() via the option `sampling_method = "lowest_density_clusters_by_size"`. When inspecting a cluster, ask yourself the question: why aren't more links being formed between record nodes? Bear in mind, small clusters are more likely to achieve a higher density as fewer record comparisons are required to form the maximum edges possible (the maximum density of 1 for a cluster of size 3 can be achieved with only 3 pairwise record comparisons). -So, it's important to consider a range of sizes when evaluating density to ensure you're not just focussed on very big clusters. Smaller clusters also have the advantage of being easier to assess by eye. This is why the option `sampling_method = "lowest_density_clusters_by_size"` performs stratified sampling across different cluster sizes. +Therefore it's important to consider a range of sizes when evaluating density to ensure you're not just focussed on very big clusters. Smaller clusters also have the advantage of being easier to assess by eye. This is why the option `sampling_method = "lowest_density_clusters_by_size"` performs stratified sampling across different cluster sizes. ### Example: cluster centralisation -Cluster centralisation is defined as the average absolute deviation from maximum [node degree]() normalised with respect to the maximum possible value. In other words, cluster centralisation tells us about the concentration of edges in a cluster. It ranges from 0 to 1. +Cluster centralisation is defined as the average absolute deviation from maximum [node degree]() normalised with respect to the maximum possible value. In other words, cluster centralisation tells us about the concentration of edges in a cluster. Centralisation ranges from 0 to 1. A high cluster centralisation (closer to 1) indicates that a few nodes are home to significantly more connections compared to the rest of the nodes in a cluster. This can help locate nodes with a lower number of connections (low node degree) relative to what is possible for that cluster. -Low centralisation suggests that edges are more evenly distributed amoung nodes. Low centralisation can be good if all nodes within a clusters enjoy many connections. However low centralisation could also indicate that nodes are not as highly connected as they could be. +Low centralisation suggests that edges are more evenly distributed amongst nodes. Low centralisation can be good if all nodes within a clusters enjoy many connections. However low centralisation could also indicate that all nodes are not as highly connected as they could be, so it can be insightful to look at low centralisation in conjunction with low node degree.
-A guide on [how to compute these graph metrics with Splink]() is given in the next chapter. +A guide on [how to compute all the graph metrics mentioned above with Splink]() is given in the next chapter. diff --git a/docs/topic_guides/evaluation/clusters/overview.md b/docs/topic_guides/evaluation/clusters/overview.md index 5693a5749a..bb318d03d2 100644 --- a/docs/topic_guides/evaluation/clusters/overview.md +++ b/docs/topic_guides/evaluation/clusters/overview.md @@ -2,14 +2,12 @@ Graphs provide a natural way to think about linked data (see [link to intro page] for a refresher). Visualising linked data as a graph and employing graph metrics are powerful ways to evaluate linkage quality. +[copy of diagram here?] + Graph metrics help to give a big-picture view of the clusters generated by a Splink model. Through metric distributions and statistics, we can gauge the quality of clusters and monitor how adjustments to models impact results. - Graph metrics can also help us home in on problematic clusters, such as those containing inaccurate links (false positives). Spot-checking can be performed with Splink’s [Cluster Studio Dashboard]() which enables users to visualise individual clusters and interrogate the links between their member records. - - - ## Evaluating cluster quality ### What is a high quality cluster? From bea498a3b98ffdcd2da1216772156ac499716e7f Mon Sep 17 00:00:00 2001 From: zslade Date: Mon, 12 Feb 2024 15:52:00 +0000 Subject: [PATCH 22/46] improvements --- .../clusters/how_to_compute_metrics.md | 38 +++++++++---------- 1 file changed, 19 insertions(+), 19 deletions(-) diff --git a/docs/topic_guides/evaluation/clusters/how_to_compute_metrics.md b/docs/topic_guides/evaluation/clusters/how_to_compute_metrics.md index 5456e97ebe..12e9648c50 100644 --- a/docs/topic_guides/evaluation/clusters/how_to_compute_metrics.md +++ b/docs/topic_guides/evaluation/clusters/how_to_compute_metrics.md @@ -2,8 +2,12 @@ To enable users to calculate a variety of graph metrics for their linked data, Splink provides the `compute_graph_metrics()` method. - Generates tables containing graph metrics (for nodes, edges and clusters), - and returns a data class of Splink dataframes +The method is called on the `linker` like so: + +``` +linker.computer_graph_metrics(df_predict, df_clustered, threshold_match_probability=0.95) +``` +with arguments Args: df_predict (SplinkDataFrame): The results of `linker.predict()` @@ -13,30 +17,26 @@ To enable users to calculate a variety of graph metrics for their linked data, S to include only pairwise comparisons with a match_probability at or above this threshold. - Returns: - GraphMetricsResult: A data class containing SplinkDataFrames - of cluster IDs and selected node, edge or cluster metrics. - attribute "nodes" for nodes metrics table - attribute "edges" for edge metrics table - attribute "clusters" for cluster metrics table +!!! warning -The `threshold_match_probability` provided should be the same as the clustering threshold passed to `cluster_pairwise_predictions_at_threshold()`. If this information is available to Splink then it will be passed automatically, otherwise the user will have to provide it themselves and take care to ensure that threshold values align. + `threshold_match_probability` should be the same as the clustering threshold passed to `cluster_pairwise_predictions_at_threshold()`. If this information is available to Splink then it will be passed automatically, otherwise the user will have to provide it themselves and take care to ensure that threshold values align. -As stated above, `compute_graph_metrics()` returns a set of Splink dataframes. The individual Splink dataframes containing node, edge and cluster metrics (as introduced in [Graph metrics]()) can be accessed as follows: +The method generates tables containing graph metrics (for nodes, edges and clusters), and returns a data class of Splink dataframes. The individual Splink dataframes containing node, edge and cluster metrics can be accessed as follows: - compute_graph_metrics.nodes for node metrics - compute_graph_metrics.edges for edge metrics - compute_graph_metrics.clusters for cluster metrics +``` +compute_graph_metrics.nodes for node metrics +compute_graph_metrics.edges for edge metrics +compute_graph_metrics.clusters for cluster metrics +``` -The metrics computed by `compute_graph_metrics()` include all those mentioned in [Graph metrics](), namely +The metrics computed by `compute_graph_metrics()` include all those mentioned in the [Graph metrics]() chapter, namely: +* Node degree +* 'Is bridge' * Cluster size * Cluster density -* Node degree * Cluster centrality -* 'Is bridge' -All of these metrics are calculated by default. If you are unable to install the packages...required for 'is bridge', this metric won't be calculated, however all other metrics will still be generated. +All of these metrics are calculated by default. If you are unable to install the `igraph` package required for 'is bridge', this metric won't be calculated, however all other metrics will still be generated. -This topic guide is a work in progress. -We are developing a worked through example of computing metrics and applying them to evaluate and improve cluster quality. \ No newline at end of file +This topic guide is a work in progress. Please check back for more detailed examples of how `compute_graph_metrics()` can be used to evaluate linked data. From 5fea4833aa1809bd6d8cd7970156fe9eedeb449e Mon Sep 17 00:00:00 2001 From: zslade Date: Mon, 12 Feb 2024 19:34:14 +0000 Subject: [PATCH 23/46] remove average and absolute --- docs/topic_guides/evaluation/clusters/graph_metrics.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/topic_guides/evaluation/clusters/graph_metrics.md b/docs/topic_guides/evaluation/clusters/graph_metrics.md index 87ac76393f..5bac6bcb9c 100644 --- a/docs/topic_guides/evaluation/clusters/graph_metrics.md +++ b/docs/topic_guides/evaluation/clusters/graph_metrics.md @@ -78,7 +78,7 @@ Therefore it's important to consider a range of sizes when evaluating density to ### Example: cluster centralisation -Cluster centralisation is defined as the average absolute deviation from maximum [node degree]() normalised with respect to the maximum possible value. In other words, cluster centralisation tells us about the concentration of edges in a cluster. Centralisation ranges from 0 to 1. +[Cluster centralisation]("https://en.wikipedia.org/wiki/Centrality#Degree_centrality") is defined as the deviation from maximum [node degree]() normalised with respect to the maximum possible value. In other words, cluster centralisation tells us about the concentration of edges in a cluster. Centralisation ranges from 0 to 1. A high cluster centralisation (closer to 1) indicates that a few nodes are home to significantly more connections compared to the rest of the nodes in a cluster. This can help locate nodes with a lower number of connections (low node degree) relative to what is possible for that cluster. From e0e7495fc5918db1405bcc8886d5cde79a29683e Mon Sep 17 00:00:00 2001 From: zslade Date: Mon, 12 Feb 2024 19:53:11 +0000 Subject: [PATCH 24/46] improving centralisation explaination --- docs/topic_guides/evaluation/clusters/graph_metrics.md | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/docs/topic_guides/evaluation/clusters/graph_metrics.md b/docs/topic_guides/evaluation/clusters/graph_metrics.md index 5bac6bcb9c..882849ce68 100644 --- a/docs/topic_guides/evaluation/clusters/graph_metrics.md +++ b/docs/topic_guides/evaluation/clusters/graph_metrics.md @@ -80,9 +80,11 @@ Therefore it's important to consider a range of sizes when evaluating density to [Cluster centralisation]("https://en.wikipedia.org/wiki/Centrality#Degree_centrality") is defined as the deviation from maximum [node degree]() normalised with respect to the maximum possible value. In other words, cluster centralisation tells us about the concentration of edges in a cluster. Centralisation ranges from 0 to 1. -A high cluster centralisation (closer to 1) indicates that a few nodes are home to significantly more connections compared to the rest of the nodes in a cluster. This can help locate nodes with a lower number of connections (low node degree) relative to what is possible for that cluster. +A high cluster centralisation (closer to 1) indicates that a few nodes are home to significantly more connections compared to the rest of the nodes in a cluster. This can help identify clusters containing nodes with a lower number of connections (low node degree) relative to what is possible for that cluster. -Low centralisation suggests that edges are more evenly distributed amongst nodes. Low centralisation can be good if all nodes within a clusters enjoy many connections. However low centralisation could also indicate that all nodes are not as highly connected as they could be, so it can be insightful to look at low centralisation in conjunction with low node degree. +Low centralisation suggests that edges are more evenly distributed amongst nodes in a cluster. Low centralisation can be good if all nodes within a clusters enjoy many connections. However low centralisation could also indicate that all nodes are not as highly connected as they could be. To check for this, look at low centralisation in conjunction with low node degree. + +[maybe include a picture to help aid understanding]
From 7f28ee722ed93ecdbe496ea309a79f5cc3f346bf Mon Sep 17 00:00:00 2001 From: zslade Date: Mon, 12 Feb 2024 19:55:18 +0000 Subject: [PATCH 25/46] update link --- docs/topic_guides/evaluation/overview.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/topic_guides/evaluation/overview.md b/docs/topic_guides/evaluation/overview.md index 0bd52f49da..bddc26cd1e 100644 --- a/docs/topic_guides/evaluation/overview.md +++ b/docs/topic_guides/evaluation/overview.md @@ -18,7 +18,7 @@ Once you have trained a model, you will use it to predict the probability of lin ### :fontawesome-solid-circle-nodes: Cluster Evaluation -Once you have chosen a linkage threshold, the edges are used to generate clusters of records. To see how to evaluate these clusters, check out the [Cluster Evaluation guide](./clusters.md). +Once you have chosen a linkage threshold, the edges are used to generate clusters of records. To see how to evaluate these clusters, check out the [Cluster Evaluation guide](./clusters/overview.md).
From 05ea72d26f5084fca5c4c3a92a6b842dc06d3a46 Mon Sep 17 00:00:00 2001 From: zslade Date: Sat, 17 Feb 2024 20:28:44 +0000 Subject: [PATCH 26/46] small tweak --- docs/topic_guides/evaluation/clusters/graph_metrics.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/topic_guides/evaluation/clusters/graph_metrics.md b/docs/topic_guides/evaluation/clusters/graph_metrics.md index 882849ce68..16a720168e 100644 --- a/docs/topic_guides/evaluation/clusters/graph_metrics.md +++ b/docs/topic_guides/evaluation/clusters/graph_metrics.md @@ -82,7 +82,7 @@ Therefore it's important to consider a range of sizes when evaluating density to A high cluster centralisation (closer to 1) indicates that a few nodes are home to significantly more connections compared to the rest of the nodes in a cluster. This can help identify clusters containing nodes with a lower number of connections (low node degree) relative to what is possible for that cluster. -Low centralisation suggests that edges are more evenly distributed amongst nodes in a cluster. Low centralisation can be good if all nodes within a clusters enjoy many connections. However low centralisation could also indicate that all nodes are not as highly connected as they could be. To check for this, look at low centralisation in conjunction with low node degree. +Low centralisation suggests that edges are more evenly distributed amongst nodes in a cluster. Low centralisation can be good if all nodes within a clusters enjoy many connections. However low centralisation could also indicate that all nodes are not as highly connected as they could be. To check for this, you can look at low centralisation in conjunction with low [node degree]() or [low density](). [maybe include a picture to help aid understanding] From f8e880cabbef4880545cb719e2d8f31cec1c11ad Mon Sep 17 00:00:00 2001 From: zslade Date: Sat, 17 Feb 2024 20:31:18 +0000 Subject: [PATCH 27/46] remove graph definition --- .../evaluation/clusters/overview.md | 17 +---------------- 1 file changed, 1 insertion(+), 16 deletions(-) diff --git a/docs/topic_guides/evaluation/clusters/overview.md b/docs/topic_guides/evaluation/clusters/overview.md index bb318d03d2..51835f8836 100644 --- a/docs/topic_guides/evaluation/clusters/overview.md +++ b/docs/topic_guides/evaluation/clusters/overview.md @@ -31,19 +31,4 @@ Whatever the starting point, this topic guide is designed to help users develop * An introduction to the [graph metrics]() currently available in Splink and how to apply them to linked data * Instructions on how to compute graph metrics with Splink -This topic guide is a work is work in progress and will be updated as new functionality and metrics are released. - -
-
- -## Linked data as graphs <-- to go somewhere upstream - -For clarity, let us first define what we mean by a graph. A graph is a collection of points (nodes) connected by lines (edges). - -[Include picture here] - -In data linking, we refer to these collections of nodes as clusters, within which the nodes represent the entity to be linked (e.g. person or journey) and the edges represent a potential match. - -[Include picture here] - -Edges come with an associate Splink score (the probability of two records being a match). This makes graphs (clusters) produced by Splink so called weighted graphs, as each edge has a weight (Splink score). +This topic guide is a work is work in progress and will be updated as new functionality and metrics are released. \ No newline at end of file From 6200739fa69b50d22f4144732911421c9e65e397 Mon Sep 17 00:00:00 2001 From: Ross Kennedy Date: Thu, 28 Mar 2024 17:29:19 +0000 Subject: [PATCH 28/46] minor edits --- .../evaluation/clusters/graph_metrics.md | 26 +++++++++++-------- .../evaluation/clusters/overview.md | 15 +++++++---- 2 files changed, 25 insertions(+), 16 deletions(-) diff --git a/docs/topic_guides/evaluation/clusters/graph_metrics.md b/docs/topic_guides/evaluation/clusters/graph_metrics.md index 16a720168e..c758c487b7 100644 --- a/docs/topic_guides/evaluation/clusters/graph_metrics.md +++ b/docs/topic_guides/evaluation/clusters/graph_metrics.md @@ -4,9 +4,9 @@ Graph metrics quantify the characteristics of a graph. A simple example of a gra For data linking with Splink, it is useful to sort graph metrics into three categories: -* [Node metrics]() -* [Edge metrics]() -* [Cluster metrics]() +* [Node metrics](#purple_circle-node-metrics) +* [Edge metrics](#link-edge-metrics) +* [Cluster metrics](#fontawesome-solid-circle-nodes-cluster-metrics) Each of these are defined below together with examples and explanations of how they can be applied to linked data to evaluate cluster quality. The examples given are of all metrics currently available in Splink. @@ -17,14 +17,18 @@ Each of these are defined below together with examples and explanations of how t It is also important to consider metrics within the context of their distribution and the underlying dataset. For example: a cluster density (see below) of 0.4 might seem low but could actually be above average for the dataset in question; a cluster of size 80 might be suspiciously large for one dataset but not for another. -## ⚫️ Node metrics +## :purple_circle: Node metrics Node metrics quantify the properties of the nodes which live within clusters. -### Example: node degree +### Node Degree Node degree is the number of edges connected to a node. +For example, in the cluster below A has a node degree of 1, whereas D has a node degree of 3. + +![Basic Graph - Records](../../../img/clusters/basic_graph_records.drawio.png){:width="80%"} + High node degree is generally considered good as it means there are many edges in support of records in a cluster being linked. Nodes with low node degree could indicate links being missed (false negatives). However, erroneous links (false positives) could also be the reason for high node degree, so it can be useful to validate the edges of highly connected nodes. @@ -33,11 +37,11 @@ It is important to consider [cluster size]() when looking at node degree. By def Bear in mind, that the degree of a single node in a cluster isn't necessarily representative of the overall connectedness of a cluster. This is where [cluster centralisation]() can help. -## πŸ”— Edge metrics +## :link: Edge metrics Edge metrics quantify the properties of the edges within a cluster. -### Example: 'is bridge' +### 'is bridge' An edge is classified as a 'bridge' if its removal splits a cluster into two smaller clusters. @@ -49,7 +53,7 @@ Bridges can be signalers of false positives in linked data, especially when join Cluster metrics refer to the characteristics of a cluster as a whole, rather than the individual nodes and edges it contains. -### Example: cluster size +### Cluster Size Cluster size refers to the number of nodes within a cluster. @@ -59,7 +63,7 @@ If you don't have an intuition of what seems reasonable, then it is worth inspec There also might be a lower bound on cluster size. For example, when linking two datasets in which you know people appear least once in each, the minimum expected size of cluster will be 2. Clusters smaller than the minimum size indicate links have been missed. This could be due to blocking rules not letting through all record comparisons of true matches. -### Example: cluster density +### Cluster Density The density of a cluster is given by the number of edges it contains divided by the maximum possible number of edges. Density ranges from 0 to 1. A density of 1 means that all nodes are connected to all other nodes in a cluster. @@ -76,7 +80,7 @@ Therefore it's important to consider a range of sizes when evaluating density to -### Example: cluster centralisation +### Cluster Centralisation [Cluster centralisation]("https://en.wikipedia.org/wiki/Centrality#Degree_centrality") is defined as the deviation from maximum [node degree]() normalised with respect to the maximum possible value. In other words, cluster centralisation tells us about the concentration of edges in a cluster. Centralisation ranges from 0 to 1. @@ -86,6 +90,6 @@ Low centralisation suggests that edges are more evenly distributed amongst nodes [maybe include a picture to help aid understanding] -
+
A guide on [how to compute all the graph metrics mentioned above with Splink]() is given in the next chapter. diff --git a/docs/topic_guides/evaluation/clusters/overview.md b/docs/topic_guides/evaluation/clusters/overview.md index 51835f8836..92edef92bd 100644 --- a/docs/topic_guides/evaluation/clusters/overview.md +++ b/docs/topic_guides/evaluation/clusters/overview.md @@ -1,12 +1,12 @@ # Cluster Evaluation -Graphs provide a natural way to think about linked data (see [link to intro page] for a refresher). Visualising linked data as a graph and employing graph metrics are powerful ways to evaluate linkage quality. +Graphs provide a natural way to think about linked data (see the ["Linked data as graphs" guide](../../theory/linked_data_as_graphs.md) for a refresher). Visualising linked data as a graph and employing graph metrics are powerful ways to evaluate linkage quality. -[copy of diagram here?] +![Basic Cluster](../../../img/clusters/basic_graph_cluster.drawio.png){:width="80%"} Graph metrics help to give a big-picture view of the clusters generated by a Splink model. Through metric distributions and statistics, we can gauge the quality of clusters and monitor how adjustments to models impact results. -Graph metrics can also help us home in on problematic clusters, such as those containing inaccurate links (false positives). Spot-checking can be performed with Splink’s [Cluster Studio Dashboard]() which enables users to visualise individual clusters and interrogate the links between their member records. +Graph metrics can also help us hone in on problematic clusters, such as those containing inaccurate links (false positives). Spot-checking can be performed with Splink’s [Cluster Studio Dashboard](../../../charts/cluster_studio_dashboard.ipynb) which enables users to visualise individual clusters and interrogate the links between their member records. ## Evaluating cluster quality @@ -14,7 +14,12 @@ Graph metrics can also help us home in on problematic clusters, such as those co When it comes to data linking, the highest quality clusters will be those containing all possible true matches (there will be no missed links a.k.a. false negatives) and no false matches (no false positives). -Generating clusters which all adhere to this ideal is rare in practice. Blocking rules, necessary to make computations tractable, can prevent record comparisons between some true matches ever being made, and data limitations can place an upper bound on the level of quality achievable. Despite this, graph metrics can help us get closer to a satisfactory level of quality as well as monitor it going forward. +Generating clusters which all adhere to this ideal is rare in practice. For example, + +* Blocking rules, necessary to make computations tractable, can prevent record comparisons between some true matches ever being made +* Data limitations can place an upper bound on the level of quality achievable. + +Despite this, graph metrics can help us get closer to a satisfactory level of quality as well as monitor it going forward. ### What does cluster quality look like for you? @@ -28,7 +33,7 @@ Whatever the starting point, this topic guide is designed to help users develop ## What this topic guide contains -* An introduction to the [graph metrics]() currently available in Splink and how to apply them to linked data +* An introduction to the [graph metrics](./graph_metrics.md) currently available in Splink and how to apply them to linked data * Instructions on how to compute graph metrics with Splink This topic guide is a work is work in progress and will be updated as new functionality and metrics are released. \ No newline at end of file From 016bb006cf16cb4f75b09dee27d29c94262d6073 Mon Sep 17 00:00:00 2001 From: zslade Date: Thu, 28 Mar 2024 19:48:59 +0000 Subject: [PATCH 29/46] changes based off comments --- docs/topic_guides/evaluation/clusters/overview.md | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/docs/topic_guides/evaluation/clusters/overview.md b/docs/topic_guides/evaluation/clusters/overview.md index 92edef92bd..24cdaf21e9 100644 --- a/docs/topic_guides/evaluation/clusters/overview.md +++ b/docs/topic_guides/evaluation/clusters/overview.md @@ -6,18 +6,18 @@ Graphs provide a natural way to think about linked data (see the ["Linked data a Graph metrics help to give a big-picture view of the clusters generated by a Splink model. Through metric distributions and statistics, we can gauge the quality of clusters and monitor how adjustments to models impact results. -Graph metrics can also help us hone in on problematic clusters, such as those containing inaccurate links (false positives). Spot-checking can be performed with Splink’s [Cluster Studio Dashboard](../../../charts/cluster_studio_dashboard.ipynb) which enables users to visualise individual clusters and interrogate the links between their member records. +Graph metrics can also help us home in on problematic clusters, such as those containing inaccurate links (false positives). Spot-checking can be performed with Splink’s [Cluster Studio Dashboard](../../../charts/cluster_studio_dashboard.ipynb) which enables users to visualise individual clusters and interrogate the links between their member records. ## Evaluating cluster quality ### What is a high quality cluster? -When it comes to data linking, the highest quality clusters will be those containing all possible true matches (there will be no missed links a.k.a. false negatives) and no false matches (no false positives). +When it comes to data linking, the highest quality clusters will be those containing all possible true matches (there will be no missed links a.k.a. false negatives) and no false matches (no false positives). In other words, clusters only containing precisely those nodes corresponding to records about the same entity. Generating clusters which all adhere to this ideal is rare in practice. For example, * Blocking rules, necessary to make computations tractable, can prevent record comparisons between some true matches ever being made -* Data limitations can place an upper bound on the level of quality achievable. +* Data limitations can place an upper bound on the level of quality achievable Despite this, graph metrics can help us get closer to a satisfactory level of quality as well as monitor it going forward. @@ -25,9 +25,9 @@ Despite this, graph metrics can help us get closer to a satisfactory level of qu The extent of cluster evaluation efforts and what is considered 'good enough' will vary greatly with linkage use-case. You might already have labelled data or quality assured outputs from another model which define a clear benchmark for cluster quality. -Domain knowledge can also set expectations of what is deemed reasonable or good. For example, you might already know that a large cluster (containing say 100 nodes) is suspicious for a your deduplicated dataset. +Domain knowledge can also set expectations of what is deemed reasonable or good. For example, you might already know that a large cluster (containing say 100 nodes) is suspicious for your deduplicated dataset. -However, you may also have little or no knowledge about the data or a clear idea of what good quality clusters look like for your linkage. +However, you may currently have little or no knowledge about the data or no a clear idea of what good quality clusters look like for your linkage. Whatever the starting point, this topic guide is designed to help users develop a better understanding of their clusters and help focus quality assurance efforts to get the best out of their linkage models. From 66ef851e7e4ec135173104563d1daaa45f17155f Mon Sep 17 00:00:00 2001 From: Zoe Slade Date: Thu, 28 Mar 2024 19:56:47 +0000 Subject: [PATCH 30/46] Delete docs/comparison_level_library.md --- docs/comparison_level_library.md | 209 ------------------------------- 1 file changed, 209 deletions(-) delete mode 100644 docs/comparison_level_library.md diff --git a/docs/comparison_level_library.md b/docs/comparison_level_library.md deleted file mode 100644 index ce8b1c2e40..0000000000 --- a/docs/comparison_level_library.md +++ /dev/null @@ -1,209 +0,0 @@ ---- -tags: - - API - - comparisons - - Damerau-Levenshtein - - Levenshtein - - Jaro-Winkler - - Jaccard - - Date Difference - - Distance In KM - - Array Intersect - - Columns Reversed - - Percentage Difference -toc_depth: 2 ---- -# Documentation for `comparison_level_library` - -The `comparison_level_library` contains pre-made comparison levels available for use to -construct custom comparisons [as described in this topic guide](./topic_guides/comparisons/customising_comparisons.html#method-3-comparisonlevels). -However, not every comparison level is available for every [Splink-compatible SQL backend](./topic_guides/splink_fundamentals/backends/backends.md). - -The pre-made Splink comparison levels available for each SQL dialect are as given in this table: - -||:simple-duckdb:
DuckDB|:simple-apachespark:
Spark|:simple-amazonaws:
Athena|:simple-sqlite:
SQLite|:simple-postgresql:
PostgreSql| -|:-:|:-:|:-:|:-:|:-:|:-:| -|[array_intersect_level](#splink.comparison_level_library.ArrayIntersectLevelBase)|βœ“|βœ“|βœ“||βœ“| -|[columns_reversed_level](#splink.comparison_level_library.ColumnsReversedLevelBase)|βœ“|βœ“|βœ“|βœ“|βœ“| -|[damerau_levenshtein_level](#splink.comparison_level_library.DamerauLevenshteinLevelBase)|βœ“|βœ“||βœ“|| -|[datediff_level](#splink.comparison_level_library.DatediffLevelBase)|βœ“|βœ“|βœ“||βœ“| -|[distance_function_level](#splink.comparison_level_library.DistanceFunctionLevelBase)|βœ“|βœ“|βœ“|βœ“|βœ“| -|[distance_in_km_level](#splink.comparison_level_library.DistanceInKmLevelBase)|βœ“|βœ“|βœ“||βœ“| -|[else_level](#splink.comparison_level_library.ElseLevelBase)|βœ“|βœ“|βœ“|βœ“|βœ“| -|[exact_match_level](#splink.comparison_level_library.ExactMatchLevelBase)|βœ“|βœ“|βœ“|βœ“|βœ“| -|[jaccard_level](#splink.comparison_level_library.JaccardLevelBase)|βœ“|βœ“|||| -|[jaro_level](#splink.comparison_level_library.JaroLevelBase)|βœ“|βœ“||βœ“|| -|[jaro_winkler_level](#splink.comparison_level_library.JaroWinklerLevelBase)|βœ“|βœ“||βœ“|| -|[levenshtein_level](#splink.comparison_level_library.LevenshteinLevelBase)|βœ“|βœ“|βœ“|βœ“|βœ“| -|[null_level](#splink.comparison_level_library.NullLevelBase)|βœ“|βœ“|βœ“|βœ“|βœ“| -|[percentage_difference_level](#splink.comparison_level_library.PercentageDifferenceLevelBase)|βœ“|βœ“|βœ“|βœ“|βœ“| - - - - -The detailed API for each of these are outlined below. - -## Library comparison level APIs - -::: splink.comparison_level_library.NullLevelBase - handler: python - rendering: - show_root_heading: true - show_source: false - heading_level: 3 - ---- - -::: splink.comparison_level_library.ExactMatchLevelBase - handler: python - selection: - members: - - __init__ - rendering: - show_root_heading: true - show_source: false - heading_level: 3 - ---- - -::: splink.comparison_level_library.ElseLevelBase - handler: python - selection: - members: - - __init__ - rendering: - show_root_heading: true - show_source: false - heading_level: 3 - ---- - -::: splink.comparison_level_library.DistanceFunctionLevelBase - handler: python - selection: - members: - - __init__ - rendering: - show_root_heading: true - show_source: false - heading_level: 3 - ---- - -::: splink.comparison_level_library.LevenshteinLevelBase - handler: python - selection: - members: - - __init__ - rendering: - show_root_heading: true - show_source: false - heading_level: 3 - ---- - -::: splink.comparison_level_library.DamerauLevenshteinLevelBase - handler: python - selection: - members: - - __init__ - rendering: - show_root_heading: true - show_source: false - heading_level: 3 - ---- - -::: splink.comparison_level_library.JaroLevelBase - handler: python - selection: - members: - - __init__ - rendering: - show_root_heading: true - show_source: false - heading_level: 3 - ---- - -::: splink.comparison_level_library.JaroWinklerLevelBase - handler: python - selection: - members: - - __init__ - rendering: - show_root_heading: true - show_source: false - heading_level: 3 - ---- - -::: splink.comparison_level_library.JaccardLevelBase - handler: python - selection: - members: - - __init__ - rendering: - show_root_heading: true - show_source: false - heading_level: 3 - ---- - -::: splink.comparison_level_library.ColumnsReversedLevelBase - handler: python - selection: - members: - - __init__ - rendering: - show_root_heading: true - show_source: false - heading_level: 3 - ---- - -::: splink.comparison_level_library.DistanceInKMLevelBase - handler: python - selection: - members: - - __init__ - rendering: - show_root_heading: true - show_source: false - heading_level: 3 - ---- - -::: splink.comparison_level_library.PercentageDifferenceLevelBase - handler: python - selection: - members: - - __init__ - rendering: - show_root_heading: true - show_source: false - heading_level: 3 - ---- - -::: splink.comparison_level_library.ArrayIntersectLevelBase - handler: python - selection: - members: - - __init__ - rendering: - show_root_heading: true - show_source: false - heading_level: 3 - ---- - -::: splink.comparison_level_library.DatediffLevelBase - handler: python - selection: - members: - - __init__ - rendering: - show_root_heading: true - show_source: false - heading_level: 3 From 13e12b97dcf2df0787c3c768728495e8b7e163ca Mon Sep 17 00:00:00 2001 From: Zoe Slade Date: Thu, 28 Mar 2024 19:58:09 +0000 Subject: [PATCH 31/46] Delete docs/datasets.md --- docs/datasets.md | 102 ----------------------------------------------- 1 file changed, 102 deletions(-) delete mode 100644 docs/datasets.md diff --git a/docs/datasets.md b/docs/datasets.md deleted file mode 100644 index 8a372f5f8b..0000000000 --- a/docs/datasets.md +++ /dev/null @@ -1,102 +0,0 @@ ---- -tags: - - API - - Datasets - - Examples ---- - -# In-built datasets - -Splink has some datasets available for use to help you get up and running, test ideas, or explore Splink features. -To use, simply import `splink_datasets`: -```py -from splink.datasets import splink_datasets - -df = splink_datasets.fake_1000 -``` -which you can then use to set up a linker: -```py -from splink.datasets import splink_datasets -from splink.duckdb.linker import DuckDBLinker -import splink.duckdb.comparison_library as cl - -df = splink_datasets.fake_1000 -linker = DuckDBLinker( - df, - { - "link_type": "dedupe_only", - "comparisons": [cl.exact_match("first_name"), cl.exact_match("surname")], - }, -) -``` - -??? tip "Troubleshooting" - - If you get a `SSLCertVerificationError` when trying to use the inbuilt datasets, this can be fixed with the `ssl` package by running: - - `ssl._create_default_https_context = ssl._create_unverified_context`. - -## `splink_datasets` - -Each attribute of `splink_datasets` is a dataset available for use, which exists as a pandas `DataFrame`. -These datasets are not packaged directly with Splink, but instead are downloaded only when they are requested. -Once requested they are cached for future use. -The cache can be cleared using [`splink_dataset_utils`](#splink_dataset_utils-object), -which also contains information on available datasets, and which have already been cached. - -### Available datasets - -The datasets available are listed below: - -|dataset name|description|rows|unique entities|link to source| -|-|-|-|-|-| -|`fake_1000`|Fake 1000 from splink demos. Records are 250 simulated people, with different numbers of duplicates, labelled.|1,000|250|[source](https://raw.githubusercontent.com/moj-analytical-services/splink_datasets/master/data/fake_1000.csv)| -|`historical_50k`|The data is based on historical persons scraped from wikidata. Duplicate records are introduced with a variety of errors.|50,000|5,156|[source](https://raw.githubusercontent.com/moj-analytical-services/splink_datasets/master/data/historical_figures_with_errors_50k.parquet)| -|`febrl3`|The Freely Extensible Biomedical Record Linkage (FEBRL) datasets consist of comparison patterns from an epidemiological cancer study in Germany.FEBRL3 data set contains 5000 records (2000 originals and 3000 duplicates), with a maximum of 5 duplicates based on one original record.|5,000|2,000|[source](https://raw.githubusercontent.com/moj-analytical-services/splink_datasets/master/data/febrl/dataset3.csv)| -|`febrl4a`|The Freely Extensible Biomedical Record Linkage (FEBRL) datasets consist of comparison patterns from an epidemiological cancer study in Germany.FEBRL4a contains 5000 original records.|5,000|5,000|[source](https://raw.githubusercontent.com/moj-analytical-services/splink_datasets/master/data/febrl/dataset4a.csv)| -|`febrl4b`|The Freely Extensible Biomedical Record Linkage (FEBRL) datasets consist of comparison patterns from an epidemiological cancer study in Germany.FEBRL4b contains 5000 duplicate records, one for each record in FEBRL4a.|5,000|5,000|[source](https://raw.githubusercontent.com/moj-analytical-services/splink_datasets/master/data/febrl/dataset4b.csv)| -|`transactions_origin`|This data has been generated to resemble bank transactions leaving an account. There are no duplicates within the dataset and each transaction is designed to have a counterpart arriving in 'transactions_destination'. Memo is sometimes truncated or missing.|45,326|45,326|[source](https://raw.githubusercontent.com/moj-analytical-services/splink_datasets/master/data/transactions_origin.parquet)| -|`transactions_destination`|This data has been generated to resemble bank transactions arriving in an account. There are no duplicates within the dataset and each transaction is designed to have a counterpart sent from 'transactions_origin'. There may be a delay between the source and destination account, and the amount may vary due to hidden fees and foreign exchange rates. Memo is sometimes truncated or missing.|45,326|45,326|[source](https://raw.githubusercontent.com/moj-analytical-services/splink_datasets/master/data/transactions_destination.parquet)| - - - -## `splink_dataset_labels` - -Some of the `splink_datasets` have corresponding clerical labels to help assess model performance. These are requested through the `splink_dataset_labels` module. - -### Available datasets - -The datasets available are listed below: - -|dataset name|description|rows|unique entities|link to source| -|-|-|-|-|-| -|`fake_1000_labels`|Clerical labels for fake_1000 |3,176|NA|[source](https://raw.githubusercontent.com/moj-analytical-services/splink_datasets/master/data/fake_1000_labels.csv)| - - - -## `splink_dataset_utils` API - -In addition to `splink_datasets`, you can also import `splink_dataset_utils`, -which has a few functions to help managing `splink_datasets`. -This can be useful if you have limited internet connection and want to see what is already cached, -or if you need to clear cache items (e.g. if datasets were to be updated, or if space is an issue). - -For example: -```py -from splink.datasets import splink_dataset_utils - -splink_dataset_utils.show_downloaded_data() -splink_dataset_utils.clear_cache(['fake_1000']) -``` - -::: splink.datasets._SplinkDataUtils - handler: python - options: - members: - - list_downloaded_datasets - - list_all_datasets - - show_downloaded_data - - clear_downloaded_data - show_root_heading: false - show_source: false - heading_level: 3 From f45d9bfeef8b0e099e063cae222f461d06e59ae4 Mon Sep 17 00:00:00 2001 From: Zoe Slade Date: Thu, 28 Mar 2024 20:00:42 +0000 Subject: [PATCH 32/46] Delete docs/comparison_library.md --- docs/comparison_library.md | 159 ------------------------------------- 1 file changed, 159 deletions(-) delete mode 100644 docs/comparison_library.md diff --git a/docs/comparison_library.md b/docs/comparison_library.md deleted file mode 100644 index 783c490bda..0000000000 --- a/docs/comparison_library.md +++ /dev/null @@ -1,159 +0,0 @@ ---- -tags: - - API - - comparisons - - Levenshtein - - Jaro-Winkler - - Jaccard - - Distance In KM - - Date Difference - - Array Intersect -toc_depth: 2 ---- -# Documentation for `comparison_library` - -The `comparison_library` contains pre-made comparisons available for use directly [as described in this topic guide](./topic_guides/comparisons/customising_comparisons.html#method-1-using-the-comparisonlibrary). -However, not every comparison is available for every [Splink-compatible SQL backend](./topic_guides/splink_fundamentals/backends/backends.html). - -The pre-made Splink comparisons available for each SQL dialect are as given in this table: - -||:simple-duckdb:
DuckDB|:simple-apachespark:
Spark|:simple-amazonaws:
Athena|:simple-sqlite:
SQLite|:simple-postgresql:
PostgreSql| -|:-:|:-:|:-:|:-:|:-:|:-:| -|[array_intersect_at_sizes](#splink.comparison_library.ArrayIntersectAtSizesBase)|βœ“|βœ“|βœ“||βœ“| -|[damerau_levenshtein_at_thresholds](#splink.comparison_library.DamerauLevenshteinAtThresholdsBase)|βœ“|βœ“||βœ“|| -|[datediff_at_thresholds](#splink.comparison_library.DatediffAtThresholdsBase)|βœ“|βœ“|βœ“||βœ“| -|[distance_function_at_thresholds](#splink.comparison_library.DistanceFunctionAtThresholdsBase)|βœ“|βœ“|βœ“|βœ“|βœ“| -|[distance_in_km_at_thresholds](#splink.comparison_library.DistanceInKmAtThresholdsBase)|βœ“|βœ“|βœ“||βœ“| -|[exact_match](#splink.comparison_library.ExactMatchBase)|βœ“|βœ“|βœ“|βœ“|βœ“| -|[jaccard_at_thresholds](#splink.comparison_library.JaccardAtThresholdsBase)|βœ“|βœ“|||| -|[jaro_at_thresholds](#splink.comparison_library.JaroAtThresholdsBase)|βœ“|βœ“||βœ“|| -|[jaro_winkler_at_thresholds](#splink.comparison_library.JaroWinklerAtThresholdsBase)|βœ“|βœ“||βœ“|| -|[levenshtein_at_thresholds](#splink.comparison_library.LevenshteinAtThresholdsBase)|βœ“|βœ“|βœ“|βœ“|βœ“| - - - - - - - -The detailed API for each of these are outlined below. - -## Library comparison APIs - -::: splink.comparison_library.ExactMatchBase - handler: python - selection: - members: - - __init__ - rendering: - show_root_heading: true - show_source: false - heading_level: 3 - ---- - -::: splink.comparison_library.DistanceFunctionAtThresholdsBase - handler: python - selection: - members: - - __init__ - rendering: - show_root_heading: true - show_source: false - heading_level: 3 - ---- - -::: splink.comparison_library.LevenshteinAtThresholdsBase - handler: python - selection: - members: - - __init__ - rendering: - show_root_heading: true - show_source: false - heading_level: 3 - ---- - -::: splink.comparison_library.DamerauLevenshteinAtThresholdsBase - handler: python - selection: - members: - - __init__ - rendering: - show_root_heading: true - show_source: false - heading_level: 3 - ---- - -::: splink.comparison_library.JaccardAtThresholdsBase - handler: python - selection: - members: - - __init__ - rendering: - show_root_heading: true - show_source: false - heading_level: 3 - ---- - -::: splink.comparison_library.JaroAtThresholdsBase - handler: python - selection: - members: - - __init__ - rendering: - show_root_heading: true - show_source: false - heading_level: 3 - ---- - -::: splink.comparison_library.JaroWinklerAtThresholdsBase - handler: python - selection: - members: - - __init__ - rendering: - show_root_heading: true - show_source: false - heading_level: 3 - ---- - -::: splink.comparison_library.ArrayIntersectAtSizesBase - handler: python - selection: - members: - - __init__ - rendering: - show_root_heading: true - show_source: false - heading_level: 3 - ---- - -::: splink.comparison_library.DatediffAtThresholdsBase - handler: python - selection: - members: - - __init__ - rendering: - show_root_heading: true - show_source: false - heading_level: 3 - ---- - -::: splink.comparison_library.DistanceInKMAtThresholdsBase - handler: python - selection: - members: - - __init__ - rendering: - show_root_heading: true - show_source: false - heading_level: 3 From 38ab7b64ddbd8149b24281240c342694946d37e2 Mon Sep 17 00:00:00 2001 From: Zoe Slade Date: Thu, 28 Mar 2024 20:00:53 +0000 Subject: [PATCH 33/46] Delete docs/comparison_template_library.md --- docs/comparison_template_library.md | 89 ----------------------------- 1 file changed, 89 deletions(-) delete mode 100644 docs/comparison_template_library.md diff --git a/docs/comparison_template_library.md b/docs/comparison_template_library.md deleted file mode 100644 index f185a0c6f1..0000000000 --- a/docs/comparison_template_library.md +++ /dev/null @@ -1,89 +0,0 @@ ---- -tags: - - API - - comparisons - - Date Comparison -toc_depth: 2 ---- - -# Documentation for `comparison_template_library` - -The `comparison_template_library` contains pre-made comparisons with pre-defined parameters available for use directly [as described in this topic guide](./topic_guides/comparisons/customising_comparisons.html#method-2-using-the-comparisontemplatelibrary). -However, not every comparison is available for every [Splink-compatible SQL backend](./topic_guides/splink_fundamentals/backends/backends.html). More detail on creating comparisons for specific data types is also [included in the topic guide.](./topic_guides/comparisons/customising_comparisons.html#creating-comparisons-for-specific-data-types) - -The pre-made Splink comparison templates available for each SQL dialect are as given in this table: - -||:simple-duckdb:
DuckDB|:simple-apachespark:
Spark|:simple-amazonaws:
Athena|:simple-sqlite:
SQLite|:simple-postgresql:
PostgreSql| -|:-:|:-:|:-:|:-:|:-:|:-:| -|[date_comparison](#splink.comparison_template_library.DateComparisonBase)|βœ“|βœ“|||| -|[email_comparison](#splink.comparison_template_library.EmailComparisonBase)|βœ“|βœ“|||| -|[forename_surname_comparison](#splink.comparison_template_library.ForenameSurnameComparisonBase)|βœ“|βœ“||βœ“|| -|[name_comparison](#splink.comparison_template_library.NameComparisonBase)|βœ“|βœ“||βœ“|| -|[postcode_comparison](#splink.comparison_template_library.PostcodeComparisonBase)|βœ“|βœ“|βœ“||| - - - - -The detailed API for each of these are outlined below. - -## Library comparison APIs - -::: splink.comparison_template_library.DateComparisonBase - handler: python - selection: - members: - - __init__ - rendering: - show_root_heading: true - show_source: false - heading_level: 3 - ---- - -::: splink.comparison_template_library.NameComparisonBase - handler: python - selection: - members: - - __init__ - rendering: - show_root_heading: true - show_source: false - heading_level: 3 - ---- - -::: splink.comparison_template_library.ForenameSurnameComparisonBase - handler: python - selection: - members: - - __init__ - rendering: - show_root_heading: true - show_source: false - heading_level: 3 - ---- - -::: splink.comparison_template_library.PostcodeComparisonBase - handler: python - selection: - members: - - __init__ - rendering: - show_root_heading: true - show_source: false - heading_level: 3 - ---- - -::: splink.comparison_template_library.EmailComparisonBase - handler: python - selection: - members: - - __init__ - rendering: - show_root_heading: true - show_source: false - heading_level: 3 - ---- \ No newline at end of file From 7d694ca703f10d6a97b4d1fb4c73c316f7757bcc Mon Sep 17 00:00:00 2001 From: Zoe Slade Date: Thu, 28 Mar 2024 20:01:14 +0000 Subject: [PATCH 34/46] Delete docs/comparison_level_composition.md --- docs/comparison_level_composition.md | 35 ---------------------------- 1 file changed, 35 deletions(-) delete mode 100644 docs/comparison_level_composition.md diff --git a/docs/comparison_level_composition.md b/docs/comparison_level_composition.md deleted file mode 100644 index 9ec815dee5..0000000000 --- a/docs/comparison_level_composition.md +++ /dev/null @@ -1,35 +0,0 @@ ---- -tags: - - API - - comparisons ---- -# Documentation for `comparison_level_composition` functions - -`comparison_composition` allows the merging of existing comparison levels by a logical SQL clause - `OR`, `AND` or `NOT`. - -This extends the functionality of our base comparison levels by allowing users to "join" existing comparisons by various SQL clauses. - -For example, `or_(null_level("first_name"), null_level("surname"))` creates a check for nulls in *either* `first_name` or `surname`, rather than restricting the user to a single column. - -The Splink comparison level composition functions available for each SQL dialect are as given in this table: - -||:simple-duckdb:
DuckDB|:simple-apachespark:
Spark|:simple-amazonaws:
Athena|:simple-sqlite:
SQLite|:simple-postgresql:
PostgreSql| -|:-:|:-:|:-:|:-:|:-:|:-:| -|[and_](#splink.comparison_level_composition.and_)|βœ“|βœ“|βœ“|βœ“|βœ“| -|[not_](#splink.comparison_level_composition.not_)|βœ“|βœ“|βœ“|βœ“|βœ“| -|[or_](#splink.comparison_level_composition.or_)|βœ“|βœ“|βœ“|βœ“|βœ“| - - - - -The detailed API for each of these are outlined below. - -## Library comparison composition APIs - -::: splink.comparison_level_composition - handler: python - selection: - members: - - and_ - - or_ - - not_ From 9bfac5a1a78dcffcd5cbd67424011884efd7a9f4 Mon Sep 17 00:00:00 2001 From: zslade Date: Tue, 2 Apr 2024 10:57:13 +0100 Subject: [PATCH 35/46] tweaks --- docs/topic_guides/evaluation/clusters/graph_metrics.md | 2 ++ docs/topic_guides/evaluation/clusters/how_to_compute_metrics.md | 2 +- docs/topic_guides/evaluation/clusters/overview.md | 2 +- 3 files changed, 4 insertions(+), 2 deletions(-) diff --git a/docs/topic_guides/evaluation/clusters/graph_metrics.md b/docs/topic_guides/evaluation/clusters/graph_metrics.md index c758c487b7..f3f1993d67 100644 --- a/docs/topic_guides/evaluation/clusters/graph_metrics.md +++ b/docs/topic_guides/evaluation/clusters/graph_metrics.md @@ -93,3 +93,5 @@ Low centralisation suggests that edges are more evenly distributed amongst nodes
A guide on [how to compute all the graph metrics mentioned above with Splink]() is given in the next chapter. + +Please note, this topic guide is a work in progress and we welcome any feedback. diff --git a/docs/topic_guides/evaluation/clusters/how_to_compute_metrics.md b/docs/topic_guides/evaluation/clusters/how_to_compute_metrics.md index 12e9648c50..ac0b974209 100644 --- a/docs/topic_guides/evaluation/clusters/how_to_compute_metrics.md +++ b/docs/topic_guides/evaluation/clusters/how_to_compute_metrics.md @@ -21,7 +21,7 @@ with arguments `threshold_match_probability` should be the same as the clustering threshold passed to `cluster_pairwise_predictions_at_threshold()`. If this information is available to Splink then it will be passed automatically, otherwise the user will have to provide it themselves and take care to ensure that threshold values align. -The method generates tables containing graph metrics (for nodes, edges and clusters), and returns a data class of Splink dataframes. The individual Splink dataframes containing node, edge and cluster metrics can be accessed as follows: +The method generates tables containing graph metrics (for nodes, edges and clusters), and returns a data class of [Splink dataframes](../../../SplinkDataFrame.md). The individual Splink dataframes containing node, edge and cluster metrics can be accessed as follows: ``` compute_graph_metrics.nodes for node metrics diff --git a/docs/topic_guides/evaluation/clusters/overview.md b/docs/topic_guides/evaluation/clusters/overview.md index 24cdaf21e9..a1d26162d0 100644 --- a/docs/topic_guides/evaluation/clusters/overview.md +++ b/docs/topic_guides/evaluation/clusters/overview.md @@ -36,4 +36,4 @@ Whatever the starting point, this topic guide is designed to help users develop * An introduction to the [graph metrics](./graph_metrics.md) currently available in Splink and how to apply them to linked data * Instructions on how to compute graph metrics with Splink -This topic guide is a work is work in progress and will be updated as new functionality and metrics are released. \ No newline at end of file +Please note, this topic guide is a work in progress and we welcome any feedback. \ No newline at end of file From 2e3f18091c6faa4386c8dd4245ff5e25b5902113 Mon Sep 17 00:00:00 2001 From: zslade Date: Tue, 2 Apr 2024 11:10:56 +0100 Subject: [PATCH 36/46] tweak --- docs/topic_guides/evaluation/clusters/graph_metrics.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/topic_guides/evaluation/clusters/graph_metrics.md b/docs/topic_guides/evaluation/clusters/graph_metrics.md index f3f1993d67..50f8929b9d 100644 --- a/docs/topic_guides/evaluation/clusters/graph_metrics.md +++ b/docs/topic_guides/evaluation/clusters/graph_metrics.md @@ -8,7 +8,7 @@ For data linking with Splink, it is useful to sort graph metrics into three cate * [Edge metrics](#link-edge-metrics) * [Cluster metrics](#fontawesome-solid-circle-nodes-cluster-metrics) -Each of these are defined below together with examples and explanations of how they can be applied to linked data to evaluate cluster quality. The examples given are of all metrics currently available in Splink. +Each of these are defined below together with examples and explanations of how they can be applied to linked data to evaluate cluster quality. The examples cover all metrics currently available in Splink. !!! note @@ -31,7 +31,7 @@ For example, in the cluster below A has a node degree of 1, whereas D has a node High node degree is generally considered good as it means there are many edges in support of records in a cluster being linked. Nodes with low node degree could indicate links being missed (false negatives). -However, erroneous links (false positives) could also be the reason for high node degree, so it can be useful to validate the edges of highly connected nodes. +However, erroneous links (false positives) could also be the reason for _high_ node degree, so it can be useful to validate the edges of highly connected nodes. It is important to consider [cluster size]() when looking at node degree. By definition, larger clusters contain more nodes to form links between, allowing nodes within them to attain higher degrees compared to those in smaller clusters. Consequently, low node degree within larger clusters can carry greater significance. From 858a16e05ede9ff2c930eeae51e3852cd8ce9a80 Mon Sep 17 00:00:00 2001 From: zslade Date: Tue, 2 Apr 2024 13:19:17 +0100 Subject: [PATCH 37/46] resolving comments and more tweaks --- .../evaluation/clusters/graph_metrics.md | 34 ++++++++----------- .../evaluation/clusters/overview.md | 2 +- 2 files changed, 16 insertions(+), 20 deletions(-) diff --git a/docs/topic_guides/evaluation/clusters/graph_metrics.md b/docs/topic_guides/evaluation/clusters/graph_metrics.md index 50f8929b9d..7cb29a26ed 100644 --- a/docs/topic_guides/evaluation/clusters/graph_metrics.md +++ b/docs/topic_guides/evaluation/clusters/graph_metrics.md @@ -1,6 +1,6 @@ # Graph metrics -Graph metrics quantify the characteristics of a graph. A simple example of a graph metric is [cluster size](), which is the number of nodes within a cluster. +Graph metrics quantify the characteristics of a graph. A simple example of a graph metric is [cluster size](#cluster-size), which is the number of nodes within a cluster. For data linking with Splink, it is useful to sort graph metrics into three categories: @@ -29,13 +29,13 @@ For example, in the cluster below A has a node degree of 1, whereas D has a node ![Basic Graph - Records](../../../img/clusters/basic_graph_records.drawio.png){:width="80%"} -High node degree is generally considered good as it means there are many edges in support of records in a cluster being linked. Nodes with low node degree could indicate links being missed (false negatives). +High node degree is generally considered good as it means there are many edges in support of records in a cluster being linked. Nodes with low node degree could indicate links being missed (false negatives) or be the result of a small number of false links (false positives). However, erroneous links (false positives) could also be the reason for _high_ node degree, so it can be useful to validate the edges of highly connected nodes. -It is important to consider [cluster size]() when looking at node degree. By definition, larger clusters contain more nodes to form links between, allowing nodes within them to attain higher degrees compared to those in smaller clusters. Consequently, low node degree within larger clusters can carry greater significance. +It is important to consider [cluster size](#cluster-size) when looking at node degree. By definition, larger clusters contain more nodes to form links between, allowing nodes within them to attain higher degrees compared to those in smaller clusters. Consequently, low node degree within larger clusters can carry greater significance. -Bear in mind, that the degree of a single node in a cluster isn't necessarily representative of the overall connectedness of a cluster. This is where [cluster centralisation]() can help. +Bear in mind, that the degree of a single node in a cluster isn't necessarily representative of the overall connectedness of a cluster. This is where [cluster centralisation](#cluster-centralisation) can help. ## :link: Edge metrics @@ -47,7 +47,7 @@ An edge is classified as a 'bridge' if its removal splits a cluster into two sma [insert picture] -Bridges can be signalers of false positives in linked data, especially when joining two highly connected sub-clusters. Examining bridges can shed light on potential errors in the linking process leading to the formation of false positive links. +Bridges can be signalers of false positives in linked data, especially when joining two highly connected sub-clusters. Examining bridges can shed light on issues with the linking process leading to the formation of false positive links. ## :fontawesome-solid-circle-nodes: Cluster metrics @@ -57,11 +57,13 @@ Cluster metrics refer to the characteristics of a cluster as a whole, rather tha Cluster size refers to the number of nodes within a cluster. -When thinking about cluster size, it is important to consider the size of the biggest clusters produced and ask yourself, does this seem reasonable for the dataset being linked? For example, does it make sense that one person is appearing hundreds of times in the linked data resulting in a cluster of over 100 nodes? If the answer is no, then false positives links are probably being formed. This could be, for example, due to having blocking rules which are too loose. +[include picture] -If you don't have an intuition of what seems reasonable, then it is worth inspecting a sample of the largest clusters in Splink's [Cluster Studio Dashboard]() to validate (or invalidate) links. From there you can develop an understanding of what maximum cluster size to expect. +When thinking about cluster size, it is often useful to consider the biggest clusters produced and ask yourself if the sizes seem reasonable for the dataset being linked. For example when linking people, does it make sense that an individual is appearing hundreds of times in the linked data resulting in a cluster of over 100 nodes? If the answer is no, then false positives links are probably being formed. -There also might be a lower bound on cluster size. For example, when linking two datasets in which you know people appear least once in each, the minimum expected size of cluster will be 2. Clusters smaller than the minimum size indicate links have been missed. This could be due to blocking rules not letting through all record comparisons of true matches. +If you don't have an intuition of what seems reasonable, then it is worth inspecting a sample of the largest clusters in Splink's [Cluster Studio Dashboard](../../../charts/cluster_studio_dashboard.ipynb) to validate (or invalidate) links. From there you can develop an understanding of what maximum cluster size to expect for your linkage. Bear in mind that a large and highly dense cluster is usually less suspicious than a large low-density cluster. + +There also might be a lower bound on cluster size. For example, when linking two datasets in which you know people appear at least once in each, the minimum expected size of cluster will be 2. Clusters smaller than the minimum size indicate links have been missed. ### Cluster Density @@ -71,24 +73,18 @@ The density of a cluster is given by the number of edges it contains divided by When evaluating clusters, a high density (closer to 1) is generally considered good as it means there are many edges in support of the records in a cluster being linked. -A low density could indicate links being missed. This could happen, for example, if blocking rules are too tight or the clustering threshold is too high. -A sample of low density clusters can be inspected in Splink [Cluster Studio Dashboard]() via the option `sampling_method = "lowest_density_clusters_by_size"`. When inspecting a cluster, ask yourself the question: why aren't more links being formed between record nodes? - -Bear in mind, small clusters are more likely to achieve a higher density as fewer record comparisons are required to form the maximum edges possible (the maximum density of 1 for a cluster of size 3 can be achieved with only 3 pairwise record comparisons). - -Therefore it's important to consider a range of sizes when evaluating density to ensure you're not just focussed on very big clusters. Smaller clusters also have the advantage of being easier to assess by eye. This is why the option `sampling_method = "lowest_density_clusters_by_size"` performs stratified sampling across different cluster sizes. +A low density could indicate links being missed. A sample of low density clusters can be inspected in Splink's [Cluster Studio Dashboard](../../../charts/cluster_studio_dashboard.ipynb) via the option `sampling_method = "lowest_density_clusters_by_size"`, which performs stratified sampling across different cluster sizes. When inspecting a cluster, ask yourself the question: why aren't more links being formed between record nodes? - ### Cluster Centralisation -[Cluster centralisation]("https://en.wikipedia.org/wiki/Centrality#Degree_centrality") is defined as the deviation from maximum [node degree]() normalised with respect to the maximum possible value. In other words, cluster centralisation tells us about the concentration of edges in a cluster. Centralisation ranges from 0 to 1. +[Cluster centralisation]("https://en.wikipedia.org/wiki/Centrality#Degree_centrality") is defined as the deviation from maximum [node degree](#node-degree) normalised with respect to the maximum possible value. In other words, cluster centralisation tells us about the concentration of edges in a cluster. Centralisation ranges from 0 to 1. -A high cluster centralisation (closer to 1) indicates that a few nodes are home to significantly more connections compared to the rest of the nodes in a cluster. This can help identify clusters containing nodes with a lower number of connections (low node degree) relative to what is possible for that cluster. +[include picture] -Low centralisation suggests that edges are more evenly distributed amongst nodes in a cluster. Low centralisation can be good if all nodes within a clusters enjoy many connections. However low centralisation could also indicate that all nodes are not as highly connected as they could be. To check for this, you can look at low centralisation in conjunction with low [node degree]() or [low density](). +A high cluster centralisation (closer to 1) indicates that a few nodes are home to significantly more connections compared to the rest of the nodes in a cluster. This can help identify clusters containing nodes with a lower number of connections (low node degree) relative to what is possible for that cluster. -[maybe include a picture to help aid understanding] +Low centralisation suggests that edges are more evenly distributed amongst nodes in a cluster. This can be good if all nodes within a clusters enjoy many connections. However, low centralisation could also indicate that most nodes are not as highly connected as they could be. To check for this, look at low centralisation in conjunction with low [density](#cluster-density).
diff --git a/docs/topic_guides/evaluation/clusters/overview.md b/docs/topic_guides/evaluation/clusters/overview.md index a1d26162d0..42ae38d17e 100644 --- a/docs/topic_guides/evaluation/clusters/overview.md +++ b/docs/topic_guides/evaluation/clusters/overview.md @@ -34,6 +34,6 @@ Whatever the starting point, this topic guide is designed to help users develop ## What this topic guide contains * An introduction to the [graph metrics](./graph_metrics.md) currently available in Splink and how to apply them to linked data -* Instructions on how to compute graph metrics with Splink +* Instructions on [how to compute graph metrics](./how_to_compute_metrics.md) with Splink Please note, this topic guide is a work in progress and we welcome any feedback. \ No newline at end of file From 22bf41d2a5f80c36e50a57f7feddc9258b2a0525 Mon Sep 17 00:00:00 2001 From: zslade Date: Tue, 2 Apr 2024 14:09:51 +0100 Subject: [PATCH 38/46] update to notebook --- .../clusters/how_to_compute_metrics.ipynb | 378 ++++++++++++++++++ .../clusters/how_to_compute_metrics.md | 42 -- mkdocs.yml | 2 +- 3 files changed, 379 insertions(+), 43 deletions(-) create mode 100644 docs/topic_guides/evaluation/clusters/how_to_compute_metrics.ipynb delete mode 100644 docs/topic_guides/evaluation/clusters/how_to_compute_metrics.md diff --git a/docs/topic_guides/evaluation/clusters/how_to_compute_metrics.ipynb b/docs/topic_guides/evaluation/clusters/how_to_compute_metrics.ipynb new file mode 100644 index 0000000000..549e04bf26 --- /dev/null +++ b/docs/topic_guides/evaluation/clusters/how_to_compute_metrics.ipynb @@ -0,0 +1,378 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# How to compute graph metrics with Splink" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Introduction to the `compute_graph_metrics()` method" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "To enable users to calculate a variety of graph metrics for their linked data, Splink provides the `compute_graph_metrics()` method.\n", + "\n", + "The method is called on the `linker` like so:\n", + "\n", + "```\n", + "linker.computer_graph_metrics(df_predict, df_clustered, threshold_match_probability=0.95)\n", + "```\n", + "with arguments\n", + "\n", + " Args:\n", + " df_predict (SplinkDataFrame): The results of `linker.predict()`\n", + " df_clustered (SplinkDataFrame): The outputs of\n", + " `linker.cluster_pairwise_predictions_at_threshold()`\n", + " threshold_match_probability (float): Filter the pairwise match predictions\n", + " to include only pairwise comparisons with a match_probability at or\n", + " above this threshold.\n", + "\n", + "!!! warning\n", + "\n", + " `threshold_match_probability` should be the same as the clustering threshold passed to `cluster_pairwise_predictions_at_threshold()`. If this information is available to Splink then it will be passed automatically, otherwise the user will have to provide it themselves and take care to ensure that threshold values align.\n", + "\n", + "The method generates tables containing graph metrics (for nodes, edges and clusters), and returns a data class of [Splink dataframes](../../../SplinkDataFrame.md). The individual Splink dataframes containing node, edge and cluster metrics can be accessed as follows:\n", + "\n", + "```\n", + "compute_graph_metrics.nodes for node metrics\n", + "compute_graph_metrics.edges for edge metrics\n", + "compute_graph_metrics.clusters for cluster metrics\n", + "```\n", + "\n", + "The metrics computed by `compute_graph_metrics()` include all those mentioned in the [Graph metrics](./graph_metrics.md) chapter, namely:\n", + "\n", + "* Node degree\n", + "* 'Is bridge'\n", + "* Cluster size\n", + "* Cluster density\n", + "* Cluster centrality\n", + "\n", + "All of these metrics are calculated by default. If you are unable to install the `igraph` package required for 'is bridge', this metric won't be calculated, however all other metrics will still be generated.\n", + "\n", + "This topic guide is a work in progress and we welcome any feedback." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Full code example" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "This code snippet computes graph metrics for a simple Splink dedupe model. A pandas dataframe of cluster metrics is displayed as the final output." + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "/var/folders/nd/c3xr518x3txg5kcqp1h7zwc80000gp/T/ipykernel_13654/2355919473.py:39: SplinkDeprecated: target_rows is deprecated; use max_pairs\n", + " linker.estimate_u_using_random_sampling(target_rows=1e6)\n", + "----- Estimating u probabilities using random sampling -----\n", + "\n", + "Estimated u probabilities using random sampling\n", + "\n", + "Your model is not yet fully trained. Missing estimates for:\n", + " - first_name (no m values are trained).\n", + " - surname (no m values are trained).\n", + " - postcode_fake (no m values are trained).\n", + "\n", + "----- Starting EM training session -----\n", + "\n", + "Estimating the m probabilities of the model by blocking on:\n", + "(l.\"first_name\" = r.\"first_name\") AND (l.\"surname\" = r.\"surname\")\n", + "\n", + "Parameter estimates will be made for the following comparison(s):\n", + " - postcode_fake\n", + "\n", + "Parameter estimates cannot be made for the following comparison(s) since they are used in the blocking rules: \n", + " - first_name\n", + " - surname\n", + "\n", + "Iteration 1: Largest change in params was -0.352 in probability_two_random_records_match\n", + "Iteration 2: Largest change in params was 0.108 in the m_probability of postcode_fake, level `All other comparisons`\n", + "Iteration 3: Largest change in params was 0.019 in the m_probability of postcode_fake, level `All other comparisons`\n", + "Iteration 4: Largest change in params was 0.00276 in the m_probability of postcode_fake, level `All other comparisons`\n", + "Iteration 5: Largest change in params was 0.000388 in the m_probability of postcode_fake, level `All other comparisons`\n", + "Iteration 6: Largest change in params was 5.44e-05 in the m_probability of postcode_fake, level `All other comparisons`\n", + "\n", + "EM converged after 6 iterations\n", + "\n", + "Your model is not yet fully trained. Missing estimates for:\n", + " - first_name (no m values are trained).\n", + " - surname (no m values are trained).\n", + "\n", + "----- Starting EM training session -----\n", + "\n", + "Estimating the m probabilities of the model by blocking on:\n", + "(l.\"dob\" = r.\"dob\") AND (SUBSTR(l.\"postcode_fake\", 1, 3) = SUBSTR(r.\"postcode_fake\", 1, 3))\n", + "\n", + "Parameter estimates will be made for the following comparison(s):\n", + " - first_name\n", + " - surname\n", + "\n", + "Parameter estimates cannot be made for the following comparison(s) since they are used in the blocking rules: \n", + " - postcode_fake\n", + "\n", + "Iteration 1: Largest change in params was 0.508 in probability_two_random_records_match\n", + "Iteration 2: Largest change in params was 0.0868 in probability_two_random_records_match\n", + "Iteration 3: Largest change in params was 0.0212 in probability_two_random_records_match\n", + "Iteration 4: Largest change in params was 0.00704 in probability_two_random_records_match\n", + "Iteration 5: Largest change in params was 0.00306 in probability_two_random_records_match\n", + "Iteration 6: Largest change in params was 0.00149 in probability_two_random_records_match\n", + "Iteration 7: Largest change in params was 0.000761 in probability_two_random_records_match\n", + "Iteration 8: Largest change in params was 0.000395 in probability_two_random_records_match\n", + "Iteration 9: Largest change in params was 0.000206 in probability_two_random_records_match\n", + "Iteration 10: Largest change in params was 0.000108 in probability_two_random_records_match\n", + "Iteration 11: Largest change in params was 5.66e-05 in probability_two_random_records_match\n", + "\n", + "EM converged after 11 iterations\n", + "\n", + "Your model is fully trained. All comparisons have at least one estimate for their m and u values\n", + "Completed iteration 1, root rows count 316\n", + "Completed iteration 2, root rows count 63\n", + "Completed iteration 3, root rows count 12\n", + "Completed iteration 4, root rows count 0\n" + ] + }, + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
cluster_idn_nodesn_edgesdensitycluster_centralisation
0Q98761652-158.00.8000000.333333
1Q10307857-11135.00.6363640.200000
2Q18910925-120172.00.9052630.105263
3Q13530025-11132.00.5818180.266667
4Q15966633-1133.01.0000000.000000
..................
21530Q5006750-710.0NaNNaN
21531Q5166888-1310.0NaNNaN
21532Q5546247-810.0NaNNaN
21533Q6698372-510.0NaNNaN
21534Q7794499-610.0NaNNaN
\n", + "

21535 rows Γ— 5 columns

\n", + "
" + ], + "text/plain": [ + " cluster_id n_nodes n_edges density cluster_centralisation\n", + "0 Q98761652-1 5 8.0 0.800000 0.333333\n", + "1 Q10307857-1 11 35.0 0.636364 0.200000\n", + "2 Q18910925-1 20 172.0 0.905263 0.105263\n", + "3 Q13530025-1 11 32.0 0.581818 0.266667\n", + "4 Q15966633-11 3 3.0 1.000000 0.000000\n", + "... ... ... ... ... ...\n", + "21530 Q5006750-7 1 0.0 NaN NaN\n", + "21531 Q5166888-13 1 0.0 NaN NaN\n", + "21532 Q5546247-8 1 0.0 NaN NaN\n", + "21533 Q6698372-5 1 0.0 NaN NaN\n", + "21534 Q7794499-6 1 0.0 NaN NaN\n", + "\n", + "[21535 rows x 5 columns]" + ] + }, + "execution_count": 1, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "import splink.duckdb.comparison_library as cl\n", + "from splink.datasets import splink_datasets\n", + "from splink.duckdb.blocking_rule_library import block_on\n", + "from splink.duckdb.linker import DuckDBLinker\n", + "\n", + "import ssl\n", + "\n", + "ssl._create_default_https_context = ssl._create_unverified_context\n", + "\n", + "df = splink_datasets.historical_50k\n", + "\n", + "settings_dict = {\n", + " \"link_type\": \"dedupe_only\",\n", + " \"blocking_rules_to_generate_predictions\": [\n", + " block_on([\"postcode_fake\", \"first_name\"]),\n", + " block_on([\"first_name\", \"surname\"]),\n", + " block_on([\"dob\", \"substr(postcode_fake,1,2)\"]),\n", + " block_on([\"postcode_fake\", \"substr(dob,1,3)\"]),\n", + " block_on([\"postcode_fake\", \"substr(dob,4,5)\"]),\n", + " ],\n", + " \"comparisons\": [\n", + " cl.exact_match(\n", + " \"first_name\",\n", + " term_frequency_adjustments=True,\n", + " ),\n", + " cl.jaro_winkler_at_thresholds(\n", + " \"surname\", distance_threshold_or_thresholds=[0.9, 0.8]\n", + " ),\n", + " cl.levenshtein_at_thresholds(\n", + " \"postcode_fake\", distance_threshold_or_thresholds=[1, 2]\n", + " ),\n", + " ],\n", + " \"retain_intermediate_calculation_columns\": True,\n", + "}\n", + "\n", + "\n", + "linker = DuckDBLinker(df, settings_dict)\n", + "\n", + "linker.estimate_u_using_random_sampling(target_rows=1e6)\n", + "\n", + "linker.estimate_parameters_using_expectation_maximisation(\n", + " block_on([\"first_name\", \"surname\"])\n", + ")\n", + "\n", + "linker.estimate_parameters_using_expectation_maximisation(\n", + " block_on([\"dob\", \"substr(postcode_fake, 1,3)\"])\n", + ")\n", + "\n", + "df_predict = linker.predict()\n", + "df_clustered = linker.cluster_pairwise_predictions_at_threshold(df_predict, 0.95)\n", + "\n", + "graph_metrics = linker.compute_graph_metrics(df_predict, df_clustered)\n", + "\n", + "graph_metrics.clusters.as_pandas_dataframe()" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "splink-bxsLLt4m", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.9.6" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/docs/topic_guides/evaluation/clusters/how_to_compute_metrics.md b/docs/topic_guides/evaluation/clusters/how_to_compute_metrics.md deleted file mode 100644 index ac0b974209..0000000000 --- a/docs/topic_guides/evaluation/clusters/how_to_compute_metrics.md +++ /dev/null @@ -1,42 +0,0 @@ -# How to compute graph metrics with Splink - -To enable users to calculate a variety of graph metrics for their linked data, Splink provides the `compute_graph_metrics()` method. - -The method is called on the `linker` like so: - -``` -linker.computer_graph_metrics(df_predict, df_clustered, threshold_match_probability=0.95) -``` -with arguments - - Args: - df_predict (SplinkDataFrame): The results of `linker.predict()` - df_clustered (SplinkDataFrame): The outputs of - `linker.cluster_pairwise_predictions_at_threshold()` - threshold_match_probability (float): Filter the pairwise match predictions - to include only pairwise comparisons with a match_probability at or - above this threshold. - -!!! warning - - `threshold_match_probability` should be the same as the clustering threshold passed to `cluster_pairwise_predictions_at_threshold()`. If this information is available to Splink then it will be passed automatically, otherwise the user will have to provide it themselves and take care to ensure that threshold values align. - -The method generates tables containing graph metrics (for nodes, edges and clusters), and returns a data class of [Splink dataframes](../../../SplinkDataFrame.md). The individual Splink dataframes containing node, edge and cluster metrics can be accessed as follows: - -``` -compute_graph_metrics.nodes for node metrics -compute_graph_metrics.edges for edge metrics -compute_graph_metrics.clusters for cluster metrics -``` - -The metrics computed by `compute_graph_metrics()` include all those mentioned in the [Graph metrics]() chapter, namely: - -* Node degree -* 'Is bridge' -* Cluster size -* Cluster density -* Cluster centrality - -All of these metrics are calculated by default. If you are unable to install the `igraph` package required for 'is bridge', this metric won't be calculated, however all other metrics will still be generated. - -This topic guide is a work in progress. Please check back for more detailed examples of how `compute_graph_metrics()` can be used to evaluate linked data. diff --git a/mkdocs.yml b/mkdocs.yml index 6bdb8750d3..912a44e5a2 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -172,7 +172,7 @@ nav: - Clusters: - Overview: "topic_guides/evaluation/clusters/overview.md" - Graph metrics: "topic_guides/evaluation/clusters/graph_metrics.md" - - How to compute graph metrics: "topic_guides/evaluation/clusters/how_to_compute_metrics.md" + - How to compute graph metrics: "topic_guides/evaluation/clusters/how_to_compute_metrics.ipynb" - Performance: - Run times, performance and linking large data: "topic_guides/performance/drivers_of_performance.md" - Spark Performance: From 01b60b62146bf1284391aefbb214fb82cdb85ebe Mon Sep 17 00:00:00 2001 From: zslade Date: Tue, 2 Apr 2024 14:10:12 +0100 Subject: [PATCH 39/46] update and fix links --- docs/topic_guides/evaluation/clusters/graph_metrics.md | 10 +++++----- docs/topic_guides/evaluation/clusters/overview.md | 2 +- 2 files changed, 6 insertions(+), 6 deletions(-) diff --git a/docs/topic_guides/evaluation/clusters/graph_metrics.md b/docs/topic_guides/evaluation/clusters/graph_metrics.md index 7cb29a26ed..3e38aa1511 100644 --- a/docs/topic_guides/evaluation/clusters/graph_metrics.md +++ b/docs/topic_guides/evaluation/clusters/graph_metrics.md @@ -4,9 +4,9 @@ Graph metrics quantify the characteristics of a graph. A simple example of a gra For data linking with Splink, it is useful to sort graph metrics into three categories: -* [Node metrics](#purple_circle-node-metrics) -* [Edge metrics](#link-edge-metrics) -* [Cluster metrics](#fontawesome-solid-circle-nodes-cluster-metrics) +* [Node metrics](#node-metrics) +* [Edge metrics](#edge-metrics) +* [Cluster metrics](#cluster-metrics) Each of these are defined below together with examples and explanations of how they can be applied to linked data to evaluate cluster quality. The examples cover all metrics currently available in Splink. @@ -78,7 +78,7 @@ A low density could indicate links being missed. A sample of low density cluster ### Cluster Centralisation -[Cluster centralisation]("https://en.wikipedia.org/wiki/Centrality#Degree_centrality") is defined as the deviation from maximum [node degree](#node-degree) normalised with respect to the maximum possible value. In other words, cluster centralisation tells us about the concentration of edges in a cluster. Centralisation ranges from 0 to 1. +[Cluster centralisation](https://en.wikipedia.org/wiki/Centrality#Degree_centrality) is defined as the deviation from maximum [node degree](#node-degree) normalised with respect to the maximum possible value. In other words, cluster centralisation tells us about the concentration of edges in a cluster. Centralisation ranges from 0 to 1. [include picture] @@ -88,6 +88,6 @@ Low centralisation suggests that edges are more evenly distributed amongst nodes
-A guide on [how to compute all the graph metrics mentioned above with Splink]() is given in the next chapter. +A guide on [how to compute graph metrics](./how_to_compute_metrics.ipynb) mentioned above with Splink is given in the next chapter. Please note, this topic guide is a work in progress and we welcome any feedback. diff --git a/docs/topic_guides/evaluation/clusters/overview.md b/docs/topic_guides/evaluation/clusters/overview.md index 42ae38d17e..650d8652ef 100644 --- a/docs/topic_guides/evaluation/clusters/overview.md +++ b/docs/topic_guides/evaluation/clusters/overview.md @@ -34,6 +34,6 @@ Whatever the starting point, this topic guide is designed to help users develop ## What this topic guide contains * An introduction to the [graph metrics](./graph_metrics.md) currently available in Splink and how to apply them to linked data -* Instructions on [how to compute graph metrics](./how_to_compute_metrics.md) with Splink +* Instructions on [how to compute graph metrics](./how_to_compute_metrics.ipynb) with Splink Please note, this topic guide is a work in progress and we welcome any feedback. \ No newline at end of file From 39d0f343ee8cb43c7737d57da5c890890bc64757 Mon Sep 17 00:00:00 2001 From: zslade Date: Tue, 2 Apr 2024 14:10:20 +0100 Subject: [PATCH 40/46] spellcheck --- scripts/pyspelling/custom_dictionary.txt | 2 ++ 1 file changed, 2 insertions(+) diff --git a/scripts/pyspelling/custom_dictionary.txt b/scripts/pyspelling/custom_dictionary.txt index 9e2b30d3c4..f320c0506d 100644 --- a/scripts/pyspelling/custom_dictionary.txt +++ b/scripts/pyspelling/custom_dictionary.txt @@ -135,6 +135,7 @@ comparator comparators conda config +connectedness csv customisable customizations @@ -207,6 +208,7 @@ runtime scalable schemas sdt +signalers subclassed subdistricts substring From fd77b6e0c7edbfd3ce50fae59ee7435a5687692c Mon Sep 17 00:00:00 2001 From: Ross Kennedy Date: Tue, 2 Apr 2024 17:46:32 +0100 Subject: [PATCH 41/46] add more graphic metric visuals --- docs/img/clusters/cluster_density.drawio.png | Bin 0 -> 68182 bytes docs/img/clusters/cluster_size.drawio.png | Bin 0 -> 23892 bytes docs/img/clusters/is_bridge.drawio.png | Bin 0 -> 50851 bytes .../evaluation/clusters/graph_metrics.md | 57 +++++++++++++++--- is_bridge.drawio.png | Bin 0 -> 50851 bytes 5 files changed, 49 insertions(+), 8 deletions(-) create mode 100644 docs/img/clusters/cluster_density.drawio.png create mode 100644 docs/img/clusters/cluster_size.drawio.png create mode 100644 docs/img/clusters/is_bridge.drawio.png create mode 100644 is_bridge.drawio.png diff --git a/docs/img/clusters/cluster_density.drawio.png b/docs/img/clusters/cluster_density.drawio.png new file mode 100644 index 0000000000000000000000000000000000000000..73b170528163075a372c8fb2f12bce8aa433c8e6 GIT binary patch literal 68182 zcmeEv1wd5W+BOY>64G5FC0#@JfJlRa(mfz03?su_uD04CLOMbW42(m^)m8K{ zFt8ahFfcU;@IZ@iqVOm1g6XNRri4-PoM9FN!=4YWY7BSrwRd#3!{886{_>7PSkT4I z*3sV4&Xz+&g+o|aSWsA$PZ0bQHWU)$5K$CE{ty-A7leqO0bhgTVQ!xAFUMNByYt&R zS~+-Fxp?xs+QGkkZ{=X;iaJ2c?YyJ2vlWLhRE%E`oXVwTW#i}yck_gCh$w;2$6eue z&fpa^gP%GE;D<5zFC=IoBxWJO4c;ocyF1$%*;$`(1XqWOiAeB^h=XRX6Y7RqnjFH) z;JvfsSv&BeW@mHO4S9+(%)`wEv_OS~h4}@Me?g;xmA#dR<5!nJ-iaOD%HhkM!0g0@ z9o4R4cyP&?n|?So123(=uwzM zM3qBO5nLJhPed6tH{gRKYIdC=Fk>wx^|KB(sv0&1MuINZ_OLHkbF~6v{=5RZU;_C* z_wn`B(ARYFRr9cNhiSRlex5m7-!CUYge1RQ$JXylgM<*|OM`<4x^vXHkqrip=TW^0 zfw3az!okbY*3R?Gx#$~ob906}x_{ef*BYtk4Ie?!*&j zW$Whi<^0c0|2&Q8$@Fk@1DE@1&Axo`^-m;FlK=OaN z9E6Wg9cFEC<~01ch^?)IhO3H%5T6LD6HmAwdUow>!DOR8y1Kc7Ial&<^K!NQjCvrd zN^Tx-n45!}tCh2+o0~hBJR#8NWCw@)p+v>X3+@J*U~m^_)VH2)ULH1fx^^CpE_QG` z4+D23wvl)O8bis>)m9NH72qSPKhOdN$^bMV`-59~IM~7ehm$?w9&Tst&|)A04*#;$ z?pFVpHS|4x9t5~3IKt+%s`CkJUp)smi8Ci`UHBZ&DH-qq*+z=@m;Qd`IMFZ1At&m~ zQJ*{bb=g3R5x}G!P&sf9KhTqhowF6((fjMG2n$-F(BbgQao_5SuO;rE-y>yE*9|C6 zaCsEo#U)U5ebP)Jl#80bh5ArVm&I?x@X7VFDJkQ4H+rT6P`{AdJ%B44Hq zJpu_)eql+7n4pBHxUe`xTnzO!dIYEy|My2A0+c-Tt115VH9;Ny8M6QC$bUli58;H? z_kY6)zzGGre+efP?0yzFiAtihFiMz((4W!L`#a$0I|K_$2=R+a3JMBKNC*l-&^jF5 z|IdpalqCF(o`1s_iTz~82=Xm@gwdbT()*uBkBEdMzvS0{zO_Z)58$VTkI3Kf`G+ru z^3ndK@%hQ%BPNO(Fv|Fd3ZfSMcZ3fFC~0d9wK;{UfH2I_QCq=ybfMG)<@h@n3F{3C!CMK!^`3Z2ny zXz}_wDETdc!T-s{{^6pbQPYDMO^)?t*fgc;80Yp1?>$T&ZC3wuT9vWg3tSo9S|1$fxY|^I{XT8 z_gfi>&+6*e8KEzX7n1Axuk(4LlHdA(AP@7!wxf;o?+`)>qPjrg3|C2DfM#>@YVt3NDWgvC(K4O+SX{&*3A ze3w^*{8{jV@~{5#c=?}k@Efw^pFOKTEM7z;zQc>~cRA?)yb~=Z_$^*Ug-}EI>FtXU zsIx+ni+=;=haov?+5feL`5a{ZVIc$g&cyr|65m3CXd(t3c>eU6?;q+^wCDN9;M#uQ zFjNBat>zW_8B!5HTs1WB@>f_jNkNp3M+5O^$VL2k)llO1S6DUi@3HnXBqM&jYJVZ3 zPW(GV`}1Zagi*ouUjb_p-$xWbExjuIcl`S$iS%Pl^qUEzKXwEx@tv0WFEFeSbgkW= zj$sx4I|BYS$e>)Ve;omT#?~J;0u~oSc_IHjl2rm79ihAY(~zv_O<8}FMn58r(4N!3 zhRh$EG?GA9C8Oo{_vcw5=pYAeW&Sigt1v3T@|TCq4-OZWF5kV6xDCrgc z{Q)B?@@;jG2%0tc=@S#eDC+%h!2D>x>kk0)LtuRl2miQ$`L0p~JQLveE{7BS4x=Js zC^`G-0fW{A-w-z9&S2-ek~N30xC8PHc>Ua9@Ai3@@jtL7e$ETIM_vi}1)t~V#|9uD za3BP62Nh_@(@@`|FQj7U>gfpgL+&s}ZpsGFasaz`kw4I%(LI6YFP(im0MPu^puS5$ zf}In{v3%KXf!>IL+Fx$v?C5}eOoI*BSZ@b5XneC-19i+-dpnT%5J&L%0Y%hlE{?Xg z$ldZk+!ONcUimMQ`1>YQL`1&Zpo3n-uj=}Q&?^1w%^m3D&{F$f-Q4j*ZUw!S_5X|b zf!OnF=@aA!4PPYvZ}~%}x&SJ_Eq6xMm4ZFU{NGk2|Kd>mgc67<1NlY*zf&#${J;uZ zD^J+x=VKrrh4Gc`vN^9uN)-#uXCi_-XaE8c$yDD(pf z{$&{XB?JAxF!En5F`BCQTF0Q<{ySBp65ko-Uv^BPRos8Rst592RkbksGrG$!Q;&Kk z2U3!NHu;U=2RQPp{RO|iIRB@ec0ZII5fst=XMqBWyZe_=_=1^VpY0#&!#@-h#L;dr znt1vB3qrroTMDDW|MNouW#9fRPyjjzsfqq7P=KH`85#=T%jJLGDo6;TsJdTR9MqKm z{7^t;y8bLs_^npq&*316OMV9hw8Mm!%ijSC=%)*!q40g!`SUtRsJQ#j0tJ+4{7b9w z=YWEQ*tfKa$bZ2qpjj9+6p(Ht(h#H3_wzylW$ykgQ24D@;mZR4?6HD`(05h=U8jSV z%iqB&h>O|7ry8#I?2Y_%G1-)=35f!TSoa?LYqTa1PT=d ze||_S{Z=CBXO~||H0OfSt3v1sfZtDkA%fpRMi@K+`#+<~(bNKXCax7SH-f$gcjV(x zJ&~RKmgV|#74)_K>6x$)wBPsrjQv(}>R(+SUHpo+bm-~%_pbk$)cP_JzvZS;9M~^U zG1?am+X^Ww~6tvnzKXM+e zr2q4dlBnqSh6eQ+-R1wJs-__8Fp=syuXYAiI1n9B;lOYrP&UF<=fByHb|Fany zRDALc^nWiy^S5#NS}XiR5`S?setFWqCmW5FYf61glGi}3>J*zDvE}_=JRF;?rOhSH-9&Km{z!tgse-Z zknLg|UT=s}r)jU{Q_ZVgniYO=U8Y>Bw7eo0PMezAK2LkSzrQ#nm$kk5?!&?DOt+iU z^(5;fo@?gY!_yTX${PpH8(oz4S$xAv1q~sy#G%3b`bATT<$yHfQPwAI!dM9y?9Bv% zzr2Jfgk%v1k$!tT_|$BO^J-R*(wr&tpRV?5cFf@ieW-#98H`W0(0Ixj9X?e8qr9CmbxsWDsFs<^dvV6=8wQRuy?3^;*vM1o1iI2cF_-Ed6%cdTF>QXo ziS=c6Lzl=b=^5p`131B);v{28Gq%_~G>VE*7{%;!{FrnWzNqn>^ux}F70uZsy;>_As8x8Ah%p2U;z7=~2AJ(zhZnf65*6FWUuG^4x$Hq-JMmOV?Z!ZAeTn1b znW)6a*h0uvBTWQ@PI_cEEhd7s#@?(Qz3V^5b7;^fq3GuJ$1AnhFXATm+*0JzqR+v% zM~Xr7GjP23$HkRF8q7Onmc_~oyXO+T_jf*$+;Nh>I-xx!VKyCbnnZ7Y>3W)!ms8*L z#DULi<&KTzTrhkq{yrYG`8C0of?xJKqqjWrLydsg1%wASP$n`5!=zWl^w$v3K8oKK7BbRp$0#DLV2 zejzQ(T9y{m2YDa&)?co^xSGYx0K+t$BqLzmlLePl5dotp@LTs=NC-ItMv*rV1ks=u z%+)lx((j!`-{^a7+wWBqQ8`kmOQM?FZ^Xl-hF~vhBR^T;)yHG0MRYmi^0pGZiUPt_tN7tf?8zHbODZX6rRRgMoBNLA zZmkWuMY7%VUdU%tjoUDsYua8cNafVgN93!{Cd=;;oc8nEr+2%r*jKDWvKiDAO6Jas zRod@yEAiC&fz)W_^M|k*DQ?`$XwNWv5R}DlexEDo3ac`n#sxN2^-_F)cdB`>3v=Wb zSK=(5YDZ7d7v`Vm8%oTRReR=!MKd=@G%cr6#ePLe!thPP{XK9C1rOt?<5__kI!+?D zk?W`(y%SPyP)z*nnnZV}-I8%aNC6pa`8mxXePw=PQo$>-n~aG!>C;B*+?ZvTnz79< z0~O|4YFaP+X$BEaNX^f%;z8CpQe_i;SK#bGd76mB^OO#PH9QN?ane$~0;n}jfcTgQ zoAKt#6M-8+yIT`0s_|5MH_zO;Khkh+63Sy#k-1?Oj(2D#UZ817QuWA|WzVZu#!fFv z(`kaY*A@)#`z~8vE_mZL59g}mnB!eF-F*z79SrQ^rUQsSBU%>gab7i5DG$QvIsLqd z*Qg?R?Ls*=WyG7lqt=jN-ESm0fXm^z870hJ$ zjF5S{3^$}}q;ZjnLQBm$wWFP+^Y6;Nmzd8@+9lyFo64>5yAg1DIvb0N$T=l0aspoC zGIU8hM^4Lc#`XluL|&Vb2^a@Wib+H%y$5DmTl`dXim z>l?_KI@hd?z&xeM5^#TtPR}n3VJ#|H4sS8`I|9ArGk;yOtVf#h)@|m1rO+!gJ$cF# z?R(ww`xK)u%Z!qj4l~7-jKLWo?jb>&Q`C9h={zU7+(zyz3bY(7W zLiHD27deQ`ngXuk5i_XE@2--wJ}fi5p})E@ncNr69Hf3jzv%FKxp7t4qhw}*JrakR zdmow#B8jN=yVAI4(~ZpY0DceaH}5=)>>oiCsGYUxjA;k2v#C0Ad85JmSrZcz?c-Df zy7#7tp?cc* zc#rV0JkF9WG~=@t!e@JH9Q%$g2L~2nlU)QdADh!=)q;O9^`m+C8_A{wvb#i~OuN^6 zcxn+lwi}AgIenp*Uz^q+Htx1OAz*r!#oXD#e?crm*1z_p-Z)${Tjn%`;aP^*z4vqw z8ufY}liG4BRfGl3P~sp_;ow}ylJ^EqGbStJR$^RbZBBAq zu=^69Nr^%z)^)Jry^UdJoP^`DSxNi5AA(og$#q~k@n-kDi-5j*Mb<4N&6xD+s8`xf z?i1g^8Z|~bsE%zHRZ_9ewuqN2n`xl|W(ONTe@jbGuvsJB7WP2dCHbH@bs#Ozk=76*+l*OBNK(v015 z+sB^e$Xo2pjkSccy*LTwVG3N;J9FDU(Ouj;@YBg>WRAlPV#9aOg+~!HmO3`rK0bV| zH;L zVO0*xCL&I7V3mXi?(eQYq;XHkrDvd?!JrkfKWJrUuOfvd$#`Y-rBrD8$gJ^$AlXmz zuMJc}ShxhE7qzXJJRAE$=^~yD5%(;UPtWvD(SuH1G%g+Eepq-yQL@@0zn6nhchkcZ zo3#9@>!{GB+Qn}2`Ke15M%80vbmClb(xXiB+BHFe6O}~NJ0Gym+rI7AN#h_QE5S9m zuDQbz92Go8T_GgYJAS+%-u*#>_eKGC`vFp=_riv5o7kP?Q3)7BGbyYe{oC#_T00H_ zJO3!(X#%#T%<6`(jK@Ui=P9*wQ}g=mQ#a3?^dwH>az`5XfrpZal-8wm( zy|C#~2f(1_%!@Q`gQMrZ{F*2<=vM`yP#4CCseOvw#1%7-rATFtI@|PRjbgdlJf4$msXp=Yo_NnG^jI z?sV(N{q+(0iNYoU37$rc>|2kohIvW3@8&9o6Kvi*HpkX1K6t;;ir$5(mMpnko&swe zmpEo`fpgr`nSrpyKAeoFRXI;LGf6v3Ds~H>)JHuUbDe!IWQ(w~-gA0pBv@Xa^c<8h zRYDD8_lj;MCvH+S#f~EmkF` zZ(qO5$54#TZ6mxuWW!P&Mo04c>40`IdyiD0BxCdP(s&!&?knyI!eVT+8=T<(xRb#% z%}MsENGUrr#7q4$zgfP+)z@XtJAF~%tXYR3OetOIvCZt}<;GzJ7dl322!V(IrzoC; zTjz#uURZAPrw(&UjY4H{e3h8m$F7ej33}v z%xp;c#IW22Q$#BFl@{3>T<27G~3*HOt3okhA7C6mq8B| zUCQ^73g8}S3iQ_tfz|Yd>IOFjY}SxmvU_srKm(M@AH*Uq#Cx659r3-u;S!6}ds^e# zMm$#t7vP{CgkcDfM812@Ar&ZP)ki5!LHKeOL+tEmQ{ck|F6YI*;$-#Ptsm>oe{x#( z>^C19Q~N;B+@c^w?nJ!GUCm|M0Riq01@16uN^l6y5z}I2`O6ODYX&;99O~*U7crOT zUX;0;2rBboaW5KA+T9+Yc zjIpF5!+ZG7$8P!l(p}p)>GV-}=0`hOm8!OD`c4swW-KEN35?R-zGkmKy?+^Lbd%ak z`eIJ5>g|jM%ji8bteLa^t07`K8;Z*l`;zpygk1I7qD!gcP8Nh+skEZFIBpG1FjFbE zm43}6%xo`s$frlr^@zKY`+wn2Ar!rgT^4>;4b18;q2%7tbiHWgXeU8ZjL4tOiPG6>v_+>XNhr8}c>Rm!EQBzV zfxEC3L&o0fA+@uMYdP%hoB)?1^}`yvd@QVEnu7tt889H(I~x+}vbIJc<;`!gG{E1= z>-i<>f;?X)c-XRpN7h=b9*sEh#M=het@kmjQMaz#eNmZ0I$x`#?uOrPw>-Jm2kjbf zm$0C~ts?>5tppGZr1Ew?Y@XCJi6P(*8D0h2__}U}s3XC1xxMt>{9&5GvcfOkc_L=V zqMkb!&NUgBZrD*`2*t&e@vE)3as8GbQaiSU;bJ1%G0F7(ZEclDLez1y+knq|)+S)> zwL|`RPLQ1F==OLtGl9VR7IhHAkne-5Sm{q?FF@~n(Cvec9E$Mh z+A@G4*p^6+nqCm$L}W^;?7w>^<-}O-+z)Jm)ZLqE+LAD}3)jP_33L=)$jya&o?s-t z2&3%p(R83^JZ73_G400{q2BSJzbudBUAgqq;Y0#q*EjL;VwA&8A7d@UUxSw9z)^wb7+&f4uifej@P)$c=x|>|gNoCco>WZDhib{e) z1mFUQnH1jMnWUf)&70Yn=k^tPa!tZ*345-h{Np2EDIe7=XhlU8B0E~CWPHrN&`P!- zz+jbijH_5VoqP={FjAZLqN}eZu1Oj%mrsNa zB_^EFeGG2H8ivK%N_9&P*@`l+xqN*Ypv{MI3?;4V2BF(nbw-uuqTchy7kK%lrQTQK zCUPdn98&>5hk8YhKT2@W;HA14}C@i)zh!+G?ms-p600gCVcJ@9FIM zoB846M4Vo0lU&b`q5Xp4wA?Md(07ztX+1@kBFWroumr>y)Eorp)y$X{R}}qRBeg;r z&4GV6_89+lXmJPvzZ?QP6?;AX6wwe2n+)K*!L8%ra$ZX4trcBFWQa{Bu|)PD0+tjmZeX1Et^Alrw?U@dC#fz zmr?6=XX72}Y7@j7VAb7DT!xR>iQ$;kxl-y8Qrr(S8Z}t**EPy#)+{@9vm9GG;_54C zG{&XM6Pvo77^Hp#(2!0t?P#7m4A~?JmG{Th855$J&Z~>|9%yGP>cW1iDTX(%viL@2 zd0FMQ19{_4H7527X+nq>L5W8`Q+eWm1%8I+Dk}zga&bs5cDY_8A!S4S%TXyfdrqti zJc1_QNPqfy02c*oLVz^;7(sE{@j%vrnv7)`_P3Gsfd2WZ!53>TA`%u~j=63(%`SlPva@;qTF>%Tj zP0`J@q5Dl|?l?u@!L%uB5oC**G89;We!~r9#EiIy*CZ z`7(Yny+!O{*sVY>9t=J6>1f~7`*rSOFU`^iT*2?O@FP4y+*psCx5%520K2ig2uM`=#nNN8a<9R(hG4+H>r;s`G0E9U^ajo~dR@uB% zg8tL_9jo9#{Uv?GJs-ENxw1-?N#>5uBt{hyS(S*z>+fGmLtRd#kzruHANY`~jop6- zA^lP`&!AY_=~-uf;7ic+22Q7P-W8hscxvVQR|OmZwI_?!CCzBVd8|<2a-pvGOH^nr zbRno@HUD&K}gfqbAPxIZW^KOW^AbupE9f+-^qk3ABUb%$AN!J z=pgaSLD*{h`3);JUA9US2O4u3QY1Uc^#Q&*FA;gl_4)Pr5kv04i|2j(4NmXQ8XgnqR}sv>vZ+l z;$#^OCNAj=!j15zHT}(I?T8Ajp&1rIlg5k^4?8wdw>NbHG4vaU0im}P8M30?fp7Mx z@49S#+HgyCZ=d- zIZWN(Q|a{O9gd0;Ltq63L zVVScR`eDc?C5cH^<3^u-Zuw~smAXW{#~tasPx)c1%$&*8UdT0D;?5ZCw!J3qG6Nhk z>a>e((IiIsA=|67eQXm|0h=C&?jP04@O4=%I@q9H zQl+(hx!}E>u*1HKZ|YZjPYArONSSmNRhJWYJ#yw&$jrlbvCP`=$Ap4d#7YW&#o=eA z`U5lGJ-w=2d4DDD`WSFSO3qC_>f+h~5p>_vt0H27P6OrKgirgPWYcuPG*oheH2U+f z5!c)%jP*I58B;@>1+Yx}^7cTiy#r3fWxP=9q?%NPxb?)K`0ydRG|S@CfKFvOhJe%r&@j&XLw*sYBR1L$& z`qhZY5*_=lANYj0vA#YUT&#Ijn7ckR6dV8WoK&HL1%wGuQu=D6eZ{fW_EUJFbMpje z4=e#=2qj%=z^t&fq5gkg&Ifu;QR+j<$v z5+%+}e5_=7Rtl0Gipi1Ybe3sAmKYzlx9fsHDXdB24UwMu>zdhM|8zE0!g1gJ8wQ+j zSQ8(R;AY1#gz|9&2TgWNlhIYFyhnwKH=WBt2q;svWwuJCd2} z4rXXw6F<;l^Pu9RPo;Tdi9>hxrP(nrwjQCdOJp9iy}|o+M!eZsWCVnvu?Ex#IIau) zIh<&AV$&w#!MT1<)o|yb+C;#?90&LoMUpyJ0-Gojyf0rhz66jVR>|aZiCu#;`b9Cd zu3EX`0w&`9cFUhYu0TNrjQW;=7oB#l=Vecx+m177vcZprwJ^av9*rwy_8r5tgNtd9 zsfZ5Nx|AJSANCnVh;+-4uPphF3mBH9+$F&sK9A4xwx#th=ED!}G6NN6Wy6fx?MD-C z1dg(KM#WPc9}9B;3L;7&OX5*qJ@1^TM`WVhY6s0=gIAfbVbv+;*=oBd?GB{Q3>gJc zyj)8o_x*0?DhcT2tK4mi@XGNJKNHLqakRp*FX|3|Q!H7GHx9!je3el!3*y&hCFyh?&>^6i0KC ziRBEL?lAQI`qawQ6T>QNRi{_CSiRmxWp>F`YrKLTzZ&0UXc2t#*r{?F_FO5fBoC>h zS|*x(R7a{94J__3%&Za8y>@Hbd{mayc_H%Sc+Ctx8yyh%-hmpY7d*f$ceITFCVE5h zExoc+e@ULCh!qLVcw*-|nBO{a4G@GyPU_J`4xfr~K4@G{gAjN?x@%(V_OM?DpWX5Z zNcD{38n{IhbOO3k^<=3bD~0*(lK7CD;u$27{egnUA{|KI9($(t1I-DO_35fZGx{KZ z*rUj|`#gB8856-pVi8Tk-1L4>#&3mS8k>yt^pNFH7?x(a4sBZzl){X?=(zqoThW2% z8N*mqW#TS-Hf-4Us2VQIZ_nR;SGxAn;(%V#T}b``@|X-U5g$3GEHXh_OwLy@1q?hn zhQm|cIXTEAR;u3kIJ;`IIQ>m|imgvZ`Kz<+W`v{N1S))3NuB^Ci6%*TcMS#%X5KwH zA9MXt6!ZeU<|_8{jwhW{M!KYklv$m0UZd;aKeU`fZl=nfK%d?=wIv2W$sGftT=Fr+ zi8_nL^dM_~d zvNDI|zV1m*uDfvd zRe27avzxvzk#scKRRiP>2+uM%_e5Mi8;l@1y`87AiY;0lxV73XtyqVy1wafC9Zl8G zJ(Qrk%6Y2xHq@^i_tQ~#ndWRZ++mMH-J5$068ThoF|}@-N)B+ikI}}%qpf_?Lj?yh zIR|08RO4f;CbiCNbk%2H^7CMwl2pb0NQ!lx<^d zUvLf}`fiCG&WTB%(W_MqOsQlsJU9%qaLeeWyn&VBwf@mXRBu zhPBlRi1nv5Vi&XrsyPr>MO0?q%GnV%9HA2tPi`gP3JXbf5oOt#EzVKNiFZG}RX&LK zY7V4iRAw@c#0rTeyVnZab*Zes@k|~d7%2UOBa%ETP8Y58>>ByPE6m!|2<5!}T>+XG z8+1soFFeF1i@5&!y40gyD)O^38kAB>%J^HA;{@vm!p9>%h7!b74w68aHP`zScAEW*!c5GFrI8lPiX&e9{FY_Nc!D-onNL^ zh3Fans0*7jbRBR)WUq*V_;pzuzm}la{8iX0QY?a-JgXP!v7Fq~1M`Za4ly|7aV7k-n`Nx;*v^->iYR+^hpTyc2LpwvyH z^GDOwDN zQB0jUb(2j=nLEJcRf;>MH2w8Oa*_%;EZ18Ih0TrS$=26^7CxHJXL2fV%&fx7Sp1Cj z#H_QW{Dm_Qqm}rBoqYYgS;lyZm9b(B?s#1z?yE|@F*^auGA2Nh7I^U9i9DAcpFM5M zo~y6IR-U+8rn>U>;p9;j`IPe^WlaF*BVnqnd`8*w2QEjREUiu^w`zU*XuM*AdRne|41uJ{@A`2}5q z=XeZFkjU@=vtI$1T9v+M)yG`p_dW>Js>py^m8`F7Rk+pG5)2z`S$T}Em-Lb- z3nEA!2T&ddvz};tznwgGm9_BX)YGed?#?~eZ1#3Gtxa`V27p(0R}%P^fTp`08l^=d zqfm2}8iXY+!_4C+EdrKt8<(L5p z{nj`IoCD$wDf=MJIx$yt;&NQY+n}k9bNQIqF^`fs8iIBooG8}e+iP?dzPw>%&q1cXANBeq@14(#Iait5LQFu5O&p>5K9!tm`6VQrz z4hnaiay73AhJs*-zv)rlo&rbJrmnhDmGf)ixz)Kc_&;O@$;*Jq9A$tgLug)gi(9}#-zI`PmM;LUiN) zq>LRw#&{lg_UV?cpT&Z~RngqE8nQIz0Yj9?JXvL{L z-u?Ybn>N{?Aj}vG!TB} zen_P*buI|x!*jgN4o%2ql08ov!>vU5O1nS3NTNzaoQ=nFMyXJmB{s$^TIp7I12*gO zejm{RP>ba;Tq%R6Onuq&_5<04?i?kxMw;EIK zVK;9*&6YKx$k{I4RanKIHBvvja&(K1h@#YHcg#WJ1Sn8oeHRjPiF%ataORaybE-I} zu9(!_FAKF^SBgB$#w_7_WT$LjJ}9a(c<=bUfBnvX`C(@|w<3@pc1X?#Vr}S=DD-*JE+E4$@v+@>Zn2iJHW}&$d;*}?FxT^bhfRtjCDaP3z1zs#;}Zw4Z`(hk zL;BV?)Gx5LGJ)7HL$^*J6lt_A6ZtDdGZH@bs!k;K@7RPwn+0)9-@}{LK~01%1}@M7~>Kqazr70y{UZ&jj1z8c*<$B<2up2gcP7`yy(0en|6nG63 zdrbtxES`5zTNw8iu)p$R*TR|g)z>f9=3^ik>uLo$>k-=+l>SfzF(8$emzVSJHtFUm zvEh0F?|$wz89o>fXR=F8RdBcr9XlxxR)v?;su%W9tqb|QW?r1=#$A%th+J-Xci1iW2&Emjx`ex%CT4ao1ylRu_k_X_P5_ znoCh;-PEj3tevY&owmYRgf^#Z<2z9t+19VNeLS=E>3yvB%kV%_-8WpcCsQ2mk--D_ zNdt2ZLL3ODw&R0xH)Lk;f-sjbDGSPz>pwJYt9&d^Q4Q*B*;Eiw?)Tw3LND%2niPHP zQnV9DGZX>i$X&=$1~^(~%(_9Cg>?8S=*WIbJ3~JcqRe`LW(anNiE-fQ zc~bYHK1?Rh)v%C|B+8odaRTk}W$PUhm{DwdykuNu)-i0ngSntG19n1zSm@T^ZHMGX z0g*jT#VlK_S1Z|KgOEi{wlCoj8FfYiljm8=K8jPB#O2cneg#L zkbiM8*>r+}dQsC^c*ujslVzu{m<5fs)TZYIryAEsKrUQqX`Ji5N-M+jfuVs)3mwYl zjnzuP!$s@5f8Z|JRalnny>u^tlX=1ROVuG#B!On} zCNI0+TjK<qok7|8p{^ztd<|1 zz7FK=uuC>?1{ql3ykdP(HVzH%l!xKH4PIyIMC=nLvuR}`1=C1g_$iH&3tDW(1%>b( zd}{M;qI|*nKfZropa_&MCGHDr= zC3h!q+gp)zA`vnLII}ddh!HA0w+kz6Rf(s9;*AxlS;jPqk=0J{wfWOkvNM|55^f_g zwS-Su0f$U&!@`G`xB!rF7;)v)NJ0!0WG!pBqYu(5sUIg$b1wyvbg>gGV}?7%w|R4k za`%dSa1-<~5k4+sPk}0FpfZIvp-0iUZ|(jRNeC+AVTP#Z}cy zTyG*ivc4`z=#{@R6K&gcrU8CKT`xTM)jr^)B{U9oBw%L;LYTIqQp52{s2nLjJ4sAf z+IJA+H^o=)DR4XhoXNsRgeMM!-CfgIeO1Yd{+SLIyQK`WbK4j?U0N+Y!a?K0vx-M` zvX8~B8D0W#J|*aS9J9W-deV7K*hRi*yFElMnplL%X7n$P3 z;*HeneGnYnau@4{aaH~!9J~GvXp-kt7p0zNOcM=h=3t1>V;;p-Wnl}yvfU6Su8ruA zQPhzz3c-yrJ6m=Z ztSNOZy_QY-Sp}MLDdom27kEhwpc_MiaZIP-x%d(DyBJcZdaCI-xU|)7E4V*VPMas( z{$Ndpppg%{zd@Y9p9~%mwR63_ zZi*mas(gKp;2^VH=3t$hz}`E+b6c0}X;&j4q)G7% z@4p546AOd0ppbPHqy>g5-T{3P!DOksd~_ov^{EkpYBz;5@h~-{`<0sR`)WZfS0#+u zbWkWS$rTk zhII&IT;oD_K^!xnHOfVCw}D4_d_zXHogqg6aST=P4n>B&*moF8WQLNc~+IEG7sjkWkF6*5+m3sdS^X*UF%TU6&Cv-KB;}x>yi8PD+%+G2$e{Z_%WA z{Nj;3P|{MmXH1jNJOop=e5;sGft(LyMI$CQUd+vTkoC7dl{ld|NsWS?%u--K)hU4jh{M4Jf?e_-S z#ie*^uTnc=VS!S(o*e8zDHaGLo=xiMu%rr0tI!Z@(#>q|u9k3oz(IomGlH&E)$)k= z7M?k%BWDfxY(`hHE!<~haPxvrZ;ub`Sf{WyFJlm0Ndvs9L5^JDe85v*S7@^kL5Y0< zBerQwTvy3qHVolDmUdPKKgl7Onkzl8iwLs!RfYxomuJv%W^H%*)&A~ zT(W)`hge$}QtdPmL)?zfQ;RHbSuSz_L4Bdc?DmC{+IQ}sLfddVXv>B5n>C_>c#$e? zj2ZKVbIrANB0VpaDn!VDK}rVhfSLs15KLl`r}D8Tj_D*kghvQiAFMKvHR}bI|7p8c zLY9bw_RTWUx#PMzR?+H&8j|avpjI8&qdMjE5YK%2hC(aA=UDU%BU)C%`guQ-lH=EB zIZl8286*FPP! z`^iYDX0xh;iM+r50s+kt#TdN?xTBKoP-3h2VB)>5v%*A}FCmPOLlyc(am_bC3hc~W z>gi^yHzSSyL{5qXG*PE2Oda3B%{%aR-zt;H7J>qLzMO!S*s1e4(yVeJwp|p29c{CW zVKexdB!LF9yE`Sj?k1~~pgbNi!&`Mqv^ZjN{3t%}D~Fg$PkyXhCvTn3eFp9CbJlBt z+9b-JUnLcb$~}N*lHL`D7pguf%9cgW-I;p{|5IK0l!4#Eg_%dJ)+Q(zO zH8SSPozILwlau@#irpE&82gf?gh9<=Y%y=VsG!XU2tbNGVr_AlHv{_&E6*!fIe8&z ziBB_Y`dn}&S8GR$HrT3guq=Kx-8%AMB$7IcQ){^>391pXM06%Bk*i$eI-Jp*kBNC% zymkFlOw&gR-&`OY z4(0#o`>rtL`r#yebqb{F8FR&%uR+{r-Xp^pqGYSRHSAhtsVhXr!`JyScsYU zAos?r8`Dn&9vs^eV|7Z*)^Y`E*Y)*r^JoOY-AA*XfQ8M29Xia+?B-a-)Y_v?AChbA z4(d0dANtHDS@Wx0hozQ3g2Hx3?o-!w7+P@Lv1||FztX&4q@pUgV^+tu6}rl6QQsaD z#IifLIh`ZnapuIM){(Yl5cK!GQ@-H(vcin-Y7VCrPG&?1qxj%)Fq$%XZ(dFji>k!h zk2fajPpURkYU97sRWKoXe2I_VPkLP36WhI%m3|=49`mDr0kNnCzKZDT!-*M+GJ&HK zPD}+Ih!5mai6^Ld3#OMD1b_=uzU!{PItjK;v@U~^?#5c%liRniG%x)Ng=$M#Jl)iU_Zfe;OpIH@DALM z1*z(cQ(Uq6=VTcd0f#UHGJ|&l8pkyWrq$P`6xfwTjjGRJsmd>5Mz8V~tH@WF*0W_0 z1Z^+W(WSYoq;Nh%2Aun>mg$L)5^k|tx@L$tBysdNyS|wQtjhrgJ>CQPG{r#I6fZKo zcfmu_!zrhFZ<Hn0=MS+9eTJ~1*2(CPtrAzauw*xK9lnZkC2p}APxjwOw^O#*~Wvt;~w z^NR}pIL{exRWcGxdv{nu39F!X3H zEX#xj`2n6be>0an!lj1XJJw5-9McHG2$&dZJSMDDoU3KGlvnW{&WKhbca|&+-nnBJ zdg%i&p4B#2t-51L@I@EHE%`RgITUe@3tNm+JTn05s8Gn_2Hh@=q{orG3S|vYMFA*l zr|9$PYdKy*)R?dASIc}s9pBHVPdY;)KbE}^ltZ5p z=}=!5`WU1lZ@Um(iHs!lxwoo-QXlu7d4E!FdtT%`$ksoLi0`n+NmFY+NPlxDzd%#* zIDdHODNqufZYnzjF7pBGRkY-emZaKk>6G+`mWnJtmzMu?evuFyiD#NMe*gYWD zq*mA#8+O@zRVXyqMnd=TGsqg=7?4e%9s@0|hL{#+07+8@SB@{s}@Xl~>EwfWbn z=ReKa9X4zsiMMvVdMTWQEl?+MOpO&GL*jgrfLxh$S%ELwpa>Ke5AKaA(~U~&*@C12 zX}!dj7t6%ad)dIh8XDjT+r*`4I8Qe#F&dBxVj$--a~Vl?OHM54Ygjqv8e?N>w%XfJuEv?7|P(8f!3QCR~nQqz!2sq+pmD8{h#P{q# z)lYsN+1nBS50u>ek5f5GPV;5zv#*n5UE;|!E>p0gSy9?+%mcyFnYuOL%aJiai;3n8 zNnJjk+ztgo0A^cq~5noHJ~?mJqpAkmE_o$<)4J3*I%bebAZGJ7}64B;G+Om4PX? z`4%WUk1Bk5_uOPV{#A;%B~nFJS(TCb$p`MPu)@}kY-dE^B;-@OjW(ItGyyvb44Sv9 zd`hplZ71m=#U9PpWei?wB6dqEtljN(>fjGCuC*{kYq_HmlY~A!<4OeaT}*@*#FmBs zhG~5@wkWaL!@GAOiHcf5&}g|hNuOJZ;`l~PCmC@V7T{x3+EHpP#<=fzPTGRKfoJ%+ z58jolluJp11S+1PY;7ZbTbTc(z7JPmIEGLmB_o33hy-CivvH1G(eVe`(gn;hf=JW@ z&8{Mvr~@0`I8s`5v~WGojf~CZ5S^&5GB;aXJCye}K{(}wPyF6%2135a<)XvG(}7mH z7eo?u8x*^xmKxr$PUnez2dZs}=4EcE5FRc>v&#G-4F zN_UDhNW17pT0#&cqy+^emF|`n5Lh&*fRrFerwWUb5F|taDHRMt&%NIF`^FjP-x>SI z-aq!<<8eLnne)1@ngBiIlkCY9DFO$m`?`GABHJ%22_A*XoTY@v4}rU7fVWg$I70fx z0cm+R#0{!UenAe%#2^lvMA553hq8>>}JS`xADpnEi)y zWxg?~0rH?6J(FV!w7Q?Y&=RG=eZ~z@xE4yd%{Gc~^@`lduN6bd!osY_1Oy(C?-`u#8_O>r1 z?Zf|i0l;>t+Jb4PdLju^Azi(>{=n+H+awyFyYrqJ-Lz_PH`jfl+_jx_o7qc#Krvzv z`@RA{WoQx0RU->I#?3Azi61Gj278vY5%M56FOxGe5vnw_-hA*!qoP8+o{U9E)&hCa zv@44K+bSgs4h95qS>qpk@efKUE8$nIe7X-=M1Uz^jF!;?UvYEf!)dF{_|W{$=sjL0 zZn@nbApAdHk}P9y^Iit@YUxo99GuonP&t5(nN5(a!qG_LIn8m6(o$y1==7;^8e8Jk zdyIa+M=aBi6D&Buph=n)M0pA6QxCC53n_=FPjB)~(QjMunvK2$t|#!jZjNQI%iP(% z^c!dT1reubT4kg+t*ng?C1-b`Xxme{nPWuagC055Nj4)$CGzNJYNW+FMmvid-T&&=I8_SXg6rmm9#ZQ>94@vMO1TVWGF;qbMwo-h&4TnVQDYy$lyHF< zx$F!~ygRQq=Uea>38x6rlv+h^9m6RG(ZPYV+Uw8vUtn%%KN1QQ|NcZ2g4j}OxG(7C z8cF+*S-RNq{nCGs$tsy)n=#%TgY1wAG9_}uxKcmAp~|xD0g=~A>yy-f{>AV)sKENE z1Xj(Y1DX$p!x(G)MkCEiZ+vSQ{=z(~!2xrls+>BJ+1B4(vKo|(wQY3D>@J9Km0zX~3zXC!&DDg0$}^ zCry~KOWq}s2d{ply1FQU(>{f9s%OpiD&3UH1z}6n*0F<2be>o9S9ad_B9o1VL`?Oi z@9~|!e;Q3Vp{#DV@k#7CrRc5sHk87wp^k6NBv^tIi@A?3`AH}(i=1VWlP9=Cw>G|* z1_aWFiU2^kQlIOz(d@rhqb2X1pZI$f>pbE!hCk_9-|YP!zs2Sap`zC`%0DbBo4=WL z1iJ66yeu_sjXge!!eM$^`rF$7n+v*B((s?kcWR2r4iP8n)dA%!wOH34zIo{r$8Oak zmfc!vSSiGGLF9G+St%FlVT&eyb%@8Va}q~ive8T>SMq(}07OpuqSD@QJe3WU6wbn7@Kx_tfqR2F%xA{OSG8hxF{}vMu1f%(u7rp>Z#HWN6)a zcx!j_gTus^Aji*nXtN)qg)D!>$c@=X*zt;Gfq}esHftTu{Ztfzm6RUVrb7%>`{Pf5~NU8YMV9w zr>Tw2{tt^sE^IhP|35tzJb1y(=Y3?NllLZ+_@|Q7-%T}eYOVa%Az_bzv&R%L8@R1> zt>@H!qtEZ(7Lz$yrfh89gA_aW;b(C!wVn(^qQqysMV|lDiV#mukhBj}MB8<{U%*+J zOjjCQp6V-hTQyTUJM6!8S1hC?AZE}K0Z?vLG5n9v%+@fFNX;{{T|+*9ZCak_`&PXX z1zzwNIY@a&x5l(ehCzxgg?f^6)eQ|>meAcF&MJyggT#3;IKjM|0EO!NwI0fJIG>LC z!Q!-98rq1B4~JjyQ;uG$U-`q5ky$}5vbW{k#LE+T;{AdXGe4*Ksn5<|RmEuN12ezq7hfG@dV*%9rBZm#QVkJn%j`1*hQd zJ5cXGe1w3yy3~{d)fkkH;3Cu23G1#u|Cv9;L>ljjR#QoO-a?8sumLfuOu;H|W*_b2 z41a+7zDw&t00i^ZkGiY%GR`%**lw)hPKy7nsVb&n>d0rXY`L6Xa~+5ehuzIfcI z+;?0&M_pFYUvC`9E~et3-YxaN zV9}IQIgj^4OcRzVreC-jYCK8WWEqDurS5cMUXG!9;USAx+9%Bxfo%Hsi|m33UBF9l zDm6ol>y)0<7+-2gecD318R!`LmP^{aYVW-m{E-kFBMk#lTlwpbqO^2pEH_fZ6OGSF z*Y)~7JOyD$=D8~zRMrtqV@vUrAFACBg@0;L@*5(SMgHrq8DhSbkI6KtTrV!WmguL9 zBG@rF)3}PQl9^^${CKkhlhl?kt}#fod$XwW+7l5e$um<#MsLXeOD!qUnEiCu zMvSEgZ088a_pqX5JJ6GA3c*LN!7#X!)I2O%KIXTlDSQ^0@+>T_K&G|=nOL7lN-PUH z&y_o)eNmL}v0-;oS;85WC;Z8=R$C7_iYWMx7)jcDa4u{;W^8fAs1{Z?&L_e5CviR| zd4fB{HribIzR4Ty>-Ra+x4C_dD_!n0@!O)IWQ(=+;a|%ss2ev5=VF*2GRKds-%fXJ zz*zDJ>+CJHrEy~t&m%bYb~e(kGU&T^TOumPAYye}z(1(nDExcqelx?`%DZ1L>9CQt zlh|MZeY?(pr{8TPK?yNV7a-Usw7;{n*i!e~Sc=YqicunZQjRUMpb-3m`hUWb7zn7( z)8EA86ZgPu{fA-RP)>1|M>c$*@Ei@ z&zmzMxQ05g5eNiKnY3bzPw8YG*T6q|B>LP?R(!aITHqE&PyJtC$|Gl~@lFs#f6Flu zjjDRwR+KMwTd2CmzO(S6s=Y=ljcum${=p>SYTacOl;%|)l(O)ow706g zRBANkgx?-O&a}X$MZKj8IDZ>yufD#=9kT`DhnZ(8JK+3MEDcMeyClnHGWiMlmU3AD zTr{)4_t#uo-1?iV3>UxT>))HfFKd4{+RbsS%2wgCCOFs9D2AFV40S;qRf>(==hw^7{+Pe-Uy=UuUiYsVMD#$~Ku zhBfMOYRR#JR5Hbd8}#GzY@Wt0-o(dO@jdm{ExI=%sXhV{3+!U~umpdnd##r(lwE>H zdBj=09*n<@SosCn{lxKfO@{ag?#aNjkgrI~*A7JfyU#_Q{`gM`%fk8`ZlB~I4IIn| z@E(&A@`#~ogut;yYao96U7-#v==y8YXaLXhe%eQoI1%a)r88TWnHwkjzcH_qDV|6|y!ME(|VgwJ9=h3?&1(JGy;M&n(_8^Fie)t@T zr*vXaYEHVe#XnY9X2;Gewd9oGjbPdI+q&n0`DPK&ZsA5ojX|a1YXgTze74X_%*0mq zjnfIF!ZMGx#|W17%$FWNKEI7LEgWPd4xMyX^jdtRXXQNpKV2fvnwUS)41VxN+cp^( zza-!E^2-Q3IiiJ))#>?jF^?YTGNmpY&{RKI*BkGW*~c`+R{4& zt_K|O>KWuP-kC&ffhfO@nBC>h+=g7?cc_xEVavsit7%PRIl_U_u@vk3pT@k?_2}+( zrUZ=2XlzQX>>>4*D62itdjByjH>ISG#ZA{34g)bLs1{4Edd{PhIf>=qteTw0b~(A2 zPUH#vnh6X%8}6F?}64tS*lgZ(#P%oID4kLnVClf-e$$!YBPhK|ofm%C!sDadaou z(DAc`neSD51;q0r%?B^z#ZjknBc!sYBK*crKD?rQw(av{Uvpxy`}aB-br7L~Y5Teh zGph(+=#z6|=$|+b>0v^|LBKT?l3B)3P_Xurkb()Rj54DR+fbNX_Wsyi1U#_POoBC1 zX`Fs4gOowIdQ5_Q3g5V9sqIdZI4lDV0NW|9irowHh`g>b`T8PdTGEUkUnZ_qzc$sO zWLMS)y0jADj0ul{<+3N*d{K0cR|@`BA*olt-A|D|$;^1o~ zL@`+a$EF|OiJd6N=c6~epgQ?B6le({VM#7l;8Rqg!+_SY#BCn1D>qZxomK|i<5Ry+ z=-!5n+L4P-PY+%{`Fr-`TIZ~e)YYs*0WtFu3!_^$g*mKjl-lpcWO25yXi{_4SJQN*#=tNVOdtN|M1Spp}Nn1+5|%h z*zSPAo3Q;4av3}RM_b0uFc7-J_S7;bYO-spnvtKhyb^|~~V61|#rPaWDsAV_oAlxTX9SiUM!?}NA7hxW$L zPODDDp5_@3N*oM$^>Sw1Ez+M&LL)_IE)(XAh>C5dAi5@ylnJwbBS5mulSAsTT*`aF zn|x2M;O5I&hIe7{x3rp#4Mi^ua9M29N4~torbGFy5Y4&r zzq5M|zd1D$=}MtcNJFA~+6yPnxA{RhXCC z;os|%kzZspc=FXhdOBh{w?Nt>QKNdYA8*?%FhMhzz8v0eG!-WrLGWH%D>I`AoV5jZ zL7VjG*tXC@Y_}C#&?jo~JA;YzB>GOmKRJDN=Nr{iW2^XJ^;JQfZn?s~D3*5XgR_5i zs$@k<)gQ_BwXCpfCP(jfWtdsJ6l#8s8AnO^uB-QU@O_LpVtRh_buoC&1l(&l{8ykt z?pd{ITEC3--j{L8J(e{;awe&GdH)@rLluNQ*q{RJE8Zr=6RH*<&l)!%oiUabyE6kq z&k5}9gLx3f;JIwm`eyRmQx@`$tH(2@&n%O&+ZpBFgF-H?g{`qtjw9@++NnUzq{C8B ze}DYnXqacYHFb)4S0CffT*AExZahrIl_--gNJOrS5gRr$uJbW_5h^RXd?F?X=!Z>_ z1jCO?f=Nb|6{jfcEz5UNYkU0-qvM2VIT}p8QlKR?UjB_#aq1O;swDlg%7-Hn4`I%w zxUTiYB(@i1yt-pS@lc6JMBt3zoPB0%^XB_{lIT<<0hxYrMjR2IEWsl;QwYx=ljs6( z7mfY~l_MX`$Jc^P|2HItU-;jUSYLpWC9X6z>IoW5+?hbUqhy;U)IK<&-}b}}le6K@ z)I_>3njR(}?=4-Q|8l`Dkh2=syb)t{D|L#c6{E6eC1v}GxC&q6>Guk^Hr&{65m2w`V6TiUn96kd-_Vw@N*s&= zclj+{gkW{>XKsaU3rzWQ&=3~?a*O``Hq3tIJP6c6!gE-&w~9eenFRavt&>8{YaBmJ z#6|2350&(O6ZbTxW5~q}d%qp>0ZWqm8;-$p5J>{C}-B@ZM(#UA-v4{$pMVbjaxiAG+cQQo+n|%@9w>V2Q(2K|xMx7b(r^OPriZC~(Vlb+r1~Yj9ZZ<;(CU z{Ze}@*j@+!r7m01}ROIDGhU_iDzw z&AkVoLP6CrJ%i+-aRTYW+q{V_@r>)2ljMlVFP4aBFe;@j4XLp{@{=;6+s7sXO68d7 zh0MfQ2-W`>rGFLtZRGK1pGS|2?N1YHnQ#B&#(sz2`yGxw0BT6BF;G#r(8N)-`NLcgx#onU-eec#^MR> zTHhdL>Hm}S0@Xqcx9jJkhh0_Bb4Ot>@x~!E-4~+Aja|Y>s5~ ze{Ydn*0iC_OX5THj)PC%jZD(9L1uB6iK-PN5?o?nm-yX~XiCEfMWGTF*H)^jZ(KA?2UkA+J(dE6VA=-j%XSQR1d2oiE_j$v3@&@V{or~bx$EJ- z2N*@rT#s_JN_j&K+b|-5DMdQh8`Z{9TZTe>&wf~^Tg(V59inU6UoZZ$CJ~_Z0 zrTcG1d_kutljMIB0=7qNM)n-O-UsLji)S%j&DpL$ZfJ6qK&jQES$JVd$Ll<}Hx4aG zxR{=V{mjp!Tppr!Rw4cL7oJ}!$>BoDUZCGO~qRECRAfcskz*%U#L+O6#-vscoej+ zO>MNejr#Qs`irXH*L$YM<*J^Ov3H~ceI{>G&)v=b=eXo!l$6I@GM|1P@>`w-d5P#i zTcZ1XqCY_aKELA>CS4t|vaezgo9t?-(!Z;BM_uyh%`M=d*dc%z1$qFRYt4g)um697 zVw#{QQ(u-Mz0)0b+Un{VD{usKrT(QXf6`{~#=`tdv;9bRq!0II^C(0&)$KVwpaZq~ z@e$n_Tfky!FlUztcnkrKUPyHU5GBS zSeg%;P^d}YDZ6H>);#u>-5vVh>gMXHFyc2O7m2N&VEg%&x?@h>jTsS&vTyOE(JF%( ziXCrjB4R~6&><G6%Ijlo=zV7Dh?tlh*gOCM4kXSmPq;EK>E3r?;deGLPNL2&n zR)t?0U47SGo`~ z=;eycvZR3a`oy#(AzN9Rta5+eg|2}qeZ{=>h=1Yeq0=GGmvDfVMW$N zN7Di-|D9h-M!938M;zC8^Pain=Ts;CR!m08@UPnV@Fqhd*SjjYWTSvWx%gNm!xis+ zPL|!w5FIY@C+Mn6|-!mXb;}6!<*F&mE_VfNMtW6FC{$l-}DYS629tR@!t3TihczjZojv^1nI-+%&WBt-^8Sek1O5qCvd z=*f1Kb(kK&`DQy(p&inq!}ZHIr+V)`0&lLlxk32_lfH5^^xA$mW2?V4g@ICqg6|&F zSFai|#Pn!a)tXkNe(!h@nH)}w^eIRQz^jb#V-d3tZ*bJ6*DI1gw#|*g`yjaM4rK10 z%)gZy?Bg6Ae0&0JEJ8kB@1GS3e3#HeVo(`5bPblTptQur;KH}|t)3)W<=`(NGWmCUTKNBd@d~8orfNb{t+|~E*YaVT)ZI_od$(cy zpnfM)cpY2CE=GvUzu@X<;rUO1<26ch8lwK13Z>y4gZU z?-UNy<(Te^eiQ^79y5`aH!v@e9j!iVefMipI#R$ZjofXfg^ii^eu_UOJ+5;Rr7T|N zUAi|kB^RaDkQi)W;UZI2cb@62V(9q^cup=SKc}NFUGlg>~MXR1PoVB=b|YP^<@=2jF?!}=X7+*3eq1Ji1h=>Z*ztB ziNtmd7Or1&3#y^@S<>3(17BTY`AFv!bLM<}`HeFb>*OJ6ZH9{JbXfRn5CzG~ym7D}^kp zH@(ui5VQr-zHx8`2d;}fTLN5woxFDV5iO(RY z0%{+_Lg82Ui&r~05*;Bb>+H?2ic=3x*r%6&w7j>Y)#x_dPFeqazSg24iYK(>dg=Zq zR$Rk)qp!YydP9xpUSmosLaF;b1$NPJ4BK^Um;J=6H_=Q`o}sAK`qU_Id!%I0^LBvrtZY=Ygq<7oDm057}!;@V&!gK>d(N00t~f~ z86YL{?v8_O5xF|bnb~VqV$i5zmSg8|C=zDzn;7s(5%Zr1|I|SH8~G}>VubcdzdA|d zjmfcAU3)T@Oy7}sa*L6as2Z4nqz)oM0gD%Pt|BbwVaxa~TvB9+uNOW+WeG>2Kn+KU z3}T5G@#+}(OcLwM4S8Fv6gRI#j#(VI0=dbvCHKLHvrNZFw#QhM;Em^&Tz?;NUIdOn zuJQ92DMoI_f0NR|P}f~nvS3iwlp;fOJ7^@j`!bA;Vs~A&(gtM~F-yZ^%<m?)4+b2a)=i&OV`tERJ)^b4Phcf9Q(rZNxaC%pm55ryLv(*uX`F3A)ykF2GFeeHw*-!2e z!Lpx?{}EctT~at5LD?p8z+NM(J(^&azQV1L$K_1cTnh%MWIxNzeWI>IU{|8;d_Kfd zv3qx>o=hvrX8mT1yrsH8&mtL-cZ&2UJyo7X$c^}Teh{!f>l^1sP~=PsPwuO_tJi^)%H~;6#GEX6^E=6NC&43ZDMpk zJzXjK>u=CO29P5c2t#5Y6ZrKsVK}D>Cf$}rA{-;AM55QvNtEZE7PEG>J#ya(O_1HD zpr%y1cYz6~grjq^g#RwQJMm0lh$UMc$s^Ndkk0fk>%SD#9RvXtZp=og|3R7W} zHwATKz_4c_e4098%d`a)byeAj9+H&D&VA7&kp&Eqe0o!BP3~`rUye5mmK*({Ih@ne zQlqcDTqj0)n=Ksy=U&iN(jm-gG&W-h3gfbjHCK$btR*?b(GnI6fG-xKye+3slNWaI6&e1vh$-3pJBhTFu8l5XINw| z%~3Eg1Ciy(tj0=FFx$IO4OPshtIf98=stcu^(slDU`ZWj3waZ;Yzc|F8wk(?3`AcY zMh2$#TExtZxl-3bBo)u4k0sF*RV4hbw^Jk4+%;Z`TY)0JZl4mPKI7&Rxuf*V3; zX>M33%jtb3li(6dSbaBV<7qa zizIgTp=v8fyXpk4k7#MR`uF!nnLSG?5jZ@h@gRxL`AUqN>+-a0|Lhvp90eveV-N_4 zWe}}g)&rza0i&n>+7|~Apw__B6pej7hm9Kw|9L%>g^>@zbbgB;l}&b=lSM@xeMD$ZXm-VxhZU5*W}7M#)hbl-9!vCa|E+H$xnvsH*V*ohwt z6y0})ylE-d4m-M2f3DM;yo{bsD%|@o zm+87|?;Jv&4O=moFoR$farMQv1NJBpM|+@o@w9zoHgfr|6?dX~zp!l$`L5DwL#mGZ zgpbpoHY~&PjJrps;0aQbBNUP4E03zPes)8yF%xcFaivZ(%8899lpWxVFW`78vAPIR0oyh z`}4M!*+7f@{H9(?oz_0H?bW7&qW#bqTD{IxREH{)o*Xne2LO1IdV$e znn2J@!i&_Osj0Y+mMXMlv{~@XL-Q$t;a8)1DwP6c_iY`NJ!C`s)=(eu1zmCGOVx;C zy6NzvS^dpQzmN*)Z6=Fpt~^+D{s_R65k{?707`!F&>Sz0(byp}i=>Nd&(gL=Yo z3)!4|_+SrV$+~R!Dlc3V(aPv)VnL!z1-GR{|A>m0BBO>JHR(m%dq|@`s6W?W#YGsC zhN($C93N_Veg5-tdpT^zbXze?EW~?^^nOWG7@QxuqP;qdO!zy0-U9>!Cx^L&t#A1& z&SQ#5z6c3|cSi~I#i*N|2x*KH1;Iw}7b}NjD`^6PcuqSsw*Ka8JI8c#--Kr!gZXnwb9tKt&KP?+OLor9rJzcim^y-fh54y1mI0j}B(t5=yv= zx4kp;@kL6&#Kx7%X42aRG**EsstdUVR@q;RqfOGxFL zD3il6#2gf48HQnxOweD9f)tOERSyKt_K5mX&T%Ogj8`xDL#Gb!qT+Qa?vm(f&MP=s zZ353O!DTlel~jD2gmnX!NKU{=LnFu39c)wbqOH6|gNkB-s-N8_I99Ci)i$?=%Ncbt z`_=bNat9#fYgL630b?~IBE(U6*7pcwbSED@oH>zf?z=H{?n-Li%i7RVlfB2 za{$cI!K!y6TS62N-(zBCWfr|cXx;@wTXGcB$A04II5^|9#Po3P%O$?z2*hamA%eac zGSGeFEW>RT40}yDAzey*L!N0u@rVA1Ua`V`k*>4v+2fx?3p#Bu`OZ5?^0Ey{hp#0#R&W^Z3KKBL%KgmXP{ zyGMN*xS#3O{%v-{V@`n_b)se0jDH>nH&(rBTdyjYl@+qlYzDH>U)T&&3^u`RtTViS zx)sZ^iPTv(G8W)X`!0|crI_li>yOsHzB6R7dD-}vy4~ z8P%`*TH4l;vV=ilH_!#eMpQbC!lOw}aRrDH`o={?p8v7$#SM;NIQ7y|Z+hX+?DK|(N!qvJ=+c|} zVWLBN)&AV=uNC*baZ#jNg%2nqUf>UpP&8JC4=h@R8d|8h2zg%o2(w>RLQQ)buhr=l ztEJ8I6iPp~n9<#aIB(dX$4V#lR17AsSje1!(d9a|Wx2(~__$>0_Oopi^B$8mgFap> zN&C8=;;(gbtC#!nXMXL4UK*jI!a?24-`bu&Q zde>03OM&rsV#V4hV>(XvJ2ye9voI)xLZuJ>?`^-iKKtswzDps?0#x0PnV5xsvphLJ zR*_3SG2zoPOt+PiFN&+C^D&Ie&}#Ud9K)+8p}ZljamyKw$JAU?ao-2KQles*Q6EgepUY8}EN&I!63WDLSarxJJ-#KV6t@dOIoTzTm`p3pkn zr>Rg{we38q`FKTbMl_pP_Osv7&LeIQ84I5j+oG{Q67H6*9-B*o7pZZ>EI)$YiN6xr z-*EJo{7p%|09XTaU_%NtyCfjsbLQlkl@`q5^M>h)wjQu*7kP3Ez0N|Y>rB8>z|xu zS-^Ed(k5$n4euF^b^tR)ARJE_aDIL~m=B+sy(o#q81Pi#@7-O}>^xZ=Z1h>&lI^=a z%Y%$YnZbQ$FWxCtj;7)>iboaN-L2DSTpE=g#`jS6hoQ;cx6zh21JD0_qe{F@vDu^L;aMY9Lr)po@n{a(7l%Th3dmi+C9{Gqps+% z-ky9^8R;(n%Z*e8mNge$PVXgG)5Yy_wq}gsl#>$yEepIMP)O<`RJi^M1pD;phAS3z;nu;OcA3MN_TxDlwVG_7 z*IGmQKtEsVYGiST$4YG}1w0xlRgex_zqar+p2A%GRZ?K!*YAFzOHi1mykDGEFZS*SHh4CSv4k24)H%qL zn+s1cVC-mbn>HY5z|cg=I?S;{%#ZKRcwxTzU-B>=ERR*K}WO{Me(C^%@5PHx4b$-=O7C2Hz*S#iw5rO%ngTzebJj z-=fF!6yLhSnpBxZ##G@L#4QyL&no;l-LR&5Ghx3M=2|Z_Ps<;IIS0|m$s~D2RtqdL zthnR7g3ZB_fn^6*8VD}+Xv}}|@^3~`*MbR0N`v(0=+@g~ZL5Td1fKb*NQ#X)NlTx0 zP8Lda=jpg(%tc$tZfM#rqMC<7G{j5Wu3;aYaW{tmRC|}W6>FK$qnhxew2iS z1mbHGaNuAb-SmN3Y5^@liNm5C8dxS35nL%20ktP^*O8}mrr|0_8-U<)p92E#j%gYB z49;my{<{apzs&fMInxefc9b6}f3&sXXHXU@WE2e4}8rS`6VIuKh@qUV%| zrP#Q?5>$1X{eqCjCmfYqyzwan-|HyheA;=m%<^gOZNlf zvB4%{D}K`kkrRA+;K%RE$>Nm9kHfmI`RWXowR@6KSoo$^UkF z?8~eUGi!yA`0FAW`OhO(oFuY_J+0G?o`2-GB?mtE#q<}~3Q&Cg*IgJ#BZs@&-}8W6 zTT1R7$Z%oTqH(Nx4;R$`Ko`W!T_Gpg>AMhUsiWACCC^mFOHl3cV^(@~GgfNkm5{Ly zQ(cv*3=_q`N*6W-XSe0QPBqQ+7mkBErGREk_8#jS7C z^xhM-lk=Tv^_->x^)Ldrl2vpMe6I$IU{12+522k}R?l>Y93m?wH(3Bj1^rE=BK^z>ZYck9S=90M7n&8}FCAySJY5 z2VRByZQ(a3p;U?PFTExi>n)38q;P|8Th~^*hZ1+%>nJ5Z#M%D!$J{$6T>Pt9DY1kj z46*%08AVhf^ZGQ4u8)GFI?}@>5uOUH{{ru3*1x&*MgOTtmdR`Ph~$fRgLiIRsMS5; z3OU;1gfHKG&vUvmC`}1Sk{RcXn5S;o`yZ525aMe3UzBn< zJ?a=I-RPmJ-3b}1Voi?a%Q6;m+#?T;p&G5u28tl7njG>uv={z>B{SHpH#W%g;vMi2 zlk6B1(ZuIre|PWfP)4~d8Toa})(>0JDFLcfSH@5QLQXYPs?2+Y&N(`G-#G1;w#6PW zq-0QPJt=|Ksd2y$=J~KjvOt23Gw(j$Wb;UW>PFcwG*X&$tw_E?CfVj!4#znU3uRb* zshWA?MVfdr+zl@&Zq`)g3z$V$t!&>CvBH#SS4ppCd@(Bo$wggQ&kiTS4*T0aKS>^g zye)NSdm!ItEk-7)EXqK;I$`y^?81B^JFeZjoqd$?{w`>0H>Rs8HJ0QB(+DCT+ zRp!{4Szj-lefI*$BXrv{g34d2h3#Hh%CY?U^M`HO7pKHOA$w2rn<|YM_9rV?s4ILJ zSk{Pn=Rs?g(2U9O-LU0JkH@Z!$}i#UKh0vL>L*wZ4V9yzgwKMXSS{U<+Kqm142ohd zLyJ*s#~GB1(1fP8*l(M`J;+_PgM@>y0SsEx&uQU3qZM1$q&m2+!{ij(cIAz~D~X+o zluR(NDiRKT2<*Ks70?po%c;W5dSSOd3$d;VzZ@>bTcwQ9+&4w}^jG3XZ}0>4}6TkmGKb_Xp!q51XiHiKcJMVRgX9?!xcQOjaR|Jh-!%^~b zuXSmLk+y9mxbB3Vh|{=_Tg-KmgAs}aI!G9c;US#r!T~rF_kB*shRqJ;EHaCVE2XG=bHi3F?tF4%w8(UzT_(WQU$;p@jG^eW9r$(Wmg?-#^dZPmpSJtRVDTFswF3nTCL16xlKsU9V@67;k|OY=%u?aVv20 zb`cNqA)bRAt5KW(>FC`OF{Qr+|4LOx?~#4#dVlSz59zj%_#+1G^b8#>)Zs%gb*H#J za!bS_zk6k}P0C&0hL}0c>j5u++T5)9+I_mO(0lueGnkv*#ZKXtO{LjY98}my>0|#F zo-|?!I9|;#PT zDT`eIC@+py4fYWy|G?pedpxAq=r71nNHcbTKsKU2sgA?3Sq{yiD&IJk%pDv`Ss+ct z7P2cY2)AZ?$bw-Q%|ba-L9kGhUXiGZSEJdZ$lP`F<>8 zh|Hy|0ib+hgl0rvM?}d%Q z)Qw-@0Mmn>I|MV%w~1(j1yF=?sJS%*FK#&b%Cd}%sB%V_@!5~9P(BQi+x z`sVGjlp(|{7^&5nS?#WWcc;;|#-j$g6jOICwSK{p)7yT>;$3b=e-mrnL#Qq*78n7S zSY*ypy_F# z_Ta|&$ifXRJAWuBao)Ur%X^821D74o_XWneb>xTyE-AX24jnp13IDmJd3&X+)3KII zl^`xkW>LIcI3AH13nL;nF2YlHYPR6Z5fHRCExIUVI_#n&d?lFMpnfGi_2_voLUKP? zEf^dTnV(fiX4C(_C?bxIF7YQ1PHLFm967{Hk`UoQZ?kMx@R<@xIS=mQ)^ul)%*lP` z^RN}Bg|;g<-Dk$K;GUdm&8qSkd$bNmE28YKe2EZY%dS@C0MB}cu}hLZ%d%W0W!#;Mn!>WsOd0QI%qr--B+M_S;E>r*6 z_K{P_J?$d=Qukk$r|)5h<5vT0%#r+}SkDi0h~Cb``WDJwSjU(mMQqb1WgYoOoemP%e^>W(qg>>U1=sDV;CukJG z|2})cn94&_SY1Q^g1XG-%O5yT$%^!HlbeU%KCUE64h;pT%H4j7Dl8-#p~&pRJol=&mnIB*)(Hd8EW);jH}cfSrq+Z>(4&Nv&7%! zQ+)AR4wouhH(I1$Q(j%=5L=_JV(vZPPFsS#}jA1-LSiZi!=;7}9bN9bIOgYrmR<+>9QLo>iBqd1?zlX?5v1>F4 z*sVx0bYd=A_NettQsL8aN-a+b*gr_)%|O*s`^;`P^xf#b+f{G=)?3iaSguY4P%h#8`?+YCA`z^Zudx+R>#t6Ey&O7~|_O3i0%J%DrvCEcyDO;#4 zgVHn}yDX7?uNXTuBq1qTLk!tdWXoEzFOhuSXv6Aah zEVcPFi+F-2u|QPv?^UhjU8_LT%=jyp)cE*h9fm6xi3JpLnj0K2u;sGQECxMi)_o6V z%$r-=G*g{H&3tz2c%<$nvMcgjixvkd2z5Wiga9TW!t)-Tn(%|N{&b4drI z+kqD5QS2w`fWz+_8^qUfwe%=fmDHp9056A{pZ~pnf5Clcj3)M-7@X;eC6dZkF%Vll zNdzV+hoYrfl|Ocx<`~xfl|jr<`a=U@Gg`_hg_vV$1JM}nJ=kU4y{H(;EyGeo3hU|j zu_%{qrfq3%LG-gNlzvOrhmE<(iDmVh^FA;b0s7m-#YO$T&hq{JgLBbJO6W;giB!C0 zR{4n-)g_4C(Ts1JZ+SkHOQb&p4^|FyBaB6Kan29wF`8G)Q9oB>?Xqy`AM*R<@KfhE zi2~Y$wJySQt54H`KHg{g{qG@VA6`8p2RzQBAT}y$YLbeT>ZO8M$oktWo2sn4S;B?F z_9#9?F>iwtua#k8Z2;Hl$NL=v0L3SEHb=Op!R?u>bJ>83*>Oq%aUwK#$9Sa$Enx|% zF-OA|3z#AtYU-ry#6xaAalNU=D4&Vtpv*)PEdlMW^jKUiSI$Srm2c`ZIf<8rh;~8# zqYm+XCQudLdPBS&xP#`N-6&YUZs6H_E2gqG=GM1-^9SLxxV!DB+FW|ds3Vl9iw%(x zD;sO5Uv&)%oY$;g&Ei(RGEpTcgRW%4?F5|OseZ^$_?F$t%PI6uP2cnEb|&+<=xqnj zuxyG?piv6Mlsah_$4}{dE$e?7yJb;4xk2Pyc57ehp?bqv5VF=%Gzwq(+nu$&l6;-+ z+P9*d)0die4$jwy)Fqj~5TX>UkC|3upKto~U;W2pE#@`22P$C#0(ND-`I$-1YJ=adN%t_Q2Xw=F|<{s`#6h2{&A>4uzk;4AhWfo5CiDGcVv$l9ni{17o=e(TcJ=KeS@ ze`$2yakO+Wf=dwIC#$qa=2r`0A+Wb0wHYcjxL~Pz^b3Br&6A!PRM0iPfxq_l?9z#@ z@_vt3v+vu}2}l_NkTOQXpb@JIIfLhaduBP+&*nf*0z7~FS(HBw_3Ri!ML}7L=bW+5 z69ZP*5nTz879;%%ep(q+KWc$6b>4L7O!-oi3y7Fjf|&iN3B|soxgs$Lb)IbaLJL!6 zRs2(5ZMDChny+$-gE)kG!)0R4Cxx+fWT{zaYb;MXMcvIxqA-<)X$l~`hSS^&nLA%o z)ax|A_H?-{3~ECT>9HH5(|vbAkdWfCA7g$9d(&P2NtPbuMHoTqGI3 zQohI1!Tvq!mbV@+VbGx`{l(}W$F5S-l=U-LTa+1W!;|rJ0`%C&_j&g&KS6^9Nc=~#d*j*3#ug%9YikINE!cXsi~`;h4|8X-E@(0H zdR!-g7mxSxhDdGYryeEvjg2-XHfn7v&8#e$aP1no-|VltQlE7$oJgK`TWH{0$u zs#^x>E_2}Ji^#bAD7F6m8&4G`oZbKS{KsY;f~uqaoM5sOhxy6y0}}U=v6Gy;n>hBY z%JxmJ?DlgDdTFQOB_?AGO^OgJy_l_&79qVBopwn8a~2d#^^VT{cp10ZP+O1G(=xQS zEn3l=;T4^O#b)~Xrp1Xv9bp7J$io1eAR+5QJqJehYal)S%2nrbu0ZoGegkX9^>*RX zi-M6WS|QEGy=#1gA=0Nud^I4bsW3rJFX^faZlVFl=4|1t@e_Q0hxBE4~< z&c8Z;Ku)0Q$Eqcv3x-}=zzUnm1YOlXTuK(T6R*`>9LnQ6Dp{yyCWl24)V3uSobtc$ zr7j>4`m}+@d646uM}rYO!qwLIO<-XD{9wU-jU7bz=kdRqUgG+%vpg_;_sRae6#l<@ z66}&u4H%h!f1pjC`tN)G$MlL*!%Y17V=z>Qe~(E0|A+d2lj>UFHWU-*_}pt_cX%_# z5Y2Khpd;S_@R6WH%H8a%u8pM8EU`t~``h)0@!-#P+C8}@c>a@@O;bf+SnLtXQK2Yx z=^b$B-PWcRhu#2&YuSTd64^a^?&qa4Jn(qyaa6cWnv2z}DFv2iT_@-_DA>#Z^__`Z zA@|oB$#Z^mL%&WTlWQLF?yH}!LAs#au^nx^Ysbka?NB>A3mI>50Gqs8^;{n11&H5B zUfBU9kWImrNkE$*e_$3m02#_(nFEqo$p?F0{!;#1(Ck9jT;wQh*niqZNPXmfT(N^Vm?N5hOfP>dz#J@^rGB}3)W8+o#W zyLefJ&=wbi-R36ILIE>AMUS80yF-=AZ)E) zxvA!txXvh5%_UqJG;u9iBVKjqc5!X?dqD8EfU5x^t^xc=o-a56n$^(20E$USvvq8DMZta0uhB159gpj9#!qc6;KBT}b-Pj+m3RhBaTG-H7|r2Xypz zA@)49s`Rmf&@H3j?Mn5)GpcuUpf4t|S}y8uY&3beAur?%(9CV^RqLe6Nka7FyrKP5 zfv7mJwLjfo4dENYWt7^~BDtb#YW@O>0Nnf{WMFst5s`d8(5-4sl$TNIafCW) zliOD`z{b;VgV5A6@fVsK3Rt4|e)akq*VjW|_Bmj$jfdhiY8qLDR5BoODZ4Xi)2*LD z{Su(!uqhs$ygcoY>SIjE4u$NxlpIquw>G1_S)M`z*T?hCah~-z%-1rNN*y4yD~ufJC)}A$U!8WZ&&^ zF3n1y1<UL6n}EIuPZyMXe}7N0K?1*%Iu z%@FW*#2`x}B2L`$Qy^P^Coq<3Utk`!3^BWpVkw$0`bdvE3Q?0EagWUIjMj}<+`H@( zQhY?3L-X>wEaFQWzW^j3`` z(dAmTIm#newRDl>bt6{OfP{2H0&cEY;?xK|4%hU7=w-o#Z?~T*Aah7Oct7}euu~*hT7Nwr%LJy` zM0}vm4b=M8BMz1DAK@np)*W(YTrq|0(m~CtjknevIM?sgLc^YORbSrLP*r1;I!*e- zYE4MDZS*yDl+4~mLYnxmN!;<*J^^w7(COP~0@@nC;{CJ$;ER>4z`(5r61rSF<8*A2 z$A%M7pueQ0SnP}BZ6lw&qaCRTHuE6Gq^S0D*cn4h8=7i-cG94oJ39Gb$^LGKHoX=febi_Enib^Vj3|*oTs0bMw`BC zq3~+ehZ`&>q#P*ZZHj{U16Yj)UyvW4lI4cx^6`)kn)lI{!lFk4*~H$*K+=^vgQjrz z0yzxzqN}HO``PN@{0RBe_o1 zdjCLo;}ulckC}Lxh43^*r0NMKgyH9&gz{jL6$Uvd6?ux9hwbW&k^N&rHw3I9KN7VA~@^6F=S{)N@`N5j)s-qXy|H|Py~)It8V#*AeA7`|T=YwF7_Ekf#+50T>y4FK^+=i>_WN^HHL*w%req@>?hC zN$|)oMYE17htL=~%Ja?DF}7D&v2%=({DK#fnUf+z%lq6BuGr<}<$+F|Jq?uucKw}^ zS`YF7J6Ubi5T>oR_>sF}l$~HWP{A5n4`_uZ*My z`rX*fmwhn&YyR~V-VKSgZa%BOwdmb+3ms*pviecM9E0LQ$5LiXu-%M^jqFdCaQa{a{KSvR=sjZ{)jc1ks3B&{}jB%y}Iky z8T)=q47K#pt*gz(bCToeQQI7ebblXm$i9=*(DGBrrw#T*7ddfDeK-B&0$s( zS;3W{XFn0Q(Q5$(+jI-8HbEIyINro~ssx9Lf&qV=zNrCqqwFY81QC5t#BqdsqAXVq z$(+?wmX%g4GfsHxBIfbK`cjFr&m5J;>mFMoLymA#cvNe$d186q#YpM#u3B8YeTjHoh`tN&_pOCd$XRIJEnN1PHy=;j>r$~osyTO& zYt@FAm3SompIB>!@{FZ9Ct8$n6bx9&`chPXxyTi8D-Aa5L^3=ayS5%^Vk18U51UXL zIp22Z^lq$zHu+ZXf)5wers~CBr?C=SXsAZW zTH9e^97mL&FNh)lTEPhyePtD|MDC$)FuOG7AGPp%Erl5k_oElF|BDcuEGI+`O-;pb zjNlW|9xn+w%=Z`~Ri&fL%FZ&!_h~<--G{|z+7QEE#j=zib3n_<8eM^mQn|7AdXDTi zMg9-WFDCoJKF{axxY>2+!```3+D4Ho`nOZS_DCh7 zn2TKe5H-W;yNh1*^SKquG;P<%aXGj)@U?2TfX{K#1S_jlMU-~Pxfe_&=XnEp+Nf+B z0~T6X)01gYN^EB~HDBFS;kLss2Q`LXv`H;`Oi!C+6on*>K0%Qo%{CGhgebZD-98Tz z2`xQ{KT2x};Dyf#)=ZaKU0~3vJdJE1NsrXF8N8P4q32>cCE4!r$ ze$2dzTTO%Yl%L#V3h_7IUrOY=H(PW;TA!|Dzb_qgyq`1Bv~wGsFFw8VU@5%jW~aMW z4YWd)5jGV1cI@=VXR(eMsQnOv5UgM(5P8jGQRY&nZA+f8rqscbGn6~?)`cB!@MmEg`%KxsGeGM zSW<Fqz|yrG6SGt5LSLQeV^w&ne@0(p@r-;@2v`|-7ki2G}0(}5(Z|J z9BOWOCHNehnyqc4Dt2!m;Hh92N-^I|WqFY~duLI1^VO5&}4( zH*rySQC?zVW6;rMr?MOC>wHJmD5R!CJ=3}r$b)t&j=T?dJ%L593{~v6NAssFlunlV zX5<`u%h=~4;uB~P9I>IU#CRrs@-eg1urhHD0gqPFA;tYQeB4G*Xpm!!si`UZH=xge zm}au-rQFiR+@%vRv~8ql%|j4Sw^1J06Om=wRw45Ei^WW+wR?v+hhwuMu;jm?r|p9kSOtO_)) zao5^DQ7kj(UwbS4ln|Rx&os1bNE#eu*~uE1i1YVEG;Gg@b1h7Rq23# zo$ex_OA~eJ_Jra5b3`%0Jp?YEFJbuaIrtDG2C(1Pt)u4%AoV|oekLuhR`z z!J2n{zUCdGdN%B326{%I7kYT2M%3_RJuD49fy{Z6afX45&;pA?v#=S5fG3Tq`Rb% z#&4tezR&aC=hpxKz4wlB@5dMp&t7}&x%Qg#H|PA#we|{EQIf^MBF6%OKsfSpFf|Ye zi3#|Jfq@RRP^EEa0RJJms>w=%iu)+nfTkWdX&pC*hn6<>a1aBJ)URI*Jkr2#3ma1_ z7gGlYZcZBu1|Ap#H@7|1T1Qb*&d$nQTESdho72I}(i&(6dN`Uoz<>2oF|~rTb6vIm z6yoqu*2NTIt>|O{w+H%JJp9#*3kp1S^`M2+8N;eAm;s>MjSjv9i7%8p_T8>n$Bje|Nr~iL14#g_Gy6=YKZ+ z)SHWw6Y!+;Pm@T&?XPtATQ9$!llfQf{_Xu;;Eva0|Jx5h+1cf+&D8aD-Q;;JEUXk9 zVfGcgqXqm*ah$-FlZ%_Rla-UBsr>^dCj`*Q z1vJ{i-Q2vc&1dTF<^(iZyE)ij-*t6zcQJ>nz+G${;BIgib%d$8jiVKC69V*uJ6cFw zSqiwk9uH`du>o}PYdkko7c02izw)rFn~Rej{I_X%0CB(6k1+kaUVhK<=Su)Wfga|% z()Rbw9;#Y7@$22UaA3D_mQ-iIhU%~J{*8XTzlOL{)URGY2l(v?||0tJ}=hp;&%jM_2$1MPbaPsr=aq~g>AlJ8l%e~hBKgtCL0G$5! zEPl@ZXVXs<|23lj5`x^n&GY{tsQxRUa&!K>5WI%!KO2I)0@uOo8Y*1BuYY6XKa9D5 zz%n;K*FA^;Cnq;QKPM;jcNF?9^`DKnYwi8Np!c5$dXWF3pa=ak;<$fb|Hj6D6mdNK z0`~;|`1j|S_m5)#$&lmue~|lop1sbW|89``FM=F|_j;h~AjivjZLI$^x+afAP^D6V?_ORjZ&mlqiDXT9+|VL`61J^xPLd9RzS|3KS+xBWKxAKBS& zBK7aFo4>TlwFUlQlWTVM$K>-r+a~|$U)+5E%{IA?KYy?Z12>;NP(=MiCP2XdZW}&l z_pAEs=M8q(pVg@Z(1Dv10RRpF?C<-)FRQD5fD8VT1c2oV=D*v2f%5Nixq(vef3hF) z{G%W8|2NzGI>i6M=GQI%du+n}o4jAgMxNh6@4p~6a!Nb!$=O*rI%)tFrnDKSrG}OF z?_%eV7WiMg_HdoM|6NXW9lQSdh2Nh!5%+I@{+%HI!!(@p+VQVJ_b&wf-{VAoiLrl~ zf3G$8N7Vh#j>Wvc*HVA`J^!@|{=5ACFBs7O2EYHe@cHlh{k1#(v;CguA94imZ}@Tr+osa+77~|pp2aIw5!=}c6l)peY{~?ZXT?_vI z9mjwQ@$Z{|Ki3xiu!H)0JLUJ@?CNN#Xyvgs7LX0gIjNY+8Oat8&)%Pz>mv4S#M z_CQ3)5`^eP!ulHxu^+!{4DL(y1cc0Q7Jvf^@YVi9$e)iQl6K2zl zjN{gQTMhqBqOXjkS;4d85G{uVX3?@2v z+|O&nJgI)CyfHJDALTj1W?bdEQ9TsL0FRex7>Y(_+TP46y4$tM@2&NcTHC;UjF#f= za-UJtX|JlsVaaK{G7478;#Zy=t4TqoCWJ)iGdioUB0KNa=VSH;GqA(0o2Jb3E<476 zSKB@M*h}_qO9au{I@}km>}g8~NhW1y*XLU^KvPk&F_?5FhKASk7x%La_a}1+j@8jDF%$84tTCbQfA|2im@#j=+g@a#wZ*QE!(Uk= z0XLcWq8Lx}-o&{&Ur3%etyw-M(#0jIHaJzs23`{|Ub{Wf=nFU5m8!M;THxVgCmt9( zd1iI8NnkQw)Q|Hx!PCCH?XbY=!MM(@CrPf2nm;$c`s|(kt#Qty_d)V-mb?p>Pi|B+ zG_oklqN#mQhDdVqutwwhWKrWida6P@&LKl}3RujITgsd8I+PI=Jsf64jK@adn}=+T z*^|>$xvK+1%yj8XcQv)g#HCA0GSJ>Qa*LQQ`EVBr=25UBV!Y1^U;9SC|GMJd_x@qx zfKk1hV8V2g_b5Sg`>Bonow|!l4jaWK>bHjcB2Be>Z@o5)l6{ zeVYaPHKp6`;S2A*e(N(3PwMG(x?N{XzPeQ1ox6|OIhaN2mi3nRzOCPgQC%M5MV|cj zl%Z{F;I{d`{gnEdPRHc8*A(@fabxZJcqc9@7X~xIL zxbAqY_W|qK(b&+*+e>rrL3mdGtznIq%%aRg2J!GT4hrv$m+eE~?uywf)MBq=zTTlun%#>R8`!3~6wJmf zFsM%)W`djNHOo6|eLafOH5GWxUqf`Un<^X;Ep)Eu_Z~rXvD6{{iAXy^{S#|kF$znw z{6b-4v;88shqSZS6Mug5LxJ5sr)L_*Gjg%UhGgnH3}WU>t}(d!W~rs0XEUJ%=F`Rv zxKRulKX#j*Exo(<)|JA0GUMfs%C69f~laEi3 zXvy&pTm!no_Z|mNscy5VCkUD=6a<7T14a zlj%WO+>>zQf^rUDp`Too@9q)b7#U|UzeE{f*s0!x5Qk}Vmm`h(9SGK9upr|{F;ww# z%CLij7y3IIx@W@%>b^=Z-0YW+>OD}qk)Ja1tu!<5#-oRTa&#Vs6iya(H*YLZ77J6{ zgdq%$cuKQ#*}y0ls$-jWjGh~=-?k$w%l36ULTqw8I=1vN5QL!>b<6ddfgCiv$ha~d z+ECiTs#J85#&W+25dCNpfe7rPa2`KePczl`vpa$?vckJ<{gt#PiK|%^WhPRFic=MH zBh!-I-koJw1M^r5%dI}!cioqM(QhmIh11vG@T_PJM!jAx(A}P<)~PVth0%+W<^{@a zE0(d2-CO79T5X9WakVNkzF2nZeW(3xaV(%<)*`-S?e)t?(rZlfl25H$V`m1nvmjjG z9>7gbYZT+R#b`2Obpov!2;ybxAWv=fZt;0+e0Q_cbnYV8+#PFu9anKzyQPcG3Zf3F zDcEk}d3kzh>o0ev6)?1XxW8PYziW$l-){puh!1o?$1mIPL6;j{a=o1>cQcytkQ|zR zr)b}#9VP&VB;R6ZKg(v>x~aOU;9H3FOQssbb8Z5KLQqaa{oVYRFNx7*6@pO~)B5UL zQ3yboCU+?C*ZgkAxa?tzZ6$Io+SiLQC3S>^U8pq2DZsvdOATH(ro^d1S1Lnj;azB2 zoi5Y}e{II3Dq&|Qf1gzcg^t}!LwSycYklrNZ=WZ^Q}*=+30>bNE7&_Rte7!F^LwGn zA!`!40ZNWe`Rvy!ktC&!XpTY4S5Us_+mD}w?2%2w!+L8V`sD+KSeZBXCm8mhk4a+gChv(Ud5< zEgN+-MOPdTwcK%xs9v2*HsEVQh?A38daqAEyV3I z5=>>rEr?v{zI$b{;bm6~$wg$CjPpWZ2X13t18$!L4!q*0VjVYv+|4@6lz>=xzT7c= zK@4VHtdmH*A&J9kzO8pi8}zdmwBR);rz)$Tq-q-2}eY0(coT8t0VOnV=*>UUV?@ecH;M|rD@YgZSxK9Wq!iSeJ9lSbQ&%0 z`RDg>XEZ-T6Z}Z^6WRc{t`zC-b~6}FKa4K=CfTc+`l-n7CPBk`nfs}&Kygqk^LvRF zp)F+k>=Glv4i zdPl-91!93KhMzRHnbo~cCeIj}W1eAKb{i=Sg3zKu!{>dl7VO9+U}HotQe%A^(bc+> z2se1p-w@?$B^C~Wu;Ta?P{#uBXHltiU)fp7tf(#z!U+gs0(Xg62i)JHtCi-CCcS!M zYhH_vush-&vn#r#cv-eYpNGpn62LhdqFMVv*U-?QJiBt$AUG-VLlUxFFXT&!_oO#Y z^AA)p;&asp_%w$Jhf;aeMS}qfK4@>%EX^ zJjWf7O3i?I5XOE^q-T~MG#zQ68Pm=qs-kY(Y;d8F@2$U%Iw)^;9;1-sI6Dpkm#G-7 zY$3(8)>hGdh3Rq-0g*jXF7qt%E|o-)3?|Y~YfBl?T7_uPhzQJRyu`=TaOL*{rea5`_Z?dX<@j}C zT05y(dUpz$Q6!-fidq1Xen4=_h({BoG#jx=k5VvK41V%xf-ztg48Fh$0~!j;FwEXQI-QdW0mVf^~`dTp8}!i0dI-v*QK_ES7x1WC0Gd3SCD`hd*RpUKxcuXlMa;koJz2hs4-f z`uuK@2(y7i0o~i?B(p}qEyk;Z@$d&?4US6HOjZ5hGtcv_ti@#zrvYK1c3+>|oz+FJ z-XkyW)dC9+nD`LTG$~Zj9v0@|L73|H6kmO9NDjwv1TmgL0)mX>YRxm^ zLr4xTQ!73_gt_q&KoUQ%`@s&LjpD|ewGeDU!apPh5gCqnK7F*JJOYaF6UbPSf`!bJ zDzJ?P*w4B#5(HwCmj{I@@#WzS^M93r6;*wQ;<9s&zg+EAns+=sC}i;%$|=nISdW>2 z35$BuCvplVn4+r9^KK18w?JWPXD*WruPLkV`)C%N$_~!(sr=D~L%MI?@i}Wp(NuBn z;}dXPI;to!Tv2-ZgBH2??)J^&g*j$^^DbHHfhWoz(Up!v>(syIV8RYw-u}+U#C%4n zb-R#D4C88C(!L~Ql{`Dw3;uZsIA|ukisBgwQHX;Ax@!v3qU@Mv0Jq~voa^}# zIGcx&3{_+!gd+B|yZK2@>CY3FJ){e-qA7O$ql8e^?45c1fZMj772fjmkqj^)p4GYZmSm}QP$Ax18J zlSgc!gu)$26y^NUDTo$r$=>z8XaANB6w-sL^n(GMSs<=L>m2^L{sxm!5O?(8={bbJ&&>3P(Kjq-k>h3{ z+LcaBUpyQ8yf3U3cWYBru!DHhnNS^A2L^!Q+O<2I3Fg$2J>qiAk;+6cw&D|q zfGI8~4_0JQwkdc^}`0y;yqpqmJbps#d0uS-!)5Z_ap z%V@uFr5c~2KDp(~;XO+HFng!vu+9mq@xkO`)e0XqW}7c4$9OtCY$FaW1%IYJYs8n5L6nO^jjRa|e>np#`$XnAkXfW{OA; zvXU=l%v2+-#<HV2!8y2k4m$jBXFCDh!$DE zIz*SsGZ8muQBxTJ(BN}$kEc%s;RTRP!2&x}aU%;tZSs&-sX9eP?68n35E(sN$*uLF zo>7<)p2%rB<@MXs*ksK%WeLSWy)Bvt{*)70zTndYiUKBrB3EmB$fpTC$1a82KnVe1F+9&2EY zrej9)z(Aj>U4IW3WZS}0@~{tAbLQ4npW+Q5|5!BbNrs&fOoWH~hHwC%RJ2f-LXwU^ zWTlmI`1uP6epgab0o9pBS$5Pw5QJknA?h>A1W>;_X}MWM-@MsZwhY%))1n9(RY z8xt5NaF;+ErSuA^sZJ5Qih(B@ZW7|002Ni>4QQ2b9RdqN=MtXzsn;ysMm5B)Nxf|) zVB3^up(r4F3nT6Eb?OvnIE}pBn6@k3i)v4HY)}|yPVuQ%%?+WKrTWsuJ-76J`m^m1jRM0tL)p*c=fK5PMaF4?anwGc{VoVpP^b zweBqqhiw?mtcZCnq7ZZ?1w|7W%VivABUn%ZkbUmPHX%~1Qor1K(XvLyZip7~DDHl{ zyZSC|u2m-z{;bICvc>S11iJuT_0n85sRLZ>ROxi5a$h%ds-Pm2bTr?Gz504KJmw;p zrMSEiX7BrGnEIDw1P#@~LLP!Ya)TwgGns>)IitegjWD{sVr^mX9)8NjJE8nrhas^T(rU5R77)bt&8<-Uu`7@4-leRqBewt>{s50NwrtzM{wAdE+yDf;} z&Foc-oE}M8RXWKQzl$m>aJJXJzD*0>$&hEr2qJo;e8;TZK?}c{sJzp!SC4r_->Ki? z6$EYLahBjNeT!AM<0)dXT>aDzgoew_m@)ZKw!smYDkuafzA$gQsVXiB_W`wY@L8`;>Ex26=za2LIHvjb@XxqTHL4MWu7VM08sxIGBQ+vWf@@A6fuY1H zO|XfMw84PQ#QBcfD)sVN!6)$R@y%xu!^}zOU|sbcqPZ_tPauyGqyT@OMaL^saeOQa zJ5&q9PKO0IVo_yj2r8Nf9+ zNwi_NQ0opqGd`ryX4hF1D5wojN1{C0dApKaK9O`vMbNU{jygO9%FQjRIw#7#`CByf8zC0;KkWHVyFD`=Q+ON!*MWFCe(5uvvCVDJJQbw89W5d zh6{^_sY{C)>SU7-%HS+IH|8x!K0`s*0=QKH#655MiC_}N3L_YKBedA<_oBU;+8LdG zTioUZHS>%*k&cQc7X=#@fiw)!D#Pg=r*6h2L|;6ebR!1x#Lup%{^>JcB3&M|k*801 zn<-Z@V7LrG{jn&8`&w5(3Wa$mZiI-yT1O@58%AK(@Lb5b{@0vCS+z=r@&e0~{MY(T zuWxn55aOilAgy37>7fzy>1I$;2(ru{_o_}h*Wj4v-n+;`(gFMlsF=!J4GOU27p2n5zAg@f&s^phY^?fleRpShFPnkKuu8Fqv_Nra zWvn%&@s1mcQCP30&Z&lWT%fPduj9T&fx~Hsb&KRUB?!&|K_=-;}5IC?iF}N4xr;-s!giJ=}EM)dXhhZbM9PBH7~y_Dv z)1Vhf25fZT;m}xBM#6zNRpI{9urLo9yVPSZr9bHnFlS_Q!*9%s4f90OO)U zlJr80?fxY5D5p{+DKupyyu7X|>*8!9@4XA2-`_RC`vG_>H9oA9U-(x$to@w&bniPg*%b?p zOiz($W&{v*-{tAaNUHxvXiJZyw2XWY>9#Qq#)~iJrFe%mg8>fY(6TcpbgMVQ-?4d6 z-v9XEH7uGnp779Z&r@R$?kavdV=VSWka{Ks<;Qpo&DsmACY~WbbqdvHOdA@x9{~aS zA4*@@=%IY)Ta3>4rv+;`0M?`pK_B!ndgQM)lYFA*soHOMd_v}C;G-c=pR~KB*UEKI zR5VJ|+XtgcfUj)r2hI6?7WT^t;kI|ovj>R)>4Qk<5@n3We8J1eBw_6mA0;B-Zi#2c z!AK7*7I~N+5 zoK_(oz=BF*u39A>J{Yh;zB;|~fsAZidw%~`?BpXqC2V6N!LyYhoP+a&FIxdHB&O8$ zQ9Cz!JfZ>miLC|S9|eiis3STy&0rjz`1?bC~~YHSuo~d_17BEfKkwlg(j* z&kFQW5t80F1~qAsz{pmSW2gk%)wWNQdPx=zCIm;k?+ga$zEM{zKxOsbtjvz57XBi> zKrNY|Rbk*MjaTQT;W0ErK~AwYNg#8wjeA!Gkkt`0R>_k7Y0!ybV#Ycr!Z+Kg_K9jT zUY6IDV`K*&q|$$|I&A0~#&Oy8A^og6!L{|l+4ys4WN%24vX~ny&gPqmgCm{S^WXp? zWa2AO53)fc8<>B9!Jl79rZQyg8X08j#R`+J#tCGmSsBfNd3YY3omAx73F#=E^!Ss9D9qxE3Ei{Q3$Z`?NWoL$x?E~%DJb$|(v4g!dxoIe z?d(!p7Ef4HA#mF>Y3*HJBu1@!Zt`O{l8^VO20ezbiA(2hgS~Z=N!R!39Ap8gP?0yk z!cGew!nm}*56jKIfQsp`6`~ft*&^#we3TvRepp#U;T$#w=hS zQp|LRPpTLU)_`hd@R~eE7eqr+L%it;%RNboj0JXTmw=sG9XkEH%LM8)*SkI6JtoMA#K$HlI+^FbMZfTl{|Q8$SjzVC#scaSXzLiFmfn^vMg z@xXa)4Z>EHM2^cebd=oIzHG;f(#weC0Ik$T?_BlPMNm{?hnKEE)Ot@n`R$KXW6xF7 zndF2!(T3ie7^@?KCnb~%E6W#abVr`H)8VdAs&APzcz?4bWkm?39uk3YQmufvGsnvr zf|A*mDKGmj0D{Yj%q(r#z$~&+UZz`Hopu~p09HGU7mTtK>X7x7&CRZh^TB@q=>FO| zHpklRN2EEbiBHO8?v?V?ud9Up~)8y)!Zd<1ZbT>|a zA_w=gawZQ{KLmU8yv%JPyBp64V{Dim(b`K1Sfs|`v+FsT!MnKVU>>TL+~@v81P=@t zTx`>*tf-I@qegVm37cL4hD|<^*(5b+1|<&Gq*E^?>bYBeW;k2#`ZlE0Sn6l%2rb~) z#KJ#mu>0ySCX=^oov?ljij=1-Pu?zw6J}XX?T0Rn_t25)m=<3+2ZoD7o0~v2cAG&d z_m$H!2=hQdsizmlD`Ks3K0_hH7?bM0HYMMg3a$L+JYL(yGHys;g;w7@{ia)M>&MJH zWw5S7i(f68k}q;0mZZJtkLW?Zl@eG;KD$Eu2#6KRBhHG!qj>x*@v%h&e;1gfSx|D2 zgJ{@JvJ7XRvW~NfyN_&`m22tQaeF54LCo8Ujsey1$bIic%o$2WUo#%erPgq%sKCpADg>it@le)|s{2t@r()X&K&)H|zOWp!9ME5kS0! z5gee0F1OqhsZ&oIM;|QR_XOo!3Wc#Jh_C3iOV2d$3edJW;wQ788oTId~5*ao?LfkiIgki8v{S9{=C zG1=BJ6rOd#S>cS5vJ)8sdVq1<)*mmU|6Tb&wC{&jcSpt-vGOJv@k0u9xw(uq=cnCK zcGwIm1V1iK>SlAF_+ye(WfAc=W3t=@5@_09H*~GiawbI8gqWWNY9xLSoPqWoXQd!j zHT5Z)^!&DWc?!y%#*~+<-SY*vO^ee4KSWYCE@lz~0=gR6?if)SM)^8K@bXYL?V3wg z#?#!)f%7I}Ib^bDNoM%fq#112(Ew97O9OYkfq= zq5a2Fulw38+E!uBORo9P??j-gFuZ_({0X+jg2Y(jcu`NH5lD1zX{WI+8B%eG0uu`B zDxcX2@>=DztTD42K2`+f{t(N9Sobf^Fl5}VDmJFe5ZWaBK-4fV6WMQSE;~Ql@eLRG zjw{+oyDFI^0`!2YhD8Jho6S86f#N8MKh)yK=2!2P9*IY_n-#w2O**qz3d-agV4pT? zr{ofsWQ$2n+q~^+?Zc9=`J2 z!<_649q(k^w5xxa!M|8kmL_+seSk{a4(jR|7UF@>nt6lpL;`FYi0f?-F-pNi#T{e} z7{08}vPWkt{q~`1bUG95)?n|R8+T zejM>}p97xv1#Lc-VsUIYV;&d-(GZ2|_aZvUbPqpvTUO}v#r5JcAWZgI=zS~&rg zw?~)rUYL(~b0bqGhmrXsG;IKhbN7w<45Al!+RAQ?^^50cz7Tp3(#0u!Gja|W_{0m% zN00FJLtOuwuwbv`%g~JrS`Re(Ls3^^jPaC?{jI z1wl-u5&{As%r-HBGi{i|l~6G1YczZ`4t!z5M&_MWa_|b>n`*=Ginr=7E0qIfFgl@& z8r&@mgaP5sTledsZ;3n!QxH=aCa`B?K}qeo^EyHZN}|M#~eF9W88b((LSRA{!9p8_q(eUsE!v^ zp;IQDPsg+?sWR1K}Io z&^9(t3{BTr6oV)(Jno3EAzUCAA3%|r!4^Y}Lh`_Jp(CS}W=-xrm-UG2edikK_`TQa zfeCdO3MwbNhpSq~vv0&uI(>4WXj-dzCOyu^KD9#NAj(Wgu%$4)S*eN{ilmlVJ`N;% z)?^0{oQ!IeD(om{X0$%kthYOlQ3BKmi6`MXlK8fGj^-GW82Acl?ZD>*v>7zDmSI#2 z_Z7&t%WBd31Br|6`648^bNC3A;%|?Md*USOsPTITpxrc4Yqe=+>A`ds03uj(-d&=L zJZp|m8^d!N=;+A2#> z3Q&)csqE9Z!OfXj!pxO}v}3$Ki+ifum{Mz0vPQWi_j3x9%*hT)cjv3;bs4|dtLogE zLspQqZpK6iN&~cbNA3WAD{caRpB!Y(8Q_%=_F=+VI_mmv>N52XO2T+!*z2`AJmJ}#b1F44H$XLwZ%v$>Dc9>gI9m6ne~-_eySDNUqX8 zeY$mCD($o24%%=vK7HDFJwX!G{CSzE90C~#6|^ya`yQ625RwwJdaxS_hP77~+YagD zd#`;_9|wJh`ajkr{Bh?gzeSGFhgchp}T>>1Al#2 zuhH4W?+Uz1DYi;Zx91ZozJjr@T%kQ*#y6*7-r#cmT;uF^0i4P~${ArPmfONe4-huY zTzSC2Yb7w$6mY5%Bn3jT95!D*s8_JW66!WTloW6!U z0E4vohRoF9_)#~l7gY|jkt0LK+IO6_rc@HBZqo|1m0LVQgD?3cx)5PTKYtr=%9%L4 z@<(Uj?-*Bt%FDfw==!i$0B?+)LgHU*T^CjqsmyjZn3C%^bbJ0Ur@%D|D zT^unYA`8K0Gc$q6X@lCQ1>9&#Yq~EZ6>W!{t-s<500dKn2i)twz2&Z1CDem*cI3b^ zt#Lv(c+VovJQ(`?HU?@58pqx3&(q&#&AcYGkBdQR3T8tf$NAY~bUaZaRK5A{y5W)J zoTy5M6(vbAF?_Q@al?=1R`uJIJjcF^pQI)=2jxeMA5GNUE2u*^=C6n{p<7$SzJ>j! zZCPu4bCyEYw^YSYH(UUR)&sM-UHhdfgtT{U5K)lZjG3VJglNyn;mzoql05iB&c>GF zq@@B6#|w|7fP`Ur%3q3=2ZF~jXXKz;Bt@27R}0s-w@)Y*dWZ~kABaB>(;$uu(Vh=L z2M4KJHUU+}LCrjQ{qpaXQwti#`q1#(z z3dGFVoikQD82oT>duV8`InQ0`(OggznUr{jQ-OWKz01P)l^B-_l&)ViAwj_xvkg)2G}mL2xns+$nX)??+31`j4^8~k*d_2%^p2>Zrlz(l zEpm|vm@Oa2X}|;6GbrgiXn8uo#te$nY<<~_PD>}x69+`Pn{ncUjP->}j)~6^ zz4>T8jrW9>qgVZo*P|8$Wz!f6s;oN`^ZPN#p~xwl-?^kG2(W!s06%UH<3fazCjfp) zOUK`9=RzZk2sLRz9Nab1uYYk!`wdli<~{A_J$CxnxmHDE;@H4vV^`mUvvq$t9mv=$ z1cBBnIonX7KVFRWSTYd}@PW-RKadg(1Tj}Pb3}_s!2B0)Jq^mLc@g|Z zHN1!18cFGIUyWCN%Rh0{ob9>s;_WRZohyyhxTxm}^#_43wD{`7=-;B$ULq=d=36{w z8|$L8uKbirai~Y$%-RL6L+Kyak~hrj;+Doi7TrB(pZS>IiKU4#ZE0U6JIQ z4yD!9=!(1nc|z5N2x<9-D(^0>AylQZ<0MgEJB%v39@(Sn;bt*X`?QrDeT)zCFS;r07tGTEbI(<6LHv6Yu`(s59No1i$CXn>aK5$#?p%Y4}Wa&I1 zbL2&4zeeET1GxxGXCS}bkr%{c z9me+QnoG<$P6pq%igS&O)G+wu8Om&Cy}`)U5Y)nCf>(FUD3o$XInb!!rXNreRT^Z% zo5{maXw>`9CDpY)%Ikhxt+HyakU1!sk39I)GYhTI8cn!`Gp%O$jDjDK1adaKxq5-$ zIiw}@VV4Q{^T+1hB37Et@Am`-1@8fCGrPdrjIe`crKwc?XFu~Kr}LG*8S$d1sGO+0 zqO~XM4S-xljRvw&SNn2NPQUXH-A5;zJ6{A3oHoK?2UZRzxmmZOBy~L4=dwZ;gkI`n zAhY{3owc6x{-`Mypf?6qUuxIg$uDu>JjR8r zZ{FH|O4g>c8<)WD7FIZmHui}-B5%nCWfLpGF)A&ZDeD~X zopanIWMp5UYD8V=x>kW0)$9Ie{}h zn)JqI@f$<-h?zw3>$NhGFZi8QnLC^kT3=g%;q?(;l!<|A@k z-Y%>iHWCYOd>db-rr3&(xjC%4qEQ$5SnZF7exsJgyL~cwiox6A*r)DfPdRRhenBV@ z{EiicB{z1s1U>}#GUXe*`CBfXK@ zN11jdtIn`ThP_c2ZbvBh6}bpY1Nlqdh&fF~Xy3z3!f#&K25lJUf) z;;9-lpK6XbJD$*<C5M(f8s4n+&|>=#Y9JU#Wu--eaoL8MA&sPVyh}tzM%f2OP|{iBMmc@( zifRaorRH}@E_~ReL9=Hqo*U7mU-0z)Xxy#Wc5hGddpZKJ{NG{oXp!8|C@uE*3gsAq zs3xF4@ST!C)k`w_q=FT1jN-jGBj13o#4NudeC(E!pi|*BFjx+cRf(9Ce*!7)e`PMB zCT-vwT%q#VDP{2K*M$HUh$z>|vygTzvR%xgi4Us2X7rF@$S_TAQHhYsluETzI2ytE zv1HIfGTLLa@mYZzGqr?oY{i?nkyO6rM*9ZwqS$23Q_*zh)Is+knts1Jzwi9%$JLMhZyGWtgBSgC)QGwr$9(3 z*oqD9I)*8K1@`yx8E=M+UA$?Q{djSMyAxP6Lk}Q)V@C^yN{IbhJDwZXFJjY-LIhrI z*IUjfk4Cbi2$0ZdchHCht)SeEXhI5!XQ*seTSrJX4xM4YM8dM;oG>vE2PK|saT~3A zw)J*Nqbb=38CJ+D$;Ne~k5J>Sped@fiYW;HQNK&)yT5eyMoYjCci5~6b)+VI6 zQ@W<3OmuZ%XPo{3EdBuDCY7B@)#m6UP0jk6no)oiVyEn|I9?M~(T)qcQK8f@{4wBs zWZGk3S&e80#nF|JFjhHQ*?t6a%$JkCC9df=)tGf_a`|JW14<3ONoFYqCHxD)j`|JK z@w%gCOl)IO5Z}Q;5a`qcpzgn9B z2TSg!lqLp^~M2II&=5P5XAF=-NZN#y!-EW!Ven`1&$IXgbG9(r~`5L`%9hL`L8Tn!N1KFQa9x#O(K zHPr0ZOD2#79B;#t*4?_Mi5P7(j1fk|!njw*6e*oh7ehEGJB`*fB2sj9N0w zb-+yn#=ZuWhFk18E1-)WXahs98yl}+>)IYQvEg^Jj_>n0_M^}NYdEde~RvucaKlgc_Kwdm1^N$CU(UQ4IkFrYQJ7y?z<{qr+xQVXe7)o;`l@n2Sm~QJ8MFxR0Zmico*%K6 zN(S#|Nx%}qmy&=54kfFNni_gGx&8ftBaVw>y-clKL!Fv|gwxcczBKN|l zlSe-ttJRl8DQIt6pPqI3396jMX({^809?38ry(}=5sSFkwrqpiLIxYtg5dsr;P@Q% zj7^1TP9GDsl)n#4-NY?=Cyw+Yh*q6OQE$`po-`}}Ct z3uc_jk4C=8xh)l8+jJ_qQo5|a>s#$RUh%q;xS{k)3d?($B;$DigISf7?-eBxHCL#z zpgI2>j?6@}Sx!PQ|3FbIh^S@>IO9dFG`innl*=Gf0xU!fY$!}H`7B1h8WV8+!g=@B zPJ4(i>+LU;Atxt;MF%@PJ&K<7wJw+M-2=7naREm$S8aeZoO=h~`8fJmwU7%mvu<-0 zjpkOLs?RPKT?`v40_fiZYW(aC9K9mOVUC;3qD%!N>v@;NP%m4dLsEKbqGciV3^v`Z zSSq!in^)(fX&Q)rm2~o675VCv^Olg65)a^Hapg?za7w&Sk4E*&LSIG4?EuP!yNDR` zH*Cr9>rFxDNH`VwrrgpXnMPtXjrQzfTd!xq{Prkt(ixH^{X(Jg3?VxlBXxCFRcz*I zrV$ELT&6pm$dcJ)5yWNFsInbQ`(3RFn59=3`P~ z&hdO3C->s!F8wo`hY*jmw7BT5orGNUBmixB+JS==q0!$E^DNbs%2c2GS>0(3Ojbsn zavbu{RYv|#0B!)0|E04^sXpsRfs@ZRJlt}-WM4hGecH(vwolE?F{>6te($LXEKmWK zW;LkVF%2kiQt!eM8bCZ$Qk#xL+jcl9H#sFcV!b=C3)Jqc2<+dtrC3=?>V>jO1OFeK W^j7Zl$*3a$0000MZsg1gJ|^TN4!1wivfMR^TnC03Xe z_}kjj$`riFnHpQ!Auo}#aI~`lEwVf?7#9@zA81rJG&6Lx{OK0RCoy$4G(Ub4UV9Bs zcMn4sJ8M-dd-v;_rf_MQob#$_{vqdcqjEjd0#&i6MvxmLuakIIdow+p_5yr|Z z!wQuEcSio>l|rozcw>oLU2BAema?S0mASEug0cDysEv`C#qr&24Z)m`c0d&@Am`B- zKi>^~PaAhRM?-rHWjm9jl{0Zaz6il1czhodkK+b758}AN+z~xEYTn2ObxTjwXgpx9 z$h9zcu{1GtI=&YDfOdA)&X)GSZZx*DwKX+H?JugMp`)Xn+poKs*;%6=8P&%gJk_u6 zj%-jjHU#T-^wd9fyJ2bK37!Q%0N#@nXO&K@XHF?4aZ15FmrHrA+5o$Op3jZIZe9W8B4olPCp z?UA~T)F)szBu#BiB#=@8-lE0>EwaEEfCl7v&W4WWrp|xmVkc)uJ1bMP81RDb$6IZ0 z_{*xH7V>x!;HIF5@l6?PMI(1Lb33@UqKOTsrGun8Cy;HVcps1VH~R4&_d_n!aj&BR z{y;IN$M%&@(`7 z@qc{=yuhl<{-is4`S-HqCWy~lwUy$Gn9u5 z4&~t!gaM4f%friu@-Ap${>Q760Q5KN1n1*|2_pERa6SQ;073w*ZRi>NVbuwZmz@4j zoz#zU-QVGkP;TpwqE38*zjjAF==W&p{U_+pZ!{PN=i%ZPghF9(I24NDMcHul_VrVx!@oAA;xLH^_n<4{$WLGzkmjZ1+We5zN?2l7LW|r>AOoJrIJz7iIS=%`t2_G-7 zEI*GVGF$NLfFGG3I3DzO0)CK>b2dca|uJ6&p(`|pFt#>y za(V)^r1Ydd4BzpySSGVQ0{e--!q&oe&^HA(8?@KZ8Az`*1>T>E9+Mj<5gu_wUdz zznwlWFKYV#RbmJ9Kc48{GpT9yOTR9Wi9T zL@ELfE=Yb?68Qlq&{`0sAjx?igguA=ke8u8N8d;iFyny!L){Ln)A1dDJql?4X=H%B zp}!0W1EC3Wz+YPz6ai{zZE23IBQOTcuPKOteiaR&`ut?Ckb#aR;KwCUm)TgFm>`YN z?*?7JHg12_i2MVCCxPFf16~1C5b~3mK|}MO%?z5HM(fDGGn7nJ`FkM!5|>fcshVEia>gEsE}e0||X z{HE#%@xMi1P_gKrUSIyY9{jyUKmWG+!VCY6zQ9nR0&VR73H}B8YklG4K}`WYB)>;|p<<9fxWD;rcmd@2IJQEY_=rFTbGRuVoJ=$j>E+5P*Y&i~LY1A20uJjQ2lY`~|_c zFn|-1f>1u-D)Pdf=QV(`{Ii{eAg=>rfc%I6of(&i9+lKa7Wu3*azbC_e%eCxCAMYqaFg%$)tf zp_Cv$l#7RlhaV2-=j8dAcD8K{bhw*>|i+_PYXvp)A_cI76DtCuYAMpSC zD#?*^`YT^2gLYiMd~eUsP~+%R^fv_^e}K9G`Atp7m7|fyJO>yUj2Ks>B{baimUK@h zDu3SdgPG9CJT8p%y(js&M&8R(YDinPSS|fDr)%gv4pyVv(W2}oCpoh(n$nQ6$xwy! z%eFPI{P^XNKzgSL)eT@PBLgD}j) zcdq>9O4JPy&{^}J?swEW8O|ey{p$&Y`v+sFvuzvx_VOnRHUD9xY7Kl}T-5FV@{)ON zjn{w>>kHgEOR9`XvMxWZ<@FV5!w&SD0sw>rBcCdcA&;Tb~ z+#lPRMRkV9n;AM2@wCWtgi1k7lim7mDB0-Mw=ZWi0(^#pMR6yZe_85qaVW&zfR;eP zTUj2$C{^M7*;goD?R@&4V_AvC(&gLUZk7tx6NC9Y9*jSav;i(^>$sBw*_e-2N^pD3 z@2qrSz<+5ac(M**pJR0(adX$F<*ltzgXp$d@5)2 zux9?;8rQK=>i*NC*b{?|E;oOztiu-gNu@(M?O?$nvwcd`{sr?3K2u3KqR6yh#MOHA zV;~qXiNc?~Z_0d~?-Q4vZ4z2Z5!klAF=MWz&v5l8X<0$8kK584%a?o!>a@)cv$S}U z_kKM4Wv+oT{@Bw7F`vE^GOZnMu<|*naka)+4s0-gq0^~4we%MuGw{PfK#PCSYZnKS z+g#`_P?|nBRI3X~AD!)(r(Ie0X82i}vB2bZBp|0S`Bt?`t1Ps(;vLlM?xyi2!{L}b z*gEtN(c{~nLO!)=7NMT`n+xZ=*3~lz>G}7kLdoRQPP|FN_rs_{brrY7>9-q{O(_N3A8INr--2v#!*w{Ge4NHfYCgArKEjf9HoM&*s6 zxevxvslY;Nj?49+*N?}Dvx96Vw%`ecar*%u5rBi>{ zbpBeE_l`RaCFXOaq!Y`^Ln>g-)Oq6?%;{dK9_!0*iF+n7=~4w*I~tB>PI4X#5-KbT zkJ!v=3KEo@e;3RCKtcVy^UNf|dH$9I9$w?CL9=P{oof?U9u-Q_Jr!7u?R$|Zn-=oM zI>ABsBg>HuG@!>?EaHSOm{g%wEi1w2b{OG zm_7^m$_?r|nwLKwepNPJp#WU&oC|l4V@{r9i!gZuQ3a}Lmz#wBnLrO zw5^S@@_m{S+VIPAx>it)pBiWB7JcoXxyvW-ZVR(U@KAK33~VRJt}(*8OU zRb9Lvk=rbM*?wMi{jJzTXD~-_8dcgiV>i2>km^4A9z_>MgjEa)N7^>K`(m#g(k1M_FZQurzI-`ZNZOe`rut33g1kZa zHvZ9rjYa}kFm{MDXnX)JDi?V!Exb7-1$P@ObJ2M3dywqLAd&w_{$Wi%_x++|yrRk4 zo~*0tpWdij`gz{!HRIMi7n^yW^>o=cG-Sz+!f*=jn_$`OJpXJpz9>n7%DlHnG`!vG z-edO#`XphuNUrz$j?X(DUxvs+!^H<}CGy`B69!Imjb*7Wl77=S_!PRDd4F1HhafbH zR#37bjAE`s;MjTi8$t5fgN$fuSGJ|acf!|q2}iOA>|I=lvKUcOo-yyWwl1|^l@a>-X- zzIc>kFjqcUfoJ(4QF@x9jb)e!^E3AJjS!VI%>;#ur>st*TgDS8Yxp)2N%#LwkCsXIn&beLXhWM1i7mq}W zUpDChe;8n7WaR$iRp*5`E-)&QJB0Lq%yEmv8*( z#rmCglD+uVFR$NcUeD9+nyL$V>*xC=@7;V5g`WLTQ+!I>C6XW@KVP)63=k4O#-g|^UFRK z+iO!QcOMQM;T`{F5{tw7^m{6&BlPm$SK+vP>CcFuf#2@ow;oi*X@)!!lo#-%=%u>4 zQw_}iq6mlU`poP1ho_%T$6mT|o6Bu!FeX#!sZktj{_`Vyh#<%OE;K}+hFST1t(xY_ z;~dR{3zi6Z*QMzzEqILpiLTU{Z?@3;?K{MadKC*Or^!Z0tLddk2DK%{H<=%AKr?2* z%`erRY>ILyObcgHnX+ z@5tP(4&$zpH*O5?YV?0ZKQqd%kxkN7;>~-k0pi<`>{|ZK2F}g5>_iXhCalWGUU(bi zRz*78l1|hr7lpcRxYgey#YrkCf|<75n=FyIgX?;WXs_*Pt5`_#%Yw77XH@c;NykMO zNJocrzzRQjw$PhDkP5!`f#wl$ctCvFvsJAv)el+2A?oT`DjD_lKls?c&3s@9G$vEa zM};L?cf|L&uBa_MS*$E8IbCkK>(wD$VL|kOC0Ml-)0&jh&ExWY*X(RBEu@bK+rN`B zsy?L}@|aWSeCSt|B#{?+Yh!K`8ClPLhf^-injYOWSpX~R>hsKc^Q`W}jgo9Z?Zq68 zcp;{kue0q|^NBUMA1p2hr4zi=p~Qc0cxQ+i8cehjE>lXNQ>2Fd*2*s69NGRo@F6&Z{D&6lOS+v z41siJg8THaM2Vapqx(D*C^cosVoOyV)=!;H~1(SPIsnZbF-P1kq!FDb2 zNC!-qpw9J53PYYksYD)#p@`Q>+&ZTf&xQ-iUpKuVa*(fSAY!_EExuqL)*7?X#F-@K zce(uu&mD4Zy``R}^UCq}FrL0?er%T#f;~w|OfRCfXLt6}b+Q_V<>9w|j*lu0ky=u1 z3Sqoc!DULELHFS*mH8{8l3idAUpTM#JW{=&a-YCKJmg2EIPQn&0G$G(potO0Lt2j1 zDSh*o)s(G#33tdKE%9f!x<@b(3gxcLsS7<>VPamJOCJPGh6+t({KJ(TGZbR6lt28S zKY^31vfjq~gDw1)L&Mor#c^z65t5)Gk7EZ&86XbHo!I8J8LoIpM5~n5*5>4VSS!;) z_b?2fyiJPrHJjvHtLx6v`j|uyutiVK9+od-8CM%VT`^oA;}aF+^C&hbY=%TL_--3I zxy8#7-^18$oTR(s$3eK;L?`47ap%@Y1zaTbSgIM@%kL)U8)@ta>@e^qK4&sg5snGi z46*8Y!(bg~`jAF#uzV&ctEq-M#5+F2qd4xbZS7H%&gs z%yJYibW~29Ox6gPq-;6c-6{C`#%sNP!3N%ARBM2<`~8btT?la&g`Rr=E)lJ~OemSd z+U+`h?TZqy(9PtBxt^5-*)S{h&CbKR9kyv}sK!&Nn;K8I3z3oZw!y4@78pCX>0?<) zGc$|$;YQC@A6vqb=0q8R&mP3LRV(U6HV2K1EPKu`?^5V{$BcV_YfoO83qwT^fl`xO z6MkFK@Mi)x@!JQ+tyGxhcpp@$M~apSGM?k>2fvfAInXG-tw%kWm6s9~W`k{fKX52< z+5piowF1F{;xsp(+nZ*dK859{O43L!ildT^Zm5vCUtLsBVe9baLWFch&CVrLu1yg;@eGadD|A^4SdL z2Q2o!IMe0sT7L60;}F+8G=O{c@$18^}Km6Ph+e5~Q2%~-WI`^{A zE{S_%cg~I1Tr;Du-XHGHBdXsV;p|CqoMV@>_b1KIW|RMG7qm zC9hd{6NUBt)2!0=;oPnQ$965Clcf3zt;$R~6Pt;2q4}r!`@I>8*GJ|ul@stoF=^ED zCBEvqjj@JqhO>`Ksg5p`S{8|ka@`(L&5>tUe%i^iG&vX(P(bfE(sDL(&M>2k9m+6A%kzMPCO4LqNCn#7reCtaK-H` z57l=F*QF;+(?XWf7Puc=9}L8LEp1m1ULoe~KYcN5zf+3%WX=imnD!wLGJ)iatMgrH z!#KuGkr`k;#uHC^XdFEeZE$rolhl0d4L#DA?3y`b2kT7?k3hIpcYVP%$>E}SJk`0Q z7~`WhurE(A2#35QY62XHnwhuN-ZtakGt-kfS}JN7##B z)WNy-4iapO`ThRWFvlQRCEGwT@M<&tleCn3gQ3rTcke%V_^{zQFfsv){dra`!BqV7 zz-7pE1&VROko#F=(N&wxrYrUl;4~Tg+&FTiyeTJ%41FJjVjP+DZK=r@SxMbh7tWkM zZ2|0X@Ev~3?sJQyrEe(XJQ}BJXso06Te;t>#9f8ttjt zAU09au{FU_zGQ+2YRx6yJB~IB^3<>N-(RL7vluUMtQ2~XNq8o*+kdGp9|UICeSD_v3Nz z)?k0RV`8DQwLAhJAvlh|V+Gu2*Fr8`V~ikg^JJgLq@|%un<&e53?ky(%?~m1^l7-E z?-PqlNKMwV+;S_7qwr{nDr~3(te<4(J`P?I37iJjIswiWbqv*%NyH2BH+U_RIhZX? zAEz{$&9*%?+U`n|>rv#cH0jrPMYmNObc*cEndt&w&60-kgwyx~g9QOR!}MI*Me+vN zPxAz9hVO2yeSFF*S3VbS^@^#2b!cqOHL|VrJqN5rYa3?B<>)SaaJaX6YB7Qj63&9a zePw;LJ0(;BAc(nrqSc}-AQiz&_?>=|x6WRYeHxQ@Zl>TqDhY52Pv$cWgch%?RDA^n zptvFAr=j_rI>nt4@Nr5q#@#TWSkeH7k^MNCI`vhG{Hu8A7~>mY(z7x$OurGysb1^C z>Z`M^Ve!Y3H<5wy^1e+-+*rbi!qNYL#H!-tSO8Y1f<3zb!>3R4#{G%eORp=sHQb+= zTeDaA;@OSfW;ac$9FG6s^yIDTxYWi%;t%SUnVDo3?CCQXR4f>T(sw$NL={UNgScn{ zr{j`Dz3}%xO;1}+H%7GBxTX`Gcqk)VLti$TePt&CUdw7s<*6Ouf@OYnf?>b#Hed|%L1agv$V zL*oY`hen@&@*+qzeUE+3f4euiJpqn~Q#)|t!queom#?ehJG)8DuAi{c`bgz{X8ct~ zYS4R*LXDzH4o8Evt)(Frp-&4L_maF;1U=`TxnWJT-8eS;BsiV>^1iDwU%hCFXg!m7 z3j4W1AdRS76O~|ITS*Vv=rWJP^KjYVwi?@}by*;LGCbd0O}SflRqUWhoUVoJfj`4X zfr_=-QyZHjc40bWwd#`ORD5Bkt&ex^5+vUI$c*P3|1nVz0DaMSgt3E@bRkZGr0%j+ zfk){K7m~pu%ZCV?SOLGo{p|kyd5=co%iD~cdKFr9_O$aKE=gr|Eh@7<(jB5(m$Oj) zv59%$WMUBBHhTBunfgx??$d;eW94q*jQhIDtL2ea(LRkjuW1O7wlqeaoJsPs7nQ&>DE9 z3o=n`qjQbLTN1bNG*0>5G|Bd#91JaOu6nnP^a>)zmfMB6nYJDu3YYJ6L;^e|t_7J) zCwPCVJGc9^S(1a+2t~j0nAiE9$GWC!bJOIivwdhl z4}s-;^WDkqaj(_4U#6>lj{)5oOx26x8U8n)s@C4mv|eivF?M3O{*f{v%x*9(ZT?De zpsas*s1^%>4ToAUp42pn4=)}|XCmw*Pd{} zy@`TbuQ#|M&UK`?=G{xaZU*|WqwP*3r4;iO0g<#aL4uf{Z%XGKUySQd^z_H^g+k6+ z72j5wrPK-X_VOM^O5``$we88D&I5yd zwx*!UlE56c-GhwTRm3q0tyQ^%9B1xn5tV&R-OJV7+*^I$<1(OoA@lvG&`a~cK@_)~ zZR%*vAq%x#zembaM4Zy~Rcm+r3ov8Xq}(vy;5hQlg$Hopv9CAjh4*JpPsgdv=I*kPJ6jN>flzC%om zOd~^ueMy%pQ^e#p2-`a!6DtJG)RmH+(L=`TgA@V!zB@s-B~$OKjr((NZfs9S=~OYE zD_HZwp%HXU?=NX}Y@(ePe|TM;O{(UjTY(30_XENE5=<*4)g@-_@u?^HtlQ)Hdy^&Z zB>O{VnaMU%rkZxX%{6=mw`Co3>JY6NU0k$cR1P-iV4$G7X@b+-Dd(ED5GIxHOE%5) zE%KAdF@jqP6#vqn7evNNaNi9MHozpTzZq*-ds z&Kk8$-TR5|EM;0UGGVN3c&0>pl8nmIGhzk=@TWsx3>3-|C&#iKJ#Pnp# z=ZBc341m!NygkJoHkvAPij+_2Sy(vB6|&D6Sy@*8RUinVZ$XYU$kaJ1Fr6olIc^$F&VOXUAM+z6@%!F)gFdWs|7)vE5ASOH z6LdlgTm3kpC3sGZ_AKQc4PrWD%0lbTQ2tf`O%-Y2wq$kRMzRVnwsOE=AKVw#1UgO5 zQQQ=8;_1i!A}58(7HFL9B~OE}XRHL^t6#W@{>ZEkou zFmASEP05wT7_OdNQK(x@)$q$7hU@z#P|NKWXovB`B7W^AE!QVYe%(y)0dE zc43za5;EU!DdpRG-kN(-O-U0<-kHC>?H(~pLe4HRjzixE%uIcsDv`ekVZsiC5d?-o zRXoAI*J-~XIK#rRLe{67tXk_6gaVB(RN_4OF|$Sh(4qku*6zfu@cXOA7Wpr&A)iRw z>QpbBdlARJM@u-drW3f7|C(|ZBs<7zpn*nYcS3F_8=t!@9E&6bJ*@@@;>y@#Ggai% zD?Gjh^9LJaO`kk--lE|OBy>i{P%9h9;|we=y7$3mJAA^T#v@v4&6JeU9|@K0rN}+z zJ5xZKlaH(-nG~|UatvLi&tUScX_Y!=JUJa9>TODuwr~~d6u_)nU?k%JzWp}$YP>jf z8Zi|g9!?NL*T^t0;xz(NL5x^NETLnhv~CNRS=pZ&n}lA7Tqbi>C7`7}L+Q3WtXZ*M zos75Ft;j9&dFaHPQ+N86;$6pk_)xVZ9Q_v6qO~{~SzdCor6Hhr=7b%IMOY{B?@dw$ zWufMTjakXZJ}dR$)9#noQH{N%9Rbypm@tBFWB8b}dcEKQp7Nz^VV6Y>fq@!Q zIoA@EXF|c;t=1nMN)nA$gM`jQ`@_PdRum@!n%2hp2%kK7@M#@ms$H=vCSH&`YMBO; zOt)plGdg-*U@1bQNc_ZGI-B)R4S{aq@9ZseC@Kd@%{QL{ukOv_ADIZrH|WI1)gR!> zoRg6nf9tk7?&=`QECvz`Rxxz4%nc*2s&;Q97pGU{sinFZM@Gh_Zh~0HJLhyjzVXhA zHHC#wVc z6=LQS2?=LL||m1Sv-O0C{o>lG47` zf3K$_>Si<(g$0cFlOT-Sf{Y!TAO&%cZE4V5o`D+?YaV_s<&jQZSlGJ&&yb~jd@>P} z=mDEislBL~w1*&dxjo-FD=MF%n3%G={+Xba#Ovzfy=pCnC*wX_V?{;{q5hN9J`%v0 zpH!wY0LlFndQneG(GyYT7+3*?&BS8+`N`UYtYHL^l+*#8}TbuGLSW`;JB!hnw{b6Zuhdw)>8Q0;NmpZk=zWp$3m z3rTnJpDW`HV$lgV)8L7K*ENFc&Z>T-x+)p z&>KvlCt`NbemnqRQugz&fAgZyw; zRYFY?GL-5i6>tA|io1ewHfZs5VO~S%*|0JO*{hJynn!Hv_4(}I!n=y_F?kSU`sH%Cc3x`32$mar`^<{c2fjt(Ii(v#R zf_lLkQ_N`?WHkkC8wPlcX2fb<$6geR$$o#^fvF=nkd2aWH>GbCN2)eWDul?04P>gt zd^U&76*x~x2wyzoDX6kwaIpJbMx5i|MvH^kNAVpg$CMC48d;ycP0fCr5xhoGuT3(} zWJ7>i*a@lmnjQ03E4TEPA@m7RA^z=xkY)~8l@=hdAJ`ZoN~6cVw>)AqUZ|~+yW^dp zS4*Ow38I*{z1&HOz%KX7O=iUDrM?^YR78Xt>D_EitC|ZCy<5c{=u0^tmu#qQvn?94 zZwP3(VscJxNK3uw)~NB@mrU*N9X~7NePlgY=?H&sD zTP}T?pO~7T3BT*JUEe>?h_S+@f(l&j&G&(!&8-F{oJ2(J@58? zzZew|UPpCFM$zeBXV+QZ+Y!fWHJe`Pj}n-4k`#6+U~eQ!U%;(+fl9 z&oyf}H*3z}tg!@Q2xcMkVmh%h{pNlcQXoSqV;{~;cD!>NkjV_aYiCx=S1)s7em?go z$@}XS&mRX{4G(e=;~`44+jYCR%BLqlrq1ew3t&--IhV5cB<=7l-A{`1WVM)msitfq zDhvoI5OkQoSB49tYXN3BFTgi{8|uhl}&cen#2z$}#r7{GQLZweNv+#DmRY{Zu{u+Cc9 zETg{m{(yCm^^@s572NmoMiijxvk1lk$4O2ohGa6l11W{1yM{*ZfU!Xs!9Xz-%28M~ zk5i5rPaJ5xHZXV;!-_k^?TgCXXup5vg3xI)vL=16<+nyQW)}vC=tZ)-Sl8ZH5;|k! z)Vqzj#2M#bc3FIM?@qc3A8a~@ENc#@Fb~_@Ec-7tDQU@>B~2-2K!z#kM{Y!A$Dm*cWlX$6M16+hYU^bS8{@!5VfDyRXEhq_fiuzU=hxfYQHg4-Ua#7Jm(AZ zr8abw)5_Oz>jwRS3w=uH3GHVxQcqYA znNe4zlhC$)I@6XBS{U~9%WMG<&1yK7`Nu?31;V|pRSJp<5JB+SPm0UcWz10l4D+S; zwZ52Cf7sGRJ(%~~gjcF5fsG8o0!8Y}0DN8O^{rWYN^;I*PAW?Y**e z`dSTEC;2?q!QR%uj0`CwOX;+8&$nd_Y2whfisj-F+cQ+3Kr$9VcU`UI zz9%(h`cNn1DZ1SygJPm^V`F3fWeI!NYE!lL>#y~c-%a?I&yr)sJYZGKnLWQ}%!GBZ z?GY+Cu$Cp?0Xbi0L~vVj(QJFdsd0xS5f2XO;4Bc@1z0foKV<2|PnY;IszP|Kw&6+# zFYa5+ia}Mv7Og_vH?KV(009j1*J3zzS14rc)xegMI63{0d-)SUyB&m-XuP-k$qS9H6NeTtev^JR(D;x^gEuy71=jZ?gZ&Cz7NUldb?s2y!pW+`&sd|NunV!(;D ztyi-}On^x{+)~~kZny_rR=s0PTiN%1ojck>^D)gtt^M7*3DG4Fm*c-bYDvZ|-jQ)# z#jqZ!-Uie`SG>n$JI+D9=37ChSqr%f1Gk(A=>Q+1d1HuM1|Ud~Kbm#L=IZf644rI0PvteJh;g$oVAQV;9Z+#&12Ssh>?Lv z2qS`sRZ-M*_q29gVMur6OjbB=0GkTP>ZmPz`k^Z6EeOhn&rxb^QvNAN zD5$y=MOUH<83lAV+vdrkcYx3cE;sjnu1GpqXd2cibJ=slLV%2Mi{&gQb2~HltrFF> zNRSPG*?wobHj++w_TFJM+^jp(CJPjoZHH3J;ICv2YkuplzHFmg)>C0E%jgMl{}O-a z+GMfq69CoksyO9)?|khN`>~&9Qw?HFzQeTwXF-18^Yh%7=$xo4EkEeFngX-b3NJSMb$DYg=@R-#8% zM~IO6I}ZYo$aI)C7MdPDs_3JHnZ(1F=-?SlfoE6#wdE3jDgO{-y}bUC2zmVp6YMjc z@;YyCo0fkbgh6HPU(BtDd#!(M<-S3`k0i>+efMmp=A?*4rNhWMuZ^3#s@mX0%RIhi z05mrvw~y$K~DKpFb&_K{dk4P7(_kNDwvkj{%*=;nb$FP7xF>H zq}^b?M3H~+l*9o@d)WjC-hIX7&PgZ~0+@vSpw*@HT?kGKGfVY}KIUEy*sw=x?eNH2 z;I4Dji$|X4zCgN_!Y)-MJZL&!wkbEFWbI+?#++|Aak;B)8hBGvaP1+S$=1&c&Cz`+vX4v+$zgIJU?jMgQn z-v>Yx->u74ojb}jjz9nTZI&ZeXYW(})Mq8fn7Y1R*d-baC`E&X^MnL=4q%po@1Ju@eI1MaqwuAAO{Z7gw|6p1`_6JPJ zO@r!!exY~k-_B450K`RNvSa{)L6HbLVSzQ}P(8!e>TN*JAXjDZ9)Q_w_^<8Wbt0y~ zh!j5{krc!Bg*;eh%oxLw>DtoqG6JBm9uk72k|{xTec*jiUB4EFF|X0QRznYJMwZEX z7NS!UF2L%MRSU%#zzyh{AO>=8vK+SCUV&fQBKqo5eU3@i?U!I&$tfwxEjzhpqNeBoGO8d1?Q%QILz!-&Rt>4U zQV$|*3LiOw#4gu>b%6h5%&936*S-$ci7TbTde^4vg3Qk4`X21!e59aZF*t$M%-o7` z(=0~)+O=$C0rE<+M^nppnLPWhxx`~qc@#(bDWRdEQNJ-X04 zFR$mF9x~LI*MP7AKV*8f`H z&WixlEKe_!198dDX`o4=Bcu|lFimMcM<6GFC^Iz1$(0s2i!pV0IUWZ}m@_xjGjg3f zj8~XXbX(ML*yg=z3(jXzU|9nd1t6Eqmk>C`JDw&-oZ&;%Pzs8=!`w021uOD^7&0>w z$wm?lC?_J+2m**~vGIrq8VCn;PgNSd*dd@ZZXO>R0~1MOT?Q39I4hGyB3q+Q#bemp zmkn!iro=W#0Uu;O@-~0^qpbK@+X8q?8^7zOQ`1XcBFxUT#t51VxD}vi`($CHD;#I1 znTUmPvo9K3fd=9XbfzXb4nG6G$Rk)n0za$Qx{6fXyc|HDdw$Z(6xpv-pd%n3i)mTT z;FLTsF)@i9q0pIsH-c1^FyLdyg)Y0jHoFBA|ka3K(;Ff(Ul4 z=1yIY&#`7o4aP7mL#BKHX%_i!>0YT(JDAS4t?=RJts5q*x=#u@9RLcU)q66P-@Y8k z)0s=DCz*jh))R1=RU#$XV=s=1ilP`YWGT&~yxPyr(=oW!pXSBBc>*^m3Iw(j-RAYp zACU9lB;|}Xgq#O9;`TZ6#}OG>A>oufk}uba+L)4_kajYyp=zF6jJnI#1sA&)-ZS`` z(wTxYA*V>P)w75sOmX|)THR1kP$;Yp>&z1I-7jfm0Gr@ej7je~&lE%V_#9N_v<0eqcR5MjUFP|;-@dpD3>c~3UEJ?>uXFvDC|L~7Vu4D=EZecN`wJ3M z``O{gxM!c$g#o9Nu6P4)UN0O1Raju-XZ$hPlW2}-P5E7ml)1#(DTSk$u}7I; zM^pY)z}}J#B1z7b5|-B`0AjZH9f$%m7KsKAkmoo-W)WNn;_#*#BTy*n8DxP;Sm%Hr zsRb&;Y2PpbR_L5YpTe|)R+Dw!GDR(w@rWH0H>lSX|41Ma#@DB!m2weK42R!7Gt`O@`V`yuriuTqQ0%>4P!`p?O}8WafN(LSGcg0GG3)694jozPB- z6G1NEmhTdu2PT;x%XNkl)IF%%e^&$8&*Wf2)el%jP=^}NmAr$cl#xs{d{D^L1QbBD zm=REt)VitKZ*M=gi%K#ln5gD;y$0i=^#_S9b;&Lk4grDdozCD;3ByNKg$ZNuY@F2V zxF_pE=*KJ*&9Z4_PZShQ`2Bd+X&VzkntvKoIOm$k{#qTM-=X*O{FH22hKpZ*z)KFn zf!R0)T&t}@8kLh1*Qsq+3!U^RNO)zWB7RrDVD zkf`c6m5B63Y%NjVkGwIzc{bWg*eT>F((X@;nn+D(pNj z4$?;vi~GADl5N5W)`4|?v*Ulp;9Plf$jcmwaiYVz&OXAivC-SvUqC6(}sA@u%Fa&Y_vVh)4+rCDgiiKYH2nA1Gq&Uh&@Ol8X$g1)_0ITYNA?;udui zw+cTzdgNJ|o2x#B$|aKh1zv0|^|-l-47$DEy*5p*1ceXA<+y~x?h%JrX8Uez5@Z|x zWv5cy7TeAHfB4-Lq5U}MP0W%kgoQ|va6MZjXLl#K(0U-38(+gm*}{q71ww0Ov~*!( zq36{M70YIzbizzf8VTvBOIAE5ma#@2$dlv(ckjG3$dk1aqBQ_cZ76n#d<);6(9F^3 z@+=ni?5aR0k~B9rlO*ya)=sX1I-}^<Jjf0}%Mj1XP4TRZ8<1*fL zWuvfa%6gxn%`QUf_wJYKMxM4s`|YncZ8Wn)X?+i0t00oXXiC^M81(W0C9&ULunixV zxo08r$41;@(ix#Yq)xiDsAX+-WsxhC-R0)`k@s$#9agVmp%HZM>;p~YWI@5fRS-jB zVb80)vs+zKNdu>hn642OI8xGg1G*1!xaU&skGTQzGhr?{=iW`bPtui%&&RhIxPy86 z-nd!nXj(HN=U@d6(}=vH`P!(YU0Rh5gYYXMWU)QPQo_YLwL92G!i@**2``ump7>}R z7lTi6lUQE4jLy41`NP{k^PH8QR5GkLkW|oGHHDr+`zhvUqw6 z6n5b6Xgs0uXyi3(>%=nbRa+W*)6{FO6jWnFN||>1EY@deMd_49<3eUultG?Ve{Pp^ z;0~W9&GXx*5)~g^ByCakws}M%4#M2f180t&k`DE*L7(33m1~bI>~TFn z%@8wVaLe6y5-()FhCIu@=HFqq(0!AebCOD`gVL|O!M5}>IjLWDvTI=avz~VH03P4m z&%}Emk)$|y5l%U5@M#b$IPMOpjxRSHs%V6eh^yTt2u^99?6AiV5U0d+2*N)bnB>s1 zk-=x&Wh$X>*B>P76xRNgchF(o_=}>16mB=*9BPPSmxb(w57;ax-#gVk%ugZQYg`e1 zXxYm3Z8&$}DDtK9NN}DfN=se=4!NWFI!+CY(X}i*1=wg)e3QsA0}$Q3ZPIp7@Yg<> zl>ijZ`B{t%cT*HY7P78yVUM~09pG3boa1Jl`H5~==hTFWcx~q5>1%!MuFR@2mp2Pq zcRQzVE9%MGGy81zAR8|bb&zF7Y&T$5ZDFDdHZE-cSi&h?E@5=-oTqREMRJyxQbsS! zXXWR7ueNO!=o92s0a{m`W-jlM_}V@q{!m<;VEC?%QU`DFW1f;mTd1nMIBS4mM@HtG zo`F~11{E$}t^=1Gyd}e{j13$yr+w<+F2dq(Phv}OER?XXRqWd#hfyy~cY7I&pNgo$= zDtnVbqYu~QF!80HIJ+^Bnrh#SXk!{TJTHEovRx?(f^DDzizFZAzjQ60SWKg1W)5im z0~`fkt4CzF->)`$mM%^Xm{W#5zb*j=m&g1Wz@jB?YJWeL@1Coo+ibx_Qjv9R0%w-^ zr`H!DKj#`FzodKZ3=?SK-_)Pdjyh?&=NOzW)8@0Iy_0YL{dp`@5vwa#R{(Nm`8}xI zv{@2}X&UwWzpyAyU3xvxP5h>Sj#>R}h{b|_ zDDO3&Y(|T>8PmS>LJlhgRr+OETq3&C?smGrAH+|eyoYNRD=vElB0JELoyuPNNs63r^2FNdgjb5*bGMGeXY09$=;|*Eo7c5(fx( z`hZVD7)PIUm6_3Y?sA7FCRATnH}Mp^@EAEq@4*4uUD3qKcKio1I&@p2iowr@M83zb z;PM((1cP|pnkXf0qp6IUIQum3L8uM7NwMyl=nSwaVaptLk<_>^t01H{q_jhe&01SV z0^SN*aq^fX*cK(xuufBp-Ep?JlQNc3QVb_*K2rad-TpH9IbGh(wLLYsf*c1P*JRok zu^QJCLO_cDB3@jHoU~8@7C{4IReJ}qt^4_Lvg5=a9CD$SFZDFaJC=I%911Y3IJD)a z=&@wuTA5i2Wi!rSXrbO`t7$92OGKC1e`-?c)V+&@24+T2tM!-eN=PqZ$C>71|) z;(T7hN$}NhNBP0mMA31M-F2=%f#Ns*fm)6p8}UQU6{_XX3ZQRU-X#iJ!u%e&J-CU` zljK(k#|0JnsNg)XwH`6hSJTecI#L*v80pZ^XXpvrnpa~(ica6OA5_^4TuKr!`P@se zpRISFpuk4IVbAs-w=%t~>+0%?_m{`=G5JD26C}=%DNUl}_XCw+iWJhn+mhiY)a4tR z!1V3D0s*bsuNg7sSVt1!_?E=bl;u<&iaS{lP^F~$wPe0HsF&HFa9H|8cUuh zQ#$v5*S)nDl|FO{v`i|=<(jp0MP9kbm>ilb*Q?1eTmZ{#ZhMq&Mu*95N!o884YeBS zggO0XXKmH7dwg}AM`8X3J&jL0|I^_3v;vGHyY{^r;zdp0K%zn3v7Si?ujLZc(C{d% zK(j7+d)(_$D1&&+DG0SlKfhD@YsA1~d3EI`9cC?L9UpMzWr%oPYa}dv_}x@$Je=L; zOVZ&FQz=rHpmmpfzl_s-qL`Ol{8s-4Umfe~8qZsRlwicI;K@94ib_BbyQppyKyfb?_C1puJaTn@~@G&uWUTEnjTehUR5qdEgM^;vM2*zJdF`4lF`-_Qn2C?tFIdjM-vP zeX;zJS}Z48W&9YaQ$))-`G9Og4lkj!3{Erv&EUFL{z_C#h$mcKT}s23znxvXKi~8@%=B=A^+hy#_oN|-yq;$}B+uBD?R-}(Pw`ci+|QCq8$vuy{^l3y zO!*uI!Ub04_?apn#gZ!qN90KURVx@a1h2gqV8AF}bOt^XSEsnIz~7)Ge-|I;p7gSd zC>YWjgbJSh6j(560lWB%V_BaPy7L@u{I&#He3`+qfefB!P&ZaCq0af<{PQt)f3BBH zk}tNDWZjcg+Uz!g)HP=NQMVw9qoNIuJUtAi#unA&6M4}5{j<>6V5*WhL5M&7AE%#6 zc}ty$Oc z-x<+9<&fwE<7uMgp>nYXTy{Fvb>4@T&JLs)%z95sj4JILJCA)>_^nhPw*AP(XrQvW z;(kZP5R?H^O=Iwn|Ggj=kSZ8fB!`~&X|_my61&)M#CoRcW66iT1jipGG&L-!GzH2d zjQj&GnQ@d4Vzh6J0qc4JI3g`Ag^$pjwM8FBw}?3m7CD)hu-7X>2F$o0iQITl}gsLz~NbwFRw-AZz;_(KH6mkdi$PFV*UJ0 z>*Ha+$ZLYI3qpoOVQxKBqQ+!z@&O!yrA9rRfRaBrJb2z(4?{H&RC^s4cG&?BjZ-vI ztRjA>WAy;E#fiOqgp;AX8d;?e;{otTI4CpxV~$`|=_7<_$Hs$YVs~Qr&j25gdy)xR zg7ghePceE#=M8NNvZZXqIU_Tzo9{-FZr-`PIh~t0!4MDpXIOgU8(KPfpX! zDwQlFlJxHpFmVh@qnjj?1Iqq4ca!y`a z*#aJ&rAt_+HP`s)6?d`}&3E*HyQ@@{dnDbZ;ojxV&DDO8rb(61_gX7#on$VhBDSfP zxm3sJ2Osoz{VCE<7V;!p!SvEXj3O!bz7BB2iDclCaj_PWw1p3$!}?OB=kCX0JX}paD~KO(FajT1j}wu`+T(ni6pP52c@X zLcoKsb6VQ5qeg$7hayxf_Hb+RlG+dAin>m$?3?eLb?6R{NS2popQYMm<$_zDp$8N* zjH7B7^h(HVxrK)tnMQm-QTxkw(v{9^3NAP4vacS>Zpz zcmcW;-t!#PPoc?15F)!8UrVzxByyFc?OQ#lCvwpQiir18W<{zQ!cn*LjI~|xNb24?$K~kCiHG{dWdQ)Ej-a~9GIV#feVq) zwSr?dY=&pT?VjQv$`8s;QP@4e4Bl-8O7QR0Mh88ZDroBTuy}J|C?yN^HHaD+&XU<~ zr@%^?O<5hz{PB|S&ec5TruaX8C|BCCES;@{je)xE3x2n0rZ`9ykM5;Swo-ToP@~MeAm&K2ce4FG z&J;A2ZBjyZM5|^i;$+X&934wbI>{y8>zg-utZSk!y#jP-k5euazt54i?^bci2P@&{ zQ=rLXkD=r(Vh@Lf!@E|0iQ?|s^i#CF3Oc~ZL-h+45C!wv*55$_Jow}YQ&c6#=05ZD z>kR4CB0Y8*zEG390q4R%RX`d2l?jl#|0oO7i3HmG3M@RMWWU2njG=vXIRp~M+OVlG zJN#114BigyrWTAsZ6{f{HW$BXg*key{g{GylsQh7l~XQ@_6L2=-2zCH9&K~_Q*o8V zc7EK=ORA_2Rx@$Hrh1SPATop4}1h_v-S^F>HTxRm{SrvUgIY> z2vWAtmpV^}s4N1dAw=>J2#gLVXj)A>8+aTc^znQ(3=r|W-T6$I^FpCUb@t6pr51E> zjrQUv>3R)WA2EyEHyTY>IPHKnnQK81bX4~d7guqgkV@fHPfT@M1Eam_m5!6s9oJ$8 zXc8`Rtc9HJk$L>nWryj5tRFA233dAu*=o~s2?Br{>y_BsJzgrgVXMQ02Gj?s5Cm3EASi z+T4P1kWngnQGiODgYl&ckFhr_XaDToz1LksrN(u)pR3|cYOfR!hL63OPRT6}9mHLB zy$t$v0GG2Ofwlu$&I}_*g<>vU`T)^XRalX&1}Q#~@@LaF8#Uukv$-s`x?jG1g}>-i zRBb%nIwN>Gk9bE4v2K1wG!a8YtEidGBL&I!Kf=kaB*!PM1NQcZRBxO7Xq^B|==7K9 z`mWAxeykJq@*BItmgO)0kms?$vg!>I`XCdn57g%1vb`U5TBO5Fj6H|ZcNK3NVRarw z++JMtmzRUu4!)J&b$2QgQmgFGiwwp%z#LXsJax}wR*rbKemmqt^uSD}l%tS`DS=?$ zw(eiP#>PtOM+{yW(S3O?lb`sLaeNdH!^I16V(DeLLYBdrXkobuo8DUv+6T;{gd5rE zylwYGO&{yCWdXMrZ(71&rqp*0W0G%=lW~wBsl@^2Ah77?*~E4}CR<=7l5697X-ukz zq!#(YF6jLdJT$=LVVy#z$woa2T3_bW!yN3HH=3FhZcE2e)fop*kA}z3?Yo6&e@Kc( zxQ&d9y}SIE#1yBevi7LKZHc+NYb9?gBt@^^5f`*2l8am}3EpFbEMU6D9Rg^>1O{0A zw*5NW6-H#)A+sRh3|d$B3JVLPu~@a45b~qJ200MdXhDR4(0so*VFu?D0&{Pr^{lNAWA;VtfZEZT8j8`gfC)S2vEx?8W!8Isyonk2V zh+8Yr7GII=buoPTDeL(UHH`$m;)r#WRvud#gf1%aq=hOs2{Mf!!Iq@`DkC6{# zRU88whw~qpm+W_7F#8>^4edr13kJR{yaGLKlC;1T>kVP{vkq9qSY+&wKQxCkl3A*G zmhRMizv(#k?DaAJANd?(vAo0ONR?RDt?93_fz^evN@!77u=tToslrmiqXbwiFuJYw ziC|ZA98^VrPLh^MA zS!Esfle@P*|2n{-<)`(TT+ib&_@pLtSJa8|gCQYVs#3bx$8`q36$fa1p90a_q@)!D zMj{`_^)tP@5LC~-^0c(J!;PBLk3#xv<@IAN3~v9*GZt%< zws#C1KBXeFa&M-x=YR{B+U~SryNo0yH+r!q$oNa%>oL%|)Hq3DKP5y6*$u;5Qm!}P z7ojbB05VQKq18^WdV|PJOrcJvikb~jrAPk{uRnR$7KBkUr2VZmsomPDk>c7ev0DHFm(t_jusbI}UQlf))5{GtK z)`^@*Y*w6UN<ew(z1KOe%Xn1i+(f6qimv>}`|N0y{;Q%OsXR?!YlegP zl}ZR9xpS64lEUcdvbbqpWCU4Dhu;_QQ0+7hx!PCP(3s>P_LVW-PtJAI#OhpnKA8$g zSGjDg_=!%iZ$>d4P3v|b*%A7cs97UTII```(h91efM(KT8k7n6GTN5(7zuh{<(c*Q5)&X{lL4#Tbo|*i^*ka#3vC7_ z*lGVck{2{ZfZu`IY$_r|!r#c+Ps=|dE8x$gXd<^iHS@iZ@x2(?Z-2N;`YmVxs6&|2 zsHf{hg#D)O*Y`^ZD)fB#RGn)!EcWoKunu?`BYrliCv#EB6cj82PXIECoae(Ckw(@b zpmuO@6!RA`JZ=m?clC~BL2wVdy{2OpP3hb3fyF=r`$S*n$%m%7}8Vi@PXtdc*;kUNFP47D@ZT8T$;Y{ug zvMLgqp185u+7&`rW{wCJ2&c!6rLFX?E#yeMJ8XTz-$hVPHqkvt{PY)~uv8#BnuvnvEWaM@6}XHPMh*a9fV@S{R3-2& z(yX~rC6rz)YNw6+O*o4Ps%lBuDr0YdH)-mj@ldnUkC{dOQ|s=N!hin>%dao zSx)#cUYW9Iup_^o8qpJ@v!*E=B&!omUKSog;pz%)Ia{NFGj1pQR-mXUPor=>O><@a zXkzB^wSRZ?zZ2f0r=9+4rEV_$Uko-`i>zZp$VIW|WRq}(){nl2lwply1}e5;aC}9R z&J-p+ZcbuY6jQZrm=AYQ#ju5SBbu`%w7V}{?Qdyl5)UFOokZTh4Va5kf9>8Ec6w{k zc_fDnHh|&Nh$7EV_V)#3IPCi9#hhm*xayd*D7JJt+)_M$cHsc1p?25uISrZvL)8t& z%(IbHq28L$v-}mg2m!&a`-Xdk`BXOeon`;vwmDY?`=mT~Mnjf>M094tNA1*SPDwzg zFA!iy%fbF2OZpMncc;}}0I}O_j|0_Vw*CHWmCb$Q(wCXF=#Y2^zYV55zdfe&qx&@C zgS>(e1YE9}AnG_?f1D;w7ZHMXCLomt)-bh*E%R{1eSZ;Zv@1t@=zHA~i=MP0`mYsj z_5aj{((}ecQGT1d2_}O*EWL*bkCLf7N1y((BRYNKzXTw;##_~`{lJl!|6Rt*9MAcq zc2|U@3J4$p6<&i!TTu~N9t%T1h;A1O$J%L?EP%)KiU~n*nSxjFH^{VK#D|w~hgd!D zszp7xY;~0Y_)2}W=N01P6%fv62)I@+W5oGYFrxm`)HZwaehQKbSN za9jfwB-W{Vm7aD@`Y{6`6nF`eLrA>{|L}xr0!gN0Lp9 zwZ?`k+CAwhB$ZH+v+?iE!h{kr>Jt@GN*0d2U$eq9+4RldItrrRjVIyIY2d}SlYrj% zWenj5K1C~h88b@BT2G;a_?eogEVEcQR8O9M%7C8&MOR51X^H(b-p&ssyQm~p+Wety zjZd210*GoPz2D&FnKzb78w5(E&n$;!qvQIY{4gX=A$vUC4yoYZS40K6x*9Cnn&!Z! zRRflCEU&zFjo_F)|6xc={q=$ORA`{GrP9DlYd3sKOVcZv$c$*s1W9uD7sc<;j#}vA zO7Q)NAo!tWp^LT*GQRqa&kh#bf8jPH0L<-}oOxe-$)49}0G$XIwa^hQG8o2X2fP5Z zUZ_0+Y+xH?N$*I2dxd1$O6tIVX&dl`Lbo|aRRxFCUN&Vq5!n`|V6h1BTa{FxfT-OF zavIL{Q=q1dhFI(e?wx-IbGRI{^JL!7m9ft0WFJ6>$kSf&_cg2OJo~}ZW&LXG3$P3g z(^oInmxJ))?UBJwA&4YY_TtXR3p!k%*6Xxf3n*<|8%k)6I&S)09p4GPF$x(cE6n|yuIG3 zhl%K!`Rrxny`(_(Ovb8zoL!AQOf}Ye@j7FV00KevFe5e79{D400~wlM1m;mx|7_qL z3{9GlpCI0&hr#fqQs>DIY38d0yM8Kk+H8e{Jb8>f5l2cMm9}IU<3-e>xP-*C)fF(0 z$HC%37{36@TeKw4Lx_Llx6+pYKYOt?!6_d!2wZZ@lw?k-D@mO$w>bG9h8gcPW4F}z z^ie9NP-t+oDtD9&3u$`aZi>8ob&D!Td>C`0MDIcjwMi}d=*vwSMe{Nt zyYOGClp7=xA~ezV%oGc>y5_U1XRI@oBbO@lxqGUsf?%tKj6#2!s&4OUR9;PyZn*Y4 zKo<{%I&tF9J#X@xlA@z*>SqCZH#U0RLwWZb zQ2&q3v|OxAaDr;4nqJxWXWU6}1mXvp>qnW>Xr)jWNk8qyWnv6I4YNtVP6Dxr9HZoA z@f0TvJ|QEh-1g>VG|Z%|P0SXRGnda+dWONCSW^Gw|Y_Ga}-~o zo^0F+a9krJaqKu+*s-A@w~*h#h6&B#A89yZb_`>;CX0L0>6jp+&YBt;Wcl^-g(qE+ zBq5UMtHe)HwLM3bq9O%2{d>?)uIvhaPoB?8Kabd`8c2HnI@1zbBDVPccz%lq}B^zEddwfprw=7Olmkt@%&%7w&078hsGk3s#99bNG_I;q=SrT`Mb7k&=fH#i0@nnJ>Y^4j<8k$r@mQ! zvGyI%yc<4Ya_2l0W{Qyp0z>L#uxlx}qqB+u$&OjH+5%(5`||Yg`48v1Fq4ysVmULW_ZHvFkQt4&~YO= zc!5A{-b}TVjusi~>WpF$RYf0&j^u>^75Y~jhEgB%)q&#k^)+xfG)!pfm?2gi_?x>R zjzR(I;Rhuj20C!qHHFwgquwmwH(4$x)jvmY^6x$+y?7WZrH|bCnq>sndA+e7EJraN zpeq?WJq7mzpn>5zk|R4HR|T^T$lzrSp7~1P=%!;va#M;wxxyve1nF zY~(X(V2#>Ylbhc%dnru^Ffp_sKJ`fn?YwBi_jfzClJm^eGn^e^79wXTqpBnb`+~*H zODtdI`Y8CADaME$JwzTwvJnY3mFrfnSdMgoxRbK#b3xEldPoJ#o*W$O!Hj1&rV!Ija$ zs{e~&m-nMnYFJQ?`^#I%N6E}YNy8Uj`XerK-BJmsc^jB>`svf4*P^W5LLv)8p^5`Fg$h(ogZ31Xu`ak%-Y-6s5K##37$wmZPxDbPyC- zs{Ig@NiuHo+;*%wa@DZ!C`y&?r_A@ug&h=psLq;$;eaZ}h4*z<*fz;RiV*5Epj4R~ z_5wV6+U5D3)7&#c^oTqLkmbF^hNdlU&a^Z7m$^PApoSHZfVk?Xmlr!MDBA1m+wk25 zt%+O=_;wmr91Wm9dNWl3M!I}C3OOKijTz!~n+&3eQcPS^k6G)PPS*Uql4 z8baXBKlbiB1%n^jn;-FW`&kjHx%=E5^)1Y;VHD~=Z3L^oFD3UxR7Y?75XU-3!W?da z;`ZT?GAgpKoG=IIQ$KY>fl35I#_qSXxed()i0zCezcO(R}_`-I-bG zDi>e#zrX*ipP-*6n9G9Qi*;9-3HX;1%b&PrA@3D!JHB;G!us_`L+u8v@$G>GlUFKm zEsMwGfA?X5IvL>%CdBU39jGoy{$V9_}{284b8 zj!Fa+0mKWySem(3>zF8cB=o~=x8d&|F){*zy)?&|V}_UD+_&uP~H1A!oVClDo27$aRsA}aPM z%$R`Xi(fY~^m0>_ekCDvj zn;aNiBgKBq>k*5=L^{*@iVVz>$|T!6O?PTiAJorK)QL|$_1v2+ws{S|lP&q;%YG@x z%()h-uVe|Ikh->q2@@7B) zMTD;hfsG0f{P19u7uBJhIr3GVG}f1}0bvO%J=lW&zeIaw(Ws`ykpg zTGbEGAH#<&prnaH3SLiqFyP*h*a>8c`cc-f#Wj%PzBenOV=d;j_hg8I6sq8p_ke0! zOYjbM1YHF!dWIUN$9U+@SMWr{IHyrJN6z3v=lKs<4k7(0($N~@D4-52vQ2mV7EB`f z<+koN5MY7bgH{xf!m1vl@Vnr|_U%ydJ)p^jSX;loxxR319Z`JpB!isw#>~%--q;|y z_i#ZiyebK*6!lmg``r#!!M8Cb!8_OO%y10o1EO$249PyrzES)teC-~D0D}6?n^N7| z@IE5SoX)#)9hMook97pNxLc%Y@)PI4 zkk9S-y^P>_qx*7r4fyI&eEq%wzd#_)4v-F}wmFeLT8Y6z&?ZukReX5RU}8MKi<48* zY-;zgiiNDNY1G|8GWM6t@p%t04>QHx^QJRj37EF9h=<5J^ zCE&k^hl?9>Ta~(fd|Y?A7Sug>vfO|67x)e=lu-D6&^4QLOP7;1s5dUc=LXFRk|amf zm!m3&=>D=xk;7nZhrHYcWGjJrv24mt6DM$$gOXYPv*&Ia3hD(KEo}GS+z}{KPQEWq zpy+6C&#$YybVD2Bd}!@7nj^2`k9eKkJiJS3$BdZ5L8jI)H-$ry^wptXB%(0?BCv@DSS`@{s)rU9kycFwaDQQPtIkAwA=*_>9NvB3?X)lmlL-@GMb+< zC{1^p_t;NmaGNN`3ZJSkl1ngY} z*p57N3u?6saEmzAdau1%frG^aUL|+1&CBy%<%`6(ay1>49b{#2H6eTQ9*(=wxUpN- zqWTAPvz&~fV+VC#L~b$o8-%LjSRL2GcqL^8yIiVxf{WIIKbl9g^q*rYQCTO(BPrXOI2_5??-_d`iwqE3k zYe)P)YaBX4al#i^1+{9Ne|+0q^^hfwhzx#2xjaEu_}Tsxb06Rs!WSx@S)k;3My>is zloVuSnCCfW92OP?Zf660^RaIC7QngIvhoCnY~@#|USDHS=q_Ac9a zaDN(!*zg606Fvj`|5#dfMBi)UgMNw4PHbv58}*d9@fOp2R@GS;J@L&=1HW3C@?-51 z^z-0>yffR~xu_oO--+)E=MpKZ8hw9;iocOjGDi~k_dVA!F`@e;rD;3Z(m$k}(`%_H z{k=o%aOiXQ@M*l5-US_|@>@@?EIMljF_;o=%a6|@XirHza@hn6u&vzk*NB!YMo`C7cn zk6hoq*i7r@PMm9K(y}NS&XK~{oel6T8e)niRm0#TYdS~ z{Cf{eQUcW493XuLv*l(Rixj83o&ZMO6HCVpq)#$vPi}5-FThGWU(ytQ?vpB^4$;(E zf;$7^gi1UuwdS(~O^r;VWMfPiNPKqOf zZzjsXioLu}n9aB;LhAj)fnnmSXNJ9v^nX~;rC^L-F+;^?g%6-Oo0DmuW@aACuS(T` zb?h=VbY)3Nf{$;+I{;*)uX^%ZJYO;e7HQ@1w7;E?zYy)h?L)(A0bY`2?4nR5Ng)?v`;o$3MY^qJ8*k z%NFb`6oT6kPvP(Vf=1iA0C_ooP09B{y4rCWlaj2amXmS zauLyqO+iArxVRvgatfu{lRJrrWs7OcY}yY4)sN=CO2i$yCUOlJHeFDbF@#l4Dv3~! z+)0F&y10Oz9lAd0e901OsA38a~=b8RFzTrjsREY(<7Bg8X1OgRp-lyna9h z!b&8AFuYBEzd1a4UssQ}zeQ`W7q_BYHMm1u)+KA(z`%fPB^|kC1RNKZqms6Ryh;1K z4n`Nv_mfEj4%kNo3fo!bAmDY)${*mFoAK@NLU6f>l&oo0YvWdFRL2z4sOj6ai~1kJ3niw*D^z8sH)-J+ZzOOeA9M4@`R44! z;8n&Jzx&F_(Heu%#T@5+#w(~=FjOta2Xb!Xex7}2u+hxsHBTUfwCAw;*kFY1jA6hc z#c7hL-EHdyLX}hUYlbLbVnnM5n^#Q^Xt5*MG-@6^`B*5WA`8v_GU|;lyXA6;tNAD)lY!WFkh%3UyM9 z{Wwm%oXO)DBF{=&qHS2?6P+gpHcOoBsK!PybSQt|Pu=evd27P4u=(!}i%&o3Y`MJejmcL`r|u7I5Q+Bc(g34L+EaOW^n0>l#mWiS=jg2&tJe{S zS@rr))F&GZy}Xac-e*Hb@$Qd0_u1yd^$0u-jnAx`PwQ5~cFLrgV$}v_@HmjAtD9$I zrhZoW6C%>ZLy`ySM`b}(6|Y2T6cqpYGsMi=riZoqS<)q97V|VM)(^nrU{T>BL1VoqGeSxEf^6#sIB!_=( z4xSi1=R?V25%fv5w#2l1W zp=L06*=vFWIVO1aRr*_?*K67J+kfXOxP08w(84OecFBk`IJWvZb=vd(`wQ`MvrrD? zvP#Mmx`k9T*6i&53gjCOQN4<#H&~cYJEC*0_C*n^761OY9Hfjmhu@Q|wT|6x*UNREq4b|g3yuPITh3xos)HB;-T_DDmcoDD zVUZaf*NS4v!{NUt3cd&c=ek`@QS+bqiyEaiW$>Q79qSYSeJ}9EIT+k;c6|0Y|2Zcx z6v5Hpv`j)rivK&j;MadK$yt<)n2r+v@7qw|Xj3-Avi1I%_5Tdxh9Vfo?B*P{|9u-? n)LFr03;#Q&;DknBwcViCkPUxNN4JeZ1OM(RYbuo~n1}p7EcLU! literal 0 HcmV?d00001 diff --git a/docs/topic_guides/evaluation/clusters/graph_metrics.md b/docs/topic_guides/evaluation/clusters/graph_metrics.md index 3e38aa1511..9932ccacda 100644 --- a/docs/topic_guides/evaluation/clusters/graph_metrics.md +++ b/docs/topic_guides/evaluation/clusters/graph_metrics.md @@ -23,12 +23,18 @@ Node metrics quantify the properties of the nodes which live within clusters. ### Node Degree -Node degree is the number of edges connected to a node. +##### Definition -For example, in the cluster below A has a node degree of 1, whereas D has a node degree of 3. +Node degree is the **number of edges connected to a node**. + +##### Example + +In the cluster below A has a node degree of 1, whereas D has a node degree of 3. ![Basic Graph - Records](../../../img/clusters/basic_graph_records.drawio.png){:width="80%"} +##### Application in Data Linkage + High node degree is generally considered good as it means there are many edges in support of records in a cluster being linked. Nodes with low node degree could indicate links being missed (false negatives) or be the result of a small number of false links (false positives). However, erroneous links (false positives) could also be the reason for _high_ node degree, so it can be useful to validate the edges of highly connected nodes. @@ -37,27 +43,47 @@ It is important to consider [cluster size](#cluster-size) when looking at node d Bear in mind, that the degree of a single node in a cluster isn't necessarily representative of the overall connectedness of a cluster. This is where [cluster centralisation](#cluster-centralisation) can help. +
+ ## :link: Edge metrics Edge metrics quantify the properties of the edges within a cluster. ### 'is bridge' -An edge is classified as a 'bridge' if its removal splits a cluster into two smaller clusters. +##### Definition + +An edge is classified as a 'bridge' if its **removal splits a cluster into two smaller clusters**. -[insert picture] +##### Example + +For example, the removal of the link labelled "Bridge" below would break this cluster of 9 nodes into two clusters of 5 and 4 nodes, respectively. + +![](../../../img/clusters/is_bridge.drawio.png){:width="70%"} + +##### Application in Data Linkage Bridges can be signalers of false positives in linked data, especially when joining two highly connected sub-clusters. Examining bridges can shed light on issues with the linking process leading to the formation of false positive links. +
+ ## :fontawesome-solid-circle-nodes: Cluster metrics Cluster metrics refer to the characteristics of a cluster as a whole, rather than the individual nodes and edges it contains. ### Cluster Size -Cluster size refers to the number of nodes within a cluster. +##### Definition -[include picture] +Cluster size refers to the **number of nodes within a cluster**. + +##### Example + +The cluster below is of size 5. + +![](../../../img/clusters/cluster_size.drawio.png){:width="30%"} + +##### Application in Data Linkage When thinking about cluster size, it is often useful to consider the biggest clusters produced and ask yourself if the sizes seem reasonable for the dataset being linked. For example when linking people, does it make sense that an individual is appearing hundreds of times in the linked data resulting in a cluster of over 100 nodes? If the answer is no, then false positives links are probably being formed. @@ -67,9 +93,17 @@ There also might be a lower bound on cluster size. For example, when linking two ### Cluster Density -The density of a cluster is given by the number of edges it contains divided by the maximum possible number of edges. Density ranges from 0 to 1. A density of 1 means that all nodes are connected to all other nodes in a cluster. +##### Definition + +The density of a cluster is given by the **number of edges it contains divided by the maximum possible number of edges**. Density ranges from 0 to 1. A density of 1 means that all nodes are connected to all other nodes in a cluster. + +##### Example -[picture: edges vs max possible edges] +The left cluster below has links between all nodes (giving a density of 1), whereas the right cluster has the minimum number of edges (4) to link 5 nodes together (giving a density of 0.4). + +![](../../../img/clusters/cluster_density.drawio.png){:width="80%"} + +##### Application in Data Linkage When evaluating clusters, a high density (closer to 1) is generally considered good as it means there are many edges in support of the records in a cluster being linked. @@ -78,10 +112,17 @@ A low density could indicate links being missed. A sample of low density cluster ### Cluster Centralisation +##### Definition + [Cluster centralisation](https://en.wikipedia.org/wiki/Centrality#Degree_centrality) is defined as the deviation from maximum [node degree](#node-degree) normalised with respect to the maximum possible value. In other words, cluster centralisation tells us about the concentration of edges in a cluster. Centralisation ranges from 0 to 1. +##### Example + [include picture] + +##### Application in Data Linkage + A high cluster centralisation (closer to 1) indicates that a few nodes are home to significantly more connections compared to the rest of the nodes in a cluster. This can help identify clusters containing nodes with a lower number of connections (low node degree) relative to what is possible for that cluster. Low centralisation suggests that edges are more evenly distributed amongst nodes in a cluster. This can be good if all nodes within a clusters enjoy many connections. However, low centralisation could also indicate that most nodes are not as highly connected as they could be. To check for this, look at low centralisation in conjunction with low [density](#cluster-density). diff --git a/is_bridge.drawio.png b/is_bridge.drawio.png new file mode 100644 index 0000000000000000000000000000000000000000..03743a6759e991602133e141ace5707d04808451 GIT binary patch literal 50851 zcmeEu2S8Lwwl+yMZsg1gJ|^TN4!1wivfMR^TnC03Xe z_}kjj$`riFnHpQ!Auo}#aI~`lEwVf?7#9@zA81rJG&6Lx{OK0RCoy$4G(Ub4UV9Bs zcMn4sJ8M-dd-v;_rf_MQob#$_{vqdcqjEjd0#&i6MvxmLuakIIdow+p_5yr|Z z!wQuEcSio>l|rozcw>oLU2BAema?S0mASEug0cDysEv`C#qr&24Z)m`c0d&@Am`B- zKi>^~PaAhRM?-rHWjm9jl{0Zaz6il1czhodkK+b758}AN+z~xEYTn2ObxTjwXgpx9 z$h9zcu{1GtI=&YDfOdA)&X)GSZZx*DwKX+H?JugMp`)Xn+poKs*;%6=8P&%gJk_u6 zj%-jjHU#T-^wd9fyJ2bK37!Q%0N#@nXO&K@XHF?4aZ15FmrHrA+5o$Op3jZIZe9W8B4olPCp z?UA~T)F)szBu#BiB#=@8-lE0>EwaEEfCl7v&W4WWrp|xmVkc)uJ1bMP81RDb$6IZ0 z_{*xH7V>x!;HIF5@l6?PMI(1Lb33@UqKOTsrGun8Cy;HVcps1VH~R4&_d_n!aj&BR z{y;IN$M%&@(`7 z@qc{=yuhl<{-is4`S-HqCWy~lwUy$Gn9u5 z4&~t!gaM4f%friu@-Ap${>Q760Q5KN1n1*|2_pERa6SQ;073w*ZRi>NVbuwZmz@4j zoz#zU-QVGkP;TpwqE38*zjjAF==W&p{U_+pZ!{PN=i%ZPghF9(I24NDMcHul_VrVx!@oAA;xLH^_n<4{$WLGzkmjZ1+We5zN?2l7LW|r>AOoJrIJz7iIS=%`t2_G-7 zEI*GVGF$NLfFGG3I3DzO0)CK>b2dca|uJ6&p(`|pFt#>y za(V)^r1Ydd4BzpySSGVQ0{e--!q&oe&^HA(8?@KZ8Az`*1>T>E9+Mj<5gu_wUdz zznwlWFKYV#RbmJ9Kc48{GpT9yOTR9Wi9T zL@ELfE=Yb?68Qlq&{`0sAjx?igguA=ke8u8N8d;iFyny!L){Ln)A1dDJql?4X=H%B zp}!0W1EC3Wz+YPz6ai{zZE23IBQOTcuPKOteiaR&`ut?Ckb#aR;KwCUm)TgFm>`YN z?*?7JHg12_i2MVCCxPFf16~1C5b~3mK|}MO%?z5HM(fDGGn7nJ`FkM!5|>fcshVEia>gEsE}e0||X z{HE#%@xMi1P_gKrUSIyY9{jyUKmWG+!VCY6zQ9nR0&VR73H}B8YklG4K}`WYB)>;|p<<9fxWD;rcmd@2IJQEY_=rFTbGRuVoJ=$j>E+5P*Y&i~LY1A20uJjQ2lY`~|_c zFn|-1f>1u-D)Pdf=QV(`{Ii{eAg=>rfc%I6of(&i9+lKa7Wu3*azbC_e%eCxCAMYqaFg%$)tf zp_Cv$l#7RlhaV2-=j8dAcD8K{bhw*>|i+_PYXvp)A_cI76DtCuYAMpSC zD#?*^`YT^2gLYiMd~eUsP~+%R^fv_^e}K9G`Atp7m7|fyJO>yUj2Ks>B{baimUK@h zDu3SdgPG9CJT8p%y(js&M&8R(YDinPSS|fDr)%gv4pyVv(W2}oCpoh(n$nQ6$xwy! z%eFPI{P^XNKzgSL)eT@PBLgD}j) zcdq>9O4JPy&{^}J?swEW8O|ey{p$&Y`v+sFvuzvx_VOnRHUD9xY7Kl}T-5FV@{)ON zjn{w>>kHgEOR9`XvMxWZ<@FV5!w&SD0sw>rBcCdcA&;Tb~ z+#lPRMRkV9n;AM2@wCWtgi1k7lim7mDB0-Mw=ZWi0(^#pMR6yZe_85qaVW&zfR;eP zTUj2$C{^M7*;goD?R@&4V_AvC(&gLUZk7tx6NC9Y9*jSav;i(^>$sBw*_e-2N^pD3 z@2qrSz<+5ac(M**pJR0(adX$F<*ltzgXp$d@5)2 zux9?;8rQK=>i*NC*b{?|E;oOztiu-gNu@(M?O?$nvwcd`{sr?3K2u3KqR6yh#MOHA zV;~qXiNc?~Z_0d~?-Q4vZ4z2Z5!klAF=MWz&v5l8X<0$8kK584%a?o!>a@)cv$S}U z_kKM4Wv+oT{@Bw7F`vE^GOZnMu<|*naka)+4s0-gq0^~4we%MuGw{PfK#PCSYZnKS z+g#`_P?|nBRI3X~AD!)(r(Ie0X82i}vB2bZBp|0S`Bt?`t1Ps(;vLlM?xyi2!{L}b z*gEtN(c{~nLO!)=7NMT`n+xZ=*3~lz>G}7kLdoRQPP|FN_rs_{brrY7>9-q{O(_N3A8INr--2v#!*w{Ge4NHfYCgArKEjf9HoM&*s6 zxevxvslY;Nj?49+*N?}Dvx96Vw%`ecar*%u5rBi>{ zbpBeE_l`RaCFXOaq!Y`^Ln>g-)Oq6?%;{dK9_!0*iF+n7=~4w*I~tB>PI4X#5-KbT zkJ!v=3KEo@e;3RCKtcVy^UNf|dH$9I9$w?CL9=P{oof?U9u-Q_Jr!7u?R$|Zn-=oM zI>ABsBg>HuG@!>?EaHSOm{g%wEi1w2b{OG zm_7^m$_?r|nwLKwepNPJp#WU&oC|l4V@{r9i!gZuQ3a}Lmz#wBnLrO zw5^S@@_m{S+VIPAx>it)pBiWB7JcoXxyvW-ZVR(U@KAK33~VRJt}(*8OU zRb9Lvk=rbM*?wMi{jJzTXD~-_8dcgiV>i2>km^4A9z_>MgjEa)N7^>K`(m#g(k1M_FZQurzI-`ZNZOe`rut33g1kZa zHvZ9rjYa}kFm{MDXnX)JDi?V!Exb7-1$P@ObJ2M3dywqLAd&w_{$Wi%_x++|yrRk4 zo~*0tpWdij`gz{!HRIMi7n^yW^>o=cG-Sz+!f*=jn_$`OJpXJpz9>n7%DlHnG`!vG z-edO#`XphuNUrz$j?X(DUxvs+!^H<}CGy`B69!Imjb*7Wl77=S_!PRDd4F1HhafbH zR#37bjAE`s;MjTi8$t5fgN$fuSGJ|acf!|q2}iOA>|I=lvKUcOo-yyWwl1|^l@a>-X- zzIc>kFjqcUfoJ(4QF@x9jb)e!^E3AJjS!VI%>;#ur>st*TgDS8Yxp)2N%#LwkCsXIn&beLXhWM1i7mq}W zUpDChe;8n7WaR$iRp*5`E-)&QJB0Lq%yEmv8*( z#rmCglD+uVFR$NcUeD9+nyL$V>*xC=@7;V5g`WLTQ+!I>C6XW@KVP)63=k4O#-g|^UFRK z+iO!QcOMQM;T`{F5{tw7^m{6&BlPm$SK+vP>CcFuf#2@ow;oi*X@)!!lo#-%=%u>4 zQw_}iq6mlU`poP1ho_%T$6mT|o6Bu!FeX#!sZktj{_`Vyh#<%OE;K}+hFST1t(xY_ z;~dR{3zi6Z*QMzzEqILpiLTU{Z?@3;?K{MadKC*Or^!Z0tLddk2DK%{H<=%AKr?2* z%`erRY>ILyObcgHnX+ z@5tP(4&$zpH*O5?YV?0ZKQqd%kxkN7;>~-k0pi<`>{|ZK2F}g5>_iXhCalWGUU(bi zRz*78l1|hr7lpcRxYgey#YrkCf|<75n=FyIgX?;WXs_*Pt5`_#%Yw77XH@c;NykMO zNJocrzzRQjw$PhDkP5!`f#wl$ctCvFvsJAv)el+2A?oT`DjD_lKls?c&3s@9G$vEa zM};L?cf|L&uBa_MS*$E8IbCkK>(wD$VL|kOC0Ml-)0&jh&ExWY*X(RBEu@bK+rN`B zsy?L}@|aWSeCSt|B#{?+Yh!K`8ClPLhf^-injYOWSpX~R>hsKc^Q`W}jgo9Z?Zq68 zcp;{kue0q|^NBUMA1p2hr4zi=p~Qc0cxQ+i8cehjE>lXNQ>2Fd*2*s69NGRo@F6&Z{D&6lOS+v z41siJg8THaM2Vapqx(D*C^cosVoOyV)=!;H~1(SPIsnZbF-P1kq!FDb2 zNC!-qpw9J53PYYksYD)#p@`Q>+&ZTf&xQ-iUpKuVa*(fSAY!_EExuqL)*7?X#F-@K zce(uu&mD4Zy``R}^UCq}FrL0?er%T#f;~w|OfRCfXLt6}b+Q_V<>9w|j*lu0ky=u1 z3Sqoc!DULELHFS*mH8{8l3idAUpTM#JW{=&a-YCKJmg2EIPQn&0G$G(potO0Lt2j1 zDSh*o)s(G#33tdKE%9f!x<@b(3gxcLsS7<>VPamJOCJPGh6+t({KJ(TGZbR6lt28S zKY^31vfjq~gDw1)L&Mor#c^z65t5)Gk7EZ&86XbHo!I8J8LoIpM5~n5*5>4VSS!;) z_b?2fyiJPrHJjvHtLx6v`j|uyutiVK9+od-8CM%VT`^oA;}aF+^C&hbY=%TL_--3I zxy8#7-^18$oTR(s$3eK;L?`47ap%@Y1zaTbSgIM@%kL)U8)@ta>@e^qK4&sg5snGi z46*8Y!(bg~`jAF#uzV&ctEq-M#5+F2qd4xbZS7H%&gs z%yJYibW~29Ox6gPq-;6c-6{C`#%sNP!3N%ARBM2<`~8btT?la&g`Rr=E)lJ~OemSd z+U+`h?TZqy(9PtBxt^5-*)S{h&CbKR9kyv}sK!&Nn;K8I3z3oZw!y4@78pCX>0?<) zGc$|$;YQC@A6vqb=0q8R&mP3LRV(U6HV2K1EPKu`?^5V{$BcV_YfoO83qwT^fl`xO z6MkFK@Mi)x@!JQ+tyGxhcpp@$M~apSGM?k>2fvfAInXG-tw%kWm6s9~W`k{fKX52< z+5piowF1F{;xsp(+nZ*dK859{O43L!ildT^Zm5vCUtLsBVe9baLWFch&CVrLu1yg;@eGadD|A^4SdL z2Q2o!IMe0sT7L60;}F+8G=O{c@$18^}Km6Ph+e5~Q2%~-WI`^{A zE{S_%cg~I1Tr;Du-XHGHBdXsV;p|CqoMV@>_b1KIW|RMG7qm zC9hd{6NUBt)2!0=;oPnQ$965Clcf3zt;$R~6Pt;2q4}r!`@I>8*GJ|ul@stoF=^ED zCBEvqjj@JqhO>`Ksg5p`S{8|ka@`(L&5>tUe%i^iG&vX(P(bfE(sDL(&M>2k9m+6A%kzMPCO4LqNCn#7reCtaK-H` z57l=F*QF;+(?XWf7Puc=9}L8LEp1m1ULoe~KYcN5zf+3%WX=imnD!wLGJ)iatMgrH z!#KuGkr`k;#uHC^XdFEeZE$rolhl0d4L#DA?3y`b2kT7?k3hIpcYVP%$>E}SJk`0Q z7~`WhurE(A2#35QY62XHnwhuN-ZtakGt-kfS}JN7##B z)WNy-4iapO`ThRWFvlQRCEGwT@M<&tleCn3gQ3rTcke%V_^{zQFfsv){dra`!BqV7 zz-7pE1&VROko#F=(N&wxrYrUl;4~Tg+&FTiyeTJ%41FJjVjP+DZK=r@SxMbh7tWkM zZ2|0X@Ev~3?sJQyrEe(XJQ}BJXso06Te;t>#9f8ttjt zAU09au{FU_zGQ+2YRx6yJB~IB^3<>N-(RL7vluUMtQ2~XNq8o*+kdGp9|UICeSD_v3Nz z)?k0RV`8DQwLAhJAvlh|V+Gu2*Fr8`V~ikg^JJgLq@|%un<&e53?ky(%?~m1^l7-E z?-PqlNKMwV+;S_7qwr{nDr~3(te<4(J`P?I37iJjIswiWbqv*%NyH2BH+U_RIhZX? zAEz{$&9*%?+U`n|>rv#cH0jrPMYmNObc*cEndt&w&60-kgwyx~g9QOR!}MI*Me+vN zPxAz9hVO2yeSFF*S3VbS^@^#2b!cqOHL|VrJqN5rYa3?B<>)SaaJaX6YB7Qj63&9a zePw;LJ0(;BAc(nrqSc}-AQiz&_?>=|x6WRYeHxQ@Zl>TqDhY52Pv$cWgch%?RDA^n zptvFAr=j_rI>nt4@Nr5q#@#TWSkeH7k^MNCI`vhG{Hu8A7~>mY(z7x$OurGysb1^C z>Z`M^Ve!Y3H<5wy^1e+-+*rbi!qNYL#H!-tSO8Y1f<3zb!>3R4#{G%eORp=sHQb+= zTeDaA;@OSfW;ac$9FG6s^yIDTxYWi%;t%SUnVDo3?CCQXR4f>T(sw$NL={UNgScn{ zr{j`Dz3}%xO;1}+H%7GBxTX`Gcqk)VLti$TePt&CUdw7s<*6Ouf@OYnf?>b#Hed|%L1agv$V zL*oY`hen@&@*+qzeUE+3f4euiJpqn~Q#)|t!queom#?ehJG)8DuAi{c`bgz{X8ct~ zYS4R*LXDzH4o8Evt)(Frp-&4L_maF;1U=`TxnWJT-8eS;BsiV>^1iDwU%hCFXg!m7 z3j4W1AdRS76O~|ITS*Vv=rWJP^KjYVwi?@}by*;LGCbd0O}SflRqUWhoUVoJfj`4X zfr_=-QyZHjc40bWwd#`ORD5Bkt&ex^5+vUI$c*P3|1nVz0DaMSgt3E@bRkZGr0%j+ zfk){K7m~pu%ZCV?SOLGo{p|kyd5=co%iD~cdKFr9_O$aKE=gr|Eh@7<(jB5(m$Oj) zv59%$WMUBBHhTBunfgx??$d;eW94q*jQhIDtL2ea(LRkjuW1O7wlqeaoJsPs7nQ&>DE9 z3o=n`qjQbLTN1bNG*0>5G|Bd#91JaOu6nnP^a>)zmfMB6nYJDu3YYJ6L;^e|t_7J) zCwPCVJGc9^S(1a+2t~j0nAiE9$GWC!bJOIivwdhl z4}s-;^WDkqaj(_4U#6>lj{)5oOx26x8U8n)s@C4mv|eivF?M3O{*f{v%x*9(ZT?De zpsas*s1^%>4ToAUp42pn4=)}|XCmw*Pd{} zy@`TbuQ#|M&UK`?=G{xaZU*|WqwP*3r4;iO0g<#aL4uf{Z%XGKUySQd^z_H^g+k6+ z72j5wrPK-X_VOM^O5``$we88D&I5yd zwx*!UlE56c-GhwTRm3q0tyQ^%9B1xn5tV&R-OJV7+*^I$<1(OoA@lvG&`a~cK@_)~ zZR%*vAq%x#zembaM4Zy~Rcm+r3ov8Xq}(vy;5hQlg$Hopv9CAjh4*JpPsgdv=I*kPJ6jN>flzC%om zOd~^ueMy%pQ^e#p2-`a!6DtJG)RmH+(L=`TgA@V!zB@s-B~$OKjr((NZfs9S=~OYE zD_HZwp%HXU?=NX}Y@(ePe|TM;O{(UjTY(30_XENE5=<*4)g@-_@u?^HtlQ)Hdy^&Z zB>O{VnaMU%rkZxX%{6=mw`Co3>JY6NU0k$cR1P-iV4$G7X@b+-Dd(ED5GIxHOE%5) zE%KAdF@jqP6#vqn7evNNaNi9MHozpTzZq*-ds z&Kk8$-TR5|EM;0UGGVN3c&0>pl8nmIGhzk=@TWsx3>3-|C&#iKJ#Pnp# z=ZBc341m!NygkJoHkvAPij+_2Sy(vB6|&D6Sy@*8RUinVZ$XYU$kaJ1Fr6olIc^$F&VOXUAM+z6@%!F)gFdWs|7)vE5ASOH z6LdlgTm3kpC3sGZ_AKQc4PrWD%0lbTQ2tf`O%-Y2wq$kRMzRVnwsOE=AKVw#1UgO5 zQQQ=8;_1i!A}58(7HFL9B~OE}XRHL^t6#W@{>ZEkou zFmASEP05wT7_OdNQK(x@)$q$7hU@z#P|NKWXovB`B7W^AE!QVYe%(y)0dE zc43za5;EU!DdpRG-kN(-O-U0<-kHC>?H(~pLe4HRjzixE%uIcsDv`ekVZsiC5d?-o zRXoAI*J-~XIK#rRLe{67tXk_6gaVB(RN_4OF|$Sh(4qku*6zfu@cXOA7Wpr&A)iRw z>QpbBdlARJM@u-drW3f7|C(|ZBs<7zpn*nYcS3F_8=t!@9E&6bJ*@@@;>y@#Ggai% zD?Gjh^9LJaO`kk--lE|OBy>i{P%9h9;|we=y7$3mJAA^T#v@v4&6JeU9|@K0rN}+z zJ5xZKlaH(-nG~|UatvLi&tUScX_Y!=JUJa9>TODuwr~~d6u_)nU?k%JzWp}$YP>jf z8Zi|g9!?NL*T^t0;xz(NL5x^NETLnhv~CNRS=pZ&n}lA7Tqbi>C7`7}L+Q3WtXZ*M zos75Ft;j9&dFaHPQ+N86;$6pk_)xVZ9Q_v6qO~{~SzdCor6Hhr=7b%IMOY{B?@dw$ zWufMTjakXZJ}dR$)9#noQH{N%9Rbypm@tBFWB8b}dcEKQp7Nz^VV6Y>fq@!Q zIoA@EXF|c;t=1nMN)nA$gM`jQ`@_PdRum@!n%2hp2%kK7@M#@ms$H=vCSH&`YMBO; zOt)plGdg-*U@1bQNc_ZGI-B)R4S{aq@9ZseC@Kd@%{QL{ukOv_ADIZrH|WI1)gR!> zoRg6nf9tk7?&=`QECvz`Rxxz4%nc*2s&;Q97pGU{sinFZM@Gh_Zh~0HJLhyjzVXhA zHHC#wVc z6=LQS2?=LL||m1Sv-O0C{o>lG47` zf3K$_>Si<(g$0cFlOT-Sf{Y!TAO&%cZE4V5o`D+?YaV_s<&jQZSlGJ&&yb~jd@>P} z=mDEislBL~w1*&dxjo-FD=MF%n3%G={+Xba#Ovzfy=pCnC*wX_V?{;{q5hN9J`%v0 zpH!wY0LlFndQneG(GyYT7+3*?&BS8+`N`UYtYHL^l+*#8}TbuGLSW`;JB!hnw{b6Zuhdw)>8Q0;NmpZk=zWp$3m z3rTnJpDW`HV$lgV)8L7K*ENFc&Z>T-x+)p z&>KvlCt`NbemnqRQugz&fAgZyw; zRYFY?GL-5i6>tA|io1ewHfZs5VO~S%*|0JO*{hJynn!Hv_4(}I!n=y_F?kSU`sH%Cc3x`32$mar`^<{c2fjt(Ii(v#R zf_lLkQ_N`?WHkkC8wPlcX2fb<$6geR$$o#^fvF=nkd2aWH>GbCN2)eWDul?04P>gt zd^U&76*x~x2wyzoDX6kwaIpJbMx5i|MvH^kNAVpg$CMC48d;ycP0fCr5xhoGuT3(} zWJ7>i*a@lmnjQ03E4TEPA@m7RA^z=xkY)~8l@=hdAJ`ZoN~6cVw>)AqUZ|~+yW^dp zS4*Ow38I*{z1&HOz%KX7O=iUDrM?^YR78Xt>D_EitC|ZCy<5c{=u0^tmu#qQvn?94 zZwP3(VscJxNK3uw)~NB@mrU*N9X~7NePlgY=?H&sD zTP}T?pO~7T3BT*JUEe>?h_S+@f(l&j&G&(!&8-F{oJ2(J@58? zzZew|UPpCFM$zeBXV+QZ+Y!fWHJe`Pj}n-4k`#6+U~eQ!U%;(+fl9 z&oyf}H*3z}tg!@Q2xcMkVmh%h{pNlcQXoSqV;{~;cD!>NkjV_aYiCx=S1)s7em?go z$@}XS&mRX{4G(e=;~`44+jYCR%BLqlrq1ew3t&--IhV5cB<=7l-A{`1WVM)msitfq zDhvoI5OkQoSB49tYXN3BFTgi{8|uhl}&cen#2z$}#r7{GQLZweNv+#DmRY{Zu{u+Cc9 zETg{m{(yCm^^@s572NmoMiijxvk1lk$4O2ohGa6l11W{1yM{*ZfU!Xs!9Xz-%28M~ zk5i5rPaJ5xHZXV;!-_k^?TgCXXup5vg3xI)vL=16<+nyQW)}vC=tZ)-Sl8ZH5;|k! z)Vqzj#2M#bc3FIM?@qc3A8a~@ENc#@Fb~_@Ec-7tDQU@>B~2-2K!z#kM{Y!A$Dm*cWlX$6M16+hYU^bS8{@!5VfDyRXEhq_fiuzU=hxfYQHg4-Ua#7Jm(AZ zr8abw)5_Oz>jwRS3w=uH3GHVxQcqYA znNe4zlhC$)I@6XBS{U~9%WMG<&1yK7`Nu?31;V|pRSJp<5JB+SPm0UcWz10l4D+S; zwZ52Cf7sGRJ(%~~gjcF5fsG8o0!8Y}0DN8O^{rWYN^;I*PAW?Y**e z`dSTEC;2?q!QR%uj0`CwOX;+8&$nd_Y2whfisj-F+cQ+3Kr$9VcU`UI zz9%(h`cNn1DZ1SygJPm^V`F3fWeI!NYE!lL>#y~c-%a?I&yr)sJYZGKnLWQ}%!GBZ z?GY+Cu$Cp?0Xbi0L~vVj(QJFdsd0xS5f2XO;4Bc@1z0foKV<2|PnY;IszP|Kw&6+# zFYa5+ia}Mv7Og_vH?KV(009j1*J3zzS14rc)xegMI63{0d-)SUyB&m-XuP-k$qS9H6NeTtev^JR(D;x^gEuy71=jZ?gZ&Cz7NUldb?s2y!pW+`&sd|NunV!(;D ztyi-}On^x{+)~~kZny_rR=s0PTiN%1ojck>^D)gtt^M7*3DG4Fm*c-bYDvZ|-jQ)# z#jqZ!-Uie`SG>n$JI+D9=37ChSqr%f1Gk(A=>Q+1d1HuM1|Ud~Kbm#L=IZf644rI0PvteJh;g$oVAQV;9Z+#&12Ssh>?Lv z2qS`sRZ-M*_q29gVMur6OjbB=0GkTP>ZmPz`k^Z6EeOhn&rxb^QvNAN zD5$y=MOUH<83lAV+vdrkcYx3cE;sjnu1GpqXd2cibJ=slLV%2Mi{&gQb2~HltrFF> zNRSPG*?wobHj++w_TFJM+^jp(CJPjoZHH3J;ICv2YkuplzHFmg)>C0E%jgMl{}O-a z+GMfq69CoksyO9)?|khN`>~&9Qw?HFzQeTwXF-18^Yh%7=$xo4EkEeFngX-b3NJSMb$DYg=@R-#8% zM~IO6I}ZYo$aI)C7MdPDs_3JHnZ(1F=-?SlfoE6#wdE3jDgO{-y}bUC2zmVp6YMjc z@;YyCo0fkbgh6HPU(BtDd#!(M<-S3`k0i>+efMmp=A?*4rNhWMuZ^3#s@mX0%RIhi z05mrvw~y$K~DKpFb&_K{dk4P7(_kNDwvkj{%*=;nb$FP7xF>H zq}^b?M3H~+l*9o@d)WjC-hIX7&PgZ~0+@vSpw*@HT?kGKGfVY}KIUEy*sw=x?eNH2 z;I4Dji$|X4zCgN_!Y)-MJZL&!wkbEFWbI+?#++|Aak;B)8hBGvaP1+S$=1&c&Cz`+vX4v+$zgIJU?jMgQn z-v>Yx->u74ojb}jjz9nTZI&ZeXYW(})Mq8fn7Y1R*d-baC`E&X^MnL=4q%po@1Ju@eI1MaqwuAAO{Z7gw|6p1`_6JPJ zO@r!!exY~k-_B450K`RNvSa{)L6HbLVSzQ}P(8!e>TN*JAXjDZ9)Q_w_^<8Wbt0y~ zh!j5{krc!Bg*;eh%oxLw>DtoqG6JBm9uk72k|{xTec*jiUB4EFF|X0QRznYJMwZEX z7NS!UF2L%MRSU%#zzyh{AO>=8vK+SCUV&fQBKqo5eU3@i?U!I&$tfwxEjzhpqNeBoGO8d1?Q%QILz!-&Rt>4U zQV$|*3LiOw#4gu>b%6h5%&936*S-$ci7TbTde^4vg3Qk4`X21!e59aZF*t$M%-o7` z(=0~)+O=$C0rE<+M^nppnLPWhxx`~qc@#(bDWRdEQNJ-X04 zFR$mF9x~LI*MP7AKV*8f`H z&WixlEKe_!198dDX`o4=Bcu|lFimMcM<6GFC^Iz1$(0s2i!pV0IUWZ}m@_xjGjg3f zj8~XXbX(ML*yg=z3(jXzU|9nd1t6Eqmk>C`JDw&-oZ&;%Pzs8=!`w021uOD^7&0>w z$wm?lC?_J+2m**~vGIrq8VCn;PgNSd*dd@ZZXO>R0~1MOT?Q39I4hGyB3q+Q#bemp zmkn!iro=W#0Uu;O@-~0^qpbK@+X8q?8^7zOQ`1XcBFxUT#t51VxD}vi`($CHD;#I1 znTUmPvo9K3fd=9XbfzXb4nG6G$Rk)n0za$Qx{6fXyc|HDdw$Z(6xpv-pd%n3i)mTT z;FLTsF)@i9q0pIsH-c1^FyLdyg)Y0jHoFBA|ka3K(;Ff(Ul4 z=1yIY&#`7o4aP7mL#BKHX%_i!>0YT(JDAS4t?=RJts5q*x=#u@9RLcU)q66P-@Y8k z)0s=DCz*jh))R1=RU#$XV=s=1ilP`YWGT&~yxPyr(=oW!pXSBBc>*^m3Iw(j-RAYp zACU9lB;|}Xgq#O9;`TZ6#}OG>A>oufk}uba+L)4_kajYyp=zF6jJnI#1sA&)-ZS`` z(wTxYA*V>P)w75sOmX|)THR1kP$;Yp>&z1I-7jfm0Gr@ej7je~&lE%V_#9N_v<0eqcR5MjUFP|;-@dpD3>c~3UEJ?>uXFvDC|L~7Vu4D=EZecN`wJ3M z``O{gxM!c$g#o9Nu6P4)UN0O1Raju-XZ$hPlW2}-P5E7ml)1#(DTSk$u}7I; zM^pY)z}}J#B1z7b5|-B`0AjZH9f$%m7KsKAkmoo-W)WNn;_#*#BTy*n8DxP;Sm%Hr zsRb&;Y2PpbR_L5YpTe|)R+Dw!GDR(w@rWH0H>lSX|41Ma#@DB!m2weK42R!7Gt`O@`V`yuriuTqQ0%>4P!`p?O}8WafN(LSGcg0GG3)694jozPB- z6G1NEmhTdu2PT;x%XNkl)IF%%e^&$8&*Wf2)el%jP=^}NmAr$cl#xs{d{D^L1QbBD zm=REt)VitKZ*M=gi%K#ln5gD;y$0i=^#_S9b;&Lk4grDdozCD;3ByNKg$ZNuY@F2V zxF_pE=*KJ*&9Z4_PZShQ`2Bd+X&VzkntvKoIOm$k{#qTM-=X*O{FH22hKpZ*z)KFn zf!R0)T&t}@8kLh1*Qsq+3!U^RNO)zWB7RrDVD zkf`c6m5B63Y%NjVkGwIzc{bWg*eT>F((X@;nn+D(pNj z4$?;vi~GADl5N5W)`4|?v*Ulp;9Plf$jcmwaiYVz&OXAivC-SvUqC6(}sA@u%Fa&Y_vVh)4+rCDgiiKYH2nA1Gq&Uh&@Ol8X$g1)_0ITYNA?;udui zw+cTzdgNJ|o2x#B$|aKh1zv0|^|-l-47$DEy*5p*1ceXA<+y~x?h%JrX8Uez5@Z|x zWv5cy7TeAHfB4-Lq5U}MP0W%kgoQ|va6MZjXLl#K(0U-38(+gm*}{q71ww0Ov~*!( zq36{M70YIzbizzf8VTvBOIAE5ma#@2$dlv(ckjG3$dk1aqBQ_cZ76n#d<);6(9F^3 z@+=ni?5aR0k~B9rlO*ya)=sX1I-}^<Jjf0}%Mj1XP4TRZ8<1*fL zWuvfa%6gxn%`QUf_wJYKMxM4s`|YncZ8Wn)X?+i0t00oXXiC^M81(W0C9&ULunixV zxo08r$41;@(ix#Yq)xiDsAX+-WsxhC-R0)`k@s$#9agVmp%HZM>;p~YWI@5fRS-jB zVb80)vs+zKNdu>hn642OI8xGg1G*1!xaU&skGTQzGhr?{=iW`bPtui%&&RhIxPy86 z-nd!nXj(HN=U@d6(}=vH`P!(YU0Rh5gYYXMWU)QPQo_YLwL92G!i@**2``ump7>}R z7lTi6lUQE4jLy41`NP{k^PH8QR5GkLkW|oGHHDr+`zhvUqw6 z6n5b6Xgs0uXyi3(>%=nbRa+W*)6{FO6jWnFN||>1EY@deMd_49<3eUultG?Ve{Pp^ z;0~W9&GXx*5)~g^ByCakws}M%4#M2f180t&k`DE*L7(33m1~bI>~TFn z%@8wVaLe6y5-()FhCIu@=HFqq(0!AebCOD`gVL|O!M5}>IjLWDvTI=avz~VH03P4m z&%}Emk)$|y5l%U5@M#b$IPMOpjxRSHs%V6eh^yTt2u^99?6AiV5U0d+2*N)bnB>s1 zk-=x&Wh$X>*B>P76xRNgchF(o_=}>16mB=*9BPPSmxb(w57;ax-#gVk%ugZQYg`e1 zXxYm3Z8&$}DDtK9NN}DfN=se=4!NWFI!+CY(X}i*1=wg)e3QsA0}$Q3ZPIp7@Yg<> zl>ijZ`B{t%cT*HY7P78yVUM~09pG3boa1Jl`H5~==hTFWcx~q5>1%!MuFR@2mp2Pq zcRQzVE9%MGGy81zAR8|bb&zF7Y&T$5ZDFDdHZE-cSi&h?E@5=-oTqREMRJyxQbsS! zXXWR7ueNO!=o92s0a{m`W-jlM_}V@q{!m<;VEC?%QU`DFW1f;mTd1nMIBS4mM@HtG zo`F~11{E$}t^=1Gyd}e{j13$yr+w<+F2dq(Phv}OER?XXRqWd#hfyy~cY7I&pNgo$= zDtnVbqYu~QF!80HIJ+^Bnrh#SXk!{TJTHEovRx?(f^DDzizFZAzjQ60SWKg1W)5im z0~`fkt4CzF->)`$mM%^Xm{W#5zb*j=m&g1Wz@jB?YJWeL@1Coo+ibx_Qjv9R0%w-^ zr`H!DKj#`FzodKZ3=?SK-_)Pdjyh?&=NOzW)8@0Iy_0YL{dp`@5vwa#R{(Nm`8}xI zv{@2}X&UwWzpyAyU3xvxP5h>Sj#>R}h{b|_ zDDO3&Y(|T>8PmS>LJlhgRr+OETq3&C?smGrAH+|eyoYNRD=vElB0JELoyuPNNs63r^2FNdgjb5*bGMGeXY09$=;|*Eo7c5(fx( z`hZVD7)PIUm6_3Y?sA7FCRATnH}Mp^@EAEq@4*4uUD3qKcKio1I&@p2iowr@M83zb z;PM((1cP|pnkXf0qp6IUIQum3L8uM7NwMyl=nSwaVaptLk<_>^t01H{q_jhe&01SV z0^SN*aq^fX*cK(xuufBp-Ep?JlQNc3QVb_*K2rad-TpH9IbGh(wLLYsf*c1P*JRok zu^QJCLO_cDB3@jHoU~8@7C{4IReJ}qt^4_Lvg5=a9CD$SFZDFaJC=I%911Y3IJD)a z=&@wuTA5i2Wi!rSXrbO`t7$92OGKC1e`-?c)V+&@24+T2tM!-eN=PqZ$C>71|) z;(T7hN$}NhNBP0mMA31M-F2=%f#Ns*fm)6p8}UQU6{_XX3ZQRU-X#iJ!u%e&J-CU` zljK(k#|0JnsNg)XwH`6hSJTecI#L*v80pZ^XXpvrnpa~(ica6OA5_^4TuKr!`P@se zpRISFpuk4IVbAs-w=%t~>+0%?_m{`=G5JD26C}=%DNUl}_XCw+iWJhn+mhiY)a4tR z!1V3D0s*bsuNg7sSVt1!_?E=bl;u<&iaS{lP^F~$wPe0HsF&HFa9H|8cUuh zQ#$v5*S)nDl|FO{v`i|=<(jp0MP9kbm>ilb*Q?1eTmZ{#ZhMq&Mu*95N!o884YeBS zggO0XXKmH7dwg}AM`8X3J&jL0|I^_3v;vGHyY{^r;zdp0K%zn3v7Si?ujLZc(C{d% zK(j7+d)(_$D1&&+DG0SlKfhD@YsA1~d3EI`9cC?L9UpMzWr%oPYa}dv_}x@$Je=L; zOVZ&FQz=rHpmmpfzl_s-qL`Ol{8s-4Umfe~8qZsRlwicI;K@94ib_BbyQppyKyfb?_C1puJaTn@~@G&uWUTEnjTehUR5qdEgM^;vM2*zJdF`4lF`-_Qn2C?tFIdjM-vP zeX;zJS}Z48W&9YaQ$))-`G9Og4lkj!3{Erv&EUFL{z_C#h$mcKT}s23znxvXKi~8@%=B=A^+hy#_oN|-yq;$}B+uBD?R-}(Pw`ci+|QCq8$vuy{^l3y zO!*uI!Ub04_?apn#gZ!qN90KURVx@a1h2gqV8AF}bOt^XSEsnIz~7)Ge-|I;p7gSd zC>YWjgbJSh6j(560lWB%V_BaPy7L@u{I&#He3`+qfefB!P&ZaCq0af<{PQt)f3BBH zk}tNDWZjcg+Uz!g)HP=NQMVw9qoNIuJUtAi#unA&6M4}5{j<>6V5*WhL5M&7AE%#6 zc}ty$Oc z-x<+9<&fwE<7uMgp>nYXTy{Fvb>4@T&JLs)%z95sj4JILJCA)>_^nhPw*AP(XrQvW z;(kZP5R?H^O=Iwn|Ggj=kSZ8fB!`~&X|_my61&)M#CoRcW66iT1jipGG&L-!GzH2d zjQj&GnQ@d4Vzh6J0qc4JI3g`Ag^$pjwM8FBw}?3m7CD)hu-7X>2F$o0iQITl}gsLz~NbwFRw-AZz;_(KH6mkdi$PFV*UJ0 z>*Ha+$ZLYI3qpoOVQxKBqQ+!z@&O!yrA9rRfRaBrJb2z(4?{H&RC^s4cG&?BjZ-vI ztRjA>WAy;E#fiOqgp;AX8d;?e;{otTI4CpxV~$`|=_7<_$Hs$YVs~Qr&j25gdy)xR zg7ghePceE#=M8NNvZZXqIU_Tzo9{-FZr-`PIh~t0!4MDpXIOgU8(KPfpX! zDwQlFlJxHpFmVh@qnjj?1Iqq4ca!y`a z*#aJ&rAt_+HP`s)6?d`}&3E*HyQ@@{dnDbZ;ojxV&DDO8rb(61_gX7#on$VhBDSfP zxm3sJ2Osoz{VCE<7V;!p!SvEXj3O!bz7BB2iDclCaj_PWw1p3$!}?OB=kCX0JX}paD~KO(FajT1j}wu`+T(ni6pP52c@X zLcoKsb6VQ5qeg$7hayxf_Hb+RlG+dAin>m$?3?eLb?6R{NS2popQYMm<$_zDp$8N* zjH7B7^h(HVxrK)tnMQm-QTxkw(v{9^3NAP4vacS>Zpz zcmcW;-t!#PPoc?15F)!8UrVzxByyFc?OQ#lCvwpQiir18W<{zQ!cn*LjI~|xNb24?$K~kCiHG{dWdQ)Ej-a~9GIV#feVq) zwSr?dY=&pT?VjQv$`8s;QP@4e4Bl-8O7QR0Mh88ZDroBTuy}J|C?yN^HHaD+&XU<~ zr@%^?O<5hz{PB|S&ec5TruaX8C|BCCES;@{je)xE3x2n0rZ`9ykM5;Swo-ToP@~MeAm&K2ce4FG z&J;A2ZBjyZM5|^i;$+X&934wbI>{y8>zg-utZSk!y#jP-k5euazt54i?^bci2P@&{ zQ=rLXkD=r(Vh@Lf!@E|0iQ?|s^i#CF3Oc~ZL-h+45C!wv*55$_Jow}YQ&c6#=05ZD z>kR4CB0Y8*zEG390q4R%RX`d2l?jl#|0oO7i3HmG3M@RMWWU2njG=vXIRp~M+OVlG zJN#114BigyrWTAsZ6{f{HW$BXg*key{g{GylsQh7l~XQ@_6L2=-2zCH9&K~_Q*o8V zc7EK=ORA_2Rx@$Hrh1SPATop4}1h_v-S^F>HTxRm{SrvUgIY> z2vWAtmpV^}s4N1dAw=>J2#gLVXj)A>8+aTc^znQ(3=r|W-T6$I^FpCUb@t6pr51E> zjrQUv>3R)WA2EyEHyTY>IPHKnnQK81bX4~d7guqgkV@fHPfT@M1Eam_m5!6s9oJ$8 zXc8`Rtc9HJk$L>nWryj5tRFA233dAu*=o~s2?Br{>y_BsJzgrgVXMQ02Gj?s5Cm3EASi z+T4P1kWngnQGiODgYl&ckFhr_XaDToz1LksrN(u)pR3|cYOfR!hL63OPRT6}9mHLB zy$t$v0GG2Ofwlu$&I}_*g<>vU`T)^XRalX&1}Q#~@@LaF8#Uukv$-s`x?jG1g}>-i zRBb%nIwN>Gk9bE4v2K1wG!a8YtEidGBL&I!Kf=kaB*!PM1NQcZRBxO7Xq^B|==7K9 z`mWAxeykJq@*BItmgO)0kms?$vg!>I`XCdn57g%1vb`U5TBO5Fj6H|ZcNK3NVRarw z++JMtmzRUu4!)J&b$2QgQmgFGiwwp%z#LXsJax}wR*rbKemmqt^uSD}l%tS`DS=?$ zw(eiP#>PtOM+{yW(S3O?lb`sLaeNdH!^I16V(DeLLYBdrXkobuo8DUv+6T;{gd5rE zylwYGO&{yCWdXMrZ(71&rqp*0W0G%=lW~wBsl@^2Ah77?*~E4}CR<=7l5697X-ukz zq!#(YF6jLdJT$=LVVy#z$woa2T3_bW!yN3HH=3FhZcE2e)fop*kA}z3?Yo6&e@Kc( zxQ&d9y}SIE#1yBevi7LKZHc+NYb9?gBt@^^5f`*2l8am}3EpFbEMU6D9Rg^>1O{0A zw*5NW6-H#)A+sRh3|d$B3JVLPu~@a45b~qJ200MdXhDR4(0so*VFu?D0&{Pr^{lNAWA;VtfZEZT8j8`gfC)S2vEx?8W!8Isyonk2V zh+8Yr7GII=buoPTDeL(UHH`$m;)r#WRvud#gf1%aq=hOs2{Mf!!Iq@`DkC6{# zRU88whw~qpm+W_7F#8>^4edr13kJR{yaGLKlC;1T>kVP{vkq9qSY+&wKQxCkl3A*G zmhRMizv(#k?DaAJANd?(vAo0ONR?RDt?93_fz^evN@!77u=tToslrmiqXbwiFuJYw ziC|ZA98^VrPLh^MA zS!Esfle@P*|2n{-<)`(TT+ib&_@pLtSJa8|gCQYVs#3bx$8`q36$fa1p90a_q@)!D zMj{`_^)tP@5LC~-^0c(J!;PBLk3#xv<@IAN3~v9*GZt%< zws#C1KBXeFa&M-x=YR{B+U~SryNo0yH+r!q$oNa%>oL%|)Hq3DKP5y6*$u;5Qm!}P z7ojbB05VQKq18^WdV|PJOrcJvikb~jrAPk{uRnR$7KBkUr2VZmsomPDk>c7ev0DHFm(t_jusbI}UQlf))5{GtK z)`^@*Y*w6UN<ew(z1KOe%Xn1i+(f6qimv>}`|N0y{;Q%OsXR?!YlegP zl}ZR9xpS64lEUcdvbbqpWCU4Dhu;_QQ0+7hx!PCP(3s>P_LVW-PtJAI#OhpnKA8$g zSGjDg_=!%iZ$>d4P3v|b*%A7cs97UTII```(h91efM(KT8k7n6GTN5(7zuh{<(c*Q5)&X{lL4#Tbo|*i^*ka#3vC7_ z*lGVck{2{ZfZu`IY$_r|!r#c+Ps=|dE8x$gXd<^iHS@iZ@x2(?Z-2N;`YmVxs6&|2 zsHf{hg#D)O*Y`^ZD)fB#RGn)!EcWoKunu?`BYrliCv#EB6cj82PXIECoae(Ckw(@b zpmuO@6!RA`JZ=m?clC~BL2wVdy{2OpP3hb3fyF=r`$S*n$%m%7}8Vi@PXtdc*;kUNFP47D@ZT8T$;Y{ug zvMLgqp185u+7&`rW{wCJ2&c!6rLFX?E#yeMJ8XTz-$hVPHqkvt{PY)~uv8#BnuvnvEWaM@6}XHPMh*a9fV@S{R3-2& z(yX~rC6rz)YNw6+O*o4Ps%lBuDr0YdH)-mj@ldnUkC{dOQ|s=N!hin>%dao zSx)#cUYW9Iup_^o8qpJ@v!*E=B&!omUKSog;pz%)Ia{NFGj1pQR-mXUPor=>O><@a zXkzB^wSRZ?zZ2f0r=9+4rEV_$Uko-`i>zZp$VIW|WRq}(){nl2lwply1}e5;aC}9R z&J-p+ZcbuY6jQZrm=AYQ#ju5SBbu`%w7V}{?Qdyl5)UFOokZTh4Va5kf9>8Ec6w{k zc_fDnHh|&Nh$7EV_V)#3IPCi9#hhm*xayd*D7JJt+)_M$cHsc1p?25uISrZvL)8t& z%(IbHq28L$v-}mg2m!&a`-Xdk`BXOeon`;vwmDY?`=mT~Mnjf>M094tNA1*SPDwzg zFA!iy%fbF2OZpMncc;}}0I}O_j|0_Vw*CHWmCb$Q(wCXF=#Y2^zYV55zdfe&qx&@C zgS>(e1YE9}AnG_?f1D;w7ZHMXCLomt)-bh*E%R{1eSZ;Zv@1t@=zHA~i=MP0`mYsj z_5aj{((}ecQGT1d2_}O*EWL*bkCLf7N1y((BRYNKzXTw;##_~`{lJl!|6Rt*9MAcq zc2|U@3J4$p6<&i!TTu~N9t%T1h;A1O$J%L?EP%)KiU~n*nSxjFH^{VK#D|w~hgd!D zszp7xY;~0Y_)2}W=N01P6%fv62)I@+W5oGYFrxm`)HZwaehQKbSN za9jfwB-W{Vm7aD@`Y{6`6nF`eLrA>{|L}xr0!gN0Lp9 zwZ?`k+CAwhB$ZH+v+?iE!h{kr>Jt@GN*0d2U$eq9+4RldItrrRjVIyIY2d}SlYrj% zWenj5K1C~h88b@BT2G;a_?eogEVEcQR8O9M%7C8&MOR51X^H(b-p&ssyQm~p+Wety zjZd210*GoPz2D&FnKzb78w5(E&n$;!qvQIY{4gX=A$vUC4yoYZS40K6x*9Cnn&!Z! zRRflCEU&zFjo_F)|6xc={q=$ORA`{GrP9DlYd3sKOVcZv$c$*s1W9uD7sc<;j#}vA zO7Q)NAo!tWp^LT*GQRqa&kh#bf8jPH0L<-}oOxe-$)49}0G$XIwa^hQG8o2X2fP5Z zUZ_0+Y+xH?N$*I2dxd1$O6tIVX&dl`Lbo|aRRxFCUN&Vq5!n`|V6h1BTa{FxfT-OF zavIL{Q=q1dhFI(e?wx-IbGRI{^JL!7m9ft0WFJ6>$kSf&_cg2OJo~}ZW&LXG3$P3g z(^oInmxJ))?UBJwA&4YY_TtXR3p!k%*6Xxf3n*<|8%k)6I&S)09p4GPF$x(cE6n|yuIG3 zhl%K!`Rrxny`(_(Ovb8zoL!AQOf}Ye@j7FV00KevFe5e79{D400~wlM1m;mx|7_qL z3{9GlpCI0&hr#fqQs>DIY38d0yM8Kk+H8e{Jb8>f5l2cMm9}IU<3-e>xP-*C)fF(0 z$HC%37{36@TeKw4Lx_Llx6+pYKYOt?!6_d!2wZZ@lw?k-D@mO$w>bG9h8gcPW4F}z z^ie9NP-t+oDtD9&3u$`aZi>8ob&D!Td>C`0MDIcjwMi}d=*vwSMe{Nt zyYOGClp7=xA~ezV%oGc>y5_U1XRI@oBbO@lxqGUsf?%tKj6#2!s&4OUR9;PyZn*Y4 zKo<{%I&tF9J#X@xlA@z*>SqCZH#U0RLwWZb zQ2&q3v|OxAaDr;4nqJxWXWU6}1mXvp>qnW>Xr)jWNk8qyWnv6I4YNtVP6Dxr9HZoA z@f0TvJ|QEh-1g>VG|Z%|P0SXRGnda+dWONCSW^Gw|Y_Ga}-~o zo^0F+a9krJaqKu+*s-A@w~*h#h6&B#A89yZb_`>;CX0L0>6jp+&YBt;Wcl^-g(qE+ zBq5UMtHe)HwLM3bq9O%2{d>?)uIvhaPoB?8Kabd`8c2HnI@1zbBDVPccz%lq}B^zEddwfprw=7Olmkt@%&%7w&078hsGk3s#99bNG_I;q=SrT`Mb7k&=fH#i0@nnJ>Y^4j<8k$r@mQ! zvGyI%yc<4Ya_2l0W{Qyp0z>L#uxlx}qqB+u$&OjH+5%(5`||Yg`48v1Fq4ysVmULW_ZHvFkQt4&~YO= zc!5A{-b}TVjusi~>WpF$RYf0&j^u>^75Y~jhEgB%)q&#k^)+xfG)!pfm?2gi_?x>R zjzR(I;Rhuj20C!qHHFwgquwmwH(4$x)jvmY^6x$+y?7WZrH|bCnq>sndA+e7EJraN zpeq?WJq7mzpn>5zk|R4HR|T^T$lzrSp7~1P=%!;va#M;wxxyve1nF zY~(X(V2#>Ylbhc%dnru^Ffp_sKJ`fn?YwBi_jfzClJm^eGn^e^79wXTqpBnb`+~*H zODtdI`Y8CADaME$JwzTwvJnY3mFrfnSdMgoxRbK#b3xEldPoJ#o*W$O!Hj1&rV!Ija$ zs{e~&m-nMnYFJQ?`^#I%N6E}YNy8Uj`XerK-BJmsc^jB>`svf4*P^W5LLv)8p^5`Fg$h(ogZ31Xu`ak%-Y-6s5K##37$wmZPxDbPyC- zs{Ig@NiuHo+;*%wa@DZ!C`y&?r_A@ug&h=psLq;$;eaZ}h4*z<*fz;RiV*5Epj4R~ z_5wV6+U5D3)7&#c^oTqLkmbF^hNdlU&a^Z7m$^PApoSHZfVk?Xmlr!MDBA1m+wk25 zt%+O=_;wmr91Wm9dNWl3M!I}C3OOKijTz!~n+&3eQcPS^k6G)PPS*Uql4 z8baXBKlbiB1%n^jn;-FW`&kjHx%=E5^)1Y;VHD~=Z3L^oFD3UxR7Y?75XU-3!W?da z;`ZT?GAgpKoG=IIQ$KY>fl35I#_qSXxed()i0zCezcO(R}_`-I-bG zDi>e#zrX*ipP-*6n9G9Qi*;9-3HX;1%b&PrA@3D!JHB;G!us_`L+u8v@$G>GlUFKm zEsMwGfA?X5IvL>%CdBU39jGoy{$V9_}{284b8 zj!Fa+0mKWySem(3>zF8cB=o~=x8d&|F){*zy)?&|V}_UD+_&uP~H1A!oVClDo27$aRsA}aPM z%$R`Xi(fY~^m0>_ekCDvj zn;aNiBgKBq>k*5=L^{*@iVVz>$|T!6O?PTiAJorK)QL|$_1v2+ws{S|lP&q;%YG@x z%()h-uVe|Ikh->q2@@7B) zMTD;hfsG0f{P19u7uBJhIr3GVG}f1}0bvO%J=lW&zeIaw(Ws`ykpg zTGbEGAH#<&prnaH3SLiqFyP*h*a>8c`cc-f#Wj%PzBenOV=d;j_hg8I6sq8p_ke0! zOYjbM1YHF!dWIUN$9U+@SMWr{IHyrJN6z3v=lKs<4k7(0($N~@D4-52vQ2mV7EB`f z<+koN5MY7bgH{xf!m1vl@Vnr|_U%ydJ)p^jSX;loxxR319Z`JpB!isw#>~%--q;|y z_i#ZiyebK*6!lmg``r#!!M8Cb!8_OO%y10o1EO$249PyrzES)teC-~D0D}6?n^N7| z@IE5SoX)#)9hMook97pNxLc%Y@)PI4 zkk9S-y^P>_qx*7r4fyI&eEq%wzd#_)4v-F}wmFeLT8Y6z&?ZukReX5RU}8MKi<48* zY-;zgiiNDNY1G|8GWM6t@p%t04>QHx^QJRj37EF9h=<5J^ zCE&k^hl?9>Ta~(fd|Y?A7Sug>vfO|67x)e=lu-D6&^4QLOP7;1s5dUc=LXFRk|amf zm!m3&=>D=xk;7nZhrHYcWGjJrv24mt6DM$$gOXYPv*&Ia3hD(KEo}GS+z}{KPQEWq zpy+6C&#$YybVD2Bd}!@7nj^2`k9eKkJiJS3$BdZ5L8jI)H-$ry^wptXB%(0?BCv@DSS`@{s)rU9kycFwaDQQPtIkAwA=*_>9NvB3?X)lmlL-@GMb+< zC{1^p_t;NmaGNN`3ZJSkl1ngY} z*p57N3u?6saEmzAdau1%frG^aUL|+1&CBy%<%`6(ay1>49b{#2H6eTQ9*(=wxUpN- zqWTAPvz&~fV+VC#L~b$o8-%LjSRL2GcqL^8yIiVxf{WIIKbl9g^q*rYQCTO(BPrXOI2_5??-_d`iwqE3k zYe)P)YaBX4al#i^1+{9Ne|+0q^^hfwhzx#2xjaEu_}Tsxb06Rs!WSx@S)k;3My>is zloVuSnCCfW92OP?Zf660^RaIC7QngIvhoCnY~@#|USDHS=q_Ac9a zaDN(!*zg606Fvj`|5#dfMBi)UgMNw4PHbv58}*d9@fOp2R@GS;J@L&=1HW3C@?-51 z^z-0>yffR~xu_oO--+)E=MpKZ8hw9;iocOjGDi~k_dVA!F`@e;rD;3Z(m$k}(`%_H z{k=o%aOiXQ@M*l5-US_|@>@@?EIMljF_;o=%a6|@XirHza@hn6u&vzk*NB!YMo`C7cn zk6hoq*i7r@PMm9K(y}NS&XK~{oel6T8e)niRm0#TYdS~ z{Cf{eQUcW493XuLv*l(Rixj83o&ZMO6HCVpq)#$vPi}5-FThGWU(ytQ?vpB^4$;(E zf;$7^gi1UuwdS(~O^r;VWMfPiNPKqOf zZzjsXioLu}n9aB;LhAj)fnnmSXNJ9v^nX~;rC^L-F+;^?g%6-Oo0DmuW@aACuS(T` zb?h=VbY)3Nf{$;+I{;*)uX^%ZJYO;e7HQ@1w7;E?zYy)h?L)(A0bY`2?4nR5Ng)?v`;o$3MY^qJ8*k z%NFb`6oT6kPvP(Vf=1iA0C_ooP09B{y4rCWlaj2amXmS zauLyqO+iArxVRvgatfu{lRJrrWs7OcY}yY4)sN=CO2i$yCUOlJHeFDbF@#l4Dv3~! z+)0F&y10Oz9lAd0e901OsA38a~=b8RFzTrjsREY(<7Bg8X1OgRp-lyna9h z!b&8AFuYBEzd1a4UssQ}zeQ`W7q_BYHMm1u)+KA(z`%fPB^|kC1RNKZqms6Ryh;1K z4n`Nv_mfEj4%kNo3fo!bAmDY)${*mFoAK@NLU6f>l&oo0YvWdFRL2z4sOj6ai~1kJ3niw*D^z8sH)-J+ZzOOeA9M4@`R44! z;8n&Jzx&F_(Heu%#T@5+#w(~=FjOta2Xb!Xex7}2u+hxsHBTUfwCAw;*kFY1jA6hc z#c7hL-EHdyLX}hUYlbLbVnnM5n^#Q^Xt5*MG-@6^`B*5WA`8v_GU|;lyXA6;tNAD)lY!WFkh%3UyM9 z{Wwm%oXO)DBF{=&qHS2?6P+gpHcOoBsK!PybSQt|Pu=evd27P4u=(!}i%&o3Y`MJejmcL`r|u7I5Q+Bc(g34L+EaOW^n0>l#mWiS=jg2&tJe{S zS@rr))F&GZy}Xac-e*Hb@$Qd0_u1yd^$0u-jnAx`PwQ5~cFLrgV$}v_@HmjAtD9$I zrhZoW6C%>ZLy`ySM`b}(6|Y2T6cqpYGsMi=riZoqS<)q97V|VM)(^nrU{T>BL1VoqGeSxEf^6#sIB!_=( z4xSi1=R?V25%fv5w#2l1W zp=L06*=vFWIVO1aRr*_?*K67J+kfXOxP08w(84OecFBk`IJWvZb=vd(`wQ`MvrrD? zvP#Mmx`k9T*6i&53gjCOQN4<#H&~cYJEC*0_C*n^761OY9Hfjmhu@Q|wT|6x*UNREq4b|g3yuPITh3xos)HB;-T_DDmcoDD zVUZaf*NS4v!{NUt3cd&c=ek`@QS+bqiyEaiW$>Q79qSYSeJ}9EIT+k;c6|0Y|2Zcx z6v5Hpv`j)rivK&j;MadK$yt<)n2r+v@7qw|Xj3-Avi1I%_5Tdxh9Vfo?B*P{|9u-? n)LFr03;#Q&;DknBwcViCkPUxNN4JeZ1OM(RYbuo~n1}p7EcLU! literal 0 HcmV?d00001 From 2032c8f7ae43d92916cf45aa75b9bfed518d9615 Mon Sep 17 00:00:00 2001 From: Ross Kennedy Date: Wed, 3 Apr 2024 12:35:50 +0100 Subject: [PATCH 42/46] add cluster centralisation caveat --- docs/topic_guides/evaluation/clusters/graph_metrics.md | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/docs/topic_guides/evaluation/clusters/graph_metrics.md b/docs/topic_guides/evaluation/clusters/graph_metrics.md index 9932ccacda..44dd43d518 100644 --- a/docs/topic_guides/evaluation/clusters/graph_metrics.md +++ b/docs/topic_guides/evaluation/clusters/graph_metrics.md @@ -112,14 +112,19 @@ A low density could indicate links being missed. A sample of low density cluster ### Cluster Centralisation +!!! info "Work in Progress" + + We are still working out where Cluster Centralisation can be best used in the context of record linkage. At this stage, we do not have clear recommendations or guidance on the best places to use it - so if you have any expertise in this area we would love to [hear from you](https://github.com/moj-analytical-services/splink/discussions)! + + We will update this guidance as and when we have clearer strategies in this space. + ##### Definition [Cluster centralisation](https://en.wikipedia.org/wiki/Centrality#Degree_centrality) is defined as the deviation from maximum [node degree](#node-degree) normalised with respect to the maximum possible value. In other words, cluster centralisation tells us about the concentration of edges in a cluster. Centralisation ranges from 0 to 1. ##### Example -[include picture] - +Coming Soon ##### Application in Data Linkage From d898a0aeb24daf44f74ce51e98e0d3e6dce524c5 Mon Sep 17 00:00:00 2001 From: zslade Date: Wed, 3 Apr 2024 13:48:23 +0100 Subject: [PATCH 43/46] add back useful density text --- docs/topic_guides/evaluation/clusters/graph_metrics.md | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/docs/topic_guides/evaluation/clusters/graph_metrics.md b/docs/topic_guides/evaluation/clusters/graph_metrics.md index 44dd43d518..2bf641fdde 100644 --- a/docs/topic_guides/evaluation/clusters/graph_metrics.md +++ b/docs/topic_guides/evaluation/clusters/graph_metrics.md @@ -107,7 +107,9 @@ The left cluster below has links between all nodes (giving a density of 1), wher When evaluating clusters, a high density (closer to 1) is generally considered good as it means there are many edges in support of the records in a cluster being linked. -A low density could indicate links being missed. A sample of low density clusters can be inspected in Splink's [Cluster Studio Dashboard](../../../charts/cluster_studio_dashboard.ipynb) via the option `sampling_method = "lowest_density_clusters_by_size"`, which performs stratified sampling across different cluster sizes. When inspecting a cluster, ask yourself the question: why aren't more links being formed between record nodes? +A low density could indicate links being missed. This could happen, for example, if blocking rules are too tight or the clustering threshold is too high. + +A sample of low density clusters can be inspected in Splink's [Cluster Studio Dashboard](../../../charts/cluster_studio_dashboard.ipynb) via the option `sampling_method = "lowest_density_clusters_by_size"`, which performs stratified sampling across different cluster sizes. When inspecting a cluster, ask yourself the question: why aren't more links being formed between record nodes? ### Cluster Centralisation From c85ca71a5dca0fac60646b8b79eb024d5b917f49 Mon Sep 17 00:00:00 2001 From: Ross Kennedy Date: Thu, 4 Apr 2024 08:48:02 +0100 Subject: [PATCH 44/46] re-add comparison libraries docs --- docs/comparison_level_composition.md | 30 +++++ docs/comparison_level_library.md | 193 +++++++++++++++++++++++++++ docs/comparison_library.md | 147 ++++++++++++++++++++ docs/comparison_template_library.md | 82 ++++++++++++ 4 files changed, 452 insertions(+) create mode 100644 docs/comparison_level_composition.md create mode 100644 docs/comparison_level_library.md create mode 100644 docs/comparison_library.md create mode 100644 docs/comparison_template_library.md diff --git a/docs/comparison_level_composition.md b/docs/comparison_level_composition.md new file mode 100644 index 0000000000..b0487df816 --- /dev/null +++ b/docs/comparison_level_composition.md @@ -0,0 +1,30 @@ +--- +tags: + - API + - comparisons +--- +# Documentation for `comparison_level_composition` functions + +`comparison_composition` allows the merging of existing comparison levels by a logical SQL clause - `OR`, `AND` or `NOT`. + +This extends the functionality of our base comparison levels by allowing users to "join" existing comparisons by various SQL clauses. + +For example, `or_(null_level("first_name"), null_level("surname"))` creates a check for nulls in *either* `first_name` or `surname`, rather than restricting the user to a single column. + +The Splink comparison level composition functions available for each SQL dialect are as given in this table: + +{% include-markdown "./includes/generated_files/comparison_composition_library_dialect_table.md" %} + + + +The detailed API for each of these are outlined below. + +## Library comparison composition APIs + +::: splink.comparison_level_composition + handler: python + selection: + members: + - and_ + - or_ + - not_ diff --git a/docs/comparison_level_library.md b/docs/comparison_level_library.md new file mode 100644 index 0000000000..e8941eadc2 --- /dev/null +++ b/docs/comparison_level_library.md @@ -0,0 +1,193 @@ +--- +tags: + - API + - comparisons + - Damerau-Levenshtein + - Levenshtein + - Jaro-Winkler + - Jaccard + - Date Difference + - Distance In KM + - Array Intersect + - Columns Reversed + - Percentage Difference +toc_depth: 2 +--- +# Documentation for `comparison_level_library` + +The `comparison_level_library` contains pre-made comparison levels available for use to +construct custom comparisons [as described in this topic guide](./topic_guides/comparisons/customising_comparisons.html#method-3-comparisonlevels). +However, not every comparison level is available for every [Splink-compatible SQL backend](./topic_guides/splink_fundamentals/backends/backends.md). + +The pre-made Splink comparison levels available for each SQL dialect are as given in this table: + +{% include-markdown "./includes/generated_files/comparison_level_library_dialect_table.md" %} + + + +The detailed API for each of these are outlined below. + +## Library comparison level APIs + +::: splink.comparison_level_library.NullLevelBase + handler: python + rendering: + show_root_heading: true + show_source: false + heading_level: 3 + +--- + +::: splink.comparison_level_library.ExactMatchLevelBase + handler: python + selection: + members: + - __init__ + rendering: + show_root_heading: true + show_source: false + heading_level: 3 + +--- + +::: splink.comparison_level_library.ElseLevelBase + handler: python + selection: + members: + - __init__ + rendering: + show_root_heading: true + show_source: false + heading_level: 3 + +--- + +::: splink.comparison_level_library.DistanceFunctionLevelBase + handler: python + selection: + members: + - __init__ + rendering: + show_root_heading: true + show_source: false + heading_level: 3 + +--- + +::: splink.comparison_level_library.LevenshteinLevelBase + handler: python + selection: + members: + - __init__ + rendering: + show_root_heading: true + show_source: false + heading_level: 3 + +--- + +::: splink.comparison_level_library.DamerauLevenshteinLevelBase + handler: python + selection: + members: + - __init__ + rendering: + show_root_heading: true + show_source: false + heading_level: 3 + +--- + +::: splink.comparison_level_library.JaroLevelBase + handler: python + selection: + members: + - __init__ + rendering: + show_root_heading: true + show_source: false + heading_level: 3 + +--- + +::: splink.comparison_level_library.JaroWinklerLevelBase + handler: python + selection: + members: + - __init__ + rendering: + show_root_heading: true + show_source: false + heading_level: 3 + +--- + +::: splink.comparison_level_library.JaccardLevelBase + handler: python + selection: + members: + - __init__ + rendering: + show_root_heading: true + show_source: false + heading_level: 3 + +--- + +::: splink.comparison_level_library.ColumnsReversedLevelBase + handler: python + selection: + members: + - __init__ + rendering: + show_root_heading: true + show_source: false + heading_level: 3 + +--- + +::: splink.comparison_level_library.DistanceInKMLevelBase + handler: python + selection: + members: + - __init__ + rendering: + show_root_heading: true + show_source: false + heading_level: 3 + +--- + +::: splink.comparison_level_library.PercentageDifferenceLevelBase + handler: python + selection: + members: + - __init__ + rendering: + show_root_heading: true + show_source: false + heading_level: 3 + +--- + +::: splink.comparison_level_library.ArrayIntersectLevelBase + handler: python + selection: + members: + - __init__ + rendering: + show_root_heading: true + show_source: false + heading_level: 3 + +--- + +::: splink.comparison_level_library.DatediffLevelBase + handler: python + selection: + members: + - __init__ + rendering: + show_root_heading: true + show_source: false + heading_level: 3 diff --git a/docs/comparison_library.md b/docs/comparison_library.md new file mode 100644 index 0000000000..459faec833 --- /dev/null +++ b/docs/comparison_library.md @@ -0,0 +1,147 @@ +--- +tags: + - API + - comparisons + - Levenshtein + - Jaro-Winkler + - Jaccard + - Distance In KM + - Date Difference + - Array Intersect +toc_depth: 2 +--- +# Documentation for `comparison_library` + +The `comparison_library` contains pre-made comparisons available for use directly [as described in this topic guide](./topic_guides/comparisons/customising_comparisons.html#method-1-using-the-comparisonlibrary). +However, not every comparison is available for every [Splink-compatible SQL backend](./topic_guides/splink_fundamentals/backends/backends.html). + +The pre-made Splink comparisons available for each SQL dialect are as given in this table: + +{% include-markdown "./includes/generated_files/comparison_library_dialect_table.md" %} + + + + + + +The detailed API for each of these are outlined below. + +## Library comparison APIs + +::: splink.comparison_library.ExactMatchBase + handler: python + selection: + members: + - __init__ + rendering: + show_root_heading: true + show_source: false + heading_level: 3 + +--- + +::: splink.comparison_library.DistanceFunctionAtThresholdsBase + handler: python + selection: + members: + - __init__ + rendering: + show_root_heading: true + show_source: false + heading_level: 3 + +--- + +::: splink.comparison_library.LevenshteinAtThresholdsBase + handler: python + selection: + members: + - __init__ + rendering: + show_root_heading: true + show_source: false + heading_level: 3 + +--- + +::: splink.comparison_library.DamerauLevenshteinAtThresholdsBase + handler: python + selection: + members: + - __init__ + rendering: + show_root_heading: true + show_source: false + heading_level: 3 + +--- + +::: splink.comparison_library.JaccardAtThresholdsBase + handler: python + selection: + members: + - __init__ + rendering: + show_root_heading: true + show_source: false + heading_level: 3 + +--- + +::: splink.comparison_library.JaroAtThresholdsBase + handler: python + selection: + members: + - __init__ + rendering: + show_root_heading: true + show_source: false + heading_level: 3 + +--- + +::: splink.comparison_library.JaroWinklerAtThresholdsBase + handler: python + selection: + members: + - __init__ + rendering: + show_root_heading: true + show_source: false + heading_level: 3 + +--- + +::: splink.comparison_library.ArrayIntersectAtSizesBase + handler: python + selection: + members: + - __init__ + rendering: + show_root_heading: true + show_source: false + heading_level: 3 + +--- + +::: splink.comparison_library.DatediffAtThresholdsBase + handler: python + selection: + members: + - __init__ + rendering: + show_root_heading: true + show_source: false + heading_level: 3 + +--- + +::: splink.comparison_library.DistanceInKMAtThresholdsBase + handler: python + selection: + members: + - __init__ + rendering: + show_root_heading: true + show_source: false + heading_level: 3 diff --git a/docs/comparison_template_library.md b/docs/comparison_template_library.md new file mode 100644 index 0000000000..cfd5dc8959 --- /dev/null +++ b/docs/comparison_template_library.md @@ -0,0 +1,82 @@ +--- +tags: + - API + - comparisons + - Date Comparison +toc_depth: 2 +--- + +# Documentation for `comparison_template_library` + +The `comparison_template_library` contains pre-made comparisons with pre-defined parameters available for use directly [as described in this topic guide](./topic_guides/comparisons/customising_comparisons.html#method-2-using-the-comparisontemplatelibrary). +However, not every comparison is available for every [Splink-compatible SQL backend](./topic_guides/splink_fundamentals/backends/backends.html). More detail on creating comparisons for specific data types is also [included in the topic guide.](./topic_guides/comparisons/customising_comparisons.html#creating-comparisons-for-specific-data-types) + +The pre-made Splink comparison templates available for each SQL dialect are as given in this table: + +{% include-markdown "./includes/generated_files/comparison_template_library_dialect_table.md" %} + + + +The detailed API for each of these are outlined below. + +## Library comparison APIs + +::: splink.comparison_template_library.DateComparisonBase + handler: python + selection: + members: + - __init__ + rendering: + show_root_heading: true + show_source: false + heading_level: 3 + +--- + +::: splink.comparison_template_library.NameComparisonBase + handler: python + selection: + members: + - __init__ + rendering: + show_root_heading: true + show_source: false + heading_level: 3 + +--- + +::: splink.comparison_template_library.ForenameSurnameComparisonBase + handler: python + selection: + members: + - __init__ + rendering: + show_root_heading: true + show_source: false + heading_level: 3 + +--- + +::: splink.comparison_template_library.PostcodeComparisonBase + handler: python + selection: + members: + - __init__ + rendering: + show_root_heading: true + show_source: false + heading_level: 3 + +--- + +::: splink.comparison_template_library.EmailComparisonBase + handler: python + selection: + members: + - __init__ + rendering: + show_root_heading: true + show_source: false + heading_level: 3 + +--- \ No newline at end of file From 100f2d2f22c9251d0c54599bf0e60ee61baa1b19 Mon Sep 17 00:00:00 2001 From: Ross Kennedy Date: Thu, 4 Apr 2024 08:52:39 +0100 Subject: [PATCH 45/46] add missing md doc --- docs/datasets.md | 90 ++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 90 insertions(+) create mode 100644 docs/datasets.md diff --git a/docs/datasets.md b/docs/datasets.md new file mode 100644 index 0000000000..a1a9a17f45 --- /dev/null +++ b/docs/datasets.md @@ -0,0 +1,90 @@ +--- +tags: + - API + - Datasets + - Examples +--- + +# In-built datasets + +Splink has some datasets available for use to help you get up and running, test ideas, or explore Splink features. +To use, simply import `splink_datasets`: +```py +from splink.datasets import splink_datasets + +df = splink_datasets.fake_1000 +``` +which you can then use to set up a linker: +```py +from splink.datasets import splink_datasets +from splink.duckdb.linker import DuckDBLinker +import splink.duckdb.comparison_library as cl + +df = splink_datasets.fake_1000 +linker = DuckDBLinker( + df, + { + "link_type": "dedupe_only", + "comparisons": [cl.exact_match("first_name"), cl.exact_match("surname")], + }, +) +``` + +??? tip "Troubleshooting" + + If you get a `SSLCertVerificationError` when trying to use the inbuilt datasets, this can be fixed with the `ssl` package by running: + + `ssl._create_default_https_context = ssl._create_unverified_context`. + +## `splink_datasets` + +Each attribute of `splink_datasets` is a dataset available for use, which exists as a pandas `DataFrame`. +These datasets are not packaged directly with Splink, but instead are downloaded only when they are requested. +Once requested they are cached for future use. +The cache can be cleared using [`splink_dataset_utils`](#splink_dataset_utils-object), +which also contains information on available datasets, and which have already been cached. + +### Available datasets + +The datasets available are listed below: + +{% include-markdown "./includes/generated_files/datasets_table.md" %} + + +## `splink_dataset_labels` + +Some of the `splink_datasets` have corresponding clerical labels to help assess model performance. These are requested through the `splink_dataset_labels` module. + +### Available datasets + +The datasets available are listed below: + +{% include-markdown "./includes/generated_files/dataset_labels_table.md" %} + + +## `splink_dataset_utils` API + +In addition to `splink_datasets`, you can also import `splink_dataset_utils`, +which has a few functions to help managing `splink_datasets`. +This can be useful if you have limited internet connection and want to see what is already cached, +or if you need to clear cache items (e.g. if datasets were to be updated, or if space is an issue). + +For example: +```py +from splink.datasets import splink_dataset_utils + +splink_dataset_utils.show_downloaded_data() +splink_dataset_utils.clear_cache(['fake_1000']) +``` + +::: splink.datasets._SplinkDataUtils + handler: python + options: + members: + - list_downloaded_datasets + - list_all_datasets + - show_downloaded_data + - clear_downloaded_data + show_root_heading: false + show_source: false + heading_level: 3 From 63496b3cde7b10423bb1ba978a16ea1c44db50f4 Mon Sep 17 00:00:00 2001 From: Ross Kennedy Date: Thu, 4 Apr 2024 08:59:00 +0100 Subject: [PATCH 46/46] fix clusters doc link --- docs/topic_guides/evaluation/edge_overview.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/topic_guides/evaluation/edge_overview.md b/docs/topic_guides/evaluation/edge_overview.md index 24bab33432..83531091d8 100644 --- a/docs/topic_guides/evaluation/edge_overview.md +++ b/docs/topic_guides/evaluation/edge_overview.md @@ -51,4 +51,4 @@ Evaluating the edges (links) of a linkage model depends on your use case. Defini Your desired metric should help give an initial estimation for a linkage threshold, then you can use spot checking to help settle on a final threshold. -In general, the links between pairs of records are not the final output of linkage pipeline. Most use-cases use these links to group records together into clusters. In this instance, evaluating the links themselves is not sufficient, you have to [evaluate the resulting clusters as well](./clusters.md). \ No newline at end of file +In general, the links between pairs of records are not the final output of linkage pipeline. Most use-cases use these links to group records together into clusters. In this instance, evaluating the links themselves is not sufficient, you have to [evaluate the resulting clusters as well](./clusters/overview.md). \ No newline at end of file