-
Notifications
You must be signed in to change notification settings - Fork 119
Topic 11: Knowledge Fusion, Cleaning, Evaluation and Truth Discovery
Sherry Lin edited this page Oct 9, 2020
·
1 revision
Surveys
- A Survey on Truth Discovery [Paper] π
- Truth Discovery Algorithms: An Experimental Evaluation [Paper]
Data Fusion
(I think this is a relatively old topic, people are moving to knowledge fusion) (To be classified... single truth/multi-truth, copy detection, source reliability...)
- Truth Discovery with Multiple Conflicting Information Providers on the Web (TKDE 2008), the most classical one. π
- Integrating conflicting data: the role of source dependence (VLDB 2009), the most classical one. π
- Fusing data with correlations (SIGMOD 2014) π
- Truth discovery and copying detection in a dynamic world (VLDB 2009) π
- Global detection of complex copying relationships between sources (VLDB 2010) π
- Online data fusion (VLDB 2011) π
- Compact explanation of data fusion decisions (WWW 2013)
- Truth finding on the Deep Web: Is the problem solved? (VLDB 2013) π
- A Confidence-Aware Approach for Truth Discovery on Long-Tail Data (VLDB 2014) π
- Dynamic Truth Discovery on Numerical Data (ICDM 2018) π
- Scaling up Copy Detection (ICDE 2015) π
Knowledge Fusion, Cleaning and Evaluation
- Knowledge Vault: A Web-Scale Approach to Probabilistic Knowledge Fusion (KDD 2014) [Paper] π
- From data fusion to knowledge fusion (VLDB 2014) [Paper] [Slides] π
- Data X-Ray: A diagnostic tool for data errors (SIGMOD 2015) [Paper] [Slides] [Demo] π
- Knowledge-based trust: estimating the trustworthiness of web sources [Paper] [Slides]π
- Knowledge verification for long tail verticals (VLDB 2017) π
- Efficient knowledge graph accuracy evaluation (VLDB 2019) [Link] π
- MIDAS: Finding the Right Web Sources to Fill Knowledge Gaps (ICDE 2019) π
- Distilling relations using knowledge bases (VLDBJ 2018) π
Given a relational table, we study the problem of detecting and repairing erroneous data, as well as marking correct data, using well curated knowledge bases (KBs). We propose detective rules (DRs), a new type of data cleaning rules that can make actionable decisions on relational data, by building connections between a relation and a KB.
- HoloDetect: Few-Shot Learning for Error Detection [PDF], the same team of the HoloClean (SIGMOD 2019) π
- Unsupervised String Transformation Learning for Entity Consolidation [PDF] (ICDE 2019) π
- Normalization of Duplicate Records from Multiple Sources (TKDE 2019) π
- Selecting Data to Clean for Fact Checking: Minimizing Uncertainty vs. Maximizing Surprise (VLDB 2020) π
- Learning Over Dirty Data Without Cleaning [Paper] (SIGMOD 2020) π
- CoClean: Collaborative Data Cleaning [Paper] (SIGMOD 2020, demo) π
- T-REx: Table Repair Explanations [Paper] (SIGMOD 2020, demo) π Datasets
- Fusion Datasets [Link]
Notes
- Data Fusion β Resolving Data Conflicts for Integration [Tutorial Proposal]
- Data Integration and Machine Learning: A Natural Synergy