-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathCITATION.cff
66 lines (66 loc) · 2.47 KB
/
CITATION.cff
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
cff-version: 1.2.0
title: Genomator Paper Software Supplementary
message: >-
If you use this software, please cite it using the
metadata from this file.
type: software
authors:
- given-names: Mark
name-particle: A
family-names: Burgess
email: [email protected]
affiliation: CSIRO
- given-names: Roc
family-names: Reguant
affiliation: CSIRO
- given-names: Anubhav
family-names: Kaphle
affiliation: CSIRO
- given-names: Mitchel
family-names: O'Brien
affiliation: CSIRO
- given-names: Letitia
name-particle: M.F
family-names: Sng
affiliation: CSIRO
- given-names: Yatish
family-names: Jain
affiliation: CSIRO
- given-names: Denis
family-names: Bauer
affiliation: CSIRO
url: 'https://bioinformatics.csiro.au/genomator/'
abstract: >-
Machine-generated data is a valuable resource for training
Artificial Intelligence algorithms, evaluating rare
workflows, and sharing data under stricter data
legislations. The challenge is to generate data that is
accurate and private. Current statistical and deep
learning methods struggle with large data volumes, are
prone to hallucinating scenarios incompatible with
reality, and seldom quantify privacy meaningfully. Here we
introduce Genomator, a logic solving approach (SAT
solving), which efficiently produces private and realistic
representations of the original data. We demonstrate the
method on genomic data, which arguably is the most complex
and private information. Synthetic genomes hold great
potential for balancing underrepresented populations in
medical research and advancing global data exchange. We
benchmark Genomator against state-of-the-art methodologies
(Markov generation, Restricted Boltzmann Machine,
Generative Adversarial Network and Convolutional
Restricted Boltzmann Machines), demonstrating 89-97%
accuracy improvement and 90-99% higher privacy. Genomator
is also 1000 times more efficient, making it the only
tested method that scales to whole genomes. We show the
universal trade-off between privacy and accuracy, and use
Genomator’s tuning capability to cater to all applications
along the spectrum, from provable private representations
of sensitive cohorts, to datasets with indistinguishable
pharmacogenomic profiles. Demonstrating the
production-scale generation of tuneable synthetic data can
increase trust and pave the way into the clinic.
keywords:
- Synthetic Data
- Privacy Analysis
- Genome Data