diff --git a/README.md b/README.md index 4670ef85..704c33cf 100644 --- a/README.md +++ b/README.md @@ -1,7 +1,7 @@ # Validation pipeline for integrative and hybrid models ## Project objective -- `1.` Develop a validation pipeline (including data validation, model validation, fit of input to model, fit of data not used for modeling and uncertainty of the model) for assessing IHM structures deposited to [PDB-Dev](https://pdb-dev.wwpdb.org/index.html) +- `1.` Develop a validation pipeline (including data validation, model validation, fit of input to model, fit of data not used for modeling and uncertainty of the model) for assessing IHM structures deposited to [PDB-IHM](https://pdb-ihm.org/index.html) ## List of files and directories: - `docs` documentation for all classes and functions (sphinx) diff --git a/about_validation.html b/about_validation.html deleted file mode 100644 index 8acc09aa..00000000 --- a/about_validation.html +++ /dev/null @@ -1,216 +0,0 @@ - - - - - - - -
- - -This validation report was created based on guidelines and recommendations from IHM TaskForce (Berman et al. 2019). The first version of the PDB-Dev validation report consists of four categories, (i) model composition, (ii) data quality assessments, (iii) model quality assessments, and (iv) fit to data used to build the model. A fifth category, fit to data used to validate the model is under development. - - Data quality assessments and fit to data used for modeling categories are dependent on the different types of experimental data used in integrative modeling. We are breaking down this section based on experimental data type and addressing each method separately. The first version of the validation report is focused on validating models built using Small Angle Scattering (SAS) data and is based on the model and data validation guidelines published by the wwPDB SAS validation task force (Trewhella et al. 2017). -
- -
Validation of models built using Chemical Crosslinking Mass Spectrometry (CX-MS), Förster Resonance Energy Transfer (FRET) and 3D Electron Microscopy (3DEM) data are under development and will be included in subsequent versions of the validation report.
- -- -
Berman, Helen M., Paul D. Adams, Alexandre A. Bonvin, Stephen K. Burley, Bridget Carragher, Wah Chiu, Frank DiMaio, et al. 2019. “Federating Structural Models and Data: Outcomes from A Workshop on Archiving Integrative Structures.” Structure 27 (12): 1745–59.
Trewhella, Jill, Anthony P. Duff, Dominique Durand, Frank Gabel, J. Mitchell Guss, Wayne A. Hendrickson, Greg L. Hura, et al. 2017. “2017 Publication Guidelines for Structural Modelling of Small-Angle Scattering Data from Biomolecules in Solution: An Update.” Acta Crystallographica. Section D, Structural Biology 73 (Pt 9): 710–28.
\n')
for line in output_list:
@@ -410,7 +657,7 @@ def find_end_line(ind_beg, ind_end, match_word):
bond_outliers = line[bond_index_beg:bond_index_end]
return bond_outliers, total_bonds
- def process_bonds_list(self, line: list, chains: list) -> (dict, int):
+ def process_bonds_list(self, line: list, chains: list, chains_map: dict) -> (dict, int):
""" process molprobity files to extract relevant information """
bond_outliers, total_bonds = self.process_bonds(line)
@@ -423,28 +670,58 @@ def process_bonds_list(self, line: list, chains: list) -> (dict, int):
for ind, outlier in enumerate(bond_outliers):
sub_line = outlier.split()
+ found_residue = False
+
if sub_line[0] in chains or len(sub_line[0]) == 1:
- bonddict['Chain'].append(sub_line[0])
- bonddict['Residue ID'].append(sub_line[1])
+ val1 = sub_line[0]
+ val2 = sub_line[1]
+ try:
+ chid, resid = chains_map[(val1, val2)]
+ except KeyError:
+ logging.warning(f'Skipping line: {outlier}')
+ continue
+
+ bonddict['Chain'].append(chid)
+ bonddict['Residue ID'].append(resid)
bonddict['Residue type'].append(sub_line[2])
+ found_residue = True
+
else:
temp = sub_line[0]
if temp[:1] in chains:
val1 = temp[:1]
val2 = temp[1:]
- bonddict['Chain'].append(val1)
- bonddict['Residue ID'].append(val2)
+ try:
+ chid, resid = chains_map[(val1, val2)]
+ except KeyError:
+ logging.warning(f'Skipping line: {outlier}')
+ continue
+ bonddict['Chain'].append(chid)
+ bonddict['Residue ID'].append(resid)
bonddict['Residue type'].append(sub_line[1])
+ found_residue = True
+
elif temp[:2] in chains:
val1 = temp[:2]
val2 = temp[2:]
- bonddict['Chain'].append(val1)
- bonddict['Residue ID'].append(val2)
+ try:
+ chid, resid = chains_map[(val1, val2)]
+ except KeyError:
+ logging.warning(f'Skipping line: {outlier}')
+ continue
+ bonddict['Chain'].append(chid)
+ bonddict['Residue ID'].append(resid)
bonddict['Residue type'].append(sub_line[1])
+ found_residue = True
+
+ if not found_residue:
+ logging.warning(f'Skipping line: {outlier}')
+ continue
+
bonddict['key'].append('-'.join(sub_line[:4]))
bonddict['Number'].append(ind+1)
@@ -480,13 +757,26 @@ def process_bonds_list(self, line: list, chains: list) -> (dict, int):
else:
return "Your molprobity processing is incorrect, please check the code", 0
+ @staticmethod
+ def get_model_id_str(line: str) -> str:
+ """ extract MODEL X substring """
+ m = re.search('MODEL\s*(?P
+ This validation report was created based on guidelines and recommendations from IHM TaskForce (Berman et al. 2019). The first version of the PDB-IHM validation report consists of four categories, (i) model composition, (ii) data quality assessments, (iii) model quality assessments, and (iv) fit to data used to build the model. A fifth category, fit to data used to validate the model is under development.
+
+ Data quality assessments and fit to data used for modeling categories are dependent on the different types of experimental data used in integrative modeling. We are breaking down this section based on experimental data type and addressing each method separately. The first version of the validation report is focused on validating models built using Small Angle Scattering (SAS) data and is based on the model and data validation guidelines published by the wwPDB SAS validation task force (Trewhella et al. 2017).
+
+ Validation of models built using Chemical Crosslinking Mass Spectrometry (CX-MS), Förster Resonance Energy Transfer (FRET) and 3D Electron Microscopy (3DEM) data are under development and will be included in subsequent versions of the validation report.
+
+ References
+ Berman, Helen M., Paul D. Adams, Alexandre A. Bonvin, Stephen K. Burley, Bridget Carragher, Wah Chiu, Frank DiMaio, et al. 2019. “Federating Structural Models and Data: Outcomes from A Workshop on Archiving Integrative Structures.” Structure 27 (12): 1745–59.
+ Trewhella, Jill, Anthony P. Duff, Dominique Durand, Frank Gabel, J. Mitchell Guss, Wayne A. Hendrickson, Greg L. Hura, et al. 2017. “2017 Publication Guidelines for Structural Modelling of Small-Angle Scattering Data from Biomolecules in Solution: An Update.” Acta Crystallographica. Section D, Structural Biology 73 (Pt 9): 710–28.
+
+ SAS data used in this integrative model could not be validated as the sascif file is currently unavailable.
+
+ SAS data used in this integrative model was obtained from {{sasdb_code_html|length|int}} deposited SASBDB entry (entries).
+
+ Scattering profile for {{sasdb_code_html[i]}}: data from solutions of biological macromolecules are presented as both log I(q) vs q and log I(q) vs log (q) based on SAS validation task force (SASvtf) recommendations. I(q) is the intensity (in arbitrary units) and q is the modulus of the scattering vector.
+
+ Molecular weight (MW) estimates from experiments and analysis: true molecular weight can be compared to the Porod estimate from scattering profiles.
+
+ Volume estimates from experiments and analysis: estimated volume can be compared to Porod volume obtained from scattering profiles.
+
+ Flexibility analysis for {{sasdb_code_html[i]}}: In a Porod-Debye plot, a clear plateau is observed for globular (partial or fully folded) domains, whereas, fully unfolded domains are devoid of any discernable plateau. For details, refer to Figure 5 in Rambo and Tainer, 2011. In a Kratky plot, a parabolic shape is observed for globular (partial or fully folded) domains and a hyperbolic shape is observed for fully unfolded domains.
+
+ P(r) analysis: P(r) represents the distribution of distances between all pairs of atoms within the particle weighted by the respective electron densities. P(r) is the Fourier transform of I(s) (and vice versa). Rg can be estimated from integrating the P(r) function. Agreement between the P(r) and Guinier-determined Rg (table below) is a good measure of the self-consistency of the SAS profile. Rg is a measure for the overall size of a macromolecule; e.g. a protein with a smaller Rg is more compact than a protein with a larger Rg, provided both have the same molecular weight (MW). The point where P(r) is decaying to zero is called Dmax and represents the maximum size of the particle.
+
+ P(r) for {{sasdb_code_html[i]}}: The value of P(r) should be zero beyond r=Dmax.
+
+ Guinier analysis: agreement between the P(r) and Guinier-determined Rg (table below) is a good measure of the self-consistency of the SAS profile. Molecular weight estimates can also be compared to Porod and sample molecular weights for consistency.
+
+ The linearity of the Guinier plot is a sensitive indicator of the
+ quality of the experimental SAS data; a linear Guinier plot is a
+ necessary but not sufficient demonstration that a solution
+ contains monodisperse particles of the same size.
+ Deviations from linearity usually point to strong interference effects,
+ polydispersity of the samples or improper background subtraction.
+ Residual value plot and coefficient of determination (R2)
+ are measures to assess linear fit to the data. A perfect fit has
+ an R2 value of 1. Residual values should be equally
+ and randomly spaced around the horizontal axis.
+
+
+ SAS data used in this integrative model could not be validated as the sascif file is currently unavailable.
+
+ ATSAS datcmp was used for hypothesis testing. All data sets are similar (i.e. the fit and the data collected) is the null hypothesis. p-value is a measure of evidence against the null hypothesis, smaller the value, the stronger the evidence that you should reject the null hypothesis.
+
+ Model(s) and/or fit for this entry has not been deposited.
+
+ The experimental scattering curve (in blue) can be compared with the theoretical curve calculated from a model (in red). Residual value plot is a measure to assess fit to the data. Residual values should be equally and randomly spaced around the horizontal axis.
+ Restraint types in this entry are not supported at the moment. There are {{ cx_num_of_restraints }} crosslinking restraints
+ combined in {{ cx_num_of_restraint_groups }} restraint groups.
+ Restraint types are summarized in the table below.
+ Only atomic or per-residue restraints are supported at the moment.
+ Distograms of individual restraints Distograms for this entry are unavailable the moment. Restraints with identical thresholds are grouped into one plot. Only best distance per restraint per model group/ensemble is plotted. Intra- and intermolecular self-link restraints are also grouped into one plot. Satisfaction rates for this entry are unavailable at the moment.
+
+ Satisfaction of restraints is calculated on a restraint
+ group level. Satisfaction of a restraint group depends on
+ satisfaction of individual restraints in the group and the
+ conditionality (all/any). Restraint group is considered
+ satisfied, if condition was met in at least one model of
+ the model group/ensemble. Only deposited models are used
+ for validation right now.
+
+ Per-model satisfaction rates in model groups/ensembles Every point represents one model in a model group/ensemble. Where possible, boxplots with quartile marks are also plotted. The following software was used in the production of this report: This is a PDB-IHM IM Structure Validation Report for a publicly released PDB-IHM entry.
+ We welcome your comments at helpdesk@pdb-ihm.org
+
+ A user guide is available at https://pdb-ihm.org/validation_help.html with specific help available everywhere you see the ? symbol.
+
+ List of references used to build this report is available here.
+
+ This validation report contains model quality assessments for all structures, data quality assessment for SAS datasets and fit to model assessments for SAS datasets. Data quality and fit to model assessments for other datasets and model uncertainty are under development. Number of plots is limited to {{MAXPLOTS}}.
+
+ MolProbity assessments and/or excluded volume assessments can not be evaluated for this current model.
+ This entry consists of {{ num_ensembles|int }} distinct ensemble(s).
+ This entry consists of {{ number_of_molecules }} unique models, with {{ num_chains|int }} subunits in each model. A total of {{ number_of_datasets|int }} datasets or restraints were used to build this entry. Each model is represented by {{ Rigid_Body|int }} rigid bodies and {{ Flexible_Unit|int }} flexible or non-rigid units.
+
+ {% if number_of_molecules|int < 2 %}
+ There is {{ number_of_molecules|int }} unique type of models in this entry.
+ {% else %}
+ There are {{ number_of_molecules|int }} unique types of models in this entry.
+ {% endif %}
+
+ {% if model_names|length < 2 %}
+ This model is titled {{ model_names[0] }}.
+ {% else %}
+ These models are titled {{ model_names|join(', ') }} respectively.
+ {% endif %}
+
+ {% if number_of_datasets|int < 2 %}
+ There is {{ number_of_datasets|int }} unique dataset used to build the models in this entry.
+ {% else %}
+ There are {{ number_of_datasets|int }} unique datasets used to build the models in this entry.
+ {% endif %}
+
+ This entry has only one representation and includes {{ Rigid_Body }} rigid bodies and {{ Flexible_Unit }} flexible units.
+
+ This entry is a result of {{ Protocols_number|int }} distinct protocol(s).
+
+ SAS data used in this integrative model was obtained from {{ sasdb_code_html|length }} deposited SASBDB entry (entries).
+
+ Scattering profile for {{ sasdb_code_html[i] }}:
+ data from solutions of biological macromolecules are presented as both log I(q) vs q and log I(q) vs log (q) based on
+ SAS validation task force (SASvtf) recommendations.
+ I(q) is the intensity (in arbitrary units) and q is the modulus of the scattering vector.
+
+ Molecular weight (MW) estimates from experiments and analysis:
+ true molecular weight can be compared to the Porod estimate from scattering profiles.
+
+ Volume estimates from experiments and analysis: estimated volume can be compared to Porod volume obtained from scattering profiles.
+
+ Flexibility analysis for {{ sasdb_code_html[i] }}: In a Porod-Debye plot, a clear plateau is observed for globular (partial or fully folded) domains, whereas, fully unfolded domains are devoid of any discernable plateau. For details, refer to Figure 5 in Rambo and Tainer, 2011. In a Kratky plot, a parabolic shape is observed for globular (partial or fully folded) domains and a hyperbolic shape is observed for fully unfolded domains.
+
+ P(r) analysis: P(r) represents the distribution of distances between all pairs of atoms within the particle weighted by the respective electron densities. P(r) is the Fourier transform of I(s) (and vice versa). Rg can be estimated from integrating the P(r) function. Agreement between the P(r) and Guinier-determined Rg (table below) is a good measure of the self-consistency of the SAS profile. Rg is a measure for the overall size of a macromolecule; e.g. a protein with a smaller Rg is more compact than a protein with a larger Rg, provided both have the same molecular weight (MW). The point where P(r) is decaying to zero is called Dmax and represents the maximum size of the particle.
+
+ P(r) for {{ sasdb_code_html[i] }}: The value of P(r) should be zero beyond r=Dmax.
+
+ Guinier analysis: agreement between the P(r) and Guinier-determined Rg (table below) is a good measure of the self-consistency of the SAS profile. Molecular weight estimates can also be compared to Porod and sample molecular weights for consistency.
+
+ Guinier analysis for {{ sasdb_code_html[i] }}: the linearity of the Guinier plot is a sensitive indicator of the quality of the experimental SAS data; a linear Guinier plot is a necessary but not sufficient demonstration that a solution contains monodisperse particles of the same size. Deviations from linearity usually point to strong interference effects, polydispersity of the samples or improper background subtraction. Residual value plot and coefficient of determination (R2) are measures to assess linear fit to the data. A perfect fit has an R2 value of 1. Residual values should be equally and randomly spaced around the horizontal axis.
+
+ SAS data used in this integrative model could not be validated as the sascif file is currently unavailable.
+
+ Validation for this section is under development.
+ For models with atomic structures, molprobity analysis is performed. For models with coarse-grained or multi-scale structures, excluded volume analysis is performed.
+ NOTE: Based on existing standards, there are no model quality assessments for this entry.
+
+ The following all-atom clashscore is based on a MolProbity analysis. All-atom clashscore is defined as the number of clashes found per 1000 atoms (including hydrogen atoms). The table below contains clashscores for all the models in this entry.
+
+ In the following table, Ramachandran outliers are listed. The Analysed column shows the number of residues for which the backbone conformation was analysed.
+
+ Model and fits displayed below were obtained from SASBDB. χ² values are a measure of fit of the model to data. A perfect fit has a χ² value of 1.0.
+ ATSAS datcmp was used for hypothesis testing. All data sets are similar (i.e. the fit and the data collected) is the null hypothesis. p-value is a measure of evidence against the null hypothesis, smaller the value, the stronger the evidence that you should reject the null hypothesis.
+
+
+ Model fit for {{sasdb_code_html[i]}} (fit/model number {{k}}): Residual value plot is a measure to assess fit to the data. Residual values should be equally and randomly spaced around the horizontal axis.
+
+ Model(s) and/or fit for this entry have not been deposited.
+
+ SAS data used in this integrative model could not be validated as the sascif file is currently unavailable.
+
+ Validation for this section is under development.
+
+ Validation for this section is under development.
+ Acknowledgements Development of integrative model validation metrics, implementation of a model validation pipeline, and creation of a validation report for integrative structures, are funded by NSF ABI awards (DBI-1756248, DBI-2112966, DBI-2112967, DBI-2112968, and DBI-1756250). The PDB-IHM team and members of Sali labcontributed model validation metrics and software packages. Implementation of validation methods for SAS data and SAS-based models are funded by RCSB PDB (grant number DBI-1832184). Dr. Stephen Burley, Dr. John Westbrook, and Dr. Jasmine Young from RCSB PDB, Dr. Jill Trewhella, Dr. Dina Schneidman, and members of the SASBDB repository are acknowledged for their advice and support in implementing SAS validation methods. Members of the wwPDB Integrative/Hybrid Methods Task Force provided recommendations and community support for the project. The following software were used in the production of this report. This validation report contains model quality assessments for all structures. For more detail for each assessment, use the dropdown menus at the top of this page. Acknowledgements Development of integrative model validation metrics, implementation of a model validation pipeline, and creation of a validation report for integrative structures, are funded by NSF ABI awards (DBI-1756248, DBI-2112966, DBI-2112967, DBI-2112968, and DBI-1756250). The PDB-Dev team and members of Sali lab contributed model validation metrics and software packages. Implementation of validation methods for SAS data and SAS-based models are funded by RCSB PDB (grant number DBI-1832184). Dr. Stephen Burley, Dr. John Westbrook, and Dr. Jasmine Young from RCSB PDB, Dr. Jill Trewhella, Dr. Helen M. Berman, Dr. Dina Schneidman, and members of SASBDB repository are acknowledged for their advice and support in implementing SAS validation methods. Members of the wwPDB Integrative/Hybrid Methods Task Force provided recommendations and community support for the project. The following software was used in the production of this report.
+ This validation report contains model quality assessments for all structures. For more detail for each assessment, use the dropdown menus at the top of this page. Number of plots is limited to {{MAXPLOTS}}.
+
+
+ Data quality assessment for SAS datasets and fit to model assessments for SAS datasets is also included in this assessment. Data quality and fit to model assessments for other datasets and model uncertainty are under development.
+
+
+ Data quality and fit to model assessments for other datasets and model uncertainty are under development.
+
+
+ MolProbity assessments and/or excluded volume assessments can not be evaluated for this current model.
+
+ Acknowledgements Development of integrative model validation metrics, implementation of a model validation pipeline, and creation of a validation report for integrative structures, are funded by NSF ABI awards (DBI-1756248, DBI-2112966, DBI-2112967, DBI-2112968, and DBI-1756250). The PDB-IHM team and members of Sali lab contributed model validation metrics and software packages. Implementation of validation methods for SAS data and SAS-based models are funded by RCSB PDB (grant number DBI-1832184). Dr. Stephen Burley, Dr. John Westbrook, and Dr. Jasmine Young from RCSB PDB, Dr. Jill Trewhella, Dr. Helen M. Berman, Dr. Dina Schneidman, and members of SASBDB repository are acknowledged for their advice and support in implementing SAS validation methods. Members of the wwPDB Integrative/Hybrid Methods Task Force provided recommendations and community support for the project. The following software was used in the production of this report. This entry has only one representation and includes {{Rigid_Body}} rigid bodies and {{Flexible_Unit}} flexible units. This entry consists of {{num_ensembles|int}} distinct ensemble. This entry consists of {{num_ensembles|int}} distinct ensembles.
+ This entry consists of {{number_of_molecules|int}} unique models, with {{num_chains|int}} subunits in each model. A total of {{number_of_datasets|int}} datasets or restraints was used to build this entry. Each model is represented by {{Rigid_Body|int}} rigid bodies and {{Flexible_Unit|int}} flexible or non-rigid units.
+ This entry has only one representation and includes {{Rigid_Body|int}} rigid bodies and {{Flexible_Unit|int}} flexible units. For models with atomic structures, molprobity analysis is performed. For models with coarse-grained or multi-scale structures, excluded volume analysis is performed. For models with atomic structures, molprobity analysis is performed. For models with coarse-grained or multi-scale structures, excluded volume analysis is performed.
+ The following all-atom clashscore is based on a MolProbity analysis. All-atom clashscore is defined as the number of clashes found per 1000 atoms (including hydrogen atoms). The table below contains clashscores for all the models in this entry.
+
+ The table below contains the detailed list of all clashes based on a MolProbity analysis. Bad clashes are >= 0.4 Angstrom.
+
+ In the following table, Ramachandran outliers are listed. The Analysed column shows the number of residues for which the backbone conformation was analysed.
+ Supported by
+ Integrative Modeling Validation Package: Version {{ version }}
+ The following software was used in the production of this report: This is a PDB-Dev IM Structure Validation Report for a publicly released PDB-Dev entry. We welcome your comments at pdb-dev@mail.wwpdb.org A user guide is available at https://pdb-dev-beta.wwpdb.org/validation_help.html with specific help available everywhere you see the ? symbol. List of references used to build this report is available here. This validation report contains model quality assessments for all structures, data quality assessment for SAS datasets and fit to model assessments for SAS datasets. Data quality and fit to model assessments for other datasets and model uncertainty are under development. Molprobity assessments and/or excluded volume assessments can not be evaluated for this current model. This entry has only one representation and includes {{ Rigid_Body }} rigid bodies and {{ Flexible_Unit }} flexible units.
- SAS data used in this integrative model was obtained from {{ sasdb_code_html|length }} deposited SASBDB entry (entries).
-
- Scattering profile for {{ sasdb_code_html[i] }}:
- data from solutions of biological macromolecules are presented as both log I(q) vs q and log I(q) vs log (q) based on
- SAS validation task force (SASvtf) recommendations.
- I(q) is the intensity (in arbitrary units) and q is the modulus of the scattering vector.
-
-
- Molecular weight (MW) estimates from experiments and analysis:
- true molecular weight can be compared to the Porod estimate from scattering profiles.
-
- Volume estimates from experiments and analysis: estimated volume can be compared to Porod volume obtained from scattering profiles.
-
-
-
- Flexibility analysis for {{ sasdb_code_html[i] }}: In a Porod-Debye plot, a clear plateau is observed for globular (partial or fully folded) domains, whereas, fully unfolded domains are devoid of any discernable plateau. For details, refer to Figure 5 in Rambo and Tainer, 2011. In a Kratky plot, a parabolic shape is observed for globular (partial or fully folded) domains and a hyperbolic shape is observed for fully unfolded domains.
-
- P(r) analysis: P(r) represents the distribution of distances between all pairs of atoms within the particle weighted by the respective electron densities. P(r) is the Fourier transform of I(s) (and vice versa). Rg can be estimated from integrating the P(r) function. Agreement between the P(r) and Guinier-determined Rg (table below) is a good measure of the self-consistency of the SAS profile. Rg is a measure for the overall size of a macromolecule; e.g. a protein with a smaller Rg is more compact than a protein with a larger Rg, provided both have the same molecular weight (MW). The point where P(r) is decaying to zero is called Dmax and represents the maximum size of the particle.
-
- P(r) for {{ sasdb_code_html[i] }}: The value of P(r) should be zero beyond r=Dmax.
-
- Residuals and error weighted residuals for P(r) analysis for {{ sasdb_code_html[i] }}: Residual value plot is a measure to assess fit to the data. Residual values should be equally and randomly spaced around the horizontal axis.
-
- Guinier analysis: agreement between the P(r) and Guinier-determined Rg (table below) is a good measure of the self-consistency of the SAS profile. Molecular weight estimates can also be compared to Porod and sample molecular weights for consistency.
-
- Guinier analysis for {{ sasdb_code_html[i] }}: the linearity of the Guinier plot is a sensitive indicator of the quality of the experimental SAS data; a linear Guinier plot is a necessary but not sufficient demonstration that a solution contains monodisperse particles of the same size. Deviations from linearity usually point to strong interference effects, polydispersity of the samples or improper background subtraction. Residual value plot and coefficient of determination (R2) are measures to assess linear fit to the data. A perfect fit has an R2 value of 1. Residual values should be equally and randomly spaced around the horizontal axis.
-
- SAS data used in this integrative model could not be validated as the sascif file is currently unavailable.
- For models with atomic structures, molprobity analysis is performed. For models with coarse-grained or multi-scale structures, excluded volume analysis is performed. Validation for this section is under development. Acknowledgements Development of integrative model validation metrics, implementation of a model validation pipeline, and creation of a validation report for integrative structures, are funded by NSF ABI awards (DBI-1756248, DBI-2112966, DBI-2112967, DBI-2112968, and DBI-1756250). The PDB-Dev team and members of Sali lab contributed model validation metrics and software packages. Implementation of validation methods for SAS data and SAS-based models are funded by RCSB PDB (grant number DBI-1832184). Dr. Stephen Burley, Dr. John Westbrook, and Dr. Jasmine Young from RCSB PDB, Dr. Jill Trewhella, Dr. Dina Schneidman, and members of the SASBDB repository are acknowledged for their advice and support in implementing SAS validation methods. Members of the wwPDB Integrative/Hybrid Methods Task Force provided recommendations and community support for the project.
+
+ 1. Understanding the PDB-IHM Validation Report
+
+ This validation report was created based on the guidelines and recommendations from IHM TaskForce (Berman et al. 2019). The first version of the PDB-IHM validation report consists of four categories as follows:
+
+ 1.1 Model composition: This section outlines model details and includes information on ensembles deposited, chains and residues of domains, model representation, software, protocol, and methods used. All deposited structures have this section.
+
+ 1.2. Data quality assessment: Data quality assessments are only available for Small Angle Scattering datasets (SAS). This section was developed in collaboration with the SASBDB community. For details on the metrics, guidelines, and recommendations used, refer to the 2017 community article (Trewhella et al. 2017). All experimental datasets used to build the model are listed, however, validation criteria for other experimental datasets are currently under development.
+
+ 1.3. Model quality assessment: Model quality for models at atomic resolution is assessed using MolProbity (Williams et al. 2018), consistent with PDB. Model quality for coarse-grained or multi-resolution structures are assessed by computing excluded volume satisfaction based on reported distances and sizes of beads in the structures.
+
+ 1.4. Fit to data used to build the model: Fit to data used to build the model is only available for SAS datasets. This section was developed in collaboration with the SASBDB (Valentini et al. 2015). For details on the metrics, guidelines, and recommendations used, refer to the 2017 community article (Trewhella et al. 2017). All experimental datasets used to build the model are listed, however, validation criteria for other experimental datasets are currently under development.
+
+ A fifth category, fit to data used to validate the model, is under development.
+
+ 2. Overview
+
+ 2.1 Overall Quality Assessment: This is a set of plots that represent a snapshot view of the validation results. There are four tabs, one for each validation criterion: (i) model quality, (ii) data quality, (iii) fit to data used for modeling, and (iv) fit to data used for validation.
+
+ 2.1.1. Model quality: For atomic structures, MolProbity is used for evaluation. We evaluate bond outliers, side chain outliers, clash score, rotamer satisfaction, and Ramachandran dihedral satisfaction (Williams et al. 2018) . Details on MolProbity evaluation and tables can be found here. For coarse-grained structures of beads, we evaluate excluded volume satisfaction. An excluded volume violation or overlap between two beads occurs if the distance between the two beads is less than the sum of their radii (S. J. Kim et al. 2018). Excluded volume satisfaction is the percentage of pair distances in a structure that are not violated (higher values are better).
+
+ 2.1.2. Data quality: Data quality assessments are only available for SAS datasets. The current plot displays radius of gyration (Rg) for each dataset used to build the model. Rg is obtained from both a P(r) analysis (see more here), and a Guinier analysis (see more here).
+
+ 2.1.3. Fit to data used for modeling: Fit to data used for modeling assessments are available for SAS datasets. The current plot displays Χ² Goodness of Fit Assessment for SAS-model fits (see more here).
+
+ 2.1.4. Fit to data used for validation: Fit to data used for validation is currently under development.
+ 3.1. Ensemble Information: Number of ensembles deposited, where each ensemble consists of two or more structures. 3.2. Summary: Summary of the structure, including number of models deposited, datasets used to build the models and information on model representation. 3.3. Entry Composition: Number of chains present in the integrative structure. 3.4. Datasets Used: Number and type of experimental datasets used to build the model. 3.5. Representation: Number and details on rigid and non-rigid elements of the structure. 3.6. Methods and Software: Methods, protocols, and softwares used to build the integrative structure.
+ 4. Data Quality
+
+ 4.1. SAS: Scattering Profiles: Data from solutions of biological macromolecules are presented as both log I(q) vs q and log I(q) vs log (q) based on SAS validation task force (SASvtf) recommendations (Trewhella et al. 2017). I(q) is the intensity (in arbitrary units) and q is the modulus of the scattering vector.
+
+ 4.2. SAS: Experimental Estimates: Molecular weight (MW) and volume data are displayed. True MW can be compared to Porod estimate from scattering profiles, estimated volume can be compared to Porod volume obtained from scattering profiles (Trewhella et al. 2017).
+
+ 4.3. SAS: Flexibility Analysis: Flexibility of chains are assessed by inferring Porod-Debye and Kratky plots. In a Porod-Debye plot, a clear plateau is observed for globular (partial or fully folded) domains, whereas fully unfolded domains are devoid of any discernible plateau. For details, refer to Figure 5 in Rambo and Tainer, 2011 (Rambo and Tainer 2011). In a Kratky plot, a parabolic shape is observed for globular (partial or fully folded) domains and a hyperbolic shape is observed for fully unfolded domains.
+
+ 4.4. SAS: P(r) Analysis: P(r) represents the distribution of distances between all pairs of atoms within the particle weighted by the respective electron densities (Moore 1980) . P(r) is the Fourier transform of I(s) (and vice versa). Rg can be estimated from integrating the P(r) function. Agreement between the P(r) and Guinier-determined Rg (table below) is a good measure of the self-consistency of the SAS profile. Rg is a measure for the overall size of a macromolecule; e.g. a protein with a smaller Rg is more compact than a protein with a larger Rg, provided both have the same molecular weight (MW). The point where P(r) is decaying to zero is called Dmax and represents the maximum size of the particle. The value of P(r) should be zero beyond r=Dmax.
+
+ 4.5. SAS: Guinier Analysis: Agreement between the P(r) and Guinier-determined Rg (table below) is a good measure of the self-consistency of the SAS profile. The linearity of the Guinier plot is a sensitive indicator of the quality of the experimental SAS data; a linear Guinier plot is a necessary but not sufficient demonstration that a solution contains monodisperse particles of the same size. Deviations from linearity usually point to strong interference effects, polydispersity of the samples or improper background subtraction (Feigin and Svergun 1987). Residual value plot and coefficient of determination (R2) are measures to assess linear fit to the data. A perfect fit has an R2 value of 1. Residual values should be equally and randomly spaced around the horizontal axis.
+
+ 5. Model Quality Assessment
+
+ Excluded volume assessments are performed for coarse-grained structures and MolProbity analysis is performed for atomic structures.
+
+ 5.1a. Excluded Volume Analysis: Excluded volume violation is defined as percentage of overlaps between coarse-grained beads in a structure. This percentage is obtained by dividing the number of overlaps/violations by the total number of pair distances in a structure. An overlap or violation between two beads occurs if the distance between the two beads is less than the sum of their radii (S. J. Kim et al. 2018).
+
+ 5.1b. MolProbity Analysis: MolProbity analysis for atomic structures reported is consistent with PDB standards for X-ray structures (Williams et al. 2018). Summarized information is available in both the HTML and PDF reports. Detailed information is available for download as csv files, both from the HTML and the PDF reports. Please refer to the PDB user guide for details.
+
+ 6. Fit to Data Used for Modeling Assessment
+
+ Recommendations from SAS validation task force (SASvtf) for model fit assessment include:
+
+ All software, including version numbers, used for modelling; three-dimensional shape, bead or atomistic modelling.
+
+ All modelling assumptions clearly stated, including adjustable parameter values. In the case of imposed symmetry, especially in the case of shape models, comparison with results obtained in the absence of symmetry restraints.
+
+ For atomistic modelling, a description of how the starting models were obtained (e.g. crystal or NMR structure of a domain, homology model etc.), connectivity or distance restraints used and flexible regions specified and the basis for their selection.
+
+ Any additional experimental or bioinformatics-based evidence supporting modelling assumptions and therefore enabling modelling restraints or independent model validation.
+
+ For three-dimensional models, values for adjustable parameters, constant adjustments to intensity, χ² and associated p-values and a clear representation of the model fit to the experimental I(q) versus q including a residual plot that clearly identifies systematic deviations.
+
+ Analysis of the ambiguity and precision of models, e.g. based on cluster analysis of results from multiple independent optimizations of the model against the SAS profile or profiles, with examples of any distinct clusters in addition to any final averaged model.
+
+ 6.1. SAS: Χ² Goodness of Fit Assessment: Model fits displayed in this section are obtained from SASBDB. χ² values are a measure of fit of the model to data. A perfect fit has a χ² value of 1.0. (Trewhella et al. 2013, Schneidman-Duhovny, Kim, and Sali 2012, and Rambo and Tainer 2013).
+
+ 6.2. SAS: Cormap Analysis: ATSAS datcmp (Manalastas-Cantos et al. 2021) was used for hypothesis testing, using the null hypothesis that all data sets (i.e. the fit and the data collected) are similar. The reported p-value is a measure of evidence against the null hypothesis; the smaller the value, the stronger the evidence that the null hypothesis should be rejected.
+ 7. Fit to Data Used for Validation Assessment This includes assessing model fit to data that was not used explicitly or implicitly in modeling. This section is currently under development.
+ 8. Understanding the Summary Table
+
+ 8.1. Entry composition: List of unique molecules that are present in the entry.
+
+ 8.2. Datasets used for modeling: List of input experimental datasets used for modeling.
+
+ 8.3. Representation: Representation of modeled structure.
+ 8.3.1. Atomic structural coverage: Percentage of modeled structure or residues for which atomic structures are available. These structures can include X-ray, NMR, EM, and other comparative models. 8.3.2. Rigid bodies: A rigid body consists of multiple coarse-grained (CG) beads or atomic residues. In a rigid body, the beads (or residues) have their relative distances constrained during conformational sampling. 8.3.3. Flexible units: Flexible units consist of strings of beads that are restrained by the sequence connectivity. 8.3.4. Interface units: An automatic definition based on identified interface for each model. Applicable to models built with HADDOCK. 8.3.5. Resolution: An automatic definition based on identified interface for each model. Applicable to models built with HADDOCK.
+ 8.4. Restraints: A set of restraints used to compute modeled structure.
+
+ 8.4.1. Physical restraints: A list of restraints derived from physical principles to compute modeled structure.
+ 8.4.2. Experimental information: A list of restraints derived from experimental datasets to compute modeled structure.
+
+ 8.5. Validation: Assessment of models based on validation criteria set by IHM task force (Sali et al. 2021 and Berman et al. 2019).
+
+ 8.5.1. Sampling validation: Validation metrics used to assess sampling convergence for stochastic sampling. Sampling precision is defined as the largest allowed Root-mean-square deviation (RMSD) between the cluster centroid and a model within any cluster in the finest clustering for which each sample contributes structures proportionally to its size (considering both the significance and magnitude of the difference) and for which a sufficient proportion of all structures occur in sufficiently large clusters (Viswanath et al. 2017).
+
+ 8.5.2. Clustering algorithm: Clustering algorithm used to analyze resulting solution.
+
+ 8.5.3. Clustering feature: Feature or reaction co-ordinate used to cluster solution.
+
+ 8.5.4. Number of ensembles: Number of solutions or ensembles of modeled structure.
+
+ 8.4.5. Number of models in ensemble(s): Number of structures in the solution ensemble(s).
+
+ 8.5.6. Model precision: Measurement of variation among the models in the ensemble upon a global least-squares superposition.
+
+ 8.5.7. Data quality:Assessment of data on which modeled structures are based. See section 4 for more details.
+
+ 8.5.8. Model quality:Assessment of modeled structures based on physical principles.See section 5 for more details.
+
+ 8.5.9. Assessment of atomic segments:Assessment of atomic segments in the integrative structure. See section 5 for more details.
+
+ 8.5.10. Excluded volume satisfaction:Assessment of excluded volume satisfaction of coarse-grained beads in the modeled structure. Excluded volume between two beads not connected in sequence are satisfied if the distance between them is greater than that of the sum of their radii. See section 5 for more details.
+
+ 8.5.11. Fit to data used for modeling:Assessment of modeled structure based on data used for modeling. See section 6 for more details.
+
+ 8.5.12. Fit to data used for validation:Assessment of modeled structure based on data not used for modeling. See section 7 for more details.
+
+ 8.6. Methodology and software: List of methods on which modeled structures are based and software used to obtain structures.
+
+ 8.6.1. Method name: Name(s) of method(s) used to generate modeled structures.
+
+ 8.6.2. Method details: Details of method(s) used to generate modeled structures.
+
+ 8.6.3. Software details: Software used to compute modeled structure, also includes scripts used to generate and analyze models.
+
+ 9. References for Validation Report
+
+ Berman, Helen M., Paul D. Adams, Alexandre A. Bonvin, Stephen K. Burley, Bridget Carragher, Wah Chiu, Frank DiMaio, et al. 2019. “Federating Structural Models and Data: Outcomes from A Workshop on Archiving Integrative Structures.” Structure 27 (12): 1745–59.
+
+ Manalastas-Cantos, Karen, Petr V. Konarev, Nelly R. Hajizadeh, Alexey G. Kikhney, Maxim V. Petoukhov, Dmitry S. Molodenskiy, Alejandro Panjkovich, et al. 2021. “ATSAS 3.0: Expanded Functionality and New Tools for Small-Angle Scattering Data Analysis.” Journal of Applied Crystallography 54 (Pt 1): 343–55.
+
+ Rambo, Robert P., and John A. Tainer. 2011. “Characterizing Flexible and Intrinsically Unstructured Biological Macromolecules by SAS Using the Porod-Debye Law.” Biopolymers 95 (8): 559–71.
+
+ Sali, Andrej, Helen M. Berman, Torsten Schwede, Jill Trewhella, Gerard Kleywegt, Stephen K. Burley, John Markley, et al. 2015. “Outcome of the First wwPDB Hybrid/Integrative Methods Task Force Workshop.” Structure 23 (7): 1156–67.
+
+ Trewhella, Jill, Anthony P. Duff, Dominique Durand, Frank Gabel, J. Mitchell Guss, Wayne A. Hendrickson, Greg L. Hura, et al. 2017. “2017 Publication Guidelines for Structural Modelling of Small-Angle Scattering Data from Biomolecules in Solution: An Update.” Acta Crystallographica. Section D, Structural Biology 73 (Pt 9): 710–28
+
+ Valentini, Erica, Alexey G. Kikhney, Gianpietro Previtali, Cy M. Jeffries, and Dmitri I. Svergun. 2015. “SASBDB, a Repository for Biological Small-Angle Scattering Data.” Nucleic Acids Research 43 (Database issue): D357–63.
+
+ Viswanath, Shruthi, Ilan E. Chemmama, Peter Cimermancic, and Andrej Sali. 2017. “Assessing Exhaustiveness of Stochastic Sampling for Integrative Modeling of Macromolecular Structures.” Biophysical Journal 113 (11): 2344–53.
+
+ Williams, Christopher J., Jeffrey J. Headd, Nigel W. Moriarty, Michael G. Prisant, Lizbeth L. Videau, Lindsay N. Deis, Vishal Verma, et al. 2018. “MolProbity: More and Better Reference Data for Improved All-Atom Structure Validation.” Protein Science: A Publication of the Protein Society 27 (1): 293–315.
+ 1. Understanding the PDB-Dev Validation Report This validation report was created based on the guidelines and recommendations from IHM TaskForce (Berman et al. 2019). The first version of the PDB-Dev validation report consists of four categories as follows: 1.1 Model composition : This section outlines model details and includes information on ensembles deposited, chains and residues of domains, model representation, software, protocol, and methods used. All deposited structures have this section. 1.2. Data quality assessment : Data quality assessments are only available for Small Angle Scattering datasets (SAS). This section was developed in collaboration with the SASBDB community. For details on the metrics, guidelines, and recommendations used, refer the 2017 community article (Trewhella et al. 2017). All experimental datasets used to build the model are listed, however, validation criteria for other experimental datasets are currently under development. 1.3. Model quality assessment : Model quality for models at atomic resolution is assessed using Molprobity (Williams et al. 2018), consistent with PDB. Model quality for coarse-grained or multi-resolution structures are assessed by computing excluded volume satisfaction based on reported distances and sizes of beads in the structures. 1.4. Fit to data used to build the model : Fit to data used to build the model is only available for SAS datasets. This section was developed in collaboration with the SASBDB (Valentini et al. 2015). For details on the metrics, guidelines, and recommendations used, refer the 2017 community article (Trewhella et al. 2017). All experimental datasets used to build the model are listed, however, validation criteria for other experimental datasets are currently under development. A fifth category, fit to data used to validate the model, is under development. 2.1 Overall Quality Assessment : This is a set of plots that represent a snapshot view of the validation results. There are four tabs, one for each validation criterion: (i) model quality, (ii) data quality, (iii) fit to data used for modeling, and (iv) fit to data used for validation. 2.1.1. Model quality : For atomic structures, MolProbity is used for evaluation. We evaluate bond outliers, side chain outliers, clash score, rotamer satisfaction, and ramachandran dihedral satisfaction (Williams et al. 2018) . Details on MolProbity evaluation and tables can be found here. For coarse-grained structures of beads, we evaluate excluded volume satisfaction. An excluded volume violation or overlap between two beads occurs if the distance between the two beads is less than the sum of their radii (S. J. Kim et al. 2018). Excluded volume satisfaction is the percentage of pair distances in a structure that are not violated (higher values are better). 2.1.2. Data quality : Data quality assessments are only available for SAS datasets. The current plot displays radius of gyration (Rg) for each dataset used to build the model. Rg is obtained from both a P(r) analysis (see more here), and a Guinier analysis (see more here). 2.1.3. Fit to data used for modeling : Fit to data used for modeling assessments are available for SAS datasets. The current plot displays Χ² Goodness of Fit Assessment for SAS-model fits (see more here). 2.1.4. Fit to data used for validation : Fit to data used for validation is currently under development. 3.1. Ensemble Information : Number of ensembles deposited, where each ensemble consists of two or more structures. 3.2. Summary : Summary of the structure, including number of models deposited, datasets used to build the models and information on model representation. 3.3. Entry Composition : Number of chains present in the integrative structure. 3.4. Datasets Used : Number and type of experimental datasets used to build the model. 3.5. Representation : Number and details on rigid and non-rigid elements of the structure. 3.6. Methods and Software : Methods, protocols, and softwares used to build the integrative structure. 4.1. SAS: Scattering Profiles : Data from solutions of biological macromolecules are presented as both log I(q) vs q and log I(q) vs log (q) based on SAS validation task force (SASvtf) recommendations (Trewhella et al. 2017). I(q) is the intensity (in arbitrary units) and q is the modulus of the scattering vector. 4.2. SAS: Experimental Estimates : Molecular weight (MW) and volume data are displayed. True MW can be compared to Porod estimate from scattering profiles, estimated volume can be compared to Porod volume obtained from scattering profiles (Trewhella et al. 2017). 4.3. SAS: Flexibility Analysis : Flexibility of chains are assessed by inferring Porod-Debye and Kratky plots. In a Porod-Debye plot, a clear plateau is observed for globular (partial or fully folded) domains, whereas fully unfolded domains are devoid of any discernible plateau. For details, refer to Figure 5 in Rambo and Tainer, 2011 (Rambo and Tainer 2011). In a Kratky plot, a parabolic shape is observed for globular (partial or fully folded) domains and a hyperbolic shape is observed for fully unfolded domains. 4.4. SAS: P(r) Analysis : P(r) represents the distribution of distances between all pairs of atoms within the particle weighted by the respective electron densities (Moore 1980) . P(r) is the Fourier transform of I(s) (and vice versa). Rg can be estimated from integrating the P(r) function. Agreement between the P(r) and Guinier-determined Rg (table below) is a good measure of the self-consistency of the SAS profile. Rg is a measure for the overall size of a macromolecule; e.g. a protein with a smaller Rg is more compact than a protein with a larger Rg, provided both have the same molecular weight (Mw). The point where P(r) is decaying to zero is called Dmax and represents the maximum size of the particle. The value of P(r) should be zero beyond r=Dmax. 4.5. SAS: Guinier Analysis : Agreement between the P(r) and Guinier-determined Rg (table below) is a good measure of the self-consistency of the SAS profile. The linearity of the Guinier plot is a sensitive indicator of the quality of the experimental SAS data; a linear Guinier plot is a necessary but not sufficient demonstration that a solution contains monodisperse particles of the same size. Deviations from linearity usually point to strong interference effects, polydispersity of the samples or improper background subtraction (Feigin and Svergun 1987). Residual value plot and coefficient of determination (R2) are measures to assess linear fit to the data. A perfect fit has an R2 value of 1. Residual values should be equally and randomly spaced around the horizontal axis. Excluded volume assessments are performed for coarse-grained structures and MolProbity analysis is performed for atomic structures. 5.1a. Excluded Volume Analysis : Excluded volume violation is defined as percentage of overlaps between coarse-grained beads in a structure. This percentage is obtained by dividing the number of overlaps/violations by the total number of pair distances in a structure. An overlap or violation between two beads occurs if the distance between the two beads is less than the sum of their radii (S. J. Kim et al. 2018). 5.1b. Molprobity Analysis : Molprobity analysis for atomic structures reported is consistent with PDB standards for X-ray structures (Williams et al. 2018). Summarized information is available in both the HTML and PDF reports. Detailed information is available for download as csv files, both from the HTML and the PDF reports. Please refer to the PDB user guide for details. 6. Fit to Data Used for Modeling Assessment Recommendations from SAS validation task force (SASvtf) for model fit assessment include: All software, including version numbers, used for modelling; three-dimensional shape, bead or atomistic modelling. All modelling assumptions clearly stated, including adjustable parameter values. In the case of imposed symmetry, especially in the case of shape models, comparison with results obtained in the absence of symmetry restraints. For atomistic modelling, a description of how the starting models were obtained (e.g. crystal or NMR structure of a domain, homology model etc.), connectivity or distance restraints used and flexible regions specified and the basis for their selection. Any additional experimental or bioinformatics-based evidence supporting modelling assumptions and therefore enabling modelling restraints or independent model validation. For three-dimensional models, values for adjustable parameters, constant adjustments to intensity, χ² and associated p-values and a clear representation of the model fit to the experimental I(q) versus q including a residual plot that clearly identifies systematic deviations. Analysis of the ambiguity and precision of models, e.g. based on cluster analysis of results from multiple independent optimizations of the model against the SAS profile or profiles, with examples of any distinct clusters in addition to any final averaged model. 6.1. SAS: Χ² Goodness of Fit Assessment : Model and fits displayed below were obtained from SASBDB. χ² values are a measure of fit of the model to data. A perfect fit has a χ² value of zero. (Trewhella et al. 2013,Schneidman-Duhovny, Kim, and Sali 2012, and Rambo and Tainer 2013) 6.2. SAS: Cormap Analysis : ATSAS datcmp (Manalastas-Cantos et al. 2021) was used for hypothesis testing, using the null hypothesis that all data sets (i.e. the fit and the data collected) are similar. The reported p-value is a measure of evidence against the null hypothesis; the smaller the value, the stronger the evidence that the null hypothesis should be rejected. 7. Fit to Data Used for Validation Assessment This includes assessing model fit to data that was not used explicitly or implicitly in modeling. This section is currently under development. 8. Understanding the Summary Table 8.1. Entry composition : List of unique molecules that are present in the entry 8.2. Datasets used for modeling : List of input experimental datasets used for modeling. 8.3. Representation : Representation of modeled structure. 8.3.1. Atomic structural coverage : Percentage of modeled structure or residues for which atomic structures are available. These structures can include X-ray, NMR, EM, and other comparative models. 8.3.2. Rigid bodies : A rigid body consists of multiple coarse-grained (CG) beads or atomic residues. In a rigid body, the beads (or residues) have their relative distances constrained during conformational sampling. 8.3.3. Flexible units : Flexible units consist of strings of beads that are restrained by the sequence connectivity. 8.3.4. Interface units : An automatic definition based on identified interface for each model. Applicable to models built with HADDOCK. 8.3.5. Resolution : An automatic definition based on identified interface for each model. Applicable to models built with HADDOCK. 8.4. Restraints : A set of restraints used to compute modeled structure. 8.4.1. Physical restraints : A list of restraints derived from physical principles to compute modeled structure. 8.4.2. Experimental information : A list of restraints derived from experimental datasets to compute modeled structure. 8.5. Validation : Assessment of models based on validation criteria set by IHM task force (Sali et al. 2021 and Berman et al. 2019) 8.5.1. Sampling validation : Validation metrics used to assess sampling convergence for stochastic sampling. Sampling precision is defined as the largest allowed Root-mean-square deviation (RMSD) between the cluster centroid and a model within any cluster in the finest clustering for which each sample contributes structures proportionally to its size (considering both the significance and magnitude of the difference) and for which a sufficient proportion of all structures occur in sufficiently large clusters (Viswanath et al. 2017). 8.5.2. Clustering algorithm : Clustering algorithm used to analyze resulting solution. 8.5.3. Clustering feature : Feature or reaction co-ordinate used to cluster solution. 8.5.4. Number of ensembles : Number of solutions or ensembles of modeled structure. 8.4.5. Number of models in ensemble(s) : Number of structures in the solution ensemble(s). 8.5.6. Model precision : Measurement of variation among the models in the ensemble upon a global least-squares superposition. 8.5.7. Data quality :Assessment of data on which modeled structures are based. See section 4 for more details. 8.5.8. Model quality :Assessment of modeled structures based on physical principles.See section 5 for more details. 8.5.9. Assessment of atomic segments :Assessment of atomic segments in the integrative structure. See section 5 for more details. 8.5.10. Excluded volume satisfaction :Assessment of excluded volume satisfaction of coarse-grained beads in the modeled structure. Excluded volume between two beads not connected in sequence are satisfied if the distance between them is greater than that of the sum of their radii. See section 5 for more details. 8.5.11. Fit to data used for modeling :Assessment of modeled structure based on data used for modeling. See section 6 for more details. 8.5.12. Fit to data used for validation :Assessment of modeled structure based on data not used for modeling. See section 7 for more details. 8.6. Methodology and software : List of methods on which modeled structures are based and software used to obtain structures. 8.6.1. Method name : Name(s) of method(s) used to generate modeled structures. 8.6.2. Method details : Details of method(s) used to generate modeled structures. 8.6.3. Software details: Software used to compute modeled structure, also includes scripts used to generate and analyze models. 9. References for Validation Report Berman, Helen M., Paul D. Adams, Alexandre A. Bonvin, Stephen K. Burley, Bridget Carragher, Wah Chiu, Frank DiMaio, et al. 2019. “Federating Structural Models and Data: Outcomes from A Workshop on Archiving Integrative Structures.” Structure 27 (12): 1745–59. Manalastas-Cantos, Karen, Petr V. Konarev, Nelly R. Hajizadeh, Alexey G. Kikhney, Maxim V. Petoukhov, Dmitry S. Molodenskiy, Alejandro Panjkovich, et al. 2021. “ATSAS 3.0: Expanded Functionality and New Tools for Small-Angle Scattering Data Analysis.” Journal of Applied Crystallography 54 (Pt 1): 343–55. Rambo, Robert P., and John A. Tainer. 2011. “Characterizing Flexible and Intrinsically Unstructured Biological Macromolecules by SAS Using the Porod-Debye Law.” Biopolymers 95 (8): 559–71. Sali, Andrej, Helen M. Berman, Torsten Schwede, Jill Trewhella, Gerard Kleywegt, Stephen K. Burley, John Markley, et al. 2015. “Outcome of the First wwPDB Hybrid/Integrative Methods Task Force Workshop.” Structure 23 (7): 1156–67. Trewhella, Jill, Anthony P. Duff, Dominique Durand, Frank Gabel, J. Mitchell Guss, Wayne A. Hendrickson, Greg L. Hura, et al. 2017. “2017 Publication Guidelines for Structural Modelling of Small-Angle Scattering Data from Biomolecules in Solution: An Update.” Acta Crystallographica. Section D, Structural Biology 73 (Pt 9): 710–28 Valentini, Erica, Alexey G. Kikhney, Gianpietro Previtali, Cy M. Jeffries, and Dmitri I. Svergun. 2015. “SASBDB, a Repository for Biological Small-Angle Scattering Data.” Nucleic Acids Research 43 (Database issue): D357–63. Viswanath, Shruthi, Ilan E. Chemmama, Peter Cimermancic, and Andrej Sali. 2017. “Assessing Exhaustiveness of Stochastic Sampling for Integrative Modeling of Macromolecular Structures.” Biophysical Journal 113 (11): 2344–53. Williams, Christopher J., Jeffrey J. Headd, Nigel W. Moriarty, Michael G. Prisant, Lizbeth L. Videau, Lindsay N. Deis, Vishal Verma, et al. 2018. “MolProbity: More and Better Reference Data for Improved All-Atom Structure Validation.” Protein Science: A Publication of the Protein Society 27 (1): 293–315. 10. References for Modeling Software Ahmed, Aqeel, Friedrich Rippmann, Gerhard Barnickel, and Holger Gohlke. 2011. “A Normal Mode-Based Geometric Simulation Approach for Exploring Biologically Relevant Conformational Transitions in Proteins.” Journal of Chemical Information and Modeling 51 (7): 1604–22. Berjanskii, Mark, Yongjie Liang, Jianjun Zhou, Peter Tang, Paul Stothard, You Zhou, Joseph Cruz, et al. 2010. “PROSESS: A Protein Structure Evaluation Suite and Server.” Nucleic Acids Research 38 (Web Server issue): W633–40. Brannetti, B., A. Zanzoni, L. Montecchi-Palazzi, G. Cesareni, and M. Helmer-Citterich. 2001. “iSPOT: A Web Tool for the Analysis and Recognition of Protein Domain Specificity.” Comparative and Functional Genomics 2 (5): 314–18. Brunger, Axel T. 2007. “Version 1.2 of the Crystallography and NMR System.” Nature Protocols 2 (11): 2728–33. Bryson, Kevin, Liam J. McGuffin, Russell L. Marsden, Jonathan J. Ward, Jaspreet S. Sodhi, and David T. Jones. 2005. “Protein Structure Prediction Servers at University College London.” Nucleic Acids Research 33 (Web Server issue): W36–38. Buchan, Daniel W. A., and David T. Jones. 2019. “The PSIPRED Protein Analysis Workbench: 20 Years on.” Nucleic Acids Research 47 (W1): W402–7. Chaudhury, Sidhartha, Sergey Lyskov, and Jeffrey J. Gray. 2010. “PyRosetta: A Script-Based Interface for Implementing Molecular Modeling Algorithms Using Rosetta.” Bioinformatics 26 (5): 689–91. Cherry, J. Michael, Eurie L. Hong, Craig Amundsen, Rama Balakrishnan, Gail Binkley, Esther T. Chan, Karen R. Christie, et al. 2012. “Saccharomyces Genome Database: The Genomics Resource of Budding Yeast.” Nucleic Acids Research 40 (Database issue): D700–705. Dimura, Mykola, Thomas-Otavio Peulen, Hugo Sanabria, Dmitro Rodnin, Katherina Hemmen, Christian A. Hanke, Claus A. M. Seidel, and Holger Gohlke. 2020. “Automated and Optimally FRET-Assisted Structural Modeling.” Nature Communications 11 (1): 5394. Ding, Feng, Douglas Tsao, Huifen Nie, and Nikolay V. Dokholyan. 2008. “Ab Initio Folding of Proteins with All-Atom Discrete Molecular Dynamics.” Structure 16 (7): 1010–18. Dominguez, C., R. Boelens, and A. M. Bonvin. 2003. “HADDOCK: A Protein-Protein Docking Approach Based on Biochemical or Biophysical Information.” Journal of the American Chemical Society 125 (7): 1731–37. Feigin, L. A., and D. I. Svergun. 1987. Structure Analysis by Small-Angle X-Ray and Neutron Scattering. Edited by George W. Taylor. Springer, Boston, MA. Finn, Robert D., Jody Clements, and Sean R. Eddy. 2011. “HMMER Web Server: Interactive Sequence Similarity Searching.” Nucleic Acids Research 39 (Web Server issue): W29–37. Gautier, Romain, Dominique Douguet, Bruno Antonny, and Guillaume Drin. 2008. “HELIQUEST: A Web Server to Screen Sequences with Specific Alpha-Helical Properties.” Bioinformatics 24 (18): 2101–2. Hummer, Gerhard, and Jürgen Köfinger. 2015. “Bayesian Ensemble Refinement by Replica Simulations and Reweighting.” The Journal of Chemical Physics 143 (24): 243150. Jones, David T., and Domenico Cozzetto. 2015. “DISOPRED3: Precise Disordered Region Predictions with Annotated Protein-Binding Activity.” Bioinformatics 31 (6): 857–63. Källberg, Morten, Gohar Margaryan, Sheng Wang, Jianzhu Ma, and Jinbo Xu. 2014. “RaptorX Server: A Resource for Template-Based Protein Structure Modeling.” Methods in Molecular Biology 1137: 17–27. Kelley, Lawrence A., Stefans Mezulis, Christopher M. Yates, Mark N. Wass, and Michael J. E. Sternberg. 2015. “The Phyre2 Web Portal for Protein Modeling, Prediction and Analysis.” Nature Protocols 10 (6): 845–58. Kim, David E., Dylan Chivian, and David Baker. 2004. “Protein Structure Prediction and Analysis Using the Robetta Server.” Nucleic Acids Research 32 (Web Server issue): W526–31. Kim, Seung Joong, Javier Fernandez-Martinez, Ilona Nudelman, Yi Shi, Wenzhu Zhang, Barak Raveh, Thurston Herricks, et al. 2018. “Integrative Structure and Functional Anatomy of a Nuclear Pore Complex.” Nature 555 (7697): 475–82. Li, Yunqi, and Yang Zhang. 2009. “REMO: A New Protocol to Refine Full Atomic Protein Models from C-Alpha Traces by Optimizing Hydrogen-Bonding Networks.” Proteins 76 (3): 665–76. Ludtke, S. J. 2016. “Single-Particle Refinement and Variability Analysis in EMAN2.1.” Methods in Enzymology 579 (July): 159–89. Lupas, A., M. Van Dyke, and J. Stock. 1991. “Predicting Coiled Coils from Protein Sequences.” Science 252 (5009): 1162–64. Manalastas-Cantos, Karen, Petr V. Konarev, Nelly R. Hajizadeh, Alexey G. Kikhney, Maxim V. Petoukhov, Dmitry S. Molodenskiy, Alejandro Panjkovich, et al. 2021. “ATSAS 3.0: Expanded Functionality and New Tools for Small-Angle Scattering Data Analysis.” Journal of Applied Crystallography 54 (Pt 1): 343–55. Matthew Allen Bullock, Joshua, Jannik Schwab, Konstantinos Thalassinos, and Maya Topf. 2016. “The Importance of Non-Accessible Crosslinks and Solvent Accessible Surface Distance in Modeling Proteins with Restraints From Crosslinking Mass Spectrometry.” Molecular & Cellular Proteomics: MCP 15 (7): 2491–2500. Moore, P. B. 1980. “Small-Angle Scattering. Information Content and Error Analysis.” Journal of Applied Crystallography 13 (2): 168–75. Ovchinnikov, Sergey, Hetunandan Kamisetty, and David Baker. 2014. “Robust and Accurate Prediction of Residue-Residue Interactions across Protein Interfaces Using Evolutionary Information.” eLife 3 (May): e02030. Pettersen, Eric F., Thomas D. Goddard, Conrad C. Huang, Gregory S. Couch, Daniel M. Greenblatt, Elaine C. Meng, and Thomas E. Ferrin. 2004. “UCSF Chimera--a Visualization System for Exploratory Research and Analysis.” Journal of Computational Chemistry 25 (13): 1605–12. Pires, Douglas E. V., David B. Ascher, and Tom L. Blundell. 2014. “mCSM: Predicting the Effects of Mutations in Proteins Using Graph-Based Signatures.” Bioinformatics 30 (3): 335–42. Pronk, Sander, Szilárd Páll, Roland Schulz, Per Larsson, Pär Bjelkmar, Rossen Apostolov, Michael R. Shirts, et al. 2013. “GROMACS 4.5: A High-Throughput and Highly Parallel Open Source Molecular Simulation Toolkit.” Bioinformatics 29 (7): 845–54. Rambo, Robert P., and John A. Tainer. 2013. “Super-Resolution in Solution X-Ray Scattering and Its Applications to Structural Systems Biology.” Annual Review of Biophysics 42 (March): 415–41. Rohl, Carol A., Charlie E. M. Strauss, Kira M. S. Misura, and David Baker. 2004. “Protein Structure Prediction Using Rosetta.” Methods in Enzymology 383: 66–93. Russel, Daniel, Keren Lasker, Ben Webb, Javier Velázquez-Muriel, Elina Tjioe, Dina Schneidman-Duhovny, Bret Peterson, and Andrej Sali. 2012. “Putting the Pieces Together: Integrative Modeling Platform Software for Structure Determination of Macromolecular Assemblies.” PLoS Biology 10 (1): e1001244. Scheres, Sjors H. W. 2012. “RELION: Implementation of a Bayesian Approach to Cryo-EM Structure Determination.” Journal of Structural Biology 180 (3): 519–30. Schneider, Michael, and Oliver Brock. 2014. “Combining Physicochemical and Evolutionary Information for Protein Contact Prediction.” PloS One 9 (10): e108438. Schneidman, D., M. Hammel, J. Tainer, and A. Sali. 2016. “FoXS, FoXSDock, and MultiFoXS: Single-State and Multi-State Structural Modeling of Proteins and Their Complexes Based on SAXS Profiles.” Nucleic Acids Research 44 (W1): W424–29. Schneidman-Duhovny, Dina, Seung Joong Kim, and Andrej Sali. 2012. “Integrative Structural Modeling with Small Angle X-Ray Scattering Profiles.” BMC Structural Biology 12 (July): 17. Serra, F., D. Bau, M. Goodstadt, D. Castillo, G. J. Filion, and M. A. Marti-Renom. 2017. “Automatic Analysis and 3D-Modelling of Hi-C Data Using TADbit Reveals Structural Features of the Fly Chromatin Colors.” PLoS Computational Biology 13 (7): e1005665. Shen, Yang, Oliver Lange, Frank Delaglio, Paolo Rossi, James M. Aramini, Gaohua Liu, Alexander Eletsky, et al. 2008. “Consistent Blind Protein Structure Generation from NMR Chemical Shift Data.” Proceedings of the National Academy of Sciences of the United States of America 105 (12): 4685–90. Söding, Johannes, Andreas Biegert, and Andrei N. Lupas. 2005. “The HHpred Interactive Server for Protein Homology Detection and Structure Prediction.” Nucleic Acids Research 33 (Web Server issue): W244–48. Steinegger, Martin, Markus Meier, Milot Mirdita, Harald Vöhringer, Stephan J. Haunsberger, and Johannes Söding. 2019. “HH-suite3 for Fast Remote Homology Detection and Deep Protein Annotation.” BMC Bioinformatics 20 (1): 473. Trigg, Jason, Karl Gutwin, Amy E. Keating, and Bonnie Berger. 2011. “Multicoil2: Predicting Coiled Coils and Their Oligomerization States from Sequence in the Twilight Zone.” PloS One 6 (8): e23519. Trnka, Michael J., Peter R. Baker, Philip J. J. Robinson, A. L. Burlingame, and Robert J. Chalkley. 2014. “Matching Cross-Linked Peptide Spectra: Only as Good as the Worse Identification.” Molecular & Cellular Proteomics: MCP 13 (2): 420–34. Tubiana, Thibault, Jean-Charles Carvaillo, Yves Boulard, and Stéphane Bressanelli. 2018. “TTClust: A Versatile Molecular Simulation Trajectory Clustering Program with Graphical Summaries.” Journal of Chemical Information and Modeling 58 (11): 2178–82. Vries, Sjoerd J. de, and Alexandre M. J. J. Bonvin. 2011. “CPORT: A Consensus Interface Predictor and Its Performance in Prediction-Driven Docking with HADDOCK.” PloS One 6 (3): e17695. Wang, Yan, Jian Wang, Ruiming Li, Qiang Shi, Zhidong Xue, and Yang Zhang. 2017. “ThreaDomEx: A Unified Platform for Predicting Continuous and Discontinuous Protein Domains by Multiple-Threading and Segment Assembly.” Nucleic Acids Research 45 (W1): W400–407. Waterhouse, Andrew, Martino Bertoni, Stefan Bienert, Gabriel Studer, Gerardo Tauriello, Rafal Gumienny, Florian T. Heer, et al. 2018. “SWISS-MODEL: Homology Modelling of Protein Structures and Complexes.” Nucleic Acids Research 46 (W1): W296–303. Webb, B., and A. Sali. 2014. “Comparative Protein Structure Modeling Using Modeller.” In Current Protocols in Bioinformatics. John Wiley and Sons. Weinkam, Patrick, Jaume Pons, and Andrej Sali. 2012. “Structure-Based Model of Allostery Predicts Coupling between Distant Sites.” Proceedings of the National Academy of Sciences of the United States of America 109 (13): 4875–80. Williams, Christopher J., Jeffrey J. Headd, Nigel W. Moriarty, Michael G. Prisant, Lizbeth L. Videau, Lindsay N. Deis, Vishal Verma, et al. 2018. “MolProbity: More and Better Reference Data for Improved All-Atom Structure Validation.” Protein Science: A Publication of the Protein Society 27 (1): 293–315. Wriggers, Willy. 2012. “Conventions and Workflows for Using Situs.” Acta Crystallographica. Section D, Biological Crystallography 68 (Pt 4): 344–51. Yang, Jianyi, Ivan Anishchenko, Hahnbeom Park, Zhenling Peng, Sergey Ovchinnikov, and David Baker. 2020. “Improved Protein Structure Prediction Using Predicted Interresidue Orientations.” Proceedings of the National Academy of Sciences of the United States of America 117 (3): 1496–1503. Yang, Jianyi, Renxiang Yan, Ambrish Roy, Dong Xu, Jonathan Poisson, and Yang Zhang. 2015. “The I-TASSER Suite: Protein Structure and Function Prediction.” Nature Methods 12 (1): 7–8. Yu, Jinchao, Geraldine Picord, Pierre Tuffery, and Raphael Guerois. 2015. “HHalign-Kbest: Exploring Sub-Optimal Alignments for Remote Homology Comparative Modeling.” Bioinformatics 31 (23): 3850–52. Zundert, G. C. P. van, and A. M. J. J. Bonvin. 2015. “DisVis: Quantifying and Visualizing Accessible Interaction Space of Distance-Restrained Biomolecular Complexes.” Bioinformatics 31 (19): 3222–24./p> Zundert, Gydo C. P. van, Adrien S. J. Melquiond, and Alexandre M. J. J. Bonvin. 2015. “Integrative Modeling of Biomolecular Complexes: HADDOCKing with Cryo-Electron Microscopy Data.” Structure 23 (5): 949–60. Integrative Modeling Validation Package : Version 1.0 Supported by
or empty div as the final element following the last floated div (within the #container) if the #footer is removed or taken out of the #container */
+ clear:both;
+ height:0;
+ font-size: 1px;
+ line-height: 0px;
+}
+ body {
+ background-color: #669966;
+ margin: 0;
+ padding: 0;
+}
+ .home-header {
+ display: flex;
+}
+ .dropdown-submenu {
+ position: relative;
+}
+ .dropdown-submenu a::after {
+ transform: rotate(-90deg);
+ position: absolute;
+ right: 6px;
+ top: .8em;
+}
+ .dropdown-submenu .dropdown-menu {
+ top: 0;
+ left: 100%;
+ margin-left: .1rem;
+ margin-right: .1rem;
+}
diff --git a/css/main.css b/static/css/main.css
similarity index 100%
rename from css/main.css
rename to static/css/main.css
diff --git a/css/old.css b/static/css/old.css
similarity index 100%
rename from css/old.css
rename to static/css/old.css
diff --git a/images/PDBDEV_00000001.png b/static/images/PDBDEV_00000001.png
similarity index 100%
rename from images/PDBDEV_00000001.png
rename to static/images/PDBDEV_00000001.png
diff --git a/images/PDBDEV_00000001_resize.png b/static/images/PDBDEV_00000001_resize.png
similarity index 100%
rename from images/PDBDEV_00000001_resize.png
rename to static/images/PDBDEV_00000001_resize.png
diff --git a/images/PDBDEV_00000001_resize_1.png b/static/images/PDBDEV_00000001_resize_1.png
similarity index 100%
rename from images/PDBDEV_00000001_resize_1.png
rename to static/images/PDBDEV_00000001_resize_1.png
diff --git a/images/PDBDEV_00000001_transparent.png b/static/images/PDBDEV_00000001_transparent.png
similarity index 100%
rename from images/PDBDEV_00000001_transparent.png
rename to static/images/PDBDEV_00000001_transparent.png
diff --git a/images/PDBDEV_00000004.png b/static/images/PDBDEV_00000004.png
similarity index 100%
rename from images/PDBDEV_00000004.png
rename to static/images/PDBDEV_00000004.png
diff --git a/images/PDBDEV_00000004_resize.png b/static/images/PDBDEV_00000004_resize.png
similarity index 100%
rename from images/PDBDEV_00000004_resize.png
rename to static/images/PDBDEV_00000004_resize.png
diff --git a/images/PDBDEV_00000004_resize_1.png b/static/images/PDBDEV_00000004_resize_1.png
similarity index 100%
rename from images/PDBDEV_00000004_resize_1.png
rename to static/images/PDBDEV_00000004_resize_1.png
diff --git a/images/PDBDEV_00000004_transparent.png b/static/images/PDBDEV_00000004_transparent.png
similarity index 100%
rename from images/PDBDEV_00000004_transparent.png
rename to static/images/PDBDEV_00000004_transparent.png
diff --git a/images/PDBDEV_00000014_resize.png b/static/images/PDBDEV_00000014_resize.png
similarity index 100%
rename from images/PDBDEV_00000014_resize.png
rename to static/images/PDBDEV_00000014_resize.png
diff --git a/images/PDBDEV_00000014_resize_1.png b/static/images/PDBDEV_00000014_resize_1.png
similarity index 100%
rename from images/PDBDEV_00000014_resize_1.png
rename to static/images/PDBDEV_00000014_resize_1.png
diff --git a/images/PDBDEV_00000014_transparent.png b/static/images/PDBDEV_00000014_transparent.png
similarity index 100%
rename from images/PDBDEV_00000014_transparent.png
rename to static/images/PDBDEV_00000014_transparent.png
diff --git a/images/favicon.jpg b/static/images/favicon.jpg
similarity index 100%
rename from images/favicon.jpg
rename to static/images/favicon.jpg
diff --git a/images/logo11.png b/static/images/logo11.png
similarity index 100%
rename from images/logo11.png
rename to static/images/logo11.png
diff --git a/images/logon.png b/static/images/logon.png
similarity index 100%
rename from images/logon.png
rename to static/images/logon.png
diff --git a/images/rcsb_logo.png b/static/images/rcsb_logo.png
similarity index 100%
rename from images/rcsb_logo.png
rename to static/images/rcsb_logo.png
diff --git a/images/search.svg b/static/images/search.svg
similarity index 100%
rename from images/search.svg
rename to static/images/search.svg
diff --git a/images/wwpdb-logo11.png b/static/images/wwpdb-logo11.png
similarity index 100%
rename from images/wwpdb-logo11.png
rename to static/images/wwpdb-logo11.png
diff --git a/js/bootstrap.min.js b/static/js/bootstrap.min.js
similarity index 100%
rename from js/bootstrap.min.js
rename to static/js/bootstrap.min.js
diff --git a/js/bootstrap3-typeahead.min.js b/static/js/bootstrap3-typeahead.min.js
similarity index 100%
rename from js/bootstrap3-typeahead.min.js
rename to static/js/bootstrap3-typeahead.min.js
diff --git a/js/bootstrap4.1.3.min.js b/static/js/bootstrap4.1.3.min.js
similarity index 100%
rename from js/bootstrap4.1.3.min.js
rename to static/js/bootstrap4.1.3.min.js
diff --git a/js/jquery-3.3.1.min.js b/static/js/jquery-3.3.1.min.js
similarity index 100%
rename from js/jquery-3.3.1.min.js
rename to static/js/jquery-3.3.1.min.js
diff --git a/js/jquery.min.js b/static/js/jquery.min.js
similarity index 100%
rename from js/jquery.min.js
rename to static/js/jquery.min.js
diff --git a/js/main.js b/static/js/main.js
similarity index 100%
rename from js/main.js
rename to static/js/main.js
diff --git a/js/popper1.12.9.min.js b/static/js/popper1.12.9.min.js
similarity index 100%
rename from js/popper1.12.9.min.js
rename to static/js/popper1.12.9.min.js
diff --git a/templates/about_validation.html b/templates/about_validation.html
new file mode 100644
index 00000000..429d0d6c
--- /dev/null
+++ b/templates/about_validation.html
@@ -0,0 +1,67 @@
+{% extends "layout.html" %}
+
+
+
+
+{% block navbar %}
+
+
+ {% include 'static_navbar.j2' %}
+{% endblock %}
+
+{% block body %}
+
+
+ Data quality?
-
- Data quality?
+
+
+ SAS:Scattering profile
+
+
+ SAS:Scattering profile?
+
+
+ Key experimental estimates?
+
+
+ Flexibility analysis ?
+
+ {% endif %}
+
+ Pair-distance distribution analysis?
+
+ Guinier analysis ?
+
+ Fit of model(s) to SAS data
+ {% if sasdb_sascif|length < 1 %}
+
+ Cormap p-value analysis of fits ?
+
+ {% if number_of_fits > 0 %}
+ Fit of model(s) to CX-MS data
+ Restraint types
+ {% if cx_ertypes is none %}
+
+
+
+
+
+
+
+ {% for i in range(cx_ertypes['data']|length) %}
+
+ Type #
+
+ {% for k in cx_ertypes['columns'] %}
+
+ {{ k }}
+
+ {% endfor %}
+
+
+ {% endfor %}
+ {{ i }}
+ {% for v in cx_ertypes['data'][i] %}
+ {{ v }}
+ {% endfor %}
+ Satisfaction of restraints
+ {% if cx_stats is none %}
+
+
+
+
+
+
+ {% for sg, sgv in cx_stats.items() %}
+ {% set state_group_loop = loop %}
+ {% for st, stv in sgv.items() %}
+ {% for mg, mgv in stv.items() %}
+ {% for k, v in mgv["cx_stats"].items() %}
+ {% set rowspan = mgv["cx_stats"]|length %}
+
+ State group
+
+
+ State
+
+
+ Model group
+
+
+ # of Deposited models/Total
+
+
+ Restraint group type
+
+
+ Satisfied (%)
+
+
+ Violated (%)
+
+
+ Count
+
+
+ {% if loop.index == 1 %}
+
+ {% endfor %}
+ {% endfor %}
+ {% endfor %}
+ {% endfor %}
+ {{ sg }}
+ {{ st }}
+ {{ mg }}
+ {{ mgv["ens_stats"]["num_models_deposited"] ~ "/" ~ mgv["ens_stats"]["num_models"]}}
+ {% endif %}
+ {{ k | replace("/", "/
+
")}}{{ v["Satisfied"] }}
+ {{ v["Violated"] }}
+ {{ v["Count"] }}
+ Integrative Structure Validation Report ?
+ {{ date }}
+
+
+
+
+
+
+
+ {% for k, v in ranked_id_list %}
+
+
+
+ {% endfor %}
+ {{ k }}
+ {{ v }}
+
+
+ Structure Title
+ {{ Molecule }}
+
+
+ Structure Authors
+ {{ Authors }}
+
+ Overall quality ?
+
+
+ Model Quality: MolProbity Analysis
+
+ {% else %}
+
+ Model Quality: Excluded Volume Analysis
+
+ {% endif %}
+
+ {% for i in range([NumModels, MAXPLOTS] | min) %}
+
+
+
+
+ Ensemble information
+ ?
+
+
+ Summary
+ ?
+
+
+ Entry composition?
+
+
+ Datasets used for modeling ?
+
+
+ Representation ?
+
+ Methodology and software ?
+
+
+
+
+
+ Data quality ?
+
+
+
+ {% if ( sas|length > 0 ) and ( sasdb_sascif|length > 0 ) %}
+
+
+ Scattering profile ?
+
+
+
+
+
+ Key experimental estimates ?
+
+
+ Flexibility analysis ?
+
+ {% for i in range ( sasdb_code_html|length ) %}
+
+
+
+ Pair-distance distribution analysis ?
+
+
+
+
+ Guinier analysis ?
+
+
+
+
+ SAS:Scattering profile
+
+
+
+ {{ Unique_dataset[i] }}
+
+
+
+
+ Model quality ?
+
+ Excluded volume satisfaction ?
+
+ Excluded volume satisfaction for the models in the entry are listed below.
+
+ {{ write_table(excluded_volume) }}
+
+ {% else %}
+
+
+
+
+ Standard geometry: bond outliers?
+
+
+ {% if molp_b|length > 1 %}
+ There are {{ bond }} bond outliers in this entry. A summary is provided below, and a detailed list of outliers can be found here.
+ {{ write_table(molp_b) }}
+ {% else %}
+ Bond length outliers can not be evaluated for this model
+ {% endif %}
+
+
+
+ Standard geometry: angle outliers?
+
+ {% if molp_a|length > 1 %}
+ There are {{ angle }} angle outliers in this entry. A summary is provided below, and a detailed list of outliers can be found here.
+ {{ write_table(molp_a) }}
+ {% else %}
+ Bond angle outliers do not exist or can not be evaluated for this model
+ {% endif %}
+
+
+
+ Too-close contacts?
+
+
+ Torsion angles: Protein backbone?
+
+
+
+ Torsion angles: Protein sidechains ?
+
+ In the following table, sidechain outliers are listed. The Analysed column shows the number of residues for which the sidechain conformation was analysed.
+
+ {% if rotascore|length > 1 %}
+ {{ write_table(rotascore) }}
+
+ Detailed list of outliers are tabulated below.
+
+ {% if rotalist|length > 1 %}
+ {{ write_table(rotalist) }}
+
+ {% endif %}
+ {% endif %}
+
+
+ {% endif %}
+
+
+ {% endif %}
+
+
+
+
+ Fit of model to data used for modeling ?
+
+
+ {% if sas|length > 0 %}
+ {% if sasdb_sascif|length > 0 %}
+
+ Fit of model(s) to SAS data
+
+
+
+ χ² goodness of fit and cormap analysis ?
+
+
+ {% if number_of_fits > 0 %}
+
+
+ {{ Unique_dataset[i] }}
+
+
+
+
+
+ Fit of model to data used for validation ?
+
+
+
+
+
+
+ Released Entries: 36
- Validation report
-
-
- Validation report
-
-
- Validation report
-
-
- Validation report
-
-
- Validation report
-
-
- Validation report
-
-
- Validation report
-
-
- Validation report
-
-
- Validation report
-
-
-
-
-
-
+
+
+
+ {% for i in range(list_to_write[0]|length) %}
+
+
+ {% for j in range(1, list_to_write|length) %}
+
+ {{ list_to_write[0][i] }}
+
+ {% endfor %}
+
+ {% for i in range(list_to_write[j]|length) %}
+
+ {% endfor %}
+
+
+ {% endfor %}
+
+ {% endfor %}
+ {% endif %}
+{% endmacro %}
+
+
+
+
+{% macro insert_sas_plot(id, sasdb_id, plot_name) %}
+ Overall quality?
-
-
-
-
-
+ {% for k, v in ranked_id_list %}
+
+
+
+ {% endfor %}
+ {{ k }}
+ {{ v }}
+
+
+ Structure Title
+ {{ Molecule }}
+
+
+ {% if Citation_Title is not none %}
+ Structure Authors
+ {{ Authors }}
+
+
+ Publication Title
+ {{ Citation_Title }}
+
+
+ {% endif %}
+ Authors
+ {{ Citation_Authors }}
+
+ Overall quality
+ ?
+
+
+
+
+
+
+ Entry composition?
-
-
- Datasets used for modeling ?
-
- Representation ?
-
+ Ensemble information?
+
+ {% if num_ensembles < 2 %}
+
+ Summary?
+
+
+ Entry composition?
+
+ {% if number_of_molecules < 2 %}
+ There is {{number_of_molecules|int}} unique type of model in this entry.
+ {% else %}
+ There are {{number_of_molecules|int}} unique types of models in this entry.
+ {% endif %}
+
+ {% if model_names|length < 2 %}
+ This model is titled {{model_names[0]}} respectively.
+ {% else %}
+ These models are titled {{model_names|join(', ')}} respectively.
+ {% endif %}
+
+ {{ write_table(Entry_list) }}
+ Datasets used for modeling ?
+ {% if number_of_datasets < 2 %}
+ There is {{number_of_datasets|int}} unique dataset used to build the models in this entry.
+ {% else %}
+ There are {{number_of_datasets|int}} unique datasets used to build the models in this entry.
+ {% endif %}
+
+ {{ write_table(Datasets_list) }}
+ Representation ?
+
+ Methodology and software?
+
+
+ {% if Protocols_number < 2 %}
+ This entry is a result of {{Protocols_number|int}} distinct protocol.
+ {% else %}
+ This entry is a result of {{Protocols_number|int}} distinct protocols.
+ {% endif %}
+
+ {{ write_table(Sampling_list) }}
+
+
+ {% if number_of_software < 1 %}
+ Software packages used for modeling were either not reported or not used.
+ {% elif number_of_software < 2 %}
+ There is {{number_of_software|int}} software package reported in this entry.
+ {% else %}
+ There are {{number_of_software|int}} software packages reported in this entry.
+ {% endif %}
+
+ {{ write_table(soft_list) }}
+ Model quality?
- Model quality?
+ Excluded volume satisfaction?
+
+ Excluded volume satisfaction for the models in the entry are listed below.
+
+ {{ write_table(excluded_volume) }}
+
+
+ {% else %}
+
+ Standard geometry: bond outliers?
+
+
+ {% if molp_b|length > 1 %}
+ There are {{ bond|int }} bond outliers in this entry ({{ "%.1f"|format(bond / total_bonds * 100) }}% of all bonds). A summary is provided below, and a detailed list of outliers can be found here.
+
+
+ {{ write_table(molp_b) }}
+
+ {% else %}
+ Bond length outliers can not be evaluated for this model
+
+ {% endif %}
+
+
+ Standard geometry: angle outliers?
+
+
+ {% if molp_a|length > 1 %}
+ {% if total_angles|int > 1 %}
+
+ There are {{ angle|int }} angle outliers in this entry ({{ "%.1f"|format(angle / total_angles * 100) }}% of all angles). A summary is provided below, and a detailed list of outliers can be found here.
+
+ {% else %}
+
+ There are {{ angle|int }} angle outliers in this entry. A summary is provided below, and a detailed list of outliers can be found here.
+
+ {% endif %}
+
+ {{ write_table(molp_a) }}
+
+ {% else %}
+ Bond angle outliers do not exist or can not be evaluated for this model
+
+ {% endif %}
+
+ {% if clashscore_list|length > 1 %}
+
+ Too-close contacts?
+
+
+ Torsion angles: Protein backbone?
+
+
+ Torsion angles : Protein sidechains?
+
+ In the following table, sidechain rotameric outliers are listed. The Analysed column shows the number of residues for which the sidechain conformation was analysed.
+
+ {{ write_table(rotascore) }}
+
+ {% endif %}
+
+ {% if rotalist|length > 1 %}
+ Detailed list of outliers are tabulated below.
+
+ {{ write_table(rotalist) }}
+
+ {% endif %}
+ {% endif %}
+ {% endif %}
+ Released Entries: 36
-
-
National Science FoundationSummary of integrative structure determination of {{complex_name}} (
+ {%- for k, v in ranked_id_list -%}
+ {{ comma()}}{{ k ~": " ~ v }}
+ {%- endfor -%}
+ )
+
+
+
+
+ 1. Model Composition
+
+
+
+ Entry composition
+
+ {{ write_bullet(Subunits) }}
+
+
+
+ Datasets used for modeling
+
+ {{ write_bullet(datasets) }}
+
+
+
+ 2. Representation
+
+
+
+ Resolution
+
+ {{ resolution }}
+
+
+
+ {% if Rigid_Body|int > 0 %}
+ Number of rigid bodies, flexible units
+ {{ Rigid_Body }}, {{ Flexible_Unit }}
+
+
+ {% endif %}
+
+ {% if Flexible_Unit|int > 0 %}
+ Rigid bodies
+
+ {{ write_bullet(RB) }}
+
+
+
+ {% endif %}
+
+ Flexible units
+
+ {{ write_bullet(flex) }}
+
+
+
+ Structural coverage (rigid bodies)
+ {{ struc }}
+
+
+ 3. Restraints
+
+
+
+ Physical principles
+
+ {{ write_bullet(physics) }}
+
+
+
+
+ Experimental data
+
+ {% for i in range(restraint_info|length) %}
+ - {{ restraint_info[i][0] }}
+
+ {% endfor %}
+
+
+
+ {% if sampling_validation is not none %}
+ 4. Validation
+
+
+
+ {% endif %}
+
+ Sampling validation
+
+ {{ write_bullet(sampling_validation) }}
+
+
+
+
+ Number of ensembles
+ {{ num_ensembles }}
+
+
+ Number of models in ensembles
+ {{ models }}
+
+
+ Number of deposited models
+ {{ number_of_models }}
+
+
+ Model precision (uncertainty of models)
+ {{ model_precision }}
+
+
+ {% if assess_atomic_segments %}
+ Data quality
+
+ {{ write_bullet(Data_quality) }}
+
+
+
+ {% endif %}
+ {% if disclaimer > 0 %}
+ Model quality: assessment of atomic segments
+
+ {{ write_bullet(assess_atomic_segments) }}
+
+
+
+ {% elif (disclaimer < 1 ) and (assess_excluded_volume is not none) %}
+ Model quality
+
+ Based on existing standards, there are no model quality assessments for this entry.
+
+
+
+ {% endif %}
+ Model quality: assessment of excluded volume
+
+ {{ assess_excluded_volume }}
+
+
+
+ Fit to data used for modeling
+
+ {{ write_bullet(validation_input) }}
+
+
+
+
+ Fit to data used for validation
+
+ {{ write_bullet(cross_validation) }}
+
+
+
+ {% for i in range(method_info['Step number']|length) %}
+ 5. Methodology and Software
+
+
+
+ {{ i + 1 }}. Method
+ {{ method_info['Method type'][i] }}
+
+
+ {% if method_info['Method description'][i] is not none %}
+ Name
+
+ {{ method_info['Method name'][i] }}
+
+
+
+ {% endif %}
+ {% if method_info['Number of computed models'][i] is not none %}
+ Description
+
+ {{ method_info['Method description'][i] }}
+
+
+
+ {% endif %}
+ {% endfor %}
+ Number of computed models
+
+ {{ method_info['Number of computed models'][i] }}
+
+
+
+
+
+
+
diff --git a/templates/supplementary_template.html b/templates/supplementary_template.html
deleted file mode 100644
index b40a3426..00000000
--- a/templates/supplementary_template.html
+++ /dev/null
@@ -1,244 +0,0 @@
-
-
-
-
-
-
-
-
-
-
- Software
+
+ {{ write_bullet(software) }}
+
+ Summary of integrative structure determination of {{complex_name}} ({{ID_T}})
-
-
-
- 1. Model Composition
-
-
-
- Entry composition
-
-
-
-
-
- Datasets used for modeling
-
-
-
-
-
- 2. Representation
-
-
-
- Atomic structural coverage
- {{struc}}
-
-
- Number of rigid bodies, flexible units
- {{Rigid_Body}}, {{Flexible_Unit}}
-
-
- Rigid bodies
-
-
-
-
-
- Flexible units
-
-
-
-
-
-
- Resolution
-
-
-
-
-
- 3. Restraints
-
-
-
- Physical principles
- {{physics}}
-
-
-
- Experimental data
-
-
-
-
-
-
- 4. Validation
-
- Sampling validation
-
-
-
-
-
-
-
- Clustering algorithm ,clustering feature
- {{clustering}}, {{feature}}
-
-
-
- Number of ensembles
- {{num_ensembles}}
-
-
- Number of models in ensembles
- {{models}}
-
-
- Model precision (uncertainty of models)
- {{model_precision}}
-
-
- Data quality
-
-
-
-
-
- Model quality: assessment of atomic segments
-
-
-
- Model quality: assessment of excluded volume
-
-
-
-
-
-
- Fit to data used for modeling
-
-
-
-
-
-
- Fit to data used for validation
-
-
-
-
-
- 5. Methodology and Software
-
-
-
- Method
- {{method_type}}
-
-
- Name
-
- {{method}}
-
-
-
- Details
-
-
-
-
-
-
-
-
-
diff --git a/templates/template_pdf.html b/templates/template_pdf.html
deleted file mode 100644
index 43f515f2..00000000
--- a/templates/template_pdf.html
+++ /dev/null
@@ -1,986 +0,0 @@
-
-
-
-
-
-
-
-
-
- Software
-
-
-
- Integrative Structure Validation Report ?
- {{ date }}
-
-
-
-
-
-
-
-
-
-
- PDB ID
- {{ ID_T }}
-
-
- Structure Name
- {{ Molecule }}
-
-
- Publication Title
- {{ Title }}
-
-
- Authors
- {{ Authors }}
- Overall quality ?
- Model Quality: Molprobity Analysis
- {% else %}
- Model Quality: Excluded Volume Analysis
- {% endif %}
-
- {% for i in range(NumModels) %}
-
- {% endfor %}
- {% elif disclaimer == 1 -%}
- Entry composition ?
-
-
- Datasets used for modeling ?
-
- Representation ?
-
-
-
-
- Data quality ?
-
-
- {% if ( sas|length > 0 ) and ( sasdb_sascif|length > 0 ) %}
-
-
-
- Scattering profile
-
-
- ?
-
-
-
- Key experimental estimates ?
-
-
-
- Flexibility analysis ?
-
-
- {% for i in range ( sasdb_code_html|length ) %}
-
- Pair-distance distribution analysis ?
-
-
-
- Guinier analysis ?
-
-
- SAS:Scattering profile
-
-
-
-
- Model quality ?
-
- Excluded volume satisfaction ?
-
-
- Excluded volume satisfaction for the models in the entry are listed below.
-
-
-
- Fit of model to data used for validation ?
-
-
-
-
+ User guide
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ User guide
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
National Science Foundation