-
Notifications
You must be signed in to change notification settings - Fork 0
Relations
xml-to-postgres
can split out repeating elements or element sets into subtables linked back to the main table through the use of foreign keys. To make this work, xml-to-postgres
assumes that the first column you define for a table contains a unique key for the row. If such a key is not available or the row has a composite key, you can have xml-to-postgres
generate unique id numbers by defining a column with the 'seri' option (see configuration options).
Subtables are controlled by the 'file' and 'norm' options that can be set on columns, both of which should contain a filename where the data will be written to. The 'file' option produces a subtable with a one-to-many relation with its parent table, so the subtable will have as its first column the first column of the parent table for use as a foreign key. The 'norm' option (short for normalize) produces a subtable with a many-to-one relation to the parent table, so the parent table will have an auto-generated id for the subtable row acting as a foreign key. With 'norm', the subtable will be deduplicated, so even when an entry appears many times in the source XML, each will only appear once in the subtable data. The 'file' and 'norm' options can also be used together, producing a many-to-many relation. In this case the filename in the 'file' option will contain data for the crosslink table (the foreign keys for both the parent table and the subtable) and the filename in the 'norm' option will contain the deduplicated subtable.
All three possibilities above can optionally be combined with a 'cols' option to specify multiple columns to be included in the subtable. In this case, xml-to-postgres
once again expects the first defined column for each table to be a unique key. As a special case, you can have a many-to-many relation with additional attributes on the crosslink table by putting the 'file' option on one subtable and then the 'norm' option on another subtable below it (see below for an example of this).
These examples extend the basic example used elsewhere in this wiki. It defines a main table 'books' with a unique id as its first column (as required by many of the examples).
Here we've added the 'file' option to the authors column. This allows a book to have multiple authors, which will be written out to the separate file 'authors.dump'. This file will then contain two columns, first the 'id' column of the books table and then the column with the author name.
name: books
file: books.dump
path: /catalog/book
cols:
- name: id
path: /
attr: id
- name: author
path: /author
file: authors.dump
Here we've added the 'norm' option to the authors column. With this we're back to a book having only a single author, but the author names will be deduplicated so that each author only appears once in 'authors.dump'. To make this work, xml-to-postgres
will generate a unique id for each author and store that as a foreign key in the 'author' column of the books table. The 'authors.dump' file will again contain two columns, this time the id column will be its own unique id and then again the column with the author name.
name: books
file: books.dump
path: /catalog/book
cols:
- name: id
path: /
attr: id
- name: author
path: /author
norm: authors.dump
Here we combine these possibilities by adding both the 'file' and the 'norm' options to the author column. This produces a many-to-many relation between books and authors, as it is in reality. The 'book_author.dump' file will then contain two columns with foreign keys, the first column having the id of the book and the second column having the generated id for the author. The 'authors.dump' file will once again contain two columns with first its own unique id and then the author name.
name: books
file: books.dump
path: /catalog/book
cols:
- name: id
path: /
attr: id
- name: author
path: /author
file: book_author.dump
norm: authors.dump
TODO: create examples for the options combined with 'cols'