-
Notifications
You must be signed in to change notification settings - Fork 3
Enhancement list
A list of my ideas for future work on the library and suggestions that were made to me at different occasions follows:
-
Support the loading of arbitrary TSV/CSV files. The user needs to specify a tuple containing the column names in the right order.
-
Support the loading of microsoft XLS format in a similar way to the TSV/CSV files.
-
Maybe add a option similar to liftOver that would use a list of insertion and deletion downloaded from GenRep to convert a track from one assembly to an other.
-
Selections are always chromosome specific. One might have the need to make same selection on multiple chromosomes however ?
-
The
remove()
method should also be able to drop specific entries in a given selection. -
When the
assembly
is set, automatically crop the features that are too big and larger than the chromosome length. -
The
ucsc_to_ensembl
function should probably also rename all the chromosomes to the new standard and not only add +1 to every start. -
Automatically perform an SQL vaccum at the end when closing the database ?
-
Maybe add an option
discard_tables=False
for the t.assembly = 'hg19' command. Or maybe transform the attribute into a function ? -
Use the SQL views technology for implementing relational tracks where a field needs to be serialized ?
-
When passing only one feature to the
write
function instead of an iterable of features, should we auto-detect this ? -
Add the ability to pass pipes/file-like objects instead of file paths as the input for the convert function ?
-
If a
convert()
fails, should we remove thedestination_path
or maybe should we just overwrite by default ? -
Add a unit test to check that missing fields in a
write
are filed with their default types. -
Make the extension
.sql
a constant variable in the package, such that the switching to.db
can be done in one place only. -
If you use the package offline, the error message when GenRep fails isn't descriptive enough.
-
Add a mechanism that checks that files are indeed sorted by chromosome in
track.simple
? -
Why not add a warning/logging system that would inform the user of which tables were deleted and such things ?
-
Added a better
__repr__
method to the SuperRow object as well as a better compare() method. -
Change the mechanism for reading all feature at once by
t.read()
with an intelligent SQL select statement similar to the one found int.search()
. -
When converting from SQL to a text format, the assembly parameter should also be used.
-
When loading a gzip compressed track, the mechanism for rewriting the file after modification is missing.
-
When doing a
read()
instead ofread('chr1')
, you can't use feature['start'] anymore. -
The line
base_coverage = sum([gene[1] - gene[0] for gene in all_genes])
doesn't work with a GFF because indexes become 3 and 4. -
Converting a signal track from the SQL format to the BED format fails in a subtle way involving a missing index in the field order.