Skip to content
xapple edited this page Feb 27, 2012 · 14 revisions

Enhancement list

A list of my ideas for future work on the library and suggestions that were made to me at different occasions follows:

  1. Support the loading of arbitrary TSV/CSV files. The user needs to specify a tuple containing the column names in the right order.

  2. Support the loading of microsoft XLS format in a similar way to the TSV/CSV files.

  3. Maybe add a option similar to liftOver that would use a list of insertion and deletion downloaded from GenRep to convert a track from one assembly to an other.

  4. Selections are always chromosome specific. One might have the need to make same selection on multiple chromosomes however ?

  5. The remove() method should also be able to drop specific entries in a given selection.

  6. When the assembly is set, automatically crop the features that are too big and larger than the chromosome length.

  7. The ucsc_to_ensembl function should probably also rename all the chromosomes to the new standard and not only add +1 to every start.

  8. Automatically perform an SQL vaccum at the end when closing the database ?

  9. Maybe add an option discard_tables=False for the t.assembly = 'hg19' command. Or maybe transform the attribute into a function ?

  10. Use the SQL views technology for implementing relational tracks where a field needs to be serialized ?

  11. When passing only one feature to the write function instead of an iterable of features, should we auto-detect this ?

  12. Add the ability to pass pipes/file-like objects instead of file paths as the input for the convert function ?

  13. If a convert() fails, should we remove the destination_path or maybe should we just overwrite by default ?

  14. Add a unit test to check that missing fields in a write are filed with their default types.

  15. Make the extension .sql a constant variable in the package, such that the switching to .db can be done in one place only.

  16. If you use the package offline, the error message when GenRep fails isn't descriptive enough.

  17. Add a mechanism that checks that files are indeed sorted by chromosome in track.simple ?

  18. Why not add a warning/logging system that would inform the user of which tables were deleted and such things ?

  19. Added a better __repr__ method to the SuperRow object as well as a better compare() method.

  20. Change the mechanism for reading all feature at once by t.read() with an intelligent SQL select statement similar to the one found in t.search().

  21. When converting from SQL to a text format, the assembly parameter should also be used.

  22. When loading a gzip compressed track, the mechanism for rewriting the file after modification is missing.

  23. When doing a read() instead of read('chr1'), you can't use feature['start'] anymore.

  24. The line base_coverage = sum([gene[1] - gene[0] for gene in all_genes]) doesn't work with a GFF because indexes become 3 and 4.

  25. Converting a signal track from the SQL format to the BED format fails in a subtle way involving a missing index in the field order.

Clone this wiki locally