Skip to content

Commit

Permalink
Merge pull request #24 from h-2/cleanup
Browse files Browse the repository at this point in the history
Cleanup & documentation
  • Loading branch information
h-2 authored Feb 17, 2022
2 parents 1e3d708 + a7de322 commit 21b6a45
Show file tree
Hide file tree
Showing 24 changed files with 909 additions and 1,307 deletions.
35 changes: 34 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,36 @@
# B.I.O -- The Biological Input-Output library
# B.I.O. – the Biological Input/Output library

B.I.O. is a C++ library for reading and writing files in the field of Bioinformatics and in particular sequence
analysis. It provides easy-to-use interfaces for the following formats:

* Plain I/O: plain-text, CSV, TSV, …
* Map I/O: SAM, BAM, …
* Seq I/O: FastA, FastQ, …
* Var I/O: VCF, BCF, …

The primary goal of this library is to offer higher level abstractions than the C libraries typically used in this
domain (e.g. htslib) while at the same time offering an excellent performance.
It hopes to offer a modern, well-integrated design that covers most typical I/O use-cases Bioinformaticians encounter.

The library relies strongly on *Modern C++* and plays well with other Modern C++ libraries.

Please see the [online documentation](TODO) for more details.

## Current state

The library is currently under heavy development. There is no release, yet, and all interfaces are subject to change.

## Dependencies

| | requirement | version | comment |
|-------------------|-------------------------------------------|----------|---------------------------------------------|
|**compiler** | [GCC](https://gcc.gnu.org) | ≥ 10 | no other compiler is currently supported! |
|**required libs** | [SeqAn3](https://github.com/seqan/seqan3) | ≥ 3 | |
|**optional libs** | [zlib](https://github.com/madler/zlib) | ≥ 1.2 | required for `*.gz` and `.bam` file support |
| | [bzip2](https://www.sourceware.org/bzip2) | ≥ 1.0 | required for `*.bz2` file support |

## Usage

* Using the library entails no build-steps, it is header-only and can be used as-is.
* A single-header version is available (TODO).
* CMake files are provided for easy integration into applications (and automatic detection/inclusion of dependencies).
367 changes: 173 additions & 194 deletions include/bio/format/bcf_input_handler.hpp

Large diffs are not rendered by default.

23 changes: 8 additions & 15 deletions include/bio/format/bcf_output_handler.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -555,13 +555,7 @@ class format_output_handler<bcf> : public format_output_handler_base<format_outp
}

//!\brief Overload for n_fmt.
void set_core_n_fmt(auto & field)
{
if constexpr (detail::genotypes_vcf_style_writer_concept<decltype(field)>)
record_core.n_fmt = std::ranges::distance(detail::get_first(field));
else
record_core.n_fmt = detail::range_or_tuple_size(field);
}
void set_core_n_fmt(auto & field) { record_core.n_fmt = detail::range_or_tuple_size(field); }
//!\}

/*!\name Field writers
Expand Down Expand Up @@ -651,7 +645,7 @@ class format_output_handler<bcf> : public format_output_handler_base<format_outp
// explicit integer width given in header
if (hdr_entry.other_fields.find("IntegerBits") != hdr_entry.other_fields.end())
{
desc = detail::dynamic_type_id_2_type_descriptor(hdr_entry.type);
desc = detail::value_type_id_2_type_descriptor(hdr_entry.type);
if (!detail::type_descriptor_is_int(desc)) // ignore header value if it isn't intX
desc = c_desc;
}
Expand All @@ -665,7 +659,7 @@ class format_output_handler<bcf> : public format_output_handler_base<format_outp

if (verify_header_types)
{
detail::bcf_type_descriptor header_desc = detail::dynamic_type_id_2_type_descriptor(hdr_entry.type);
detail::bcf_type_descriptor header_desc = detail::value_type_id_2_type_descriptor(hdr_entry.type);
if (desc != header_desc || !detail::type_descriptor_is_int(desc) ||
!detail::type_descriptor_is_int(header_desc))
{
Expand Down Expand Up @@ -707,7 +701,7 @@ class format_output_handler<bcf> : public format_output_handler_base<format_outp
var_io::header::info_t const & info = header->infos.at(header->idx_to_info_pos().at(idx));

/* VALUE */
if constexpr (detail::is_dynamic_type<value_t>)
if constexpr (detail::is_info_element_value_type<value_t>)
{
auto func = [&](auto & param) { write_typed_data(param, get_desc(param, info)); };
std::visit(func, value);
Expand Down Expand Up @@ -950,15 +944,15 @@ class format_output_handler<bcf> : public format_output_handler_base<format_outp
}
};

if constexpr (detail::is_dynamic_vector_type<value_t>)
if constexpr (detail::is_genotype_element_value_type<value_t>)
std::visit(func, value);
else
func(value);
}

//!\brief Overload for GENOTYPES; genotypes_bcf_style.
//!\brief Overload for GENOTYPES.
template <std::ranges::forward_range range_t>
requires(detail::genotype_bcf_style_writer_concept<std::ranges::range_reference_t<range_t>>)
requires(detail::genotype_writer_concept<std::ranges::range_reference_t<range_t>>)
void write_field(vtag_t<field::genotypes> /**/, range_t && range)
{
for (auto && genotype : range)
Expand All @@ -967,13 +961,12 @@ class format_output_handler<bcf> : public format_output_handler_base<format_outp

//!\brief Overload for GENOTYPES; tuple of pairs.
template <typename... elem_ts>
requires(detail::genotype_bcf_style_writer_concept<elem_ts> &&...)
requires(detail::genotype_writer_concept<elem_ts> &&...)
void write_field(vtag_t<field::genotypes> /**/, std::tuple<elem_ts...> & tup) // TODO add const version
{
auto func = [&](auto &... field) { (write_genotypes_element(field), ...); };
std::apply(func, tup);
}
// TODO vcf-style
//!\}

//!\brief Write the header.
Expand Down
Loading

0 comments on commit 21b6a45

Please sign in to comment.