Skip to content

Commit

Permalink
Fix: Allow non-PHRED quality scores e.g. Solexa
Browse files Browse the repository at this point in the history
Solexa FASTQ reads are not PHRED scores, and as such can be negative.
Relax some of the checks regarding qualities.
  • Loading branch information
jakobnissen committed Jul 4, 2023
1 parent ee0a192 commit 198d78a
Show file tree
Hide file tree
Showing 5 changed files with 21 additions and 10 deletions.
8 changes: 8 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,14 @@ All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/)
and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.html).

## [2.1.2]
### Bugfix
* Allow non-PHRED quality scores, such as Solexa scores, which can be negative (#104)

## [2.1.0]
### Bugfix
* Fix doc examples for writer with do-syntax (#100)

## [2.1.0]
### Additions
* Implement `Base.copy!` for `FASTQRecord` and `FASTARecord`
Expand Down
2 changes: 1 addition & 1 deletion Project.toml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
name = "FASTX"
uuid = "c2308a5c-f048-11e8-3e8a-31650f418d12"
authors = ["Sabrina J. Ward <[email protected]>", "Jakob N. Nissen <[email protected]>"]
version = "2.1.1"
version = "2.1.2"

[weakdeps]
BioSequences = "7e6ae17a-c86d-528c-b3b9-7f778a29fe59"
Expand Down
11 changes: 5 additions & 6 deletions src/fastq/quality.jl
Original file line number Diff line number Diff line change
Expand Up @@ -9,11 +9,12 @@
"""
QualityEncoding(range::StepRange{Char}, offset::Integer)
FASTQ PHRED quality encoding scheme. `QualityEncoding` objects are used to
FASTQ quality encoding scheme. `QualityEncoding` objects are used to
interpret the quality scores of FASTQ records.
`range` is a range of allowed ASCII chars in the encoding, e.g. `'!':'~'` for
the most common encoding scheme.
The offset is the PHRED offset.
The offset is the ASCII offset, i.e. a character with ASCII value `x` encodes
the value `x - offset`.
See also: [`quality_scores`](@ref)
Expand Down Expand Up @@ -44,9 +45,7 @@ struct QualityEncoding
elseif high > 127
error("Quality encoding only works with ASCII charsets")
elseif offset < 0
error("Quality offset must be non-negative")
elseif low < offset
error("Low end of in quality encoding range cannot be less than offset")
error("Quality offset must be non-negative")
else
return new(low, high, off)
end
Expand All @@ -57,7 +56,7 @@ end
const SANGER_QUAL_ENCODING = QualityEncoding('!':'~', 33)

"Solexa (Solexa+64) quality score encoding"
const SOLEXA_QUAL_ENCODING = QualityEncoding('@':'~', 64)
const SOLEXA_QUAL_ENCODING = QualityEncoding(';':'~', 64)

"Illumina 1.3 (Phred+64) quality score encoding"
const ILLUMINA13_QUAL_ENCODING = QualityEncoding('@':'~', 64)
Expand Down
3 changes: 2 additions & 1 deletion test/fastq/TestFASTQ.jl
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ module TestFASTQ
const OFFSET = 33

using ReTest
using FASTX: FASTQ
using FASTX.FASTQ: Record, Reader, Writer, identifier, description,
sequence, quality, quality_scores, QualityEncoding, quality_header!, validate_fastq
using BioSequences: LongDNA, LongRNA, LongAA, @dna_str, @rna_str, @aa_str
Expand Down Expand Up @@ -41,4 +42,4 @@ include("record.jl")
include("io.jl")
include("specimens.jl")

end # module TestFASTQ
end # module TestFASTQ
7 changes: 5 additions & 2 deletions test/fastq/record.jl
Original file line number Diff line number Diff line change
Expand Up @@ -191,7 +191,6 @@ end

# QualityEncoding
@test_throws Exception QualityEncoding('B':'A', 10)
@test_throws Exception QualityEncoding('A':'A', 90)
@test_throws Exception QualityEncoding('a':'A', 10)
@test_throws Exception QualityEncoding('Z':'Y', 10)
@test_throws Exception QualityEncoding('A':'B', -1)
Expand All @@ -208,6 +207,10 @@ end
@test_throws BoundsError quality_scores(records[2], 2:5)
@test_throws BoundsError quality_scores(records[2], 5:5)

# Solexa encoding is weird in thay it can be negative
rec = Record("abc", "TAG", [20, 0, -5]; offset=64)
@test collect(quality_scores(rec, FASTQ.SOLEXA_QUAL_ENCODING)) == [20, 0, -5]

# Custom quality encoding
CustomQE = QualityEncoding('A':'Z', 12)
good = parse(Record, "@a\naaaaaa\n+\nAKPZJO")
Expand Down Expand Up @@ -285,4 +288,4 @@ end
@test isequal(cp, records[2])
end

end # testset Record
end # testset Record

2 comments on commit 198d78a

@jakobnissen
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JuliaRegistrator
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Registration pull request created: JuliaRegistries/General/86847

After the above pull request is merged, it is recommended that a tag is created on this repository for the registered package version.

This will be done automatically if the Julia TagBot GitHub Action is installed, or can be done manually through the github interface, or via:

git tag -a v2.1.2 -m "<description of version>" 198d78ad96f3c8cca24d76b53cc39eeda70ed4d5
git push origin v2.1.2

Please sign in to comment.