Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace jpeg_decoder with zune-jpeg #1877

Closed
wants to merge 19 commits into from

Conversation

Shnatsel
Copy link
Contributor

@Shnatsel Shnatsel commented Mar 10, 2023

Like #1876 but replaces jpeg_decoder with zune-jpeg outright without breaking the API.

TODO:

  • Wait for the zune-jpeg release with the Error impl (currently uses git)
  • Deactivate the failing hash-based tests (JPEG is lossy and the hash will change due to slight changes in the implementation)
  • Get zune-jpeg to support ICC profile extraction zune-jpeg: Support extracting the ICC profile etemesi254/zune-image#101
  • Support lossless JPEG somehow for backwards compatibility

Newly added features in this PR:

  • Limits on maximum dimensions

Out of scope of this PR:

  • Support for memory limit (not supported even by jpeg_decoder, zune does support it but it can be added later)
  • Scaling during decoding (not supported by zune, supporting would require lots of unsafe code)
  • Anything else that's not already supported by jpeg_decoder

@fintelia
Copy link
Contributor

fintelia commented Mar 10, 2023

This doesn't handle setting limits, and that may prove challenging with the current design. If I understand the draft code then it right now is hard-coded to use zune-jpeg's default limits. The specific challenge to fixing that is that ImageDecoder::set_limits may be called after new, which means that any allocations in the constructor must be of bounded size and any limits passed to zune-jpeg must be set after that method is called.

@Shnatsel
Copy link
Contributor Author

Shnatsel commented Mar 10, 2023

jpeg_decoder doesn't support any kind of limits either, so I don't think it's an immediate concern.

There's a more general the limit on the dimensions, but JPEGs can trivially lie about their dimensions. And in case the JPEG declares its dimensions incorrectly, the jpeg_decoder adapter allocates a huge amount of memory for the decoded JPEG and then panics when trying to run copy_from_slice on it - so it fails to prevent large allocations but doesn't correctly decode the image either.

So it's not a regression, but I do agree that something should be done about that in the long term.

The thorny issue is buffering the entire input - the input alone could exceed the limit before we even try to decode it. I could make reading the input lazy, to make it possible to first call set_limits() and only then call things like dimensions, and have the max allocation limit honored when reading (which is a surprisingly thorny problem, and I don't think any decoder in image currently limits the size of the input). But this still relies on setting the limits before any parsing occurs, which feels fragile.

Alternatively we could read the first N bytes (up to some low limit, unless overridden?) and parse the headers in that slice only. I don't know if there is any kind of bound on the data required to determine the dimensions and colorspace, but my guess is "no". I suppose we could hardcode some sort of reasonable-sounding limit and ignore images that have the colorspace definition 100MB in?

Or we could combine the two approaches and only read the input up to half of the currently set total memory limit?

@fintelia
Copy link
Contributor

Hm, the longer term solution might be to do an API break to require R: Read + Seek and then have new scan through the image to find the needed metadata and then seek back to the start. At any one point only a constant amount of memory would need to be allocated for that.

Given that it isn't a regression, I'd be OK with merging this without limit support. However, given that zune-jpeg is outside of image-rs, we should make sure that they're amenable to adding the feature additions we'd need to implement our limits API.

@Shnatsel
Copy link
Contributor Author

I think the only thing the decoder actually needs to support is capping the maximum amount of allocated memory. zune-jpeg has far simpler allocation patterns than jpeg_decoder (no multi-threading causing allocations everywhere) so I don't see this as an issue. Plus zune-png and zune-inflate support setting a memory limit already.

@etemesi254 what do you think? Would adding a memory limit (even at least a loosely adhered to) to zune-jpeg be feasible?

@etemesi254
Copy link

What should it be based on?

This is how much memory we allocate

  • memory to store decoded image

    By far the largest

This can be controlled by set_max_{width, height}

  • Temporary memory
    • Support memory needed to hold coefficients in decoding stages,only one MCU and 2 width strides maximum, the library is the most memory efficient when compared to libjpeg-turbo, stb and this,

We can't cap it any less

@Shnatsel
Copy link
Contributor Author

Is there a set_max_output_bytes or something like that? For example many web services need to make thumbnails of images, but not if they're so big they'd exhaust the server's memory and get the process killed. This is less of an issue of dimensions - both 100x10,000 images and 10,000x100 images should both be allowed, but not 10,000x10,000.

@fintelia
Copy link
Contributor

For our purposes, simply being able to query an estimate of how much memory is required to decode a specific file should be enough. Our limits are also best effort so if a limit of 10 MB is requested, it is OK if the actual amount used is 11 MB. If a limit of 10 MB is requested, but 500 MB is allocated, that's less OK.

The tricker part is that we will need to have a way of getting image metadata without buffering the entire image contents in memory. A user could hypothetically ask us to decode /dev/random or something as a JPEG, and we shouldn't OOM if that happens. I think that means zune-jpeg would need a streaming API, at least for getting metadata.

@fintelia
Copy link
Contributor

fintelia commented Mar 10, 2023

@Shnatsel making fancier decisions based on image metadata is actually part of the intention behind the current ImageDecoder API. You can query the image dimensions or total output bytes before deciding whether to decode the image. You can even base the cap on memory allocations based on those factors.

@HeroicKatora
Copy link
Member

What should it be based on?

Whatever buffer the library allocates. The image allocated output buffer, as computed by the metadata, will already have been subtracted from the budget by the time the decoder implementation receives it. Afterwards, it doesn't need to be too strict—and not cumulative. A good estimate that restricts the maximum resident size during decoding is perfectly fine. Fuzzing for instance is configured to fail at twice the budget passed to the decoder iirc.

@etemesi254
Copy link

This doesn't handle setting limits, and that may prove challenging with the current design. If I understand the draft code then it right now is hard-coded to use zune-jpeg's default limits. The specific challenge to fixing that is that ImageDecoder::set_limits may be called after new, which means that any allocations in the constructor must be of bounded size and any limits passed to zune-jpeg must be set after that method is called.

This can be added, all we need is the ability to expose the options after they have been set. get_options_mut method may work.

Alternatively we could read the first N bytes (up to some low limit, unless overridden?) and parse the headers in that slice only

Reading limits up until a certain bytes will probably blow up in your face, jpeg may have limits on how markers are arranged, but real world usage means no one will follow that, and if you do it strictly it will cause an image somewhere not to decode.

The tricker part is that we will need to have a way of getting image metadata without buffering the entire image contents in memory. A user could hypothetically ask us to decode /dev/random

What if you would do reading+seeking until you hit the sof marker, this can be simplified by the fact that jpeg makers consist of 2 bytes -marker, 2 bytes length, if a marker is not SOF, you seek past its bytes until the next start up until you hit eof.

Also the case about /dev/random, what does stream_len() return? (part of Seek trait which is nightly only currently)

@HeroicKatora
Copy link
Member

What if you would do reading+seeking until you hit the sof marker, this can be simplified by the fact that jpeg makers consist of 2 bytes -marker, 2 bytes length, if a marker is not SOF, you seek past its bytes until the next start up until you hit eof.

As a strategy for determining the length of the image to buffer, that sounds perfectly acceptable to me. Regarding stream length, it works out to Ok(0) for /dev/{null, random, zero} due to some magic of seeking not moving the stream position. Quite confusing but it means that in general we shouldn't ''trust'' that length to be correct or sensible for preparing an allocation.

Comment on lines +45 to +48
/// Some decoders support scaling the image during decoding,
/// but the current backend, `zune-jpeg`, doesn't,
/// so this function currently does nothing
/// and always returns the original dimensions.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Depending on the resolution of etemesi254/zune-image#103, it may make sense to just deprecate this method

@Shnatsel
Copy link
Contributor Author

I've also implemented limits on dimensions because those were easy. Memory limits will come in a follow-up PR.

@Shnatsel
Copy link
Contributor Author

This cuts peak memory consumption nearly in half - from 150MB to 84MB on this image

So the loading of the full compressed image into memory is probably not a big deal after all.

Copy link
Member

@HeroicKatora HeroicKatora left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From a code pov of the new implementation this seems alright. All reasonable limits, color space, even taking care of backwards compatibility of features. The only thing bugging is the transition story for 16-bit images. What would we tell users who may rely on it. The basic options:

  • Make a promise to follow up quick enough
  • Accept the breakage (not very desirable...)
  • Some path to migration to use jpeg-decoder directly while it's being worked on

@Shnatsel
Copy link
Contributor Author

@etemesi254 what do you think?

@Shnatsel
Copy link
Contributor Author

16-bit JPEG seems to be a non-standard extension, neither Wikipedia nor a quick Google search mention anything about 16-bit support in JPEG. I dug up the original PR for jpeg_decoder supporting 16-bit lossless JPEG, and it said

As this is a special need mostly in medical imaging I tried to touch as little code as possible. ... If you think this is too special a use case I might also publish it in its own crate.

@etemesi254
Copy link

image-rs/jpeg-decoder#223

There was never a decision of how to handle such images.

And here are some things that need clarification even before we see way forward.

  • Endianess
  • API

Endianess I'm more of native endian person

API, suggest another method, decode_{something} that correctly handles 8 bit and 16 bit images.
decode should just call decode_{something} and if the image is in 16 bit, we convert to 8 bit in place

@HeroicKatora
Copy link
Member

HeroicKatora commented Mar 16, 2023

Endianess I'm more of native endian person

Sounds fine. Which crate does the conversion isn't so critical in the cases where it needs to be done once, as long as it only occurs once. Late conversion sometimes allows eliding it alltogether, early conversion allows the data reduction mentioned by you. Ideally, image would have some code to handle any difference here, like it does for png right now (pending similar replacement of course) and also allow both—though I've yet to see a concrete API design for ImageDecoder to address both.¹

Anyways, on the question of api: returning data into a &mut [u8] just works in all cases where buffer layout information is transmitted ahead-of-time. (For which I'm indefinitely grateful for Rust not having type-based-alias restrictions).

I don't want to start decode_u16 shenanigans because I don't see any real direction (i.e. goal) in encoding the sample type of all things into the method name. More complex things like tiff, subsampled-yuv, compressed texture chunks … won't even conform to any notion of having a simple sample type. (And the strong type safety is in established fact a huge roadblock for some changes that would effectively require specialization that involve Pixel trait or the generic associated type of GenericImage to solve.. There are lots of hidden costs in too many types). There's always layout question and the only way to handle them is dynamically via a memory-based interface. It's not &mut [u8] as in u8-samples, it's &mut [u8] as in 'just pass me the data'. If either side provides type safety by wrapping the argument into a stronger buffer, sure! Maybe we can pull up the strongly-tyed bytemuck-views similar to image-canvas with a public constructor (and gracefully failing or handling unaligned outputs).


¹ Maybe these should be approached as two separate uses, with two separate settings. One for providing some upper bound on required data fidelity, as well as feedback from the decoder on how it actually will be provided and in what layout. Trying to fit them into one design is hard; but that is probably inherent to the complex problem space.

@etemesi254
Copy link

I don't want to start decode_u16 shenanigans because I don't see any real direction (i.e. goal) in encoding the sample type of all things into the method name. More complex things like tiff, subsampled-yuv, compressed texture chunks … won't even conform to any notion of having a simple sample type. (And the strong type safety is in established fact a huge roadblock for some changes that would effectively require specialization that involve Pixel trait or the generic associated type of GenericImage to solve.. There are lots of hidden costs in too many types). There's always layout question and the only way to handle them is dynamically via a memory-based interface. It's not &mut [u8] as in u8-samples, it's &mut [u8] as in 'just pass me the data'. If either side provides type safety by wrapping the argument into a stronger buffer, sure! Maybe we can pull up the strongly-tyed bytemuck-views similar to image-canvas with a public constructor (and gracefully failing or handling unaligned outputs).

Some other projects and how they do it

  • Ffmpeg uses, pointers to C's equivalent of &mut [u8] and does casting and macro specialization when needed depending on the filter e.g vf_lut which is computes lookup table differenciates between 8 bit and 16 bit.

  • Open-cv kinda does the same thing with matrix depth

  • And so does vips

  • And do I (e.g Sobel, filter is implemented as two stages, stage 1-> generic impl, stage 2 -> specialization)

  • And does image-rs, with DynamicImage enum.

So I do agree with bytemuck/image-canvas but would say it works well if it's weakly typed but enforced at runtime, i.e let it be a bag of bytes and at runtime you can have checks that make sure those invariants are upheld.

You can uphold type variants by some TypeID tricks and checks in the right places

Just my two cents

@HeroicKatora
Copy link
Member

You can uphold type variants by some TypeID tricks and checks in the right places.

Heh, that's quite neat. Not sure if you need to but FYI Channel is a bit unsound, I'll open some issues accordingly. You may want to check image_texel for a robust treatment of enforcing the safety invariants on the container. (The beforementioned image-canvas is a try at building a buffer atop those abstractions). Maybe we can consolidate the operations if you can give it a review.

@HeroicKatora
Copy link
Member

To summarize the interface question with regards to this PR: there's no need to change anything. 16-bit non-lossy is not standard (and certainly not baseline). As options, a fallback to the jpeg_decoder crate is still available—as is copying the existing bindings for ImageDecoder into your own crate. (¹). There's no risk of breaking changes necessary in image to add support with the zune-jpeg bindings since most major libraries and interface go the byte-based route anyways.

¹ Though one thing that does arise as a nice-to-have would be a way to override the default ImageDecoder's being used for detected standard image formats for the image::io::* interfaces. This would probably be easier if the trait was object-safe.

@HeroicKatora HeroicKatora mentioned this pull request Mar 23, 2023
@benjaminrwilson
Copy link

Hi @Shnatsel,

Thanks for putting this together. Are there any blockers for merging this?

@Shnatsel
Copy link
Contributor Author

Yes, something needs to be decided about Lossless JPEG. It is a rare format used in medical imaging but rarely found outside it, that jpeg_decoder supports but zune-jpeg does not.

@benjaminrwilson
Copy link

@vsaase, I see you authored image-rs/jpeg-decoder#193 -- do you have any thoughts of the best way to handle this?

@vsaase
Copy link

vsaase commented May 3, 2023

I am not involved with DICOM and Rust anymore, maybe @Enet4 can comment on how this is currently handled?

@Shnatsel
Copy link
Contributor Author

Shnatsel commented May 3, 2023

It would be valuable to know if Lossless JPEG is used through the jpeg_decoder crate directly, or by going through the image crate.

@Enet4
Copy link

Enet4 commented May 3, 2023

Thank you for mentioning me. Lossless JPEG and extended JPEG support is indeed relevant for the DICOM-rs project, as it's one common form of compression found in medical imaging. It is worth keeping in mind that these forms of JPEG are standardized per the DICOM standard, although most sources seem to either be oblivious of such capabilities in JPEG or just assume them not to be officially standard. This confusion around JPEG is also one of the reasons why I'm hoping for JPEG-XL to thrive, although Chrome kind of hurt the chances... But I digress. 😶

As far as the DICOM-rs project goes, decoding of encapsulated JPEG pixel data already depends on jpeg-decoder directly, which means that it would be unaffected by this PR. image is used as a vector for API integrations and to be able to easily turn the contents into an image file (like in dicom-toimage).

The project also has a secondary tool dicom-fromimage, which does the opposite: it uses image directly to load a general purpose image file and turn it into a DICOM file. In the event that someone would want to load a lossless JPEG file, that would no longer work with these changes. But I suspect that this would be a very unusual use case and not enough a reason to merit blocking this. There may also be a feasible workaround here, by detecting unsupported JPEG files and reverting to jpeg-decoder then, considering that the error is informative enough for this. I see that the possibility of having a fallback impl has been mentioned, so I wonder if the choice of decoder could still eventually exist behind a Cargo feature.

I personally would be thrilled to be able to use zune-jpeg for its additional performance characteristics, but support coverage is more important.

@etemesi254
Copy link

Hi, lossless jpeg is standardized by the jpeg spec, but is rarely used outside of medical applications as noted, since it is a spec part, I am more welcome to accept a PR in case anyone has time to implement(it is beyond my scope/interest).

Seems image supports lossless 16 bit images in grayscale (ColorType::L16)

fn from_jpeg(pixel_format: jpeg::PixelFormat) -> ColorType {
use jpeg::PixelFormat::*;
match pixel_format {
L8 => ColorType::L8,
L16 => ColorType::L16,
RGB24 => ColorType::Rgb8,
CMYK32 => panic!(),
}
}

I'm not sure if that indicates lossless support in image, the only viable way to know if it would work would be to try and decode an image

@Shnatsel
Copy link
Contributor Author

Shnatsel commented May 3, 2023

If zune-jpeg added support for detecting lossless JPEG based on the header but not decoding it, this would be enough for image to decode it.

However, that would be a DoS vector because jpeg_decoder does not support memory limits, while zune-jpeg does. So any fallback to jpeg-decoder would be a vector for memory exhaustion.

Since removing Lossless JPEG from image isn't going to affect DICOM-rs, the primary user, and the fallback path would be a DoS vector, I recommend merging this as-is and dropping Lossless JPEG support. @HeroicKatora yay or nay?

@Shnatsel
Copy link
Contributor Author

For posterity, this has been rebased in #2141 and merged! 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants