-
Notifications
You must be signed in to change notification settings - Fork 115
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ReadFile
: semantics of bytesRead
#352
Comments
Hi @lostmsu ,
Dokan does use |
The scenario I worked with is `dumpbin` from C++ build tools, and it failed with my FS until I did fill the buffer (but didn't fail on a raw file).
There might be a bug in .NET wrapper. In the decompilation it looked like
it copies the whole buffer regardless of `bytesRead` for some reason, which
is suspect. Perhaps some programs rely on only `bytesRead` to be overwritten. E.g. if they read into a circular buffer.
|
I assume that you have seen this line: And yes, it looks like it copies the whole buffer. But the value of |
Reasonably sure. This is the end of the method before the fix bytesRead = stream.Read(buffer, 0, buffer.Length);
return DokanResult.Success; |
Okay. I think it would be interesting in your case to just try and change the |
So I tried the suggested change, and This is what I put into if (rawReadLength < 0)
throw new ArgumentOutOfRangeException("bytesRead", rawReadLength, "bytesRead must be greater than or equal to zero.");
if (rawReadLength > rawBufferLength)
throw new ArgumentOutOfRangeException("bytesRead", rawReadLength, "bytesRead must be less than or equal to the length of the buffer.");
Marshal.Copy(buffer, 0, rawBuffer, rawReadLength); This is what I put into int read = Math.Min(buffer.Length, 128); I copied apphost.exe to the mirrored folder and tried to
When I replace with The sad part is that if Microsoft tool is misbehaving like that, how many other apps silently rely on a similar behavior? I guess I will have to ensure full reads now in all file system implementations for compatibility purposes. 😭 |
Thanks for testing! Based on your results, it seems like something we probably need to address somehow. I'll take a closer look at this next week when I am back in office. |
BTW, if you are considering forcing the compatibility thing at Dokan level, I think simply repeating reads until buffer is filled is going to break potential scenarios where FS is backed by something like sockets. |
I have confirmed that Procmon might also be useful to see what is going on with dumpbin. @lostmsu Are you able to reproduce the issue with the C# Mirror ? C Mirror / Memfs ? It would help find out where the issue is |
@Liryna please see above. You can repro the issue with Mirror if you limit the amount of bytes read to 128 in |
@Liryna also, see my suggested change instead of your fix, that guards against bad number returned by |
But that is not a correct implementation. If fewer than requested bytes are returned for a file by the file system, it means that the read reached end of file. It is not like reading from devices such as communication channels. So, effectively, dumpbin will obviously treat the file as just 128 bytes long if you do this (which means it will only get the MZ header and not the PE header and assume it is a DOS exe etc etc). |
It is a bit of an issue when an implementation actually wants to return fewer than requested bytes. That should be perfectly fine to do, to indicate that only that much data was available towards the end of the file. |
@LTRData the guards are against negative values and values that are higher than requested. |
From MSDN it sounds like @LTRData is correct E.g. a file system that serves files should always read the requested amount when possible. IMHO, what should be done is by default Dokan should repeat |
Sorry but I do not agree. That would limit possibilities for implementations that actually want to return fewer bytes read to indicate reached end of file and to avoid further unnecessary reads. Dokan should not send another unnecessary read request to the implementation in that case. |
@LTRData that's why I propose there should be an option for an implementation to opt out of this behavior. E.g. a property on |
Yes, but I think it should be the other way around in that case. There should be some kind of compatiblity layer to implement if this behavior with subsequent reads are needed by the implementation. Because this is not how file system implementations normally handle requests. |
The problem with the "other way around" is that is just calls for bugs. BTW, the whole conversation should also apply to dokan native repository, because in C standard library For instance, I have not looked at SSHFS implementation, but there's a good chance that it exhibits the same issue because it reads from the network, and network does not have the semantics used in Windows |
And to prove my theory: https://github.com/dokan-dev/dokan-sshfs/blob/3577c47f92e6f67be80554a7b1a91a922e627495/DokanSSHFS/DokanOperations.cs#L518 yes, it appears to have this bug. |
Agree there is no good reason for the library to retry reads. It is unable to know why the file system returned this value. Retry could be bad. Even if it is an option, maybe one read case returned less bytes for a good reason and a retry could make it worse. |
No. That is not an issue. For regular files, .NET Stream reads and C library reads always return the number of requested bytes except towards end of file. The concept of returning fewer bytes and reads should be retried only applies to things like communication devices, network sockets and similar. Edit: to be more specific, the read that you linked to in sshfs is not a read from a network socket in a way that would be an issue here. And if it was, it would have been really bad and practically would never had worked. |
This is never guaranteed, and maybe true right now on Windows, but actually may not be. For instance, from
|
It is never guaranteed as a general concept for everything that could be read from. But for regular files it definitely is, on all platforms and frameworks. There are a huge number of things in many places that would break otherwise. |
I'm sorry, but what you say just doesn't match the documentation that I quoted. Unlike the |
In fact, there's more down there:
|
Even Win32 But this does not happen for regular files. In any case, it is something that you need to take into account if you are reading from sources that could behave like this. But the code in sshfs that you linked to is not an example of such scenario because there they read from a steam buffer that is known to the implementation and the exact behavior is well defined. You can compare that to when they read from sockets and similar, which is entirely different in this aspect. |
Just spent a couple of hours trying to figure out an issue with my FS.
Turned out that Dokan expects the implementation of
ReadFile
to always fill the entire buffer whenoffset + buffer.Length
are within the file length, and read exactly remaining bytes whenoffset + buffer.Length
are more than file length.I was calling .NET
Stream.Read
with thebuffer
, and my filesystem would return garbage to some programs. The reason isStream.Read
by its contract does not have to fill your entire buffer. For example, when reading from a TCP connection if you specify 1MiB buffer and 1KiB packet arrives,NetworkStream
will fill 1KiB and return 1024 even though the connection is still open and it will be possible to read more data in the future.Now I don't know what the kernel APIs expect in this case, but IMHO one of the following should be done:
bytesRead
is not used everywhere it should be), fixing that bugReadFile
calls until eitherbuffer
is filled or end of file reached (this would remove the need for documenting the discrepancy)The text was updated successfully, but these errors were encountered: