Skip to content
This repository has been archived by the owner on Jun 26, 2023. It is now read-only.

Fix/disallow filename characters #376

Merged
merged 10 commits into from
Apr 21, 2023
Merged

Conversation

aaperis
Copy link
Contributor

@aaperis aaperis commented Apr 17, 2023

This PR prevents potential filepath naming problems that may arise due to cross-platform incompatibilities.

The procedure implemented is as follows:

  • a filepath is first checked whether it contains both "/" and "\". If yes, the request is rejected. This prevents generating erroneous folder structure in the next step.
  • all file separators in the filepath are changed to the host's OS convention
  • the resulting filepath is checked against a regex set of characters that we wish to disallow and if a hit is found the request is rejected.
    In case of a rejection, proxy returns a 406 error response to the client and prints debug logs with details on the disallowed characters found in the filepath. Testsuite is updated, too.

The fix was tested for the case when proxy runs in linux and an s3 client (s3cmd, sda-cli) runs in either linux or windows. Not tested for mac but hopefully it behaves like linux ;-).

P.S. following standup discussion, spaces in filenames are allowed (along with the majority of unicode).

Closes #370

@aaperis aaperis self-assigned this Apr 17, 2023
@aaperis aaperis requested a review from a team April 18, 2023 07:00
@pontus
Copy link
Contributor

pontus commented Apr 18, 2023

Does the library unescape the URI before handing it to the handler functions?

@aaperis
Copy link
Contributor Author

aaperis commented Apr 18, 2023

Does the library unescape the URI before handing it to the handler functions?

Which library and URI are you referring to?

@pontus
Copy link
Contributor

pontus commented Apr 18, 2023

Does the library unescape the URI before handing it to the handler functions?

Which library and URI are you referring to?

The golang provided net/http and the incoming URI.

Another formulation of this is essentially - if the client sends a name such as something%5csomethingelse, what will happen?

@codecov-commenter
Copy link

codecov-commenter commented Apr 18, 2023

Codecov Report

Merging #376 (7146394) into master (c25f117) will increase coverage by 0.76%.
The diff coverage is 100.00%.

📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more

@@            Coverage Diff             @@
##           master     #376      +/-   ##
==========================================
+ Coverage   55.49%   56.25%   +0.76%     
==========================================
  Files           8        8              
  Lines        1202     1223      +21     
==========================================
+ Hits          667      688      +21     
  Misses        482      482              
  Partials       53       53              
Flag Coverage Δ
unittests 56.25% <100.00%> (+0.76%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
helper/helper.go 36.96% <100.00%> (+4.46%) ⬆️
proxy.go 74.29% <100.00%> (+0.32%) ⬆️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@aaperis
Copy link
Contributor Author

aaperis commented Apr 18, 2023

Does the library unescape the URI before handing it to the handler functions?

Which library and URI are you referring to?

The golang provided net/http and the incoming URI.

Another formulation of this is essentially - if the client sends a name such as something%5csomethingelse, what will happen?

Great point. In short, a file with the name something%5csomethingelse will appear in the inbox.

As far as I see, the path from the URI is unescaped. So, if there is potential for a mixup, I do not see how this can be mitigated on the proxy side.
Anything that slips through should be caught eventually when validating the filepaths in the xmls against the files in the inbox before ingestion starts.

@aaperis aaperis requested review from jbygdell and pontus April 18, 2023 18:30
Copy link
Contributor

@norling norling left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some small stuff you might wanna look at, but looks good in general! :)

proxy.go Outdated
Comment on lines 91 to 89
func (p *Proxy) notAcceptableResponse(w http.ResponseWriter, _ *http.Request) {
log.Debug("not acceptable response")
w.WriteHeader(406)
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
func (p *Proxy) notAcceptableResponse(w http.ResponseWriter, _ *http.Request) {
log.Debug("not acceptable response")
w.WriteHeader(406)
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove this and write the header directly otherwise you will get double debug messages

Copy link
Contributor Author

@aaperis aaperis Apr 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is intended to be a rather generic debug message like the other response functions do (e.g. see line 91 in proxy.go).
example of debug output:

{"level":"info","msg":"User: dummy, Request type PUT, Path: /test/dummy/csa/sq?/data_file","time":"2023-04-20T11:51:56Z"}
{"level":"debug","msg":"filepath contains disallowed characters: ?","time":"2023-04-20T11:51:56Z"}
{"level":"debug","msg":"not acceptable response","time":"2023-04-20T11:51:56Z"}

There are two debug messages but not identical, the first one gives the explicit problem and the one below the response status . I can combine them in to one but the way it is now the function above is reusable and in addition I think the debug messages are more informative.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the function and moved the WriteHeader in the main body.

proxy.go Outdated
filepath, err := helper.FormatUploadFilePath(rawFilepath)
if err != nil {
log.Debugf(err.Error())
p.notAcceptableResponse(w, r)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
p.notAcceptableResponse(w, r)
w.WriteHeader(http.StatusNotAcceptable)

Copy link
Contributor Author

@aaperis aaperis Apr 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@aaperis aaperis requested a review from jbygdell April 20, 2023 12:20
@aaperis aaperis force-pushed the fix/disallow-filename-characters branch from ec2d372 to 7146394 Compare April 21, 2023 05:42
Comment on lines +374 to +378
// Check for mixed "\" and "/" in filepath. Stop and throw an error if true so that
// we do not end up with unintended folder structure when applying ReplaceAll below
if strings.Contains(filePath, "\\") && strings.Contains(filePath, "/") {
return filePath, fmt.Errorf("filepath contains mixed '\\' and '/' characters")
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be unnecessary in the end, but it will require manual testing to validate since / actually is a valid separator on windows.

@dbampalikis dbampalikis merged commit 23e3a06 into master Apr 21, 2023
@dbampalikis dbampalikis deleted the fix/disallow-filename-characters branch April 21, 2023 11:25
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Disallow characters in filenames
6 participants