Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support to access raw HTTP headers #1307

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

jimdigriz
Copy link

@jimdigriz jimdigriz commented Jul 15, 2018

I have a requirement to inspect the ordering and case-sensitive-ness of the HTTP headers in the request for User-Agent identification. As cowboy only supports normalised header access, I wired in a headers_raw boolean option to make the request object contain headers_raw:

#{bindings => #{},body_length => 0,cert => undefined,
             has_body => false,
             headers =>
                 #{<<"accept">> => <<"*/*">>,
                   <<"host">> => <<"localhost:8080">>,
                   <<"user-agent">> => <<"curl/7.52.1">>},
             headers_raw =>
                 <<"Host: localhost:8080\r\nUser-Agent: curl/7.52.1\r\nAccept: */*\r\n">>,
             host => <<"localhost">>,host_info => undefined, 
             method => <<"GET">>,path => <<"/">>,path_info => undefined,
             peer => {{127,0,0,1},47606}, 
             pid => <0.124.0>,port => 8080,qs => <<>>,ref => test,
             scheme => <<"http">>,
             sock => {{127,0,0,1},8080},
             streamid => 1,version => 'HTTP/1.1'}

My use case is then to then access this via middleware to do browser identification.

Before making this PR 'worthy', at this point I really looking for feedback on your appetite for this sort of thing, and if this is the best way to plumb it in and maybe you have some suggestions and/or recommendations?

@essen
Copy link
Member

essen commented Jul 15, 2018

Not terribly fond of special features to help track users especially when they're not compatible with HTTP/2. But let's leave it open for now and see if it could benefit other use cases.

@jimdigriz
Copy link
Author

Though everything can be used for Evil(tm), my personal use is when the User-Agent claims to be Windows/Chrome but is actually Linux/PhantomJS. The idea is that combined with other signals then if, for example, they are lying about the User-Agent and the IP is from a data center then they probably are up to no good and should be blocked.

Thanks for the feedback.

@jimdigriz
Copy link
Author

FYI, raw headers (capturing the header ordering and case-sensitively of the name) provide no benefit for tracking users. There is a N:1 relationship of browsers (where N is a group of same family and range of versions) to a particular signature.

This gleans you no more information about the user that you could obtain just from parsing the value of the User-Agent its-self which is more unique (I see user id's in there...the internet is awful...).

@essen
Copy link
Member

essen commented Aug 13, 2018

Sure but that doesn't change things much, field order has no meaning in the protocol (except for fields of the same name) and you risk burning yourself if/when legitimate clients end up doing the same. In Cowboy I would prefer to stick as close as possible to the protocols.

Still, maybe someone has a need for this for other reasons so let's wait a few months and see if it can be useful.

@essen
Copy link
Member

essen commented Oct 31, 2018

You're not alone wanting this golang/go#24375

Also https://www.sans.org/reading-room/whitepapers/detection/paper/34460

I like the principle behind the implementation of the PR though. So if there's more demand I'm not against including.

@jimdigriz
Copy link
Author

jimdigriz commented Oct 31, 2018 via email

@essen
Copy link
Member

essen commented Oct 31, 2018

Hmm are you going to keep raw data also, perhaps just the HEADERS block? Just curious.

@jimdigriz
Copy link
Author

jimdigriz commented Oct 31, 2018 via email

@essen essen added the Unsupported edge case Provide feedback if you think this should be supported somehow. label Nov 5, 2018
@essen
Copy link
Member

essen commented Nov 11, 2018

At IETF103 it was mentioned it could become a requirement for intermediaries to send headers without concatenating them, so it's possible a more general solution will be required in the future. If that happens I'll favor a mode to select whether Cowboy should operate using maps or lists of headers, with the caveat of course that lists will be slower to manipulate.

@jimdigriz jimdigriz force-pushed the headers-raw branch 3 times, most recently from d2d8ad9 to 04ac41e Compare February 10, 2019 13:39
@jimdigriz jimdigriz force-pushed the headers-raw branch 2 times, most recently from 8995c89 to 235cf3e Compare January 18, 2020 15:44
@jimdigriz jimdigriz force-pushed the headers-raw branch 2 times, most recently from e1b1fdd to bc5d952 Compare May 30, 2020 09:15
@jimdigriz
Copy link
Author

jimdigriz commented May 30, 2020

Attached is a the HTTP/2 supporting updated implementation that I use in my passive browser fingerprinting library.

Thoughts?

I'll write some unit tests if this looks to be an acceptable approach.

@essen
Copy link
Member

essen commented May 30, 2020

Hm I would rather have a more consistent approach, from what I understand HTTP/1.1 gives a binary while HTTP/2 gives the parsed list of headers. Is there any reason for HTTP/1.1 to be a binary?

@jimdigriz
Copy link
Author

For my use case, no as in my project I actually convert the binary to an KV list anyway. :)

As this is 'raw' I thought for the generic case (as those users have particular needs and processing ideas) it was maybe best to expose what ever was closest to the wire representation for full headers (otherwise maybe I should just return the frame?). Mostly as some may for HTTP/1.1 decide they want to use a binary match to search for leading white space in values without having to iterate over a list.

Happy to add something to the HTTP/1.x side to fix up the returned value into something more resembling cow_http:headers(); I suppose those who want the binary can just fold it back up into a binary if they wanted.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Unsupported edge case Provide feedback if you think this should be supported somehow.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants