-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DecodedURL #54
DecodedURL #54
Conversation
…der for any reserved characters, and used it in .child() and .sibling(), will add it in further methods shortly
…obably have the issue with not-yet-normalized query parameter names (mixed decoded and encoded query parameter names that overlap).
… overlapping due to not being normalized
…ation (.append, etc.). Also add and test __eq__, __ne__, and __hash__
…RLs of some complexity. Had to add userinfo to URL.normalize() to help with equality checks.
… and other Twisted codebases
Codecov Report
@@ Coverage Diff @@
## master #54 +/- ##
==========================================
+ Coverage 97.8% 97.94% +0.13%
==========================================
Files 6 8 +2
Lines 1137 1408 +271
Branches 137 164 +27
==========================================
+ Hits 1112 1379 +267
- Misses 13 14 +1
- Partials 12 15 +3
Continue to review full report at Codecov.
|
hyperlink/_url.py
Outdated
_UNRESERVED_DECODE_MAP = dict([(k, v) for k, v in _HEX_CHAR_MAP.items() | ||
if v.decode('ascii', 'replace') | ||
in _UNRESERVED_CHARS]) | ||
|
||
_ROOT_PATHS = frozenset(((), (u'',))) | ||
|
||
|
||
def _encode_reserved(text, maximal=True): | ||
"""A very comprehensive percent encoding for encoding all | ||
delimeters. Used for arguments to DecodedURL, where a % means a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: "delimiters"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oooh, this kind of typo is so very unlike me, I assure you! ;) Thanks!
return not self.__eq__(other) | ||
|
||
def __hash__(self): | ||
return hash((self.__class__, self.scheme, self.userinfo, self.host, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor: Since equality is delegated upwards to URL
, would it be better to do the same for hashing?
e.g.
def __hash__(self):
return hash(self.normalize().to_uri())
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good point. I may have subtly changed my mind midway through development, and now I lean more toward not delegating up to URL for equality. I'll get that fixed, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whenever you're delegating hash
like this, remember you should also include a tweak, so that the DecodedURL
and the URL
representing the same data don't hash the same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@glyph why is that? (correctness shouldn't be effected since dictionaries compare by equality, and I'm not entirely sure what the performance problem you're trying to forestall)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I share @moshez's curiosity.
@mahmoud - will you follow @alexwlchan's advice about delegating __hash__
?
hyperlink/__init__.py
Outdated
|
||
__all__ = [ | ||
"URL", | ||
"DecodedURL", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want to say that we should just avoid exposing this entirely, but it probably needs to be exposed for type annotations.
However, as per #44, could we have a "decoded" property on URL
that provides an interface to this, and an "encoded" property on DecodedURL
that maps back to a URL
?
(At this point I think I'm in favor of adding an EncodedURL
alias for URL
, then maybe adding a top-level entry point like hyperlink.parse()
which takes a decoded
kwarg flag which defaults to true, to make it easier to get started with DecodedURL
which is what I think we all want most of the time. That can definitely be deferred to a separate ticket though.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I am pretty much in favor of all those conveniences :) And I also agree that this probably needs to be exposed.
…vel API. fix a bug with userinfo where double-escapes were possible because % wasn't marked as safe. add lazy option to DecodedURL, also exposed in parse and URL's new get_decoded_url() method. add a few more notes and comment tweaks
…verage-oriented tests. Remove the password attributes of DecodedURL pending future discussion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The approach seems good and it addresses my question:
>>> from hyperlink import DecodedURL, URL
>>> d = DecodedURL(URL())
>>> d.add(u'parameter', u'#value').to_text()
u'?parameter=%23value'
I think this means we should use DecodedURL
. in twisted.python.url
. That's great!
I've left comments about documentation improvements and lingering TODOs. I'd also like to see clarity around __hash__
. Please address these issues with changes or PR comments before merging.
@@ -1040,7 +1083,7 @@ def from_text(cls, text): | |||
rooted, userinfo, uses_netloc) | |||
|
|||
def normalize(self, scheme=True, host=True, path=True, query=True, | |||
fragment=True): | |||
fragment=True, userinfo=True): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The docstring should mention userinfo
.
There's a TODO for userinfo. Should that go?
hyperlink/_url.py
Outdated
handled automatically. | ||
|
||
Where applicable, a UTF-8 encoding is presumed. Be advised that | ||
some interactions, can raise UnicodeEncodeErrors and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comma splice:
some interactions , can raise UnicodeEncodeErrors...
Also, maybe you want backticks around UnicodeEncodeError
s and UnicodeDecodeError
s.
hyperlink/_url.py
Outdated
UnicodeDecodeErrors, just like when working with | ||
bytestrings. | ||
|
||
Examples of such interactions include handling query strings |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be part of the previous paragraph?
hyperlink/_url.py
Outdated
encoding binary data, and paths containing segments with special | ||
characters encoded with codecs other than UTF-8. | ||
""" | ||
def __init__(self, url, lazy=False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is a public class, then its initializer should be documented. Please include an explanation of url
and lazy
in the docstring.
hyperlink/_url.py
Outdated
|
||
def click(self, href=u''): | ||
"Return a new DecodedURL wrapping the result of :meth:`~hyperlink.URL.click()`" | ||
return type(self)(self._url.click(href=href)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not self.__class__
?
durl = durl.set(' ', 'spa%ed') | ||
assert durl.get(' ') == ['spa%ed'] | ||
|
||
durl = DecodedURL(url=durl.to_uri()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the point of this? Round tripping?
hyperlink/test/test_decoded_url.py
Outdated
|
||
assert durl.set('arg', 'd').get('arg') == ['d'] | ||
|
||
def test_equivalences(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This really tests __eq__
and __hash__
, so maybe test_equality_and_hashability
?
|
||
durl_map = {} | ||
durl_map[durl] = durl | ||
durl_map[durl2] = durl2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should also test that a URL
and a DecodedURL
that represent the same underlying URL don't overlap in a dict
(or set
.)
hyperlink/test/test_decoded_url.py
Outdated
|
||
assert len(durl_map) == 1 | ||
|
||
def test_replace_roundtrip(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is good, but there should also be a roundtrip test between TOTAL_URL
and DecodedURL
.
|
||
BASIC_URL = 'http://example.com/#' | ||
TOTAL_URL = "https://%75%73%65%72:%00%00%00%[email protected]:8080/a/nice%20nice/./path/?zot=23%25&zut#frég" | ||
UNDECODABLE_FRAG_URL = TOTAL_URL + '%C3' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It'd be good to note that this is undecodable because %C3
makes it invalid UTF-8.
This still needs some documentation, but to address #44 and the handful of issues surrounding the central problem, I present
DecodedURL
. It takes care of handling reserved characters so you don't have to.From the docstring:
It's tested, works, and seems practical, though, so take a look!