-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix a date-time bug; drop pointless dependencies; and deprecate utils. #85
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank @dmyersturnbull, definitely appreciate you reviewing that codebase and identifying and proposing fixes to outdated code. I'm sure your changes are reliable and will try to review them soon, but because I'm out the rest of the week, I just want to give you a heads up that we will probably wait until next week to merge this, since I want to be extra careful testing this before pushing out to production for the weekend. Any changes with rcsb.db
are pretty impactful with the weekly-update, so I just tend to lean towards extra caution when it comes to pushing things out is all.
While I still need to go through each change, the one thing that stands out to me is the removal of the mock-data
submodule. I'm guessing that wasn't intentional...can you add it back in there?
Also FYI @brindakv
Sounds good. The SQL removal doesn't depend on this. By the way, do tests run for PRs?
Ah! Git doesn't normally track empty directories, and I had to |
Thanks! And be aware that Michael just merged a PR, so you'll probably want to rebase or merge.
Yes, they do. Every time you push a new commit a new pipeline will run. However, for some reason this repository doesn't show the results from the runs like other repos do, so you'll need to just go to Azure to check the run directly: https://dev.azure.com/rcsb/RCSB%20PDB%20Python%20Projects/_build?definitionId=33 When you push a new commit, a new run will show up on that link. |
� Conflicts: � rcsb/db/cockroach/CockroachDbUtil.py � rcsb/db/cockroach/Connection.py � rcsb/db/crate/CrateDbUtil.py � rcsb/db/mysql/MyDbUtil.py � requirements.txt � setup.py
e3bc77c
to
c6f228d
Compare
@piehld I rebased. I also adjusted the readme (no longer mentions MySQL) and cleaned up |
8e67cee
to
18dcff2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for bringing mock-data
back @dmyersturnbull, and the other changes! I have a couple general questions about the removal of TimeUtil.py
, which I left as comments for that file. I definitely agree with removing unnecessary dependencies from requirements.txt
, but do wonder if we need to remove that file altogether or instead just update it, since that class is used in code elsewhere that would need to be updated also if we remove it. So, personally I'd kind of prefer keeping the class but just updating it to use datetime
instead.
rcsb/db/utils/TimeUtil.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This class is used in rcsb.exdb
and rcsb.workflow
, so we would need appropriate updates to those repos too (and first) if we want to get rid of this file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively, instead of getting rid of this file (since it is used in a handful of places), is it possible to just update this file to not use dateutil
or pytz
? That would be my preference, at least.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This class is used in
rcsb.exdb
andrcsb.workflow
I don't think that's true. rcsb.exdb
is importing rcsb.utils.io.TimeUtil
(which, incidentally, is identical). The first commit in this PR refactored TimeUtil
, but I deleted it when I realized that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I noticed that difference too, but I also see imports of rcsb.db.utils.TimeUtil
, e.g.:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, good catch. I added stubs to rcsb/db/utils/__init__.py
that provide those with deprecation warnings.
from datetime import datetime | ||
from typing import Optional | ||
from zoneinfo import ZoneInfo |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Likewise, if we keep TimeUtil
, we don't have to worry about updating each of these files that currently imports it.
# TODO | ||
# What is this function doing???! | ||
# Its name was castDateTimeToIsoDate, suggesting it outputs YYYY-MM-DD. | ||
# Contradicting tht, the docstring said: | ||
# > Cast the input date (optional time) string (yyyy-mm-dd:hh::mm:ss) to a Python DateTime object | ||
# Instead, it did this: | ||
# > tS = dateutil.parser.parse(tv).replace(tzinfo=pytz.UTC).isoformat() | ||
# That's three contradictory claims. | ||
# What was this function supposed to do? | ||
# Why did it fill in 00:00:00, and why did it append +00:00? | ||
# Also, why did the format separate the date and time with `:`? | ||
# END TODO |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lol, these are all good questions. I imagine some of the contradictions probably came from John copying and pasting a docstring or changing the method name after already writing an [outdated] docstring. From experience, he wasn't always super careful with docstrings.
So, I think the best way to figure out what the intent of this was would be to run an actual load and add some logging to figure it out. I could look into doing that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I simply retained the current behavior and ignored the intention (save for those comments).
@piehld Another PR will remove MySQL. This PR fixes a bug and removes some unnecessary dependencies.
Summary: Fixes a date-time calculation bug; drops pointless date-time dependencies; and deprecates TimeUtil and TextUtil.
TimeUtil
incorrectly calleddatetime.replace(tzinfo=utc)
, which does doesn't actually move timezones. Code usingTimeUtil
anduseUtc
was affected.dateutil
,pytz
, andstrict-rfc3339
. Builtindatetime
works perfectly well. Removes these fromTimeUtil
and some other code.TimeUtil
. Its methods just delegate todatetime
in standard ways. [1]TextUtil
and refactored code to avoid using it. It only definedunescapeXmlCharRef
, which is justhtml.unescape
.unescape.py
, which was an unused duplicate ofTextUtil
. [2]scandir
andconfigparser
, which were marked; python<3.0
.six
andfuture
, removedfrom __future__ import generators
, and movedjsondiff
totests_require
. (2nd commit)[1] The exception is "week signature" because it's non-obvious/nonstandard (i.e. not ISO 8601 week number). However, it's just
strftime("%Y_%V")
.[2] This is technically a breaking change, but adding
import rcsb.db.utils.TextUtil as unescape
toutils.__init__.py
would make it non-breaking.