Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix a date-time bug; drop pointless dependencies; and deprecate utils. #85

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

dmyersturnbull
Copy link
Contributor

@dmyersturnbull dmyersturnbull commented Feb 4, 2025

@piehld Another PR will remove MySQL. This PR fixes a bug and removes some unnecessary dependencies.

Summary: Fixes a date-time calculation bug; drops pointless date-time dependencies; and deprecates TimeUtil and TextUtil.

  1. TimeUtil incorrectly called datetime.replace(tzinfo=utc), which does doesn't actually move timezones. Code using TimeUtil and useUtc was affected.
  2. Drops the dependencies dateutil, pytz, and strict-rfc3339. Builtin datetime works perfectly well. Removes these from TimeUtil and some other code.
  3. Refactored code to not use TimeUtil. Its methods just delegate to datetime in standard ways. [1]
  4. Cleaned up TextUtil and refactored code to avoid using it. It only defined unescapeXmlCharRef, which is just html.unescape.
  5. Deleted unescape.py, which was an unused duplicate of TextUtil. [2]
  6. Dropped the dependencies scandir and configparser, which were marked ; python<3.0.
  7. Dropped the dependencies six and future, removed from __future__ import generators, and moved jsondiff to tests_require. (2nd commit)

[1] The exception is "week signature" because it's non-obvious/nonstandard (i.e. not ISO 8601 week number). However, it's just strftime("%Y_%V").
[2] This is technically a breaking change, but adding import rcsb.db.utils.TextUtil as unescape to utils.__init__.py would make it non-breaking.

@dmyersturnbull dmyersturnbull requested a review from piehld February 4, 2025 00:22
Copy link
Collaborator

@piehld piehld left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank @dmyersturnbull, definitely appreciate you reviewing that codebase and identifying and proposing fixes to outdated code. I'm sure your changes are reliable and will try to review them soon, but because I'm out the rest of the week, I just want to give you a heads up that we will probably wait until next week to merge this, since I want to be extra careful testing this before pushing out to production for the weekend. Any changes with rcsb.db are pretty impactful with the weekly-update, so I just tend to lean towards extra caution when it comes to pushing things out is all.

While I still need to go through each change, the one thing that stands out to me is the removal of the mock-data submodule. I'm guessing that wasn't intentional...can you add it back in there?

Also FYI @brindakv

@dmyersturnbull
Copy link
Contributor Author

dmyersturnbull commented Feb 4, 2025

I want to be extra careful testing this before pushing out to production for the weekend.

Sounds good. The SQL removal doesn't depend on this.

By the way, do tests run for PRs?

While I still need to go through each change, the one thing that stands out to me is the removal of the mock-data submodule. I'm guessing that wasn't intentional...can you add it back in there?

Ah! Git doesn't normally track empty directories, and I had to rmdir to bypass an error. Apparently that deletes submodules. I'll fix it.

@piehld
Copy link
Collaborator

piehld commented Feb 4, 2025

Thanks! And be aware that Michael just merged a PR, so you'll probably want to rebase or merge.

By the way, do tests run for PRs?

Yes, they do. Every time you push a new commit a new pipeline will run. However, for some reason this repository doesn't show the results from the runs like other repos do, so you'll need to just go to Azure to check the run directly: https://dev.azure.com/rcsb/RCSB%20PDB%20Python%20Projects/_build?definitionId=33

When you push a new commit, a new run will show up on that link.

� Conflicts:
�	rcsb/db/cockroach/CockroachDbUtil.py
�	rcsb/db/cockroach/Connection.py
�	rcsb/db/crate/CrateDbUtil.py
�	rcsb/db/mysql/MyDbUtil.py
�	requirements.txt
�	setup.py
@dmyersturnbull
Copy link
Contributor Author

@piehld I rebased. I also adjusted the readme (no longer mentions MySQL) and cleaned up setup.py. I moved rcsb.utils.chemref to the test requirements.

Copy link
Collaborator

@piehld piehld left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for bringing mock-data back @dmyersturnbull, and the other changes! I have a couple general questions about the removal of TimeUtil.py, which I left as comments for that file. I definitely agree with removing unnecessary dependencies from requirements.txt, but do wonder if we need to remove that file altogether or instead just update it, since that class is used in code elsewhere that would need to be updated also if we remove it. So, personally I'd kind of prefer keeping the class but just updating it to use datetime instead.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This class is used in rcsb.exdb and rcsb.workflow, so we would need appropriate updates to those repos too (and first) if we want to get rid of this file.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, instead of getting rid of this file (since it is used in a handful of places), is it possible to just update this file to not use dateutil or pytz? That would be my preference, at least.

Copy link
Contributor Author

@dmyersturnbull dmyersturnbull Feb 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This class is used in rcsb.exdb and rcsb.workflow

I don't think that's true. rcsb.exdb is importing rcsb.utils.io.TimeUtil (which, incidentally, is identical). The first commit in this PR refactored TimeUtil, but I deleted it when I realized that.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

@dmyersturnbull dmyersturnbull Feb 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, good catch. I added stubs to rcsb/db/utils/__init__.py that provide those with deprecation warnings.

Comment on lines +18 to +20
from datetime import datetime
from typing import Optional
from zoneinfo import ZoneInfo
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Likewise, if we keep TimeUtil, we don't have to worry about updating each of these files that currently imports it.

Comment on lines +398 to +409
# TODO
# What is this function doing???!
# Its name was castDateTimeToIsoDate, suggesting it outputs YYYY-MM-DD.
# Contradicting tht, the docstring said:
# > Cast the input date (optional time) string (yyyy-mm-dd:hh::mm:ss) to a Python DateTime object
# Instead, it did this:
# > tS = dateutil.parser.parse(tv).replace(tzinfo=pytz.UTC).isoformat()
# That's three contradictory claims.
# What was this function supposed to do?
# Why did it fill in 00:00:00, and why did it append +00:00?
# Also, why did the format separate the date and time with `:`?
# END TODO
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lol, these are all good questions. I imagine some of the contradictions probably came from John copying and pasting a docstring or changing the method name after already writing an [outdated] docstring. From experience, he wasn't always super careful with docstrings.

So, I think the best way to figure out what the intent of this was would be to run an actual load and add some logging to figure it out. I could look into doing that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I simply retained the current behavior and ignored the intention (save for those comments).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants