Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: remove custom startup to fix dev reloading #36302

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

ormsbee
Copy link
Contributor

@ormsbee ormsbee commented Feb 26, 2025

The cms/startup.py and lms/startup.py files were created to
allow us to do a lot of custom initialization around things
like the ModuleStore, monkey-patching, adding MIME types to
our process, etc. As far back as 2017, we recognized that
this was a bad thing, marked these modules as "deprecated",
and started removing things or putting them in the standard
Django locations for them (0279181).

In its current state, these startup modules no longer do any
custom work, and just invoke django.startup(). But this is
meant for running Django code in "standalone" usage, e.g. if
you have a script that isn't a management command but needs
some Django functionality.

The "runserver" command used during development normally
launches a child process to serve requests and knows how to
kill and respawn that process when files are modified, so
that changes are reflected. It can also normally handle the
case where there's a SyntaxError in the child process, and
fixing that error will reload the code again.

Something about running django.startup() manually interferes
with this functionality in "runserver". It still reloads the
code in response to changes, but if the code gets into a
broken state for any reason (like a syntax error), the master
process itself dies. That causes the container to restart,
only to die again shortly afterwards in a loop until the
error is fixed. The container restarts will break any shell
you had opened into the container, as well as any IDE
integrations that connected to that container to access the
files and Python instance.

Getting rid of the custom startup code fixes this and moves
us one small step closer to being a more normal Django
project.

@ormsbee ormsbee force-pushed the remove-custom-startup branch from 6eaf17a to 013e66c Compare February 26, 2025 00:34
@ormsbee
Copy link
Contributor Author

ormsbee commented Feb 26, 2025

I think what it comes down to is that we're only supposed to invoke django.setup() once, and Django's various commands do that invocation. But if django.ready() is called by us, it's not re-initialized properly because Apps.ready will set set to True, and whatever extra initialization needs to happen, doesn't.

@ormsbee ormsbee force-pushed the remove-custom-startup branch from 013e66c to a719952 Compare February 26, 2025 02:24
@ormsbee
Copy link
Contributor Author

ormsbee commented Feb 26, 2025

This will break this site-configuration script in Tutor, though I think we can just replace it with a django.setup() call.

@ormsbee ormsbee marked this pull request as ready for review February 26, 2025 02:55
@ormsbee ormsbee force-pushed the remove-custom-startup branch from a719952 to 99d0a35 Compare February 26, 2025 03:10
@kdmccormick kdmccormick self-requested a review February 26, 2025 03:31
@ormsbee ormsbee force-pushed the remove-custom-startup branch 2 times, most recently from 6dae997 to ed08013 Compare February 26, 2025 14:45
@ormsbee
Copy link
Contributor Author

ormsbee commented Feb 26, 2025

FYI @blarghmatey and @pdpinch since this will affect the startup of LMS/Studio and I don't have a good way of testing that in non-Tutor environments. Please take a look to see if you have any concerns.

@blarghmatey
Copy link
Contributor

Thanks for the heads up! I don't foresee any problems on our end, but I will run a build in one of our test environments to be sure.

@ormsbee
Copy link
Contributor Author

ormsbee commented Feb 27, 2025

@kdmccormick: I pretty much exclusively run tutor dev off of the main branch. What's the right way to test this in the production setup? Should I spin up the Sumac release, run tutor local with this commit cherry-picked? Just doing tutor local launch gave me a lot of errors related to missing packages.

@kdmccormick kdmccormick added the create-sandbox open-craft-grove should create a sandbox environment from this PR label Feb 27, 2025
@kdmccormick
Copy link
Member

tutor local launch should work fine on main. Have you rebuild the local prod image lately, tutor images build openedx?

@kdmccormick
Copy link
Member

(I've also added a label to create a PR sandbox on this.)

@open-craft-grove
Copy link

Sandbox deployment failed 💥
Please check the settings and requirements.
Retry deployment by pushing a new commit or updating the requirements/settings in the pull request's description.
📜 Failure Logs
ℹ️ Grove Config, Tutor Config, Tutor Requirements

@ormsbee
Copy link
Contributor Author

ormsbee commented Feb 27, 2025

Have you rebuild the local prod image lately, tutor images build openedx?

Hah, yup, this was exactly it. Started it up and everything works as expected. Studio and LMS both came up normally and celery tasks ran. Thank you!

@blarghmatey
Copy link
Contributor

I was also able to deploy the changes in this branch to one of our QA environments without any issues.

@ormsbee
Copy link
Contributor Author

ormsbee commented Feb 27, 2025

@robrap: I've tested in Tutor and @blarghmatey has tested on MIT's deployment. If I can get a reviewer to look through this, would you be comfortable merging this next week?

@robrap
Copy link
Contributor

robrap commented Feb 28, 2025

I’d like to try when it is ready. Anything you can share to help me reduce risk would be appreciated. What did you test in Tutor?

Also, is any of this of concern? https://github.com/search?q=%28org%3Aopenedx+OR+org%3Aedx%29+%22lms%2Fstartup%22++NOT+is%3Aarchived&type=code
I didn’t search for cms.

@ormsbee
Copy link
Contributor Author

ormsbee commented Feb 28, 2025

I’d like to try when it is ready. Anything you can share to help me reduce risk would be appreciated. What did you test in Tutor?

I tested starting the LMS and Studio in both dev (so manage.py [lms/cms] runserver) and via uwsgi using lms/wsgi.py and cms/wsgi.py. Did sanity check that courseware rendered in LMS and Studio. Did a publish in Studio and verified that the celery task ran successfully.

Also, is any of this of concern? https://github.com/search?q=%28org%3Aopenedx+OR+org%3Aedx%29+%22lms%2Fstartup%22++NOT+is%3Aarchived&type=code

I'll remove the references to the files, but no, what I saw in that search and in the cms equivalent doesn't concern me. I think it's possible we might be able to remove some of the weirdness around catalog/models.py later on now that we won't have this bespoke startup sequence, but I think leaving it for now is fine.

@ormsbee
Copy link
Contributor Author

ormsbee commented Feb 28, 2025

@kdmccormick: I looked through the error log for the sandbox build, but didn't see anything specific enough for me to understand what's going on there and why it's failing.

@ormsbee
Copy link
Contributor Author

ormsbee commented Feb 28, 2025

Given that this PR has nothing but deletions, I plan to merge this even if codecov disapproves because I've written no test code for it.

@robrap
Copy link
Contributor

robrap commented Feb 28, 2025

@ormsbee

Given that this PR has nothing but deletions, I plan to merge this even if codecov disapproves because I've written no test code for it.

Does this mean once approved, I plan to have @robrap merge this early next week so he can test it?

@ormsbee
Copy link
Contributor Author

ormsbee commented Feb 28, 2025

@robrap: I was planning to wait until you were good with merging, but I didn't think you had the access to bypass the rules and merge even though the codecov check is technically failing. If you do have that ability, I'd be happy to have you merge instead.

@robrap
Copy link
Contributor

robrap commented Feb 28, 2025

@ormsbee: The only thing I see that is clearly blocking is that there still isn’t an approved review from someone with write access. Once you have that, we can see if I can merge.

Copy link
Member

@kdmccormick kdmccormick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code LGTM. I haven't tested, but I see that Dave and Tobias both have.

@ormsbee
Copy link
Contributor Author

ormsbee commented Mar 1, 2025

@robrap: Ah, great! I didn't realize it was non-blocking. I'll do minor comment cleanup tonight, and leave the merge timing next week to your discretion then. Thank you!

@robrap
Copy link
Contributor

robrap commented Mar 1, 2025

Ok. Let me know when it is ready to merge. Thanks.

The cms/startup.py and lms/startup.py files were created to
allow us to do a lot of custom initialization around things
like the ModuleStore, monkey-patching, adding MIME types to
our process, etc. As far back as 2017, we recognized that
this was a bad thing, marked these modules as "deprecated",
and started removing things or putting them in the standard
Django locations for them (0279181).

In its current state, these startup modules no longer do any
custom work, and just invoke django.startup(). But this is
meant for running Django code in "standalone" usage, e.g. if
you have a script that isn't a management command but needs
some Django functionality.

The "runserver" command used during development normally
launches a child process to serve requests and knows how to
kill and respawn that process when files are modified, so
that changes are reflected. It can also normally handle the
case where there's a SyntaxError in the child process, and
fixing that error will reload the code again.

Something about running django.startup() manually interferes
with this functionality in "runserver". It still reloads the
code in response to changes, but if the code gets into a
broken state for any reason (like a syntax error), the master
process itself dies. That causes the container to restart,
only to die again shortly afterwards in a loop until the
error is fixed. The container restarts will break any shell
you had opened into the container, as well as any IDE
integrations that connected to that container to access the
files and Python instance.

Getting rid of the custom startup code fixes this and moves
us one small step closer to being a more normal Django
project.
The get_service_user method used to do a local call to the
get_user_model function because it was not guaranteed to be properly
initialized at the time of import. This was partly due to how we did
custom initialization using lms/startup.py, but it was also because
when it was implemented (commit f318661), the platform was still
running on Django 1.8.18. At that time, get_user_model was guaranteed to
work only after Django has imported all models.

In Django 1.11, the behavior of get_user_model was changed:

  https://docs.djangoproject.com/en/1.11/releases/1.11/#django-contrib-auth

> get_user_model() can now be called at import time,
> even in modules that define models.

Now that lms/startup.py is gone and get_user_model is safe to call at
the module level, I'm refactoring the catalog app's models.py file to
follow the convention we use everywhere else in edx-platform with
respect to get_user_model.
@ormsbee ormsbee force-pushed the remove-custom-startup branch from ed08013 to d5aa834 Compare March 1, 2025 04:16
@ormsbee
Copy link
Contributor Author

ormsbee commented Mar 1, 2025

@robrap: It should be merge-ready now. I did add another commit to just remove the weird local import rather than updating the comment about why it was there (d5aa834).

@open-craft-grove
Copy link

Sandbox deployment failed 💥
Please check the settings and requirements.
Retry deployment by pushing a new commit or updating the requirements/settings in the pull request's description.
📜 Failure Logs
ℹ️ Grove Config, Tutor Config, Tutor Requirements

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
create-sandbox open-craft-grove should create a sandbox environment from this PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants