Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resolve request_fingerprint deprecation removal in Scrapy 2.12.0 #51

Closed
wants to merge 7 commits into from

Conversation

fsmeraldi
Copy link

@fsmeraldi fsmeraldi commented Feb 24, 2025

Removal of scrapy.utils.request.request_fingerprint() breaks scrapy_deltafetch. I solved this by replacing the deprecated function with a RequestFingerprinter object according to the new specifications. Tests modified accordingly.

Thank you for a very useful package!

Closes #50.

@Gallaecio
Copy link
Contributor

While this indeed removes the removed import, it does not switch to the new way to handle request fingerprinting introduced in Scrapy 2.7.0. And to be fair, I have just realized how the Scrapy docs focus on the user information about request fingerprinting and neglect the component-author information.

Do you think you could refactor this PR to instead rely on self.crawler.request_fingerprinter.fingerprint() for request fingerprinting?

You could use hasattr(self.crawler, "request_fingerprinter") in from_crawler or __init__ to determine whether or not the installed Scrapy version supports it, use it where available, and import the old function where not available.

Test expectations may need to change as well when running a version of Scrapy that supports the new approach. For one, the new fingerprints as bytes, not str.

@fsmeraldi
Copy link
Author

fsmeraldi commented Feb 25, 2025

While this indeed removes the removed import, it does not switch to the new way to handle request fingerprinting introduced in Scrapy 2.7.0. And to be fair, I have just realized how the Scrapy docs focus on the user information about request fingerprinting and neglect the component-author information.

Do you think you could refactor this PR to instead rely on self.crawler.request_fingerprinter.fingerprint() for request fingerprinting?

You could use hasattr(self.crawler, "request_fingerprinter") in from_crawler or init to determine whether or not the installed Scrapy version supports it, use it where available, and import the old function where not available.

I am actually passing the crawler to the constructor of RequestFingerprinter, my understanding is that the constructor does just that version check and handles REQUEST_FINGERPRINTER_CLASS? I also thought this was the current way of doing it, sorry, I might have got in beyond my depth.

Sorry, I got that mixed up with REQUEST_FINGERPRINTER_IMPLEMENTATION. I think I can see what you mean, I will try to give it a go when I have time.

Test expectations may need to change as well when running a version of Scrapy that supports the new approach. For one, the new fingerprints as bytes, not str.

I see the current implementation of to_bytes returns the argument unchanged if it is already a bytes object, so although the code does not check for correctness of the fingerprinting function, it does seem to check correctly for the functioning of deltafetch

@Gallaecio
Copy link
Contributor

Thanks!

I have moved the code to from_crawler to maximize backward compatibility (so that subclasses that do not pass crawler to super().__init__() not only do not break, but also use the new method unless _get_key is overridden).

I have also upgraded the minimum required Python version to get the CI passing. I will open a separate PR to modernize the code base a bit in preparation for a release.

@Gallaecio Gallaecio requested review from kmike and wRAR February 25, 2025 20:03
@fsmeraldi
Copy link
Author

That's neat, thank you very much!

@Gallaecio Gallaecio mentioned this pull request Feb 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants