Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SAME_HOSTNAME not working on non www URLs #955

Closed
ROYOSTI opened this issue Feb 3, 2025 · 1 comment · Fixed by #956
Closed

SAME_HOSTNAME not working on non www URLs #955

ROYOSTI opened this issue Feb 3, 2025 · 1 comment · Fixed by #956
Assignees
Labels
bug Something isn't working. t-tooling Issues with this label are in the ownership of the tooling team.

Comments

@ROYOSTI
Copy link

ROYOSTI commented Feb 3, 2025

When using the EnqueueStrategy.SAME_HOSTNAME I noticed it does not work properly on non www urls.

In the debugger I noticed it passes origin to the _check_enqueue_strategy but it uses the context.request.loaded_url if available.
So every URL that is checked will mismatch because of the difference in hostname

Image

I tested this with multiple urls with & without www prefix and got the same behaviour.

Image

Changing the line to origin = context.request.url fix this issue, but I have no idea what implications this would have on the other code.

I use the PlaywrightCrawler in my code with context.enqueue_links

@github-actions github-actions bot added the t-tooling Issues with this label are in the ownership of the tooling team. label Feb 3, 2025
@Mantisus Mantisus self-assigned this Feb 3, 2025
@Mantisus Mantisus added the bug Something isn't working. label Feb 3, 2025
@vdusek
Copy link
Collaborator

vdusek commented Feb 3, 2025

Hi @ROYOSTI, thanks for reporting this. It does seem like a bug and we'll try to look into it soon.

@vdusek vdusek added this to the 107th sprint - Tooling team milestone Feb 3, 2025
vdusek pushed a commit that referenced this issue Feb 4, 2025
…ponse with redirect (#956)

### Description

-  fix `enqueue_links` for response with redirect. 

### Issues

- Closes: #955
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working. t-tooling Issues with this label are in the ownership of the tooling team.
Projects
None yet
3 participants