Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

community[minor]: 03 - Refactoring PyPDF parser #29330

Merged
merged 15 commits into from
Jan 31, 2025

Conversation

pprados
Copy link
Contributor

@pprados pprados commented Jan 21, 2025

This is one part of a larger Pull Request (PR) that is too large to be submitted all at once.
This specific part focuses on updating the PyPDF parser.

For more details, see PR 28970.

…3-pypdf

# Conflicts:
#	docs/docs/integrations/document_loaders/pymupdf.ipynb
#	docs/docs/integrations/document_loaders/pypdfloader.ipynb
#	libs/community/langchain_community/document_loaders/parsers/images.py
#	libs/community/langchain_community/document_loaders/parsers/pdf.py
#	libs/community/langchain_community/document_loaders/pdf.py
#	libs/community/tests/integration_tests/document_loaders/parsers/test_images.py
#	libs/community/tests/integration_tests/document_loaders/parsers/test_pdf_parsers.py
#	libs/community/tests/integration_tests/document_loaders/test_pdf.py
# Conflicts:
#	libs/community/langchain_community/document_loaders/parsers/images.py
#	libs/community/langchain_community/document_loaders/parsers/pdf.py
#	libs/community/langchain_community/document_loaders/pdf.py
#	libs/community/tests/integration_tests/document_loaders/parsers/test_pdf_parsers.py
#	libs/community/tests/integration_tests/document_loaders/test_pdf.py
#	libs/community/tests/unit_tests/document_loaders/parsers/test_pdf_parsers.py
Copy link

vercel bot commented Jan 21, 2025

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
langchain ✅ Ready (Inspect) Visit Preview 💬 Add feedback Jan 31, 2025 8:18am

@pprados pprados marked this pull request as ready for review January 21, 2025 09:08
@dosubot dosubot bot added size:XXL This PR changes 1000+ lines, ignoring generated files. community Related to langchain-community 🤖:docs Changes to documentation and examples, like .md, .rst, .ipynb files. Changes to the docs/ folder labels Jan 21, 2025
@pprados pprados mentioned this pull request Jan 21, 2025
2 tasks
@pprados
Copy link
Contributor Author

pprados commented Jan 23, 2025

@eyurtsev can you review this code? I think you'll understand it the fastest, since it's a continuation of the previous one.

I'd also like to thank you for our exchanges, which have resulted in a relevant foundation for LangChain.

@pprados
Copy link
Contributor Author

pprados commented Jan 27, 2025

@eyurtsev can you review this code?

@pprados pprados marked this pull request as draft January 29, 2025 10:23
@pprados pprados marked this pull request as ready for review January 29, 2025 10:23
@dosubot dosubot bot added the Ɑ: doc loader Related to document loader module (not documentation) label Jan 29, 2025
Copy link
Collaborator

@eyurtsev eyurtsev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pprados PR looks good.

The main issue is just clearing up documentation for plain vs. layout. Once that's updated, we can merge!

@dosubot dosubot bot added the lgtm PR looks good. Use to confirm that a PR is ready for merging. label Jan 29, 2025
@pprados
Copy link
Contributor Author

pprados commented Jan 30, 2025

@eyurtsev
Done.

@eyurtsev eyurtsev merged commit ceda8bc into langchain-ai:master Jan 31, 2025
21 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
community Related to langchain-community Ɑ: doc loader Related to document loader module (not documentation) 🤖:docs Changes to documentation and examples, like .md, .rst, .ipynb files. Changes to the docs/ folder lgtm PR looks good. Use to confirm that a PR is ready for merging. size:XXL This PR changes 1000+ lines, ignoring generated files.
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

2 participants