Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Converting Embedded image from Documents #158

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

FeuRicardo
Copy link

@FeuRicardo FeuRicardo commented Dec 19, 2024

Pull Request

Description

This PR introduces the following changes:

  1. Initialization of New Attributes:

    • Added _mlm_client and _mlm_model attributes to the PptxConverter class, initialized using the kwargs dictionary.
  2. Handling of Image Shapes:

    • Integrated a new method _convert_image_to_markdown to handle the conversion of image shapes to markdown within the presentation slides processing loop.
  3. Handling of image within DataURI:

    • Integrated a new validation to identify DataURIs of the image type and, if the LLM model has been defined, converts the image to markdown.
  4. Addition of _convert_image_to_markdown Method:

    • Added a new method _convert_image_to_markdown to the PptxConverter class to convert image shapes to markdown format.

Related Issue

Link to the related issue (if any).

Motivation and Context

  • The new attributes _mlm_client and _mlm_model are required for additional functionality.
  • The _convert_image_to_markdown method improves the handling of image shapes by converting them to markdown format, enhancing the overall functionality of the PptxConverter class.
  • The new feature that identifying and converting image-type DataURIs improves handling of documents (such as .docx) that have embedded images, enhancing the overall functionality of the _CustomMarkdownify class and its dependents.

How Has This Been Tested?

  • Unit tests
  • Integration tests
  • [ X ] Manual testing

Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce.

Screenshots (if appropriate):

Types of changes

  • Bug fix
  • [ X ] New feature
  • Breaking change
  • Documentation update

Checklist:

  • [ X ] My code follows the code style of this project.
  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.
  • [ X ] I have added tests to cover my changes.
  • [ X ] All new and existing tests passed.
  • [ X ] The title of my pull request is a short description of the requested changes.

Additional Notes

This new feature reflects over .pptx, .docx and .html (including extends classes)

@gagb
Copy link
Contributor

gagb commented Dec 19, 2024

please expand the pr description.

@FeuRicardo
Copy link
Author

FeuRicardo commented Dec 20, 2024

please expand the pr description.

Pull Request

Description

This PR introduces the following changes:

  1. Initialization of New Attributes:

    • Added _mlm_client and _mlm_model attributes to the PptxConverter class, initialized using the kwargs dictionary.
  2. Handling of Image Shapes:

    • Integrated a new method _convert_image_to_markdown to handle the conversion of image shapes to markdown within the presentation slides processing loop.
  3. Addition of _convert_image_to_markdown Method:

    • Added a new method _convert_image_to_markdown to the PptxConverter class to convert image shapes to markdown format.

Related Issue

Link to the related issue (if any).

Motivation and Context

  • The new attributes _mlm_client and _mlm_model are required for additional functionality.
  • The _convert_image_to_markdown method improves the handling of image shapes by converting them to markdown format, enhancing the overall functionality of the PptxConverter class.

How Has This Been Tested?

  • Unit tests
  • Integration tests
  • Manual testing

Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce.

Screenshots (if appropriate):

Types of changes

  • Bug fix
  • New feature
  • Breaking change
  • Documentation update

Checklist:

  • My code follows the code style of this project.
  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.
  • I have added tests to cover my changes.
  • All new and existing tests passed.
  • The title of my pull request is a short description of the requested changes.

Additional Notes

Add any additional information or context.

@FeuRicardo FeuRicardo changed the title Embedded image Converting Embedded image from Documents Dec 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants