Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug/Very High Memory Utilisation: 7mb Excel sheet taking more than 10gb memory space #3872

Open
Akashtyagi opened this issue Jan 17, 2025 · 1 comment
Labels
bug Something isn't working

Comments

@Akashtyagi
Copy link

Akashtyagi commented Jan 17, 2025

Describe the bug
When trying to parse and chunk a 7mb xls file, the Unstructured server takes exponential memory space and crashes for me beyond 10gb.

To Reproduce
Input file -

xlsx4.xls

with open(file_path, "rb") as f:
    files = shared.Files(
        content=f.read(),
        file_name=file_path,
    )


req = operations.PartitionRequest(
    partition_parameters=shared.PartitionParameters(
        files=files,
        chunking_strategy=ChunkingStrategy.BY_TITLE,
        strategy=PartitionStrategy.HI_RES,
        multipage_sections=False,
    )
)


try:
    start = time.time()
    elements_by_page = {}
    print("File name: ", file_path)
    partitioned_data = unstructured_client.general.partition(req)
    end = time.time() - start
    print("Time taken in seconds: ", end)
    # print("Partitioned data:", partitioned_data)
    tables = 0
    for element in partitioned_data.elements:
        if element["type"] == "Table" and element['metadata']['text_as_html'] is not None:
            tables += 1
    print("Total Table counts: ", tables)

except SDKError as sdk_error:
    raise sdk_error

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots

Image

Environment Info
Please run python scripts/collect_env.py and paste the output here.
This will help us understand more about the environment in which the bug occurred.

Additional context
Unstructured Pod Config
Resources:
CPU: 4
Memory: 10000

@Akashtyagi Akashtyagi added the bug Something isn't working label Jan 17, 2025
@Akashtyagi
Copy link
Author

Similar Issue - #2129

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant