Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No support for some Pandas Extension Dtypes #399

Open
Duncan-Hunter opened this issue Jun 19, 2024 · 4 comments
Open

No support for some Pandas Extension Dtypes #399

Duncan-Hunter opened this issue Jun 19, 2024 · 4 comments
Assignees
Labels
bug Something isn't working stale

Comments

@Duncan-Hunter
Copy link
Contributor

Describe the bug
Pandas has extension DTypes. When you fit a Univariate calculator, or presumably anything that else that checks for dtypes using _split_features_by_type, columns are dropped because Int64 is not in

[
        'int_',
        'int8',
        'int16',
        'int32',
        'int64',
        'uint8',
        'uint16',
        'uint32',
        'uint64',
        'float_',
        'float16',
        'float32',
        'float64',
    ]

To Reproduce
Using an environment with nannyml=0.10.7

import numpy as np
import pandas as pd


num_dtypes = [
    'int_',
    'int8',
    'int16',
    'int32',
    'int64',
    'uint8',
    'uint16',
    'uint32',
    'uint64',
    'float_',
    'float16',
    'float32',
    'float64',
    ]

test = pd.Series([1, 2, 3, 4, 5], dtype='Int64')

print("In num_dtypes: ", test.dtype in num_dtypes)
print("in ['Int64']: ", test.dtype in ['Int64'])
print("dtype: ", test.dtype)

test = test.astype(test.dtype.type)

print("new dtype: ", test.dtype)
print("In num_dtypes: ", test.dtype in num_dtypes)
In num_dtypes:  False
in ['Int64']:  True
dtype:  Int64
new dtype:  int64
In num_dtypes:  True

Expected behavior
There should be support for these dtypes, and columns shouldn't be dropped without the user knowing.

Additional context
I'm going to work around the issue by converting my datatypes to underlying numpy types using pd.Series.dtype.type. But for a fix, I think you should use np.issubdtype(dtype.type, np.number).

@Duncan-Hunter Duncan-Hunter added bug Something isn't working triage Needs to be assessed labels Jun 19, 2024
@nnansters
Copy link
Contributor

Hey @Duncan-Hunter ,

good catch, good suggestion. I'll take a look into the np.issubdtype function for a cleaner solution.

Worst case scenario we can always add the extension dtypes to the list above.

@nnansters nnansters removed the triage Needs to be assessed label Jun 19, 2024
Copy link

stale bot commented Aug 18, 2024

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Aug 18, 2024
@nnansters nnansters removed the stale label Aug 23, 2024
Copy link

stale bot commented Oct 24, 2024

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Oct 24, 2024
@stale stale bot closed this as completed Nov 3, 2024
@nnansters nnansters removed the stale label Nov 4, 2024
@nnansters nnansters reopened this Nov 4, 2024
Copy link

stale bot commented Jan 3, 2025

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Jan 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working stale
Projects
None yet
Development

No branches or pull requests

2 participants