Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: add isin for elementwise set inclusion test #854

Open
lucascolley opened this issue Nov 21, 2024 · 4 comments
Open

RFC: add isin for elementwise set inclusion test #854

lucascolley opened this issue Nov 21, 2024 · 4 comments
Labels
API extension Adds new functions or objects to the API. RFC Request for comments. Feature requests and proposed changes.

Comments

@lucascolley
Copy link
Member

lucascolley commented Nov 21, 2024

Prior art

Motivation

This function is used in scikit-learn. They've implemented it in terms of the standard, and that implementation could find a home in array-api-extra: data-apis/array-api-extra#34. @asmeurer suggested there that we should also consider adding this to the standard.

@asmeurer
Copy link
Member

Another potential reason for adding it is that it uses a nontrivial implementation which depends on some heuristics based on the input size.

@rgommers
Copy link
Member

Thanks @lucascolley. I've added ndonnx (which has it) and MLX (which doesn't) to the issue description.

This seems like a very reasonable proposal to me. Implementing isin in terms of other primitives in the standard is a little complex indeed.

The return type should always be a boolean array. The NumPy docs say it can be a bool for a single input element, but that's actually a bug in the docs. I checked NumPy, JAX, and PyTorch and all return a 0-D array.

@rgommers rgommers added the API extension Adds new functions or objects to the API. label Nov 22, 2024
@kgryte kgryte added the RFC Request for comments. Feature requests and proposed changes. label Nov 22, 2024
@kgryte kgryte added this to Proposals Nov 22, 2024
@kgryte kgryte moved this to Stage 0 in Proposals Nov 22, 2024
@rgommers
Copy link
Member

The thing to discuss here is what keywords are desired I think. NumPy and Dask use:

def isin(element, test_elements, assume_unique=False, invert=False, *, kind=None)

The private scikit-learn implementation here is:

def isin(element, test_elements, xp, assume_unique=False, invert=False)

JAX:

def isin(element, test_elements, assume_unique=False, invert=False, *, method='auto')

PyTorch:

def isin(elements, test_elements, *, assume_unique=False, invert=False)

ndonnx:

def isin(x: Array, items: Sequence[Scalar]) -> Array

The assume_unique and invert keywords seem easy to support and useful. So this should probably work:

def isin(x: Array, test_elements: Array, /, *, assume_unique : bool = False, invert : bool =False) -> Array[bool]

The type of test_elements is a bit TBD, could be a union between arrays and sequences perhaps?

@rgommers
Copy link
Member

We discussed this in the community meeting today. A summary with a couple of points to follow up on:

  • In general, folks were 👍🏼 on adding isin
  • For the second argument, accepting arrays and scalars but not sequences seemed preferred.
  • Some discussion about promotion behavior, this needs to be specified. Since the semantics of isin are element-wise comparison like, it is probably a good idea to match what == does.
  • For the second argument: limit to 1-D, or accept any shape and then reshape to 1-D? NumPy does the latter.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API extension Adds new functions or objects to the API. RFC Request for comments. Feature requests and proposed changes.
Projects
Status: Stage 0
Development

No branches or pull requests

4 participants