ValueError: high is out of bounds for int32 #66

XNarno · 2024-07-17T13:27:20Z

Bug description

The tutorial is not running in the "initialize_active_learner" part related to "Setting up the Active Learner" session.

Steps to reproduce

Running the third notebook https://github.com/webis-de/small-text/blob/main/examples/notebooks/03-active-learning-with-setfit.ipynb

Expected behavior

Not getting this error.

Environment:

Python version: 3.12.4
small-text version: 1.4.0
small-text integrations (e.g., transformers): 4.42.3
PyTorch version (if applicable): /
OS : Windows 10 Enterprise

Installation (pip, conda, or from source): pip in a conda env
CUDA version (if applicable): /

Addition information

The error message :

`ValueError Traceback (most recent call last)
Cell In[26], line 27
22 active_learner.initialize_data(x_indices_initial, y_initial)
24 return x_indices_initial
---> 27 initial_indices = initialize_active_learner(active_learner, train.y)
28 labeled_indices = initial_indices

Cell In[26], line 22
19 #x_indices_initial = x_indices_initial.astype(int)
20 y_initial = y_train_int[x_indices_initial]
---> 22 active_learner.initialize_data(x_indices_initial, y_initial)
24 return x_indices_initial

File c:\Users\XXX\AppData\Local\anaconda3\envs\APIClassifier\Lib\site-packages\small_text\active_learner.py:154, in PoolBasedActiveLearner.initialize_data(self, indices_initial, y_initial, indices_ignored, indices_validation, retrain)
151 self.indices_ignored = np.empty(shape=(0), dtype=int)
153 if retrain:
--> 154 self._retrain(indices_validation=indices_validation)

File c:\Users\XXX\AppData\Local\anaconda3\envs\APIClassifier\Lib\site-packages\small_text\active_learner.py:393, in PoolBasedActiveLearner._retrain(self, indices_validation)
390 dataset.y = self.y
392 if indices_validation is None:
--> 393 self._clf.fit(dataset, **self.fit_kwargs)
394 else:
395 indices = np.arange(self.indices_labeled.shape[0])
...
File numpy\random\mtrand.pyx:780, in numpy.random.mtrand.RandomState.randint()

File numpy\random\_bounded_integers.pyx:1423, in numpy.random._bounded_integers._rand_int32()

ValueError: high is out of bounds for int32`

chschroeder · 2024-07-17T14:42:32Z

Hi @XNarno,

Thank you for reporting this! This is an error I haven't seen before. I suspect this is an issue that numpy has with Windows 10.

I will further investigate this. If you want to try the notebook, you could execute it in Google Colab for now.

chschroeder · 2024-07-30T20:15:18Z

This is (partly) a peculiarity of Windows 10 and numpy.

For numpy>=2 the solution seems easy
For numpy<2, I am not sure how I want the solution to be

Either way, it will be fixed in small-text 1.4.1.

Signed-off-by: Christopher Schröder <[email protected]>

chschroeder · 2024-08-02T22:42:41Z

I have a fix. Unfortunately, I don't have a system where I could test this. Whenever you have a moment, could you please let me know if the fix is working?

You can install from the v1.4.x branch directly with:

pip install git+https://github.com/webis-de/[email protected]

chschroeder · 2025-01-20T23:03:10Z

See #72 for continuation. Moreover, windows 10 users a welcome to post (full) stack traces and more information on this error.

XNarno added the bug Something isn't working label Jul 17, 2024

chschroeder added this to the small-text-1.4.1 milestone Jul 22, 2024

chschroeder added a commit that referenced this issue Aug 2, 2024

Fix possible out of bounds error for SetFitClassification (#66)

1193e54

Signed-off-by: Christopher Schröder <[email protected]>

chschroeder closed this as completed Aug 18, 2024

chschroeder mentioned this issue Jan 20, 2025

SetFit fails with Numpy >2.0.0 #72

Closed

chschroeder reopened this Jan 21, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ValueError: high is out of bounds for int32 #66

ValueError: high is out of bounds for int32 #66

XNarno commented Jul 17, 2024

chschroeder commented Jul 17, 2024

chschroeder commented Jul 30, 2024

chschroeder commented Aug 2, 2024 •

edited

Loading

chschroeder commented Jan 20, 2025

ValueError: high is out of bounds for int32 #66

ValueError: high is out of bounds for int32 #66

Comments

XNarno commented Jul 17, 2024

Bug description

Steps to reproduce

Expected behavior

Environment:

Addition information

The error message :

chschroeder commented Jul 17, 2024

chschroeder commented Jul 30, 2024

chschroeder commented Aug 2, 2024 • edited Loading

chschroeder commented Jan 20, 2025

chschroeder commented Aug 2, 2024 •

edited

Loading