You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello all ,
I am trying to run csv exmaple for my file which has 850 records . Also I am trying to find duplicates based on custom function which Levenshtein distance . Trying to group all names under one entity_num which shre match of name more than 80% .
While preparning data I changed smaple size to 50
deduper.prepare_training(data_d,sample_size=50 )
after I finish labeling I am getting following error
Traceback (most recent call last):
File "C:\Python_Projects\Python_extra_code\csv_example.py", line 132, in <module>
deduper.train()
File "C:\Dev\Python3.11\Lib\site-packages\dedupe\api.py", line 1215, in train
self.predicates = self.active_learner.learn_predicates(recall, index_predicates)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Dev\Python3.11\Lib\site-packages\dedupe\labeler.py", line 397, in learn_predicates
return self.blocker.learn_predicates(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Dev\Python3.11\Lib\site-packages\dedupe\labeler.py", line 136, in learn_predicates
return self.block_learner.learn(
^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Dev\Python3.11\Lib\site-packages\dedupe\training.py", line 72, in learn
candidate_cover = self.random_forest_candidates(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Dev\Python3.11\Lib\site-packages\dedupe\training.py", line 112, in random_forest_candidates
sample_predicates = random.sample(predicates, pred_sample_size)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Dev\Python3.11\Lib\random.py", line 453, in sample
raise ValueError("Sample larger than population or is negative")
ValueError: Sample larger than population or is negative
Process finished with exit code 1
The text was updated successfully, but these errors were encountered:
Hello all ,
I am trying to run csv exmaple for my file which has 850 records . Also I am trying to find duplicates based on custom function which Levenshtein distance . Trying to group all names under one entity_num which shre match of name more than 80% .
While preparning data I changed smaple size to 50
deduper.prepare_training(data_d,sample_size=50 )
after I finish labeling I am getting following error
The text was updated successfully, but these errors were encountered: