Getting Error while running csv example for my file #136

purnima1612 · 2024-04-15T02:29:15Z

Hello all ,
I am trying to run csv exmaple for my file which has 850 records . Also I am trying to find duplicates based on custom function which Levenshtein distance . Trying to group all names under one entity_num which shre match of name more than 80% .

While preparning data I changed smaple size to 50
deduper.prepare_training(data_d,sample_size=50 )

after I finish labeling I am getting following error


Traceback (most recent call last):
  File "C:\Python_Projects\Python_extra_code\csv_example.py", line 132, in <module>
    deduper.train()
  File "C:\Dev\Python3.11\Lib\site-packages\dedupe\api.py", line 1215, in train
    self.predicates = self.active_learner.learn_predicates(recall, index_predicates)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Dev\Python3.11\Lib\site-packages\dedupe\labeler.py", line 397, in learn_predicates
    return self.blocker.learn_predicates(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Dev\Python3.11\Lib\site-packages\dedupe\labeler.py", line 136, in learn_predicates
    return self.block_learner.learn(
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Dev\Python3.11\Lib\site-packages\dedupe\training.py", line 72, in learn
    candidate_cover = self.random_forest_candidates(
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Dev\Python3.11\Lib\site-packages\dedupe\training.py", line 112, in random_forest_candidates
    sample_predicates = random.sample(predicates, pred_sample_size)
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Dev\Python3.11\Lib\random.py", line 453, in sample
    raise ValueError("Sample larger than population or is negative")
ValueError: Sample larger than population or is negative

Process finished with exit code 1

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Getting Error while running csv example for my file #136

Getting Error while running csv example for my file #136

purnima1612 commented Apr 15, 2024

Getting Error while running csv example for my file #136

Getting Error while running csv example for my file #136

Comments

purnima1612 commented Apr 15, 2024