You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Couldn't find 'csv_example_training.json' in the repo, so used 'csv_input_with_true_ids.csv'. There was no setting file either so couldn't use that (commented out in code as shared below).
Made sure to use consoleLabel() instead of console_label().
Followed the steps in csv_example.py. Active learning got initiated but the program terminates without error message.
The code is below:
##################################################
import os
import csv
import re
import logging
import optparse
#as of 2.0 this method is called console_label() but in 1.x it was called consoleLabel(), that difference may account for the error. Now updated to consoleLabel
dedupe.consoleLabel(deduper)
deduper.train()
with open(training_file, 'w') as tf:
deduper.write_training(tf)
Couldn't find 'csv_example_training.json' in the repo, so used 'csv_input_with_true_ids.csv'. There was no setting file either so couldn't use that (commented out in code as shared below).
Made sure to use consoleLabel() instead of console_label().
Followed the steps in csv_example.py. Active learning got initiated but the program terminates without error message.
The code is below:
##################################################
import os
import csv
import re
import logging
import optparse
import dedupe
from unidecode import unidecode
def preProcess(column):
def readData(filename):
example
path = '/Users/asuri/Downloads/dedupe-examples-master/csv_example/'
filename = 'csv_example_messy_input.csv'
#######################################
if name == 'main':
#as of 2.0 this method is called console_label() but in 1.x it was called consoleLabel(), that difference may account for the error. Now updated to consoleLabel
print('clustering...')
clustered_dupes = deduper.partition(data_d, 0.5)
print('# duplicate sets', len(clustered_dupes))
cluster_membership = {}
for cluster_id, (records, scores) in enumerate(clustered_dupes):
for record_id, score in zip(records, scores):
cluster_membership[record_id] = {
"Cluster ID": cluster_id,
"confidence_score": score
}
with open(output_file, 'w') as f_output, open(input_file) as f_input:
The text was updated successfully, but these errors were encountered: