Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

can you please add a script to test the bert trained model ? #4

Open
ankitkr3 opened this issue Aug 21, 2021 · 2 comments
Open

can you please add a script to test the bert trained model ? #4

ankitkr3 opened this issue Aug 21, 2021 · 2 comments

Comments

@ankitkr3
Copy link

Thanks for this amazing work, can you please add a script for testing the saved lstm model with bert featurizer? @Gaurav-Pande

@ankitkr3 ankitkr3 changed the title can you please add a script to set the bert trained model ? can you please add a script to test the bert trained model ? Aug 21, 2021
@ankitkr3
Copy link
Author

Just for more clarity so for example i have saved set_count_1.h5 model for the first set and similarly i want to test more data with the help of this model for the same set. but I am not able to produce a script.

@ankitkr3
Copy link
Author

ankitkr3 commented Aug 21, 2021

I have worked on a script, let me know if it make sense:

`import time
import torch
import transformers as ppb
import warnings
warnings.filterwarnings('ignore')

cuda = torch.device('cuda')

For DistilBERT:

model_class, tokenizer_class, pretrained_weights = (ppb.DistilBertModel, ppb.DistilBertTokenizer, 'distilbert-base-uncased')
tokenizer = tokenizer_class.from_pretrained(pretrained_weights)
model = model_class.from_pretrained(pretrained_weights)
with torch.cuda.device(cuda):
test_essays = demo_df['essay']
sentences = []
tokenize_sentences = []
train_bert_embeddings = []

tokenized_test = test_essays.apply((lambda x: tokenizer.encode(x, add_special_tokens=True ,max_length=200)))

max_len = 0
for i in tokenized_test.values:
if len(i) > max_len:
max_len = len(i)
padded_test = np.array([i + [0]*(max_len-len(i)) for i in tokenized_test.values])
attention_mask_test = np.where(padded_test != 0, 1, 0)
test_input_ids = torch.tensor(padded_test)
test_attention_mask = torch.tensor(attention_mask_test)

with torch.no_grad():
last_hidden_states_test = model(test_input_ids, attention_mask=test_attention_mask)

test_features = last_hidden_states_test[0][:,0,:].numpy()

train_x,train_y = train_features.shape
test_x,test_y = test_features.shape

testDataVectors = np.reshape(test_features,(test_x,1,test_y))

lstm_model.load_weights("./model_weights/final_lstm1.h5")
preds = lstm_model.predict(testDataVectors)
print(int(np.around(preds)))
`

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant