how to proper log in training_step or on_training_batch_end in DDP #20098
Unanswered
huangfu170
asked this question in
DDP / multi-GPU / multi-node
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I use 4 gpus in 1 nodes with a NLP task, I wonder to know, how to log the training loss at step level to the Tensorboard, I use the following code, but it didn't work and only output the loss at the end of epoch:` def training_step(self,batch,batch_idx):
input_ids, attention_mask, label, label_input_ids, label_attention_mask, edge_index, cp_input_ids, cp_attention_mask = self.unzip_batch(batch)
sim, outputs,,,, = self(input_ids, attention_mask, label_input_ids, label_attention_mask, edge_index, cp_input_ids, cp_attention_mask)
loss_sim = loss_function(sim, label)
loss_output = loss_function(outputs, label)
loss = loss_sim + loss_output
sim= (torch.sigmoid(sim[:,1:])>=0.8)
outputs=(torch.sigmoid(outputs[:,1:])>=0.8)
Beta Was this translation helpful? Give feedback.
All reactions