-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support output
in batch level methods of callaback
#330
Comments
IMO, not really. Please note that, I raised this ticket in general. Some may have a fuctional model (written for classification, object detection etc), and rewriting subclass just for these, looks inconvenient and not intuitive. Having |
Hello @innat, it would require a larger scope change on Keras training/testing APIs before such support is available, and in the meantime overriding of training or testing step is still the best solution. Whether or not we'll make it a default support still needs to be assessed, based on how general such usage is. |
Agree that it would require larger change in the API. Please do the assessment as needed. |
Circling back here as I heard back from the team - currently we don't have a plan to have the predicted output in the batch level callbacks since that may further complicate the training flow as the output may be on the remote workers and not available on chief (and syncing may have performance penalty). For now, test_step override is still the best solution for the original request. |
@rchao
I understand that this may require major change in the API. But rather than complication, I think it would be a very useful feature to have, for example, when we need to unpack the dataloader (generator or tf.data API) to compute some tf or non-tf computation over the model's output. In case of overriding the test_step, I tired once lightly, and faced some issue for eager mode and graph mode training approach (will revisit with details.). |
@innat thanks for the info. If you're interested, can you show me a possible way of achieving this while not adding much complexity to the library, and not breaking any existing tests? |
@rchao Anyway, as it is suggested (by @haifeng-jin ) that the best option here could be overriding class CustomModel(keras.Model):
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
self.val_gt = []
self.val_pred = []
def test_step(self, data):
x, y = data
y_pred = self(x, training=False)
self.compiled_loss(y, y_pred, regularization_losses=self.losses)
self.compiled_metrics.update_state(y, y_pred)
self.val_gt.append(y)
self.val_pred.append(y_pred)
return {m.name: m.result() for m in self.metrics}
class MyCallback(tf.keras.callbacks.Callback):
def on_epoch_begin(self, epoch, logs=None):
print('called on begin')
# reset
self.model.val_gt = []
self.model.val_pred = []
def on_epoch_end(self, epoch, logs=None):
print('called on end')
print(self.model.val_gt)
print(self.model.val_pred)
print() inputs = keras.Input(shape=(32,))
outputs = keras.layers.Dense(1)(inputs)
model = CustomModel(inputs, outputs)
model.compile(optimizer="adam", loss="mse", metrics=["mae"])
# Just use `fit` as usual
x = np.random.random((1000, 32))
y = np.random.random((1000, 1))
model.fit(
x, y, epochs=3, validation_split=0.1, verbose=2,
callbacks=MyCallback()
)
Running this code in graph mode (not eager mode), I've faced issue to get value ( |
With eager mode (works but not in graph mode), cc. @rchao @haifeng-jin class MyCallback(tf.keras.callbacks.Callback):
def on_epoch_begin(self, epoch, logs=None):
print('called on begin ', epoch)
self.model.val_gt = []
self.model.val_pred = []
def on_epoch_end(self, epoch, logs=None):
print('called on end ', epoch)
print(len(self.model.val_gt), len(self.model.val_pred))
gt = [i.numpy()[-1] for a in self.model.val_gt for i in a]
pred = [i.numpy()[-1] for a in self.model.val_pred for i in a]
print(gt)
print(pred)
print()
inputs = keras.Input(shape=(32,))
outputs = keras.layers.Dense(1)(inputs)
model = CustomModel(inputs, outputs)
model.compile(
optimizer="adam", loss="mse", metrics=["mae"], run_eagerly=True
)
x = np.random.random((1000, 32))
y = np.random.random((1000, 1))
x_test = np.random.random((10, 32))
y_test = np.random.random((10, 1))
model.fit(
x, y,
epochs=2,
validation_data=(x_test, y_test),
verbose=2,
batch_size=4,
callbacks=MyCallback()
)
|
I don't understand what is exactly the goal here. Are you just looking to access tf dataset in the callback like in https://stackoverflow.com/questions/64128947/how-to-access-tf-data-dataset-within-a-keras-custom-callback ? |
Not just the input (x, y) but also the prediction (y_pred) from callback with given dataset. (The design is mostly inspired by the pytorch-lightning. ) on_(train|test|predict)_batch_end(self, output, batch_id, logs=None)
outputs["image_path"],
outputs["image"],
outputs["gt_label"],
outputs["prediction"], cc. @haifeng-jin
|
For the output are you re-proposing again the old keras-team/keras#3469? |
For that issue, I think using on_(train|test|predict)_batch_end(self, output, batch_id, logs=None)
outputs["image_path"],
outputs["image_tensor"],
outputs["target_label"],
outputs["predicted_label"],
... So, if we run |
Do you have a minimal self-contained colab or gist to reproduce the error? |
Already mentioned previous message, here. model.compile(
optimizer="adam", loss="mse", metrics=["mae"],
run_eagerly=True # False causes issue
) |
Not really, in that callback forward pass is repeated. model_train_step = model.train_step
def outer_train_step(data):
# https://github.com/keras-team/keras/blob/v2.7.0/keras/engine/training.py
x, y_true, w = keras.utils.unpack_x_y_sample_weight(data)
self.x.assign(x)
if w is not None:
self.w.assign(w)
self.y_true.assign(y_true)
result = model_train_step(data)
y_pred = model(x)
self.y_pred.assign(y_pred)
return result |
Instead of import numpy as np
import tensorflow as tf
from tensorflow import keras
from tensorflow.experimental import numpy as tnp
class CustomModel(keras.Model):
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
self.val_x = tf.Variable((
tnp.empty((0, 32), dtype=tf.float32)), shape=[None, 32]
)
self.val_gt = tf.Variable(
tnp.empty((0, 1), dtype=tf.float32), shape=[None, 1]
)
self.val_pred = tf.Variable(
tnp.empty((0, 1), dtype=tf.float32), shape=[None, 1]
)
def test_step(self, data):
x, y = data
y_pred = self(x, training=False)
self.compiled_loss(y, y_pred, regularization_losses=self.losses)
self.compiled_metrics.update_state(y, y_pred)
self.val_x.assign(
tf.concat([self.val_x, x], axis=0)
)
self.val_gt.assign(
tf.concat([self.val_gt, y], axis=0)
)
self.val_pred.assign(
tf.concat([self.val_pred, y_pred], axis=0)
)
return {m.name: m.result() for m in self.metrics} class MyCallback(keras.callbacks.Callback):
def on_epoch_begin(self, epoch, logs=None):
print('called on begin ', epoch)
self.model.val_x.assign(
tf.Variable((
tnp.empty((0, 32), dtype=tf.float32)), shape=[None, 32]
)
)
self.model.val_gt.assign(
tf.Variable(
tnp.empty((0, 1), dtype=tf.float32), shape=[None, 1]
)
)
self.model.val_pred.assign(
tf.Variable(
tnp.empty((0, 1), dtype=tf.float32), shape=[None, 1]
)
)
def on_epoch_end(self, epoch, logs=None):
print('called on end ', epoch)
print(self.model.val_gt.numpy())
print(self.model.val_pred.numpy())
print(self.model.val_x.numpy().shape) inputs = keras.Input(shape=(32,))
outputs = keras.layers.Dense(1)(inputs)
model = CustomModel(inputs, outputs)
model.compile(
optimizer="adam", loss="mse", metrics=["mae"], run_eagerly=0
)
x = np.random.random((1000, 32))
y = np.random.random((1000, 1))
x_test = np.random.random((10, 32))
y_test = np.random.random((10, 1))
model.fit(
x,
y,
epochs=5,
validation_data=(x_test, y_test),
verbose=2,
batch_size=4,
callbacks=[
MyCallback(),
]
) Epoch 1/5
called on begin 0
250/250 - 1s - loss: 0.3816 - mae: 0.4723 - val_loss: 0.0360 - val_mae: 0.1434
called on end 0
[0.82510984 0.8162336 0.94785255 0.06877889 0.05006607 0.4200096
0.8379941 0.23999517 0.32227454 0.12522219]
[ 0.81243783 0.96750426 0.9465156 0.5163584 -0.14448965 0.4210961
1.0099922 0.39414424 0.4558667 0.2907895 ]
(10, 32)
Epoch 2/5
called on begin 1
250/250 - 0s - loss: 0.1617 - mae: 0.3254 - val_loss: 0.0380 - val_mae: 0.1485
called on end 1
[0.82510984 0.8162336 0.94785255 0.06877889 0.05006607 0.4200096
0.8379941 0.23999517 0.32227454 0.12522219]
[ 0.81986165 0.94598633 0.9096545 0.5447165 -0.08035474 0.44689882
0.9836677 0.41099367 0.47807842 0.33092263]
(10, 32)
Epoch 3/5
called on begin 2
250/250 - 0s - loss: 0.1455 - mae: 0.3091 - val_loss: 0.0398 - val_mae: 0.1494
called on end 2
[0.82510984 0.8162336 0.94785255 0.06877889 0.05006607 0.4200096
0.8379941 0.23999517 0.32227454 0.12522219]
[ 0.8137208 0.91400844 0.8605845 0.564705 -0.01779287 0.46524122
0.94323516 0.41769716 0.48927492 0.36420074]
(10, 32)
Epoch 4/5
called on begin 3
250/250 - 0s - loss: 0.1296 - mae: 0.2932 - val_loss: 0.0290 - val_mae: 0.1234
called on end 3
[0.82510984 0.8162336 0.94785255 0.06877889 0.05006607 0.4200096
0.8379941 0.23999517 0.32227454 0.12522219]
[ 0.71483564 0.8042053 0.73826283 0.48975974 -0.0262439 0.39364785
0.8218028 0.34616536 0.39640903 0.30753985]
(10, 32)
Epoch 5/5
called on begin 4
250/250 - 0s - loss: 0.1177 - mae: 0.2815 - val_loss: 0.0360 - val_mae: 0.1352
called on end 4
[0.82510984 0.8162336 0.94785255 0.06877889 0.05006607 0.4200096
0.8379941 0.23999517 0.32227454 0.12522219]
[0.72407675 0.78460646 0.70396656 0.5215519 0.04386391 0.4255894
0.7968867 0.3694296 0.42922068 0.35909805]
(10, 32)
<keras.callbacks.History at 0x7f8ad85bad10> |
I run the above code in CPU and it gives target and prediction as its coded. But in GPU mode, I set import tensorflow as tf
tf.config.optimizer.set_jit(True) And the same code gives the following error
In the above error message, the |
Yes the XLA bridge requires a constant shape: |
Oh, error and error. 🤕
logs
|
@bhack is it a known issue or expected behaviour? Any workaround, for example using |
Mhh.. I don't think so, you could try to check tensorflow/tensorflow#47170 |
Have you tried with the new bridge https://www.tensorflow.org/api_docs/python/tf/config/experimental/enable_mlir_bridge?version=nightly ? |
Just tried as follows import tensorflow as tf
tf.config.optimizer.set_jit(True)
tf.config.experimental.enable_mlir_bridge()
....
|
Is it the same inverting the order? |
Tried both. But only with # using both (gives error)
tf.config.optimizer.set_jit(True)
tf.config.experimental.enable_mlir_bridge()
# only (works)
tf.config.experimental.enable_mlir_bridge() added |
Is it running with 2.11? |
Colab crashed (no clue why). I'm using kaggle env, where it is 2.6. That's why I installed it in colab too. |
With nightly the compiler crashed ---> 52 tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name, 53 inputs, attrs, num_outputs) 54 except core._NotOkStatusException as e: InternalError: Graph execution error: RET_CHECK failure (tensorflow/compiler/xla/service/gpu/gpu_compiler.cc:626) dnn != nullptr [[{{node cluster_1_1/xla_compile}}]] [Op:__inference_train_function_1394] |
Probably the same reason this approach also doesn't work on TPU (doesn't support dynamic shape). tensorflow/tensorflow#59511 , kinda misleading error message too. cc. @rchao @haifeng-jin |
UpdateCreating variables within CPU and outside the strategy scope works. with tf.device('/CPU:0'):
val_gt = tf.Variable(
tnp.empty((0, 1), dtype=tf.float32), shape=[None, 1], trainable=False
)
val_pred = tf.Variable(
tnp.empty((0, 1), dtype=tf.float32), shape=[None, 1], trainable=False
)
class CustomModel(keras.Model):
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
... |
@rchao Hi, |
System information.
TensorFlow version (you are using): 2.9
Are you willing to contribute it (Yes/No) : No (It's not clear from where to start; it may also required API desgin level discussion.)
Describe the feature and the current behavior/state
Currently, if we want to use a data set in callback, we usually pass it to the corresponding callback's constructor. That dataset can be validation set or any special dataset that we want to
.evaluate
or.predict
, etc. The feature of current callback methods are listed here.Now, I've encountered a case, where I need to have the following from validation set.
image_path
,image_tensor
,target
,predicted
I've a
.fit
method as follows, and I need to use a callback for validation (on_test_batch_end
) data to further save the image-tensor as image-file with predicted value (here, which is mask) and various type of metrics calculation at instance level (samplw-wise).validation_data
in.fit
and another time in callback, looks unncessary. Also, there can be multiple callback which may require the same.image_paths
,tensor
,target
andprediction
, [and so on] ) from callback-level method.Will this change the current api? How?
Currently
Maybe,
Here the term output ( also in the post title) inspired from pytorch-lightning and just for demonstration purpose).
Who will benefit from this feature?
keras-user.
Contributing
The text was updated successfully, but these errors were encountered: