Hi,
After changing scheduling policy to WFQ in the training data, the training
script gives the following error when trying to train a model:
Node:
'gradients/rnn_13/TensorArrayUnstack/TensorListFromTensor_grad/TensorListStack'
Operation expected a list with 1 elements but got a list with 2 elements.
[[{{node
gradients/rnn_13/TensorArrayUnstack/TensorListFromTensor_grad/TensorListStack}}]]
[Op:__inference_train_function_32315]
I created the data using quickstart notebook, the only change is replacing
this line in generate_topology():
G.nodes[i]["schedulingPolicy"] = "FIFO"
with this
G.nodes[i]["schedulingPolicy"] = "WFQ"
G.nodes[i]["schedulingWeights"] = '25,25,50'
The docker runs OK on this data but I can't train a model on it due to the
above error.
If the policies are FIFO, the training runs without issues.
Could you please let me know how to resolve this?
Thanks,
Yawi
----- here is the full traceback for your reference:
INFO: Starting training from scratch...
Epoch 1/20
Traceback (most recent call last):
File "/home/yawi/dev/GNNetworkingChallenge/api_test.py", line 23, in
<module>
main(data_dir, final_evaluation=True)
File "/home/yawi/dev/GNNetworkingChallenge/RouteNet_Fermi/__init__.py",
line 101, in main
model.fit(ds_train,
File
"/home/yawi/.conda/envs/gnnch/lib/python3.9/site-packages/keras/utils/traceback_utils.py",
line 67, in error_handler
raise e.with_traceback(filtered_tb) from None
File
"/home/yawi/.conda/envs/gnnch/lib/python3.9/site-packages/tensorflow/python/eager/execute.py",
line 54, in quick_execute
tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
tensorflow.python.framework.errors_impl.InvalidArgumentError: Graph
execution error:
Detected at node
'gradients/rnn_13/TensorArrayUnstack/TensorListFromTensor_grad/TensorListStack'
defined at (most recent call last):
File "/home/yawi/dev/GNNetworkingChallenge/api_test.py", line 23, in
<module>
main(data_dir, final_evaluation=True)
File "/home/yawi/dev/GNNetworkingChallenge/RouteNet_Fermi/__init__.py",
line 101, in main
model.fit(ds_train,
File
"/home/yawi/.conda/envs/gnnch/lib/python3.9/site-packages/keras/utils/traceback_utils.py",
line 64, in error_handler
return fn(*args, **kwargs)
File
"/home/yawi/.conda/envs/gnnch/lib/python3.9/site-packages/keras/engine/training.py",
line 1409, in fit
tmp_logs = self.train_function(iterator)
File
"/home/yawi/.conda/envs/gnnch/lib/python3.9/site-packages/keras/engine/training.py",
line 1051, in train_function
return step_function(self, iterator)
File
"/home/yawi/.conda/envs/gnnch/lib/python3.9/site-packages/keras/engine/training.py",
line 1040, in step_function
outputs = model.distribute_strategy.run(run_step, args=(data,))
File
"/home/yawi/.conda/envs/gnnch/lib/python3.9/site-packages/keras/engine/training.py",
line 1030, in run_step
outputs = model.train_step(data)
File
"/home/yawi/.conda/envs/gnnch/lib/python3.9/site-packages/keras/engine/training.py",
line 893, in train_step
self.optimizer.minimize(loss, self.trainable_variables, tape=tape)
File
"/home/yawi/.conda/envs/gnnch/lib/python3.9/site-packages/keras/optimizers/optimizer_v2/optimizer_v2.py",
line 537, in minimize
grads_and_vars = self._compute_gradients(
File
"/home/yawi/.conda/envs/gnnch/lib/python3.9/site-packages/keras/optimizers/optimizer_v2/optimizer_v2.py",
line 590, in _compute_gradients
grads_and_vars = self._get_gradients(tape, loss, var_list, grad_loss)
File
"/home/yawi/.conda/envs/gnnch/lib/python3.9/site-packages/keras/optimizers/optimizer_v2/optimizer_v2.py",
line 471, in _get_gradients
grads = tape.gradient(loss, var_list, grad_loss)
Node:
'gradients/rnn_13/TensorArrayUnstack/TensorListFromTensor_grad/TensorListStack'
Operation expected a list with 1 elements but got a list with 2 elements.
[[{{node
gradients/rnn_13/TensorArrayUnstack/TensorListFromTensor_grad/TensorListStack}}]]
[Op:__inference_train_function_32315]
Process finished with exit code 1