Hi,
After changing scheduling policy to
WFQ
in the training data, the training script gives the following error when trying to train a model:
Node: 'gradients/rnn_13/TensorArrayUnstack/TensorListFromTensor_grad/TensorListStack'
Operation expected a list with 1 elements but got a list with 2 elements.
[[{{node gradients/rnn_13/TensorArrayUnstack/TensorListFromTensor_grad/TensorListStack}}]] [Op:__inference_train_function_32315]
I created the data using quickstart notebook, the only change is replacing this line in generate_topology():
G.nodes[i]["schedulingPolicy"] = "FIFO"
with this
G.nodes[i]["schedulingPolicy"] = "WFQ"
G.nodes[i]["schedulingWeights"] = '25,25,50'
The docker runs OK on this data but I can't train a model on it due to the above error.
If the policies are FIFO, the training runs without issues.
Could you please let me know how to resolve this?
Thanks,
Yawi
----- here is the full traceback for your reference:
INFO: Starting training from scratch...
Epoch 1/20
Traceback (most recent call last):
File "/home/yawi/dev/GNNetworkingChallenge/api_test.py", line 23, in <module>
main(data_dir, final_evaluation=True)
File "/home/yawi/dev/GNNetworkingChallenge/RouteNet_Fermi/__init__.py", line 101, in main
model.fit(ds_train,
File "/home/yawi/.conda/envs/gnnch/lib/python3.9/site-packages/keras/utils/traceback_utils.py", line 67, in error_handler
raise e.with_traceback(filtered_tb) from None
File "/home/yawi/.conda/envs/gnnch/lib/python3.9/site-packages/tensorflow/python/eager/execute.py", line 54, in quick_execute
tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
tensorflow.python.framework.errors_impl.InvalidArgumentError: Graph execution error:
Detected at node 'gradients/rnn_13/TensorArrayUnstack/TensorListFromTensor_grad/TensorListStack' defined at (most recent call last):
File "/home/yawi/dev/GNNetworkingChallenge/api_test.py", line 23, in <module>
main(data_dir, final_evaluation=True)
File "/home/yawi/dev/GNNetworkingChallenge/RouteNet_Fermi/__init__.py", line 101, in main
model.fit(ds_train,
File "/home/yawi/.conda/envs/gnnch/lib/python3.9/site-packages/keras/utils/traceback_utils.py", line 64, in error_handler
return fn(*args, **kwargs)
File "/home/yawi/.conda/envs/gnnch/lib/python3.9/site-packages/keras/engine/training.py", line 1409, in fit
tmp_logs = self.train_function(iterator)
File "/home/yawi/.conda/envs/gnnch/lib/python3.9/site-packages/keras/engine/training.py", line 1051, in train_function
return step_function(self, iterator)
File "/home/yawi/.conda/envs/gnnch/lib/python3.9/site-packages/keras/engine/training.py", line 1040, in step_function
outputs = model.distribute_strategy.run(run_step, args=(data,))
File "/home/yawi/.conda/envs/gnnch/lib/python3.9/site-packages/keras/engine/training.py", line 1030, in run_step
outputs = model.train_step(data)
File "/home/yawi/.conda/envs/gnnch/lib/python3.9/site-packages/keras/engine/training.py", line 893, in train_step
self.optimizer.minimize(loss, self.trainable_variables, tape=tape)
File "/home/yawi/.conda/envs/gnnch/lib/python3.9/site-packages/keras/optimizers/optimizer_v2/optimizer_v2.py", line 537, in minimize
grads_and_vars = self._compute_gradients(
File "/home/yawi/.conda/envs/gnnch/lib/python3.9/site-packages/keras/optimizers/optimizer_v2/optimizer_v2.py", line 590, in _compute_gradients
grads_and_vars = self._get_gradients(tape, loss, var_list, grad_loss)
File "/home/yawi/.conda/envs/gnnch/lib/python3.9/site-packages/keras/optimizers/optimizer_v2/optimizer_v2.py", line 471, in _get_gradients
grads = tape.gradient(loss, var_list, grad_loss)
Node: 'gradients/rnn_13/TensorArrayUnstack/TensorListFromTensor_grad/TensorListStack'
Operation expected a list with 1 elements but got a list with 2 elements.
[[{{node gradients/rnn_13/TensorArrayUnstack/TensorListFromTensor_grad/TensorListStack}}]] [Op:__inference_train_function_32315]
Process finished with exit code 1
_______________________________________________