Re: [Challenge-kdn] Question about RouteNet code on GitHub - Challenge-kdn

16 Jul 2020

Dear Minh,
Please, find my answers inline:
...
  Dear GNNet challenge 2020 organizing team,
 My name is Minh, one of the participants in the challenge. I have
 questions about the RouteNet code on GitHub that I want to ask as follows:
 1) Can you please provide a requirements.txt file of your Python
 environment? I'm aware that you wrote the codes depend on
 tensorflow=2.1.0, networks>=2.4, and pandas >=0.24, but I often get
 different errors when running on different machines, so I think a
 requirements.txt file will make it easier not only for me but also for
 all other participants. 
We are aware that the process to install all the libraries may be a bit
confusing. However, we decided not to put a 'requirements.txt' file
because we think it can be more error-prone. The main problem comes when
you install TensorFlow in a machine with a CUDA-enabled GPU card, since
you need to have the right CUDA and CuDNN versions pre-installed. These
libraries depend on the operating system and the version you use.
Thus, the most laborious part is to install correctly CUDA and CuDNN,
but you can do it in a pretty systematic way following this TensorFlow
tutorial:
https://www.tensorflow.org/install/gpu
...
  2) What is the terminating point of RouteNet as a
baseline for the
 challenge? From what I see, it only stops after 5 million steps, which
 takes an incredibly long time, should this be the terminating point of
 the algorithm? And the results after 5 million steps are the baseline
 results for the competition? 
This is just an arbitrary upper bound limit we set in the
implementation. The idea is that you can stop the execution whenever you
want based on the training progress, and the last models (checkpoints)
are automatically saved in the "../logs/model_log" directory. Note that
after 400k steps the model has iterated over all the training dataset.
Further iterations can only help slightly refine the model.
You can take the baseline as a reference implementation to develop your
model. However, with this baseline you can expect a MAPE (Mean Absolute
Percentage Error) above 100%. One main reason is that it does not encode
information about queue scheduling, and this has a great impact on
network delay.
*
*
...
  I look forward to hearing from you soon. Thank you.
 Best regards,
 Minh Nguyen.
 King Abdullah University of Science and Technology (KAUST)
 Al-Khawarizmi Applied Math. Building (Bldg. #1) |  Level 3, Table
 3139-WS01
 Thuwal 23955-6900 | Makkah Province
 Kingdom of Saudi Arabia 
I hope I answered all your questions. Please, let me know if you have
any more questions/comments.
Regards,
José Suárez-varela
Barcelona Neural Networking center - UPC