Hi Nathan,
Normalizing the input and output values is a well-known technique to
provide training stability to Deep Learning models in general. Usually,
if the input and output variables of the model follow a normal
distribution [N(0,1)], it is easier for deep learning models to learn
the data. This does not necessarily mean that the model won't work if
you don't normalize the data, but it often facilitates the loss of the
model to converge during training. You can see this as an optimization
technique. A common approach is just to measure the mean and standard
deviation on your training dataset (or a representative subset) and
normalize individually each variable by applying a linear transformation
[i.e., (x - mean)/std. dev.]. This is what we did in the ACM SOSR paper.
However, another approach that we experimentally tried is to select the
normalization function based on the value distribution of each variable.
Then, in the case of RouteNet with nodes
(
https://github.com/knowledgedefinednetworking/network-modeling-GNN/blob/mas…)
we analyzed the distribution of delays in our training dataset (not
public yet) and given that the distribution had a lot of density for low
values and a long tail for large values we thought that using a
logarithmic function to normalize could be more beneficial (and it was
eventually) to make the delay distribution seen by RouteNet closer to a
normal distribution.
Note that the parse function in our code reads the data from the dataset
(tfrecords). Then, when you normalize the delay labels
(features['delay']) it means that you take the delay values from the
ground truth (obtained by our packet-level simulator) and normalize
them. To train RouteNet we need supervised data (with labels), then we
normalize the labels from the ground truth (features['delay']) and use
them to train the model.
Once RouteNet is trained you can use it to produce predictions. In this
case the model is fed only by inputs (i.e., traffic, topology, routing)
and it produces an output value. If it was trained with normalized
output labels, it will produce normalized outputs and it is necessary to
apply denormalization to get the real values. In principle, during
evaluation you can ignore the delay values (features['delay']) in the
parse function given that they are the labels used for training (i.e.,
the output of our simulator).
Regarding the code in the following link:
https://github.com/knowledgedefinednetworking/demo-routenet/blob/master/cod…
The parse function was used to train the RouteNet model. In this case we
normalized the data before we introduced it to RouteNet. It means that,
once the model is trained, we need to denormalize the output delay
values of RouteNet to obtain the actual delay predictions. Then, in our
demo notebook
(
https://github.com/knowledgedefinednetworking/demo-routenet/blob/master/dem…)
we loaded a model already trained. Note that to make predictions with
RouteNet, the training delay labels (features['delay']) are not needed
anymore. However, in this demo we wanted to compare the predictions made
by RouteNet (after denormalization) with the values produced by our
simulator in a separate dataset that wasn't used for the training. Then,
we retrieve the actual delay values of the simulator (features['delay'])
without denormalization. Also, since we normalized the output labels of
RouteNet during training (in the function parse in
"routenet_with_link_cap.py"), we need to apply the reverse function to
denormalize the delays:
_Training_ ("demo-routenet/code/routenet_with_link_cap.py"):
parse function:
if k == 'delay':
features[k] = (features[k] - 0.37) / 0.54 --> Delay
labels normalized used to train RouteNet
_Evaluation_ ("demo-routenet/demo_notebooks/demo.ipynb"):
pred_Delay, label_Delay = sess.run([predictions, label])
"label" --> Real values produced by the simulator (label =
features[target] = features['delay'])
predictions = 0.54*preds + 0.37 --> Denormalization applied to RouteNet
predictions during the evaluation phase to obtain the real delay values
predicted. Note that this is the reverse function of the one applied in
the training (function parse in
"demo-routenet/code/routenet_with_link_cap.py").
I hope this will clarify all your doubts.
Regards,
José
El 6/10/19 a las 09:55, Nathan Sowatskey escribió:
Thank you José.
I am trying to make sense of this, so I appreciate you bearing with
me. The normalisation seems to be fundamental to how RouteNet is used,
but also seems not to be discussed anywhere, so I just have the clues
in the code to work with.
In my experiments, I have performed the training with the delay input
normalised, and not normalised.
Predictions with Delay Normalised as an Input Variable
-------------------------------------------------------------------------
If I normalise the delay input value, and (de)normalise the
predictions, I get results like this (this is with a checkpoint 51211).
Looking at these results, one could draw the conclusion that the
end-to-end process is working as expected. What is not explained,
though, is the reasoning behind the normalisation.
You do say in your response below, though, that “it is not necessary
to normalise the output parameters (i.e., delay) since we are not
using them for training”, which seems perfectly sensible. Given this,
common sense, point, I don’t understand why the delay is normalised in
the parse function here:
https://github.com/knowledgedefinednetworking/demo-routenet/blob/master/cod…
That parse function is used when reading the TF records data and
passing it to the model during training. So, transforming the delay
value at that point doesn’t make sense given that we are not using it
for prediction.
Since the demo notebook does also (de)normalise the predictions, it
seems as though the trained model in the supplied checkpoint 260380
actually was trained with the delay value normalised, and so it had to
be (de)normalised in the notebook, which is the result I have
reproduced here.
There is no explicit rationale provided for normalising the target
variable, or the other input variables, though. If this is required
here, then it might be for scaling purposes. If that is the case, then
I would suppose that there is some analysis that you have done to
determine the appropriate scaling factors. If that analysis does
exist, it might have been published somewhere I have not found yet.
What might, also, make sense is if the delay is normalised when the
code is being used with jitter as the prediction target, but the code
seems to be all about delay at this stage.
As I note above, since the normalisation appears not to be discussed
anywhere, I only have the clues in the code to go on.
Predictions without Delay Normalised as an Input Variable
-----------------------------------------------------------------------------
For comparison, I have also trained without the delay value
normalised, but with the traffic and link_capacity normalised, as
below in the parse function:
if feature == 'traffic':
features[feature] = (features[feature] - 0.17) / 0.13
if feature == 'link_capacity':
features[feature] = (features[feature] - 25.0) / 40.0
I also do not normalise the predictions, and I get results like this
from a checkpoint 5904 (which is to say an early stage in the training):
Given these results, it looks like the model I am training, without
normalising delay either as an input variable, or denormalising the
delay predictions, is providing reasonable predictions (given that it
has only been trained for a relatively small number of training steps).
Conclusion
---------------
A possible conclusion is simply that I have broken something in the
way in which I have refactored your original code. You will see,
though, that I have been careful to also write extensive unit and
smoke tests, so I have confidence that my refactoring has not changed
the original functions of the code.
My code, though, seems to produce reasonable predictions for delay
without normalisation of the delay. Of course, I am still using your
original normalisation for the traffic and link capacity, without
understanding why yet.
What are your thoughts please? Has the normalisation that you are
employing been discussed anywhere?
Many thanks
Nathan
> On 30 Sep 2019, at 12:13, José Suárez-Varela <jsuarezv(a)ac.upc.edu
> <mailto:jsuarezv@ac.upc.edu>> wrote:
>
>
> Hi Nathan,
>
> The code in the demo notebook is used only for delay inference, not
> for training. In this code, we load a model that we trained using the
> RouteNet implementation in "routenet_with_link_cap.py". Then, we load
> samples from our datasets (generated with our packet-level
> simulator), make the inference with the RouteNet model and finally
> compare RouteNet's predictions with the values of our ground truth.
>
> In this case it is not necessary to normalize the output parameters
> (i.e., delay) since we are not using them for training. We only
> normalize the input parameters of RouteNet (traffic and link
> capacities). Note that we then denormalize RouteNet's predictions to
> compare them with the real (denormalized) delay values of the ground
> truth (variable "label_Delay"):
>
> predictions = 0.54*preds + 0.37
>
>
>
> Regards,
>
> José
>
> El 28/09/19 a las 17:28, Nathan Sowatskey escribió:
>> Hi
>>
>> I have noted that the normalisation applied in the demo notebook here:
>>
>>
https://github.com/knowledgedefinednetworking/demo-routenet/blob/master/dem…
>>
>> Does not apply the same normalisation as the code here:
>>
>>
https://github.com/knowledgedefinednetworking/demo-routenet/blob/master/cod…
>>
>> Specifically, delay is not normalised in the demo notebook.
>>
>> The demo notebook loads a checkpoint from here:
>>
>>
https://github.com/knowledgedefinednetworking/demo-routenet/tree/master/tra…
>>
>> This model, then, was created without normalising the delay also.
>> That implies that the code that was used to train that model is not
>> the same code that is in the routenet_with_link_cap.py code at
>> the link above.
>>
>> In simpler terms, the demo notebook prediction does not work if the
>> delay is normalised as at routenet_with_link_cap.py#L85. So, the
>> code for training given in this repository is not compatible with
>> the demo notebook and the trained model used as an example.
>>
>> Regards
>>
>> Nathan
>>
>>
>> _______________________________________________
>> Kdn-users mailing list
>>
>> Kdn-users(a)knowledgedefinednetworking.org
>>
https://mail.n3cat.upc.edu/cgi-bin/mailman/listinfo/kdn-users