This series of patches introduce a correction in tq metric and forwarding rules of ogm packets.
In the current configuration routing loops scenario can emerge due to some conceptual errors such the global tq window and the forwarding of ogm coming from sub-optimal paths substituting ogm tq with the average of current router. An example scenario has been realized also in simulation using qemu.
Can be demonstrated as already done for DSDV[1], that is a protocol very similar to batman, that a proper forwarding policy together with a strict control on metric monotonicity (as suggested in [2]) are sufficient to guarantee loop-freeness for the protocol.
I apologize in advance if the source style is not the best possible, but I hope that the ideas will applied after a phase of code revision.
Cheers, Daniele Furlan
[1] C. Perkins, P. Bhagwat, ”Highly-dynamic destination-sequenced distance-vector routing (DSDV) for mobile computers”, Proc. SIGCOMM ’94, pp. 234-244 [2] Y. Yang, J. Wang, R. Kravets, ”Designing routing metrics for mesh networks”, In WiMesh, 2005.
On Thursday 17 November 2011 16:15:42 Daniele Furlan wrote:
This series of patches introduce a correction in tq metric and forwarding rules of ogm packets.
In the current configuration routing loops scenario can emerge due to some conceptual errors such the global tq window and the forwarding of ogm coming from sub-optimal paths substituting ogm tq with the average of current router. An example scenario has been realized also in simulation using qemu.
Interesting statement but maybe it would also be interesting for others to provide examples and explanations.
Can be demonstrated as already done for DSDV[1], that is a protocol very similar to batman, that a proper forwarding policy together with a strict control on metric monotonicity (as suggested in [2]) are sufficient to guarantee loop-freeness for the protocol.
I apologize in advance if the source style is not the best possible, but I hope that the ideas will applied after a phase of code revision.
Please read and understand http://www.open-mesh.org/wiki/open-mesh/Contribute#Submitting-patches
I refer here to checking of your patches using simple static analyzers and the documentation part.
Also things like
+ /* update last originator seqno received from this neighbour */ + if ((set_mark) && + (batman_ogm_packet->seqno > tmp_neigh_node->last_seqno)) + tmp_neigh_node->last_seqno = batman_ogm_packet->seqno; +
are simply wrong. Have you thought about overflow of sequence numbers (hint: seq_before/after)? Ok, forget about the code and look at the rest.
You don't give any hints about implications of your changes in the commit messages (no, something like "everything will be better because I say it" doesn't count).
Take for example the stuff prepared by Simon. He gathered information, talked to the people and prepared documentation. I can now use this stuff to discuss with him at a higher level than with the stuff you provided. And his stuff is only a "small" optional feature, but your patches changes things in the path selection and information propagation code.
Maybe you should contact everyone (marec, d0tslash, ordex, ...) on IRC and talk about the actual concepts (and also their concepts... ttvn *hinthinthint*) and prepare actual documentation based on that.
Kind regards, Sven
Hi,
2011/11/17 Sven Eckelmann sven@narfation.org:
On Thursday 17 November 2011 16:15:42 Daniele Furlan wrote:
This series of patches introduce a correction in tq metric and forwarding rules of ogm packets.
In the current configuration routing loops scenario can emerge due to some conceptual errors such the global tq window and the forwarding of ogm coming from sub-optimal paths substituting ogm tq with the average of current router. An example scenario has been realized also in simulation using qemu.
Interesting statement but maybe it would also be interesting for others to provide examples and explanations.
This argument will be a section of my master thesis. As soon as I have some more material I will forward it to the community together with the script used for testing with qemu.
Can be demonstrated as already done for DSDV[1], that is a protocol very similar to batman, that a proper forwarding policy together with a strict control on metric monotonicity (as suggested in [2]) are sufficient to guarantee loop-freeness for the protocol.
I apologize in advance if the source style is not the best possible, but I hope that the ideas will applied after a phase of code revision.
Please read and understand http://www.open-mesh.org/wiki/open-mesh/Contribute#Submitting-patches
I refer here to checking of your patches using simple static analyzers and the documentation part.
Also things like
- /* update last originator seqno received from this neighbour */
- if ((set_mark) &&
- (batman_ogm_packet->seqno > tmp_neigh_node->last_seqno))
- tmp_neigh_node->last_seqno = batman_ogm_packet->seqno;
are simply wrong. Have you thought about overflow of sequence numbers (hint: seq_before/after)? Ok, forget about the code and look at the rest.
I perfectly know that the code I provided is a mess. But it works and would solve the problems of loops and convergency, and this can be verified by anybody before concentrating on coding technicality.
You don't give any hints about implications of your changes in the commit messages (no, something like "everything will be better because I say it" doesn't count).
As already said I will provide soom formal proofs of NON loop-freeness of current BATMAN implementation together with a proposal of changes that guarantees FORMALLY loop-freeness.
Take for example the stuff prepared by Simon. He gathered information, talked to the people and prepared documentation. I can now use this stuff to discuss with him at a higher level than with the stuff you provided. And his stuff is only a "small" optional feature, but your patches changes things in the path selection and information propagation code.
My modifications are not as drastic as you are saying. They simply remove formally wrong mechanisms such global TQ window and the forwarding of ogm not coming from the actual router.
Maybe you should contact everyone (marec, d0tslash, ordex, ...) on IRC and talk about the actual concepts (and also their concepts... ttvn *hinthinthint*) and prepare actual documentation based on that.
As soon as I have some more material I will expose my concept clearly, meanwhile if someone want to test my patch is free to do that.
Kind regards, Sven
Thanks for your comments! Regards.
Daniele Furlan
Hi,
Interesting statement but maybe it would also be interesting for others to provide examples and explanations.
This argument will be a section of my master thesis. As soon as I have some more material I will forward it to the community together with the script used for testing with qemu.
should we wait with the discussion about your patches until then ? Keep in mind that nobody asked you to publish your thesis now or all your scripts. A few words explaining what you are trying to solve might be a good starting point.
are simply wrong. Have you thought about overflow of sequence numbers (hint: seq_before/after)? Ok, forget about the code and look at the rest.
I perfectly know that the code I provided is a mess. But it works and would solve the problems of loops and convergency, and this can be verified by anybody before concentrating on coding technicality.
No, it might solve something but at the same time introduces severe bugs which render the whole routing code useless! The integer overflow bug you are introducing in several places of the code breaks the whole algorithm for anyone that intends to use batman-adv for longer than a short test.
Take for example the stuff prepared by Simon. He gathered information, talked to the people and prepared documentation. I can now use this stuff to discuss with him at a higher level than with the stuff you provided. And his stuff is only a "small" optional feature, but your patches changes things in the path selection and information propagation code.
My modifications are not as drastic as you are saying. They simply remove formally wrong mechanisms such global TQ window and the forwarding of ogm not coming from the actual router.
I guess you are underestimating the impact of your own changes. Both of your proposed solutions have been discussed on this list before and have been rejected because of the side effects.
For instance, only accepting the highest OGM seqno makes the algorithm strongly prefer short / low delay routes, no matter the TQ value. Imagine a simple 3 node setup: A, B and C. The route from A to C can be a single hop route or a 2 hop route via B. The OGMs via B (taking the extra hop) will have a higher delay, thus will always be discarded as they arrive "late". Even if it is a much better path! Unless I overlooked something you reduced batman to simple "hop counting"-like algorithm ..
As soon as I have some more material I will expose my concept clearly, meanwhile if someone want to test my patch is free to do that.
I'd strongly advise against testing the patches! It is obivous that the code is broken, no matter what it tries to fix.
Regards, Marek
2011/11/18 Marek Lindner lindner_marek@yahoo.de:
Hi,
Interesting statement but maybe it would also be interesting for others to provide examples and explanations.
This argument will be a section of my master thesis. As soon as I have some more material I will forward it to the community together with the script used for testing with qemu.
should we wait with the discussion about your patches until then ? Keep in mind that nobody asked you to publish your thesis now or all your scripts. A few words explaining what you are trying to solve might be a good starting point.
Consider this simple example topology that can be easily realizid using qemu and wirefilter setting opportune packet loss ratio:
C / | / | A --- B \ | \ | D
Links AB, BD, BD have TQ=255 Links AD, AC have TQ=150
In this scenarion both C and D route through B to reach A.
Now suppose node B goes away. Nodes C and D keep on receiving ogm of node A that will be forwarded substituting the actual TQ (150) with the current router one (255*hop_penalty).......
After TQ_GLOBAL_WINDOW_SIZE received ogm from A, both C and D will assign a TQ of 0 to B and as a consequence C will route via D anc D via C creating a loop.
This loops is resolved as soon as averaging and hop_penalty reduce TQ under the TQ of direct connection to A.
are simply wrong. Have you thought about overflow of sequence numbers (hint: seq_before/after)? Ok, forget about the code and look at the rest.
I perfectly know that the code I provided is a mess. But it works and would solve the problems of loops and convergency, and this can be verified by anybody before concentrating on coding technicality.
No, it might solve something but at the same time introduces severe bugs which render the whole routing code useless! The integer overflow bug you are introducing in several places of the code breaks the whole algorithm for anyone that intends to use batman-adv for longer than a short test.
Yes there are errors, it is only for testing on virtual machines, I'm not suggesting to install this modification in "real" networks.
Take for example the stuff prepared by Simon. He gathered information, talked to the people and prepared documentation. I can now use this stuff to discuss with him at a higher level than with the stuff you provided. And his stuff is only a "small" optional feature, but your patches changes things in the path selection and information propagation code.
My modifications are not as drastic as you are saying. They simply remove formally wrong mechanisms such global TQ window and the forwarding of ogm not coming from the actual router.
I guess you are underestimating the impact of your own changes. Both of your proposed solutions have been discussed on this list before and have been rejected because of the side effects.
For instance, only accepting the highest OGM seqno makes the algorithm strongly prefer short / low delay routes, no matter the TQ value. Imagine a simple 3 node setup: A, B and C. The route from A to C can be a single hop route or a 2 hop route via B. The OGMs via B (taking the extra hop) will have a higher delay, thus will always be discarded as they arrive "late". Even if it is a much better path! Unless I overlooked something you reduced batman to simple "hop counting"-like algorithm ..
I see wiki pages on new ogm and elp. The concepts are very similar and I don't see any side effects.
Actually the example you have described cannot happen because router ogms are forwarded even if they are duplicate. So if the extra delay is less than ogm interval (which is a feasible hypotesis) there are no problems.
As soon as I have some more material I will expose my concept clearly, meanwhile if someone want to test my patch is free to do that.
I'd strongly advise against testing the patches! It is obivous that the code is broken, no matter what it tries to fix.
I agree, again it is only for testing on virtual machines, I'm not driving people into installing this modification in their networks.
Regards, Marek
b.a.t.m.a.n@lists.open-mesh.org