Hi,
I am here to think loud about the batman-adv way of dealing with patches. I think everybody knows how it works right now [1]. Patches will be accepted by Marek and applied on master (we ignore some exceptions for now). From time to time a new kernel is released, this is the time when next is used to create a new release which is more or less the stuff also in the released^W"next to be released" kernel. Now all the stuff in master is merged into next and Antonio will use the stuff in next to create pull requests for David S. Miller. In case he accepts it in net-next (this isn't as easy as it sounds), this stuff will be part of the next+1 kernel.
This concept of "let everything wait in master until it is smells funny" worked perfectly in the past and helped a lot to organize some things. But it seems more and more problematic to continue to work with it because David S. Miller isn't the most easy person and therefore a lot of things got pushed back or got rejected.
So, what happens when something gets rejected or has to be reworked? Yes, some people have to run around and do a lot of magic. These things will end up in next (because this is were Antonio is working when he sends patches to David) and all the new stuff in master has to be rebased later on top of the new things in next [did I just loose most of my audience or is this snoring coming out of my PC?].
To make it short: An extra release cycle waiting time for feature patches is added and creates an extra burden for the involved maintainers because they have to rebase/revert/move/... patches when something unexpected happens (or we can say: "when David happens"). I will just remind everyone of the DAT fun.
The big question is: Is this extra waiting time for new feature patches really useful in the current situation and does batman-adv benefit from it in a special/irreplaceable way?
My personal observation is rather negative. Most problems are detected by people using the release and by people checking the patches (on the b.a.t.m.a.n. mailing list and on netdev... this includes "i am really really angry about what you did" David). But of course, my impressions could be slightly biased. I think Antonio can give more insights about it because he has to squash all the stuff together.
Here just a calculation for a new feature patch: * Send to b.a.t.m.a.n@lists.open-mesh.org at the beginning of the first month * accepted by Marek in the middle of the first month (had to be resent or so) * will be part of next in the middle of the third month * will be sent to David in the middle of the third month (uh, fast Antonio) * will be send to Linus by David at the the beginning of the fifth month * will be released as a separate kernel module in the middle of the fifth month * will be released in a kernel at the beginning of the seventh month
Or it could easily happen that the author of the patches get a late "sry, it will not be accepted" in months three-five by David. Of course, it can also happen after month five, but this is rather unlikely because it already passed the fuzzy Heimdall reincarnation David.
Antonio already showed that he can handle adhoc pull requests very well. Therefore, the idea would be to remove the two month extra waiting time. Here are just two random branch names:
* next - new patches are excepted here - so it is something like net-next for batman-adv. Antonio will gather some patches and then sent them to David. Davids reaction will come heavy and hard... but the effort for an reaction is reduced. * master - bugfix gathering place. So it is like net for batman-adv. Patches will end up really fast in Linus' tree.
But I would still say it is a good idea to create releases from next because more people will test it and therefore it is easier to get bugfixes to Linus' instead through the stable queue. .1 releases from master can still be created when necessary.
So, why bringing this up now? We currently have 12 Patches in master and Linux 3.7 will be released soon. So switching the patch strategy will not break loose hell. And as a nice side effect: it is not too tragic when C.A.T.W.O.M.A.N. is not merged right now.
But what would it mean for the next release? Hm, the next release would be created from next, but then we propably have to use some tricks. Right now master is too far ahead because it has the role of next+1. We would have to move the patches to next (simple merge) and reset master to the release.
Further .0 releases would just be made on next and then master merges the release. Additional fixes in master can "simply" be merged in next.
Let the fight begin...
Kind regards, Sven
[1] http://www.open-mesh.org/projects/batman-adv/wiki/Release-todo?version=133
Hello Sven, list,
Thank you for your effort in writing this email
On Tue, Dec 04, 2012 at 10:50:08PM +0100, Sven Eckelmann wrote: [...]
So, what happens when something gets rejected or has to be reworked? Yes, some people have to run around and do a lot of magic. These things will end up in next (because this is were Antonio is working when he sends patches to David) and all the new stuff in master has to be rebased later on top of the new things in next [did I just loose most of my audience or is this snoring coming out of my PC?].
no, not yet :) maybe your fan is running too much.
[...]
The big question is: Is this extra waiting time for new feature patches really useful in the current situation and does batman-adv benefit from it in a special/irreplaceable way?
the added value I see in having this extra time is the that we can spend some days in testing it, running it on nodes and trying to find bugs. But, to be honest, I don't know if out there we have people really doing this on a daily/weekly basis.
My personal observation is rather negative. Most problems are detected by people using the release and by people checking the patches (on the b.a.t.m.a.n. mailing list and on netdev... this includes "i am really really angry about what you did" David). But of course, my impressions could be slightly biased. I think Antonio can give more insights about it because he has to squash all the stuff together.
well, if I got you correctly the problem here is "only" about time: with the current method we "wait" more time before sending patches upstream, while in your suggestion we will immediately (well we could still wait some days in order to let the build tests do their optimum job and double check the code - because yes, I re-read each and every patch before sending them to David..strange eh? :D).
So, you would like a mac80211 like process, I think.
Here just a calculation for a new feature patch:
- Send to b.a.t.m.a.n@lists.open-mesh.org at the beginning of the first month
- accepted by Marek in the middle of the first month (had to be resent or so)
- will be part of next in the middle of the third month
- will be sent to David in the middle of the third month (uh, fast Antonio)
- will be send to Linus by David at the the beginning of the fifth month
- will be released as a separate kernel module in the middle of the fifth month
- will be released in a kernel at the beginning of the seventh month
you meant weeks here? It sounds like a birth :)
Antonio already showed that he can handle adhoc pull requests very well. Therefore, the idea would be to remove the two month extra waiting time.
I hope you still meant weeks here..or am I missing something?
Here are just two random branch names:
- next - new patches are excepted here - so it is something like net-next for batman-adv. Antonio will gather some patches and then sent them to David. Davids reaction will come heavy and hard... but the effort for an reaction is reduced.
- master - bugfix gathering place. So it is like net for batman-adv. Patches will end up really fast in Linus' tree.
To be honest I thought the same some time ago, but I think there was a good reason to have it like it is now. I exposed my thoughts to Marek (? I can't really remember) and he explained me why it was nice to have a pre-incubation.
But I would still say it is a good idea to create releases from next because more people will test it and therefore it is easier to get bugfixes to Linus' instead through the stable queue. .1 releases from master can still be created when necessary.
So, why bringing this up now? We currently have 12 Patches in master and Linux 3.7 will be released soon. So switching the patch strategy will not break loose hell. And as a nice side effect: it is not too tragic when C.A.T.W.O.M.A.N. is not merged right now.
But what would it mean for the next release? Hm, the next release would be created from next, but then we propably have to use some tricks. Right now master is too far ahead because it has the role of next+1. We would have to move the patches to next (simple merge) and reset master to the release.
Further .0 releases would just be made on next and then master merges the release. Additional fixes in master can "simply" be merged in next.
Let the fight begin...
Ok, as I stated before, this mac80211 approach is fine with me. I like the idea of a direct flow that goes to upstream without a waiting time in our pre-incubation repository.
However, I'd like to hear about why we have this strategy now from the people that decided this (Sven maybe you were involved in the initial decision too?), because if it is such, I think there is an advantage somewhere. So before losing this advantage just because "now" we think it is better way B, I'd rather prefer hear all the other people opinions.
The advantage I see is that, if we make a mistake and we send a fix, we can still merge the original patch and fix together before sending it to David. I think this give us a not negligible margin of error. But maybe we became good enough to get rid of this margin? :)
Cheers,
On Wednesday 05 December 2012 00:01:26 Antonio Quartulli wrote: [...]
The big question is: Is this extra waiting time for new feature patches really useful in the current situation and does batman-adv benefit from it in a special/irreplaceable way?
the added value I see in having this extra time is the that we can spend some days in testing it, running it on nodes and trying to find bugs. But, to be honest, I don't know if out there we have people really doing this on a daily/weekly basis.
Yes, but adding extra time isn't it. Testing and debugging would be an important part.. but is it done in a way that it pays for the extra pain when David is doing his monkey dance?
My personal observation is rather negative. Most problems are detected by people using the release and by people checking the patches (on the b.a.t.m.a.n. mailing list and on netdev... this includes "i am really really angry about what you did" David). But of course, my impressions could be slightly biased. I think Antonio can give more insights about it because he has to squash all the stuff together.
well, if I got you correctly the problem here is "only" about time: with the current method we "wait" more time before sending patches upstream, while in your suggestion we will immediately (well we could still wait some days in order to let the build tests do their optimum job and double check the code - because yes, I re-read each and every patch before sending them to David..strange eh? :D).
So, you would like a mac80211 like process, I think.
I am not 100% sure about the mac80211, but it looks at least like it. But "let them wait for some days" is still a good idea.
Here just a calculation for a new feature patch:
- Send to b.a.t.m.a.n@lists.open-mesh.org at the beginning of the first
month * accepted by Marek in the middle of the first month (had to be resent or so) * will be part of next in the middle of the third month
- will be sent to David in the middle of the third month (uh, fast
Antonio)
will be send to Linus by David at the the beginning of the fifth month
will be released as a separate kernel module in the middle of the fifth
month
will be released in a kernel at the beginning of the seventh month
you meant weeks here? It sounds like a birth :)
No, month like in "30 days of night". Lets use an example. Please search for "Add the backbone gateway list to debugfs" on the mailing list. It should be from 2012-06-14. It will be included in the kernel released in ~1 week. My guess is 2012-12-10 (so missed the "beginning of the seventh month" only by some days). It was send by you to David on 2012-08-23. So it was beginning/middle of the third month.
It was just a random patch. I think I would be able to find better matching ones, but it should be understandable that my guess is not so extreme wrong.
Antonio already showed that he can handle adhoc pull requests very well. Therefore, the idea would be to remove the two month extra waiting time.
I hope you still meant weeks here..or am I missing something?
No, month like in "twelve of them make a year".
Here are just two random branch names:
- next - new patches are excepted here - so it is something like net-next
for> batman-adv. Antonio will gather some patches and then sent them to David. Davids reaction will come heavy and hard... but the effort for an reaction is reduced.
- master - bugfix gathering place. So it is like net for batman-adv.
Patches> will end up really fast in Linus' tree.
To be honest I thought the same some time ago, but I think there was a good reason to have it like it is now. I exposed my thoughts to Marek (? I can't really remember) and he explained me why it was nice to have a pre-incubation.
And my anti-thesis is "we don't gain much in this incubation phase, but create more problems later on". It worked fine in the past because we didn't know what to sent and picked special patches and moved stuff around (i still have bad dreams about the gateway stuff).
So, this anti-thesis should give us the opportunity to reassess the current strategy to work with patches.
The advantage I see is that, if we make a mistake and we send a fix, we can still merge the original patch and fix together before sending it to David. I think this give us a not negligible margin of error. But maybe we became good enough to get rid of this margin? :)
Definitely, but I think you are the right person to answer whether this incubation time helped us or whether it is only a tiny amount of fixes and the rest of the fixes came from other things.
We have to keep in mind that we are not living in our own isolated universe, but that there are also other persons involved which use the other way of getting stuff in the kernel. And from time to time they collide with our stuff in master. And don't forget our hellkeeper David, the incontrovertible opinion himself.
Kind regards, Sven
Hi Sven
I've been working on Marvell SoC chips for the last few months, mostly those used in NAS devices. Maybe a few comments from a different corner of the kernel may be useful. But this corner is also quite different, so not everything i say bellow may be relevant for BATMAN. We are about the same size in terms of number of active developers, but our methodology is quite different.
It seems like the biggest problem is the late feedback from David S. Miller, et al, about patches. Getting this feedback earlier in the life of a patchset would easy people lives.
For Marvell work, we post all our patches to the linux arm kernel list, where the ARM maintainers will see the patches. All patches go there, in all stages of their life, from early RFCs, to patches we want the upstream maintainers to take in a following pull request. Thus there is the possibility to get early feedback from the upstream maintainers and avoid most last minutes surprises.
So maybe it would be good to stop using BATMAN mailing list for patches and instead use netdev. Or at least CC: netdev.
We try, but often fail, to send pull requests early. The arm-soc maintainers will accept pull requests at any time and queue them up in there for-next tree. Sending pull request during -rc2 or -rc3 is not a problem and if the maintainer decides to reject it, you have a few weeks before -rc6/-rc7 and impending opening of the merge window.
We don't have anything like a master tree. Each developers work on his own clone of linus's tree, generally on the last -rc tag. We have a nominal lead maintainer, who builds trees for pull requests direction arm-soc maintainers. This maintainer takes either patches from the linux-arm-kernel mailing list, or pull request from other Marvell developers, shapes up the tree in the form the arm-soc maintainer likes, and sends pull-requests when ready. The tree is then throw away.
The BATMAN master tree, if i understand correctly, is to allow releases for older kernels? Maybe turn the process around? When Linus makes a release, pull the mainline code into a branch, add in the compat stuff and release a tarball from that? If any stable patch touches the batman code, again, import it and make a new tarball.
We also make use of the -rc tags and the around 7 weeks before final release for testing. Testing of these releases probably has more value than testing of BATMAN master, or for-next, since any other changes in the stack are included and issues because of changes outside of BATMAN will be found. I've never had problems getting fixes into -rc releases.
Just some ideas....
Andrew
Hi Andrew,
I have a few comments on what you wrote:
On Wed, Dec 05, 2012 at 11:35:27AM +0100, Andrew Lunn wrote:
Hi Sven
[...]
We don't have anything like a master tree.
Yeah, I think this is exactly Sven's point. In the end, the whole email from Sven can be concentrated in the suggestion to think about this direction, imho.
The BATMAN master tree, if i understand correctly, is to allow releases for older kernels? Maybe turn the process around? When Linus makes a release, pull the mainline code into a branch, add in the compat stuff and release a tarball from that? If any stable patch touches the batman code, again, import it and make a new tarball.
The master branch is used to create the out-of-the-kernel-tree package that we release every now and then (Actually together with each kernel release).
I think the major advantage here is that, whenever a person sends a patch which is not going to work on older kernels, he must also send a patch for compat.h/c. This would be impossible if the patch was directly aimed to the kernel tree, or, to say it in other words, we should each and every time run behind the committer to ask him to send another patch for compat.
This is my feeling about the master branch/repository. I also asked to do it the other way around, as you are suggesting, in order to also simplify (and reduce probability of making mistakes) the process of creating pull requests. But, in the end, we are simply moving complexity from one corner to the other.
However, I think this particular topic (patch against package vs kernel tree) is orthogonal to what Sven is proposing.
Thank you for throwing more ideas on the plate :-)
Cheers,
I think the major advantage here is that, whenever a person sends a patch which is not going to work on older kernels, he must also send a patch for compat.h/c.
Hi Antonio
This is where you are fighting again the kernel process. The kernel process does not care about older kernels, expect for patches flowing into stable as bug fixes. So anybody from outside of BATMAN will not supply such compat.[ch] changes.
Maybe its also time to evaluate the value of older kernels with newest BATMAN? Is the pain worth the gain?
Andrew
On Wed, Dec 05, 2012 at 12:24:35PM +0100, Andrew Lunn wrote:
I think the major advantage here is that, whenever a person sends a patch which is not going to work on older kernels, he must also send a patch for compat.h/c.
Hi Antonio
This is where you are fighting again the kernel process. The kernel process does not care about older kernels, expect for patches flowing into stable as bug fixes. So anybody from outside of BATMAN will not supply such compat.[ch] changes.
Maybe its also time to evaluate the value of older kernels with newest BATMAN? Is the pain worth the gain?
Well, at least we have to keep compatibility with some releases: most of our users are Openwrt users, which means that we have to provide at least provide a package (with all the bugfixes) that runs on some of the openwrt distributions. Maybe, to alleviate this, we can think about dropping some kernel versions and support only the most recent.
However, as stated before, this is more or less orthogonal to what we Sven is proposing and we want to change now, but still it is a valuable suggestion that we may want to keep for the next step.
Cheers,
Andrew
Hi,
thanks a lot about this mail. I'll add some extra comments without any judgements. Your mail mostly talks about other things which are orthogonal to the "anti-thesis"
On Wednesday 05 December 2012 11:35:27 Andrew Lunn wrote:
I've been working on Marvell SoC chips for the last few months, mostly those used in NAS devices. Maybe a few comments from a different corner of the kernel may be useful. But this corner is also quite different, so not everything i say bellow may be relevant for BATMAN. We are about the same size in terms of number of active developers, but our methodology is quite different.
The biggest different is the "lets install a whole kernel to test this change" methodology ;)
Usually (please correct me) batman-adv is developed outside the kernel because it is easier to test stuff and it worked till now. No one of us wants to port the latest OpenWrt to the -rc kernel to test stuff ;)
It seems like the biggest problem is the late feedback from David S. Miller, et al, about patches. Getting this feedback earlier in the life of a patchset would easy people lives.
Partly, David switches horses relative often. So an early feedback is not as valuable as it sounds.
For Marvell work, we post all our patches to the linux arm kernel list, where the ARM maintainers will see the patches. All patches go there, in all stages of their life, from early RFCs, to patches we want the upstream maintainers to take in a following pull request. Thus there is the possibility to get early feedback from the upstream maintainers and avoid most last minutes surprises.
So maybe it would be good to stop using BATMAN mailing list for patches and instead use netdev. Or at least CC: netdev.
I'll tried it in the netdev_alloc/standard interface patchset but I got only a surprised "where is the pull request?" reply.
We try, but often fail, to send pull requests early. The arm-soc maintainers will accept pull requests at any time and queue them up in there for-next tree. Sending pull request during -rc2 or -rc3 is not a problem and if the maintainer decides to reject it, you have a few weeks before -rc6/-rc7 and impending opening of the merge window.
We also don't have this problem of getting patches accepted in -rc2 and -rc3. But it is funny that David's net-next/net tree hasn't catched the fresh air of the last -rc1.
[...]
The BATMAN master tree, if i understand correctly, is to allow releases for older kernels? Maybe turn the process around? When Linus makes a release, pull the mainline code into a branch, add in the compat stuff and release a tarball from that? If any stable patch touches the batman code, again, import it and make a new tarball.
So the compat-driver style. I'll played around with the idea for a while but never came up with a working solution without a lot of extra hassle.
Kind regards, Sven
The biggest different is the "lets install a whole kernel to test this change" methodology ;)
Yes, i generally do that, test a whole kernel, not a module. But...
Usually (please correct me) batman-adv is developed outside the kernel because it is easier to test stuff and it worked till now. No one of us wants to port the latest OpenWrt to the -rc kernel to test stuff ;)
Do you actually need to port to OpenWrt?
How i work is build a kernel, with everything i need built in. No modules. Then tftpboot the kernel, and use the rootfs from the disk. Why not do the same with OpenWrt?
It seems like the biggest problem is the late feedback from David S. Miller, et al, about patches. Getting this feedback earlier in the life of a patchset would easy people lives.
Partly, David switches horses relative often. So an early feedback is not as valuable as it sounds.
O.K. I've not paid enough attention to his comments to know this.
For Marvell work, we post all our patches to the linux arm kernel list, where the ARM maintainers will see the patches. All patches go there, in all stages of their life, from early RFCs, to patches we want the upstream maintainers to take in a following pull request. Thus there is the possibility to get early feedback from the upstream maintainers and avoid most last minutes surprises.
So maybe it would be good to stop using BATMAN mailing list for patches and instead use netdev. Or at least CC: netdev.
I'll tried it in the netdev_alloc/standard interface patchset but I got only a surprised "where is the pull request?" reply.
Humm, interesting. Is that maybe because BATMAN only ever sends pull requests to the list?
Andrew
On Wed, Dec 05, 2012 at 12:39:27PM +0100, Andrew Lunn wrote:
The biggest different is the "lets install a whole kernel to test this change" methodology ;)
Yes, i generally do that, test a whole kernel, not a module. But...
Usually (please correct me) batman-adv is developed outside the kernel because it is easier to test stuff and it worked till now. No one of us wants to port the latest OpenWrt to the -rc kernel to test stuff ;)
Do you actually need to port to OpenWrt?
How i work is build a kernel, with everything i need built in. No modules. Then tftpboot the kernel, and use the rootfs from the disk. Why not do the same with OpenWrt?
It seems like the biggest problem is the late feedback from David S. Miller, et al, about patches. Getting this feedback earlier in the life of a patchset would easy people lives.
Partly, David switches horses relative often. So an early feedback is not as valuable as it sounds.
O.K. I've not paid enough attention to his comments to know this.
For Marvell work, we post all our patches to the linux arm kernel list, where the ARM maintainers will see the patches. All patches go there, in all stages of their life, from early RFCs, to patches we want the upstream maintainers to take in a following pull request. Thus there is the possibility to get early feedback from the upstream maintainers and avoid most last minutes surprises.
So maybe it would be good to stop using BATMAN mailing list for patches and instead use netdev. Or at least CC: netdev.
I'll tried it in the netdev_alloc/standard interface patchset but I got only a surprised "where is the pull request?" reply.
Humm, interesting. Is that maybe because BATMAN only ever sends pull requests to the list?
I think the main reason for this reply is that David was directly CCed. I think we got no feedbacks because we are not used to send patches over netdev "asking" for them.
However, if you look at all the other network "subtrees", everybody just sends pull requests only. I think patches against "subtrees" are rarely sent to netdev for feedback.
Cheers,
Hello Andrew,
just a few comments:
On Wed, Dec 05, 2012 at 12:39:27PM +0100, Andrew Lunn wrote:
The biggest different is the "lets install a whole kernel to test this change" methodology ;)
Yes, i generally do that, test a whole kernel, not a module. But...
Usually (please correct me) batman-adv is developed outside the kernel because it is easier to test stuff and it worked till now. No one of us wants to port the latest OpenWrt to the -rc kernel to test stuff ;)
Do you actually need to port to OpenWrt?
Most people I know using batman-adv use either OpenWRT based routers, some others "normal" (or stripped down versions of) distributions like Debian. So yes, this is probably the biggest user base (from my point of view, at least, no surveys done yet).
How i work is build a kernel, with everything i need built in. No modules. Then tftpboot the kernel, and use the rootfs from the disk. Why not do the same with OpenWrt?
OpenWRT has many other non-batman-related patches (platform paches, wifi patches, ...), and upgrading the kernel is a major hassle - you'd need to adapt/port lots of these patches, and some of them are not as nice as we are used from our Linux git repos. :)
We try to keep repositories up to date for our OpenWRT users, but this shouldn't make us or them update the kernel all the time. BTW, OpenWRT also uses compat-wireless for wifi with custom patches, this is how they keep at the "bleeding edge". If we change to this style, we would probably have to follow the same pattern.
Cheers, Simon
b.a.t.m.a.n@lists.open-mesh.org