[B.A.T.M.A.N.] [RFC] batman-adv: add generic netlink query API to replace debugfs files

Matthias Schiffer mschiffer at universe-factory.net
Fri Aug 7 19:37:23 CEST 2015

On 08/07/2015 06:16 PM, Linus L├╝ssing wrote:
> Hi Matthias,
> here at the Wireless BattleMesh we finally had the chance to get
> some initial discussions on your patch going. For the start a few
> comprehension questions came up:
> On Wed, Jun 24, 2015 at 08:34:28PM +0200, Matthias Schiffer wrote:
>> * As batman-adv uses single_open, the whole content of the originators/
>>   transglobal files must fit into a single buffer; in large batman-adv
>>   networks this often fails (as an order-5 allocation or even more would be
>>   necessary)
>> * When originators or transglobal aren't just used for debugging, they are
>>   first converted to text, and then parsed again in userspace by tools like
>>   alfred/batadv-vis. Sending MAC address lists from the kernel to userspace
>>   as text makes the buffer size issue even worse.
> These two points can be addressed through debugfs too, for
> instance using sequential debugfs writes, right? (In fact IIRC you
> had started with that approach until you got the feedback from
> Gregk, right?)
I've had a look at the different functions the seq_file API provides,
but I didn't write any code.

> Can you elaborate a little more on the "order-5 allocation"?
> What amount of free RAM did the machines have where we observed
> Out-of-Memory kernel panics upon debugfs access? Can you give some
> numbers / calculations why we ended up with several Megabytes
> memory allocations on debugfs access?
An order-5 allocation are 2^5 = 32 pages of memory, i.e. 128K of RAM. As
these are allocated by kmalloc, these 32 pages are allocated as one
piece of physical RAM. As the RAM gets fragmented more and more the
longer a system is running, 32 pages in one piece can be hard to find
even when there are still tens of MB of free RAM.

The fragmentation issue becomes a bit worse because the seq_file code
starts with a single page and loops until the buffer is big enough to
fit the whole output, always freeing the buffer in each loop iteration
and allocation a new buffer twice the size.

> The debugfs race conditions Gregk and you talked about are on
> adding/removing debugfs files, right? Are there any known race
> conditions on simple reads/writes in the abscene of removing
> debugfs files?
The race conditions only occur when files are removed, but that alone is
bad enough - I'd really like to avoid enabling debugfs at all on
critical systems.

> Since you've had a look at both the netlink and sequential debugfs
> approach already, can you give some estimation about the
> complexity or rough number of lines of code to change for the
> sequential debugfs approach?
I guess that should be possible in 100~200 lines of code. Most of it
would be similar to the code I've implemented for the netlink API:
storing counters between the callback runs to keep track of the current
position in the data structures. Of course, it will have the same
drawbacks: when originator/tt entries are added or removed between the
calls, entries will be duplicate or missing from the output.

The main reason why I didn't consider fixing the debugfs code first was
that returning to userspace in the middle of the read will make the race
conditions much easier to hit: at the moment, the race can only occur
when the file is removed between open() and read(), which is usually
very short. The time between several read() calls can be much bigger,
especially when the files are read and parsed line by line.

> One thought that popped up here was, whether it'd make sense to
> first "fix" the debugfs approach to the extent possible with a
> couple of lines instead of 800+ lines to get rid of the issues
> we frequently observe. And then merge a complete fix but bigger
> patchset implementing netlink support with a more thorough review
> and discussions on what we'd need for its API now and upcoming
> features.
> Cheers, Linus

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: OpenPGP digital signature
URL: <http://lists.open-mesh.org/pipermail/b.a.t.m.a.n/attachments/20150807/aa18b23c/attachment-0001.sig>

More information about the B.A.T.M.A.N mailing list