Just to throw in some credibile sources.
https://static.lwn.net/images/pdf/LDD3/ch08.pdf:
"The main differences in passing from scull to scullc are a slight speed improvement and better memory use. Since quanta are allocated from a pool of memory fragments of exactly the right size, their placement in memory is as dense as possible, as opposed to scull quanta, which bring in an unpredictable memory fragmentation."
Also the paragraph regarding SLAB_HW_CACHE_ALIGN sounds interesting. (And maybe it might be worth considering to ommit this flag for the sake of our beloved but always memory deprived embedded systems? Most TT entries aren't usually read anyway.)
Should we just go for kmem_cache_alloc() or should someone ask on netdev@ first whether that chapter is still valid for current kernels?