This patchset increases the DAT DHT timeout to reduce the amount of broadcasted ARP Replies.
To increase the timeout only for DAT DHT entries added via DHT-PUT but not for any other entry in the DAT cache the DAT cache and DAT DHT concepts are split into two separate hash tables (PATCH 2/3).
PATCH 3/3 then increases the timeout for DAT DHT entries from 5 to 30 minutes.
The motivation for this patchset is based on the observations made here: https://www.open-mesh.org/projects/batman-adv/wiki/DAT_DHCP_Snooping
In tests this year at Freifunk Lübeck with ~180 mesh nodes and Gluon this reduced the ARP broadcast overhead, measured over 7 days, as follows:
- Total: 6677.66 bits/s -> 677.26 bits/s => -89.86% 11.92 pkts/s -> 1.21 pkts/s => -89.85%
- from gateways: 5618.02 bits/s -> 212.28 => -96.22% 10.03 pkts/s -> 0.38 pkts/s => -96.21%
Also see graphics and a few more test details here: - https://www.open-mesh.org/projects/batman-adv/wiki/DAT_DHCP_Snooping#Result-...
These patches (v5) have been applied in this mesh network without issues for 3 months now.
Regards, Linus
---
Changelog v9: - PATCH 1/3: - fixed typo in a comment: ENOENT -> ENONET
Changelog v8: - PATCH 1/3: - fixing / cleaning up includes - fixing function kernel doc titles - fixing bugs introduced in v7 in the error handling of batadv_orig_dump() and batadv_neigh_dump(), using a goto pattern with more explicit labels
Changelog v7: - adding PATCH 1/3 to add the batadv_netlink_get_softif() wrapper to reduce the amount of duplicate code, both in the current code base but also for the next PATCH 2/3
Changelog v6: - removed renaming+deprecation of BATADV_P_DAT_CACHE_REPLY in PATCH 1/2 - small commit message rewording in PATCH 1/2
Changelog v5: - rebased to current main branch -> removed now obsolete debugfs code
Changelog v4: - rebased to: acfc9a214d01695 ("batman-adv: genetlink: make policy common to family")
Changelog v3:
formerly: "batman-adv: Increase purge timeout on DAT DHT candidates" https://patchwork.open-mesh.org/patch/17728/ - fixed the potential jiffies overflow and jiffies initialization issues by replacing the last_dht_update timeout variable with a split of DAT cache and DAT DHT into two separate hash tables -> instead of maintaining two timeouts in one DAT entry two DAT entries are created and maintained in their respective DAT cache and DAT DHT hash tables
Changelog v2:
formerly: "batman-adv: Increase DHCP snooped DAT entry purge timeout in DHT" (https://patchwork.open-mesh.org/patch/17364/) - removed the extended timeouts flag in the DHT-PUT messages introduced in v1 again - removed DHCP dependency