I have been battling a weird problem recently, I have this problem occurring on two (2) separate networks, one with 2 nodes and the other with 3 nodes. What happens is the network is fine and all of a sudden the clients can not reach the Internet, This what I have observed. on both Openwrt 19.07, 18.07. A reboot of the gateway corrects the problem. 1. Gateway is up and running and able the reach the internet. 2. batctl o show the neighbor/s 3. batctl ping [MAC] fails
root@Main-GW:~# batctl o [B.A.T.M.A.N. adv openwrt-2018.1-5, MainIF/MAC: mesh0/e8:5b:b7:00:10:73 (bat0/22:55:4d:3e:5f:8f BATMAN_IV)] Originator last-seen (#/255) Nexthop [outgoingIF] * e8:5b:b7:00:10:6b 0.880s (255) e8:5b:b7:00:10:6b [ mesh0] root@Main-GW:~# batctl ping e8:5b:b7:00:10:6b PING e8:5b:b7:00:10:6b (e8:5b:b7:00:10:6b) 20(48) bytes of data Reply from host e8:5b:b7:00:10:6b timed out Reply from host e8:5b:b7:00:10:6b timed out Reply from host e8:5b:b7:00:10:6b timed out Reply from host e8:5b:b7:00:10:6b timed out
On Monday, 25 May 2020 10:35:12 CEST smartwires@gmail.com wrote:
I have been battling a weird problem recently, I have this problem occurring on two (2) separate networks, one with 2 nodes and the other with 3 nodes. What happens is the network is fine and all of a sudden the clients can not reach the Internet, This what I have observed. on both Openwrt 19.07, 18.07. A reboot of the gateway corrects the problem.
[...]
root@Main-GW:~# batctl ping e8:5b:b7:00:10:6b PING e8:5b:b7:00:10:6b (e8:5b:b7:00:10:6b) 20(48) bytes of data Reply from host e8:5b:b7:00:10:6b timed out Reply from host e8:5b:b7:00:10:6b timed out Reply from host e8:5b:b7:00:10:6b timed out Reply from host e8:5b:b7:00:10:6b timed out
My first guess is that the underlying interfaces (mesh0) stopped to transport unicast frames. Did you check this by setting an IP on mesh0 and ping between these devices using the IPv4 ping?
Kind regards, Sven
When the problem occurs I have no access to the non-gateway node, with it is working I can do a ping
On Thursday, 28 May 2020 03:05:07 CEST smartwires@gmail.com wrote:
When the problem occurs I have no access to the non-gateway node, with it is working I can do a ping
You said before that the ping on the underlying device (mesh0) is not working when this problem is observed. I would therefore propose to contact the developers of the driver for the underlying device to figure out why it is no longer able to transport unicast frames.
Kind regards, Sven
I have exactly the same problem with the same symptoms. I'm running a fresh build of OpenWRT trunk. The problem is not new. On some days it happens several times. On other days it doesn't happen at all.
I'm curious to know what your hardware(s) and driver(s) are, Smartwires. Mine is TPLink Archer [AC]7 v[245]. I'm running the QCA 988x driver on the 5GHz radio. My solution is the same as yours: reboot the gateway. It's a terrible solution, having only one advantage, which is that it (sort of) works.
I have seen Sven's remark about unicast packets. I'm not sanguine about getting Qualcomm to fix a driver for an older product. The Candela Technologies driver refuses to function on the DFS channels (100, 116, 132), which in my large, populous US residential environment work far, far better than channels 36 or 149.
All ideas welcome.
On 5/25/20 4:35 AM, smartwires@gmail.com wrote:
I have been battling a weird problem recently, I have this problem occurring on two (2) separate networks, one with 2 nodes and the other with 3 nodes. What happens is the network is fine and all of a sudden the clients can not reach the Internet, This what I have observed. on both Openwrt 19.07, 18.07. A reboot of the gateway corrects the problem.
- Gateway is up and running and able the reach the internet.
- batctl o show the neighbor/s
- batctl ping [MAC] fails
root@Main-GW:~# batctl o [B.A.T.M.A.N. adv openwrt-2018.1-5, MainIF/MAC: mesh0/e8:5b:b7:00:10:73 (bat0/22:55:4d:3e:5f:8f BATMAN_IV)] Originator last-seen (#/255) Nexthop [outgoingIF]
- e8:5b:b7:00:10:6b 0.880s (255) e8:5b:b7:00:10:6b [ mesh0]
root@Main-GW:~# batctl ping e8:5b:b7:00:10:6b PING e8:5b:b7:00:10:6b (e8:5b:b7:00:10:6b) 20(48) bytes of data Reply from host e8:5b:b7:00:10:6b timed out Reply from host e8:5b:b7:00:10:6b timed out Reply from host e8:5b:b7:00:10:6b timed out Reply from host e8:5b:b7:00:10:6b timed out
On Thursday, 28 May 2020 21:03:20 CEST Steve Newcomb wrote:
I have seen Sven's remark about unicast packets. I'm not sanguine about getting Qualcomm to fix a driver for an older product.
I am slightly confused now about the mentioning of the candelatech driver.
Just to sync both of you up:
* Are you using ath10k-ct with the ath10k-firmware*-ct or are you using ath10k with the ath10k-firmware*? * And are you using IBSS or 802.11s (meshpoint with mesh_fwding=0)? Is this encrypted or not encrypted?
The Candela Technologies driver refuses to function on the DFS channels (100, 116, 132), which in my large, populous US residential environment work far, far better than channels 36 or 149.
Was this reported to Ben Greear?
Kind regards, Sven
On 05/28/2020 12:19 PM, Sven Eckelmann wrote:
On Thursday, 28 May 2020 21:03:20 CEST Steve Newcomb wrote:
I have seen Sven's remark about unicast packets. I'm not sanguine about getting Qualcomm to fix a driver for an older product.
I am slightly confused now about the mentioning of the candelatech driver.
Just to sync both of you up:
- Are you using ath10k-ct with the ath10k-firmware*-ct or are you using ath10k with the ath10k-firmware*?
- And are you using IBSS or 802.11s (meshpoint with mesh_fwding=0)? Is this encrypted or not encrypted?
The Candela Technologies driver refuses to function on the DFS channels (100, 116, 132), which in my large, populous US residential environment work far, far better than channels 36 or 149.
Was this reported to Ben Greear?
Kind regards, Sven
If you are using my firmware, what chipset are you using?
Thanks, Ben
On 5/28/20 3:19 PM, Sven Eckelmann wrote:
On Thursday, 28 May 2020 21:03:20 CEST Steve Newcomb wrote:
I have seen Sven's remark about unicast packets. I'm not sanguine about getting Qualcomm to fix a driver for an older product.
I am slightly confused now about the mentioning of the candelatech driver.
Just to sync both of you up:
- Are you using ath10k-ct with the ath10k-firmware*-ct or are you using ath10k with the ath10k-firmware*?
- And are you using IBSS or 802.11s (meshpoint with mesh_fwding=0)? Is this encrypted or not encrypted?
Speaking only for myself:
CONFIG_PACKAGE_ath10k-firmware-qca988x=y CONFIG_PACKAGE_kmod-ath10k=y # CONFIG_PACKAGE_ath10k-firmware-qca988x-ct is not set # CONFIG_PACKAGE_kmod-ath10k-ct is not set
option mesh_fwding '0' option encryption 'psk2+ccmp'
The Candela Technologies driver refuses to function on the DFS channels (100, 116, 132), which in my large, populous US residential environment work far, far better than channels 36 or 149.
Was this reported to Ben Greear?
Not yet, no. I am planning to do that when I can get serious about testing the adhoc alternative. I tried it just long enough to discover that DFS didn't work (log message was something like "forbidden" (can't remember exactly what it said right now, but that was the sense of it), although I had specified country 'US' and the driver seemed to be aware of the corresponding hex code. No such log message appeared when channel was 36 or 149. I said to myself: hmmm, at least the QCA driver *sort-of* works in my environment and returned to it.
Kind regards, Sven
On 05/28/2020 01:59 PM, Steve Newcomb wrote:
On 5/28/20 3:19 PM, Sven Eckelmann wrote:
On Thursday, 28 May 2020 21:03:20 CEST Steve Newcomb wrote:
I have seen Sven's remark about unicast packets. I'm not sanguine about getting Qualcomm to fix a driver for an older product.
I am slightly confused now about the mentioning of the candelatech driver.
Just to sync both of you up:
- Are you using ath10k-ct with the ath10k-firmware*-ct or are you using ath10k with the ath10k-firmware*?
- And are you using IBSS or 802.11s (meshpoint with mesh_fwding=0)? Is this encrypted or not encrypted?
Speaking only for myself:
CONFIG_PACKAGE_ath10k-firmware-qca988x=y CONFIG_PACKAGE_kmod-ath10k=y # CONFIG_PACKAGE_ath10k-firmware-qca988x-ct is not set # CONFIG_PACKAGE_kmod-ath10k-ct is not set
option mesh_fwding '0' option encryption 'psk2+ccmp'
wave-1 ath10k-ct does not support mesh, and while it supports ADHOC, it has issues sometimes, especially when using encryption, and I have not had interest to debug it so far.
wave-2 firmware supports mesh, and I think adhoc is stable as well. I have not done any serious testing on either mesh nor adhoc though.
I've tested DFS in STA/AP mode and that works fine on my driver/firmware, possibly due to us setting the regdom as a fwcfg option, I suppose.
Thanks, Ben
The Candela Technologies driver refuses to function on the DFS channels (100, 116, 132), which in my large, populous US residential environment work far, far better than channels 36 or 149.
Was this reported to Ben Greear?
Not yet, no. I am planning to do that when I can get serious about testing the adhoc alternative. I tried it just long enough to discover that DFS didn't work (log message was something like "forbidden" (can't remember exactly what it said right now, but that was the sense of it), although I had specified country 'US' and the driver seemed to be aware of the corresponding hex code. No such log message appeared when channel was 36 or 149. I said to myself: hmmm, at least the QCA driver *sort-of* works in my environment and returned to it.
Kind regards, Sven
On 5/28/20 5:28 PM, Ben Greear wrote:
I've tested DFS in STA/AP mode and that works fine on my driver/firmware, possibly due to us setting the regdom as a fwcfg option, I suppose.
I wonder how I can set the regdom as fwcfg option? I don't know the procedure to try that. Do I need to cross-compile the firmware myself, rather than using the OpenWRT package?
Never mind. I shouldn't have asked, because openwrt/dl/ath10k-ct-2020-03-25-3d173a47.tar.xz!ath10k-ct-2020-03-25-3d173a47/README.txt clearly states:
This is a copy of the drivers/net/wireless/ath/ath10k tree from the Candela-Technologies (CT) 4.7, 4.9, and 4.13 kernels.
This package may be useful for people trying to use CT ath10k firmware on LEDE/OpenWRT, or other custom-built kernels.
The ath10k driver has a lot of patches, most of which are to enable it to work more effectively with the ath10k CT firmware:
http://www.candelatech.com/ath10k.php
To compile with some help: ./build_me.sh
To compile manually: cd ath10k cp make_all make_all.mine chmod a+x make_all.mine # Edit make_all.mine to point to your compiled kernel # Copy ath/*.h files into ../ # This header file stuff is not obvious, sorry..but it helps us compile # properly on LEDE/OpenWRT backports infrastructure. ./make_all.mine
For full kernel source that these drivers came from, see:
http://dmz2.candelatech.com/?p=linux-4.7.dev.y/.git;a=summary git clone git://dmz2.candelatech.com/linux-4.7.dev.y
http://dmz2.candelatech.com/?p=linux-4.4.dev.y/.git;a=summary git clone git://dmz2.candelatech.com/linux-4.4.dev.y
Please send bug reports to: greearb@candelatech.com
On 6/1/20 9:41 PM, Steve Newcomb wrote:
On 5/28/20 5:28 PM, Ben Greear wrote:
I've tested DFS in STA/AP mode and that works fine on my driver/firmware, possibly due to us setting the regdom as a fwcfg option, I suppose.
I wonder how I can set the regdom as fwcfg option? I don't know the procedure to try that. Do I need to cross-compile the firmware myself, rather than using the OpenWRT package?
Steve, I am also using ap with a QCA9558 SOC and Also using ath10k-firmware-qca988x . I have also considered using adhoc.
On 5/28/20 8:13 PM, smartwires@gmail.com wrote:
Steve, I am also using ap with a QCA9558 SOC and Also using ath10k-firmware-qca988x . I have also considered using adhoc.
I think I discovered something yesterday that explains everything, and it's very reproducible. The mesh mode in the QCA firmware works reliably in the lab and in the field, but only when there are 3 or fewer nodes. If I add one more node, the mesh will completely fail, either immediately or within a few hours. If the nodes are strung out in a daisy chain, failure is usually, but not always, delayed for a while, and the links break in a piecemeal fashion, one at a time. If the nodes are close enough to each other, total failure occurs quite quickly. I surmise that the 802.11s implementation in the QCA driver was not tested with more than 3 nodes, or perhaps it wasn't designed to support more than 3 nodes. Sigh.
Sven, I think this epiphany obviates the need for your test (which I still haven't figured out how to execute in the field), but I'll return to that effort if you think I should.
So in the end, unless I replace the hardware throughout the neighborhood with far more expensive hardware, I must find a way to use Ben's driver, or to have no mesh network with more than 3 nodes in it.
On 06/01/2020 07:05 PM, Steve Newcomb wrote:
On 5/28/20 8:13 PM, smartwires@gmail.com wrote:
Steve, I am also using ap with a QCA9558 SOC and Also using ath10k-firmware-qca988x . I have also considered using adhoc.
I think I discovered something yesterday that explains everything, and it's very reproducible. The mesh mode in the QCA firmware works reliably in the lab and in the field, but only when there are 3 or fewer nodes. If I add one more node, the mesh will completely fail, either immediately or within a few hours. If the nodes are strung out in a daisy chain, failure is usually, but not always, delayed for a while, and the links break in a piecemeal fashion, one at a time. If the nodes are close enough to each other, total failure occurs quite quickly. I surmise that the 802.11s implementation in the QCA driver was not tested with more than 3 nodes, or perhaps it wasn't designed to support more than 3 nodes. Sigh.
Sven, I think this epiphany obviates the need for your test (which I still haven't figured out how to execute in the field), but I'll return to that effort if you think I should.
So in the end, unless I replace the hardware throughout the neighborhood with far more expensive hardware, I must find a way to use Ben's driver, or to have no mesh network with more than 3 nodes in it.
Have you tried using IPQ4019 based systems? They seem pretty affordable, and the 3-radio Linksys MR8300 & EA8300 seem pretty stable in my testing recently (in AP mode, not tested mesh).
Thanks, Ben
On 6/2/20 4:02 PM, Ben Greear wrote:
Have you tried using IPQ4019 based systems? They seem pretty affordable, and the 3-radio Linksys MR8300 & EA8300 seem pretty stable in my testing recently (in AP mode, not tested mesh).
No. Used, on E-bay, the cheaper of the two, the EA8300 is at least twice as expensive as what we're using now. To buy a dozen or so in hope of getting them to work with Batman is out of the question, alas.
Ben, I fully understand your lack of incentive to spend your time on drivers for older hardware. This is not your problem, really. (Unless, like me, you are seeking ways to address the digital divide, where low entry cost is the key consideration. Every dollar cheaper means more people can connect, which was important even before the pandemic began. I selected these Archer [AC]7 v[245] units because there are so many of them for sale that they are kind of hard to sell. True, I didn't know the QCA driver would limit me to 3 nodes per mesh, nor did I know your driver couldn't support encryption nor DFS, at least not out of the box. It looked like a reasonable bet at the time; with 2 drivers to choose from, what could go wrong? Oh, well, nothing worthwhile was ever easy.)
Steve
On 06/02/2020 07:06 PM, Steve Newcomb wrote:
On 6/2/20 4:02 PM, Ben Greear wrote:
Have you tried using IPQ4019 based systems? They seem pretty affordable, and the 3-radio Linksys MR8300 & EA8300 seem pretty stable in my testing recently (in AP mode, not tested mesh).
No. Used, on E-bay, the cheaper of the two, the EA8300 is at least twice as expensive as what we're using now. To buy a dozen or so in hope of getting them to work with Batman is out of the question, alas.
Ben, I fully understand your lack of incentive to spend your time on drivers for older hardware. This is not your problem, really. (Unless, like me, you are seeking ways to address the digital divide, where low entry cost is the key consideration. Every dollar cheaper means more people can connect, which was important even before the pandemic began. I selected these Archer [AC]7 v[245] units because there are so many of them for sale that they are kind of hard to sell. True, I didn't know the QCA driver would limit me to 3 nodes per mesh, nor did I know your driver couldn't support encryption nor DFS, at least not out of the box. It looked like a reasonable bet at the time; with 2 drivers to choose from, what could go wrong? Oh, well, nothing worthwhile was ever easy.)
Steve
I'm working with the TIP project, which aims to provide stable OpenWrt capable hardware, among lots of other things.
I know some others in that group are interested in low cost solutions, so curious to know what price you think is viable for your market...
https://telecominfraproject.com/wifi/
Thanks, Ben
On 6/3/20 8:48 AM, Ben Greear wrote:
I'm working with the TIP project, which aims to provide stable
OpenWrt capable hardware, among lots of other things.
I know some others in that group are interested in low cost
solutions, so curious to know what price you think is
viable for your market...
Ben, your question is appropriately industrial, but I've already given my answer, which is: "Less is more." The lower the price, the more human capital is protected from the abyss of the digital divide.
True, the primary problem is not so much the one-time cost of a cheap radio. However, one way to address the primary problem is to give public-spirited digital-haves a way to lift up their digital-have-not neighbors. With mesh computing, that means purchasing multiple routers and making gifts of at least one of them, along with the gift of access. What can the digital-haves afford to give, besides access? Typically, not much.
What follows is a rant. Advice: skip it.
---------------------------------------------
Let me explain my perspective, here.
Personally, I believe that there is no difference between the information highway and any other public highway. The digital divide is compelling evidence of oppression of the poor by the wealthy. The nature of the oppression is comparable to highway tolls that restrict the mobility of the poor. Since the capture of US federal regulatory bodies by the telecom industry, and in the absence of effective telecom regulation in the public interest, the only course available to people who want to relieve the damage caused by the exclusion of the dispossessed is to work around the edges, which is what I'm doing. While I would welcome industrial help, I expect none, because I'm only interested in the prosperity of the *entire* public. (In my retirement, I can just barely afford to be.)
I'm unfamiliar with the Telecom Infra Project (TIP). However, I spent more than 20 years voluntarily working on ANSI and ISO information interchange standards (ISO 10743, 10744, 13250), and I know exactly what I'm talking about when I say that the public interest is unlikely to be served by industrial consortia who say things like what the TIP website says:
"We believe that accelerating innovation coupled with new business approaches and cost efficiencies will help the industry build the networks of the future and create business opportunities for new and existing companies, alike."
Such information technology consortia are typically violators of the spirit, and generally the letter, of the antitrust legislation that has been on the books since the end of the gilded age that preceded the current gilded age, and which no recent U.S. administration has seen fit to enforce. They are simply dog fights in which the public interest has no dog. The reason for their existence is to form alliances between aggregations of capital as they conspire against the market-leading aggregations of capital.
ANSI (the American National Standards Institute) promulgates rules for such activities that keep all participants from violating antitrust law -- from being "conspiracies in restraint of trade" -- but you never hear about ANSI standards any more because nobody bothers to avoid antitrust prosecution. There isn't any, basically. Why put up with burdensome transparency rules? Why put up with the sandbagging machinations of representatives of the actual market leaders? Open societies are expensive, frustrating, and annoying.
The purpose of a business is to make a profit, and that incentive *does* serve the public interest, but only in the context of regulation that forces the public interest to be served by it. In the case of the US telecom industry, and basically since the Consent Decree of 1982, regulation has served the interests of its investors, but not the public interest. The history is appalling, really, and the story keeps getting worse.
That's just how things are these days, as we flush whole sections of each generation's human potential down the toilet. It bothers me a lot. For each succeeding generation, the cost of each generation's loss is exponentially increased. Try not to think about it.
On 06/03/2020 08:35 AM, Steve Newcomb wrote:
On 6/3/20 8:48 AM, Ben Greear wrote:
I'm working with the TIP project, which aims to provide stable OpenWrt capable hardware, among lots of other things.
I know some others in that group are interested in low cost solutions, so curious to know what price you think is viable for your market...
Ben, your question is appropriately industrial, but I've already given my answer, which is: "Less is more." The lower the price, the more human capital is protected from the abyss of the digital divide.
True, the primary problem is not so much the one-time cost of a cheap radio. However, one way to address the primary problem is to give public-spirited digital-haves a way to lift up their digital-have-not neighbors. With mesh computing, that means purchasing multiple routers and making gifts of at least one of them, along with the gift of access. What can the digital-haves afford to give, besides access? Typically, not much.
What follows is a rant. Advice: skip it.
If you are trying to lift up a broad swath of the world, then you need scale and vision, and part of that is how to make it self sustaining. Giving a few crumbs to folks is less useful in my mind than helping give them the means to make their own bread. Think someone starting a company that wants to deploy 10k hotspots with 40k satellite wifi mesh nodes....
Open source software (and maybe hardware) with high volume, affordable, and solid hardware is one of the core aims of TIP. Think of a price and minimum hardware that meets your goals, if I find someone that can make such a thing at such a price, I'll let you know.
If you want wave-1 ath10k to mesh, my advice is to use 7 virtual station vdevs and one AP on each radio. ath10k-ct firmware and software will support this nicely. Have those 7 stations connect to peers' AP vdevs. Do routing mesh magic through this topology. Then you don't care about anything other than STA + AP working. This might also scale to other platforms that don't support IBSS or MESH well.
Thanks, Ben
On 6/3/20 12:42 PM, Ben Greear wrote:
If you are trying to lift up a broad swath of the world, then you need scale and vision, and part of that is how to make it self sustaining. Giving a few crumbs to folks is less useful in my mind than helping give them the means to make their own bread. Think someone starting a company that wants to deploy 10k hotspots with 40k satellite wifi mesh nodes....
Your vision is deeply correct, but so is mine. One difference between the two ideas is that yours is top-down, with the purpose of offering a service, while mine is bottom-up, with the purpose of developing community and neighborliness, regardless of service provider(s). (I'm an admirer of the late Fred Rogers, and I'm not ashamed to admit it.)
But you are right, too. I'm also an admirer of the Indian cataract surgery guy (can't remember his name) who insisted that all patients pay for their surgery "because charity doesn't scale". He ultimately built a mind-boggling practice that, among other things, presumably now manufactures most of the world's intraocular lenses.
Open source software (and maybe hardware) with high volume, affordable, and solid hardware is one of the core aims of TIP. Think of a price and minimum hardware that meets your goals, if I find someone that can make such a thing at such a price, I'll let you know.
Fair enough. How about a *delivered* price, i.e. an out-of-pocket cost, of $50. Obviously a minimum of 2 radios, 128 Mb, and reasonable CPU power. Since we're forward-looking, here, with support for the new channels now presumably forthcoming from the FCC.
If you want wave-1 ath10k to mesh, my advice is to use 7 virtual station vdevs and one AP on each radio. ath10k-ct firmware and software will support this nicely. Have those 7 stations connect to peers' AP vdevs. Do routing mesh magic through this topology. Then you don't care about anything other than STA + AP working. This might also scale to other platforms that don't support IBSS or MESH well.
Many thanks, Ben. This is a helpful idea, and I daresay not many people could have come up with it. (Certainly not I!) You have just given me some homework to do, and I'm grateful for it. Bravo. If I get anywhere with it, you'll be the first to know.
Steve
b.a.t.m.a.n@lists.open-mesh.org