Hi Marek,
the important aspect is that the method of estimating the throughput is consistent across all radios on the same AP. This is necessary to make the estimated throughput values comparable.
Yes, I agree, and that is what my point is. The current implementation and what is being proposed here prefer to use sta_get_expected_throughput(), if available, and then fall back to examining the tx rate more directly. While both of these methods attempt to estimate throughput, one method may reliably result in over estimation while another method may reliably result in underestimation.
In my case, my 2.4ghz radio driver uses minstrel for rate control, so throughput estimates are derived using sta_get_expected_throughput(). For me, this estimation is chronically an over estimate. The 5ghz radio does rate control in hardware, so we cannot use the sta_get_expected_throughput() method for it. As such, we fall back to using the less prefered method of determination. Currently, that means tx rate / 3 (which is an under-estimate).
This results in my network perferring 2.4ghz paths when it should be preferring 5ghz paths. The problem is that throughput calculation method is not consistent across radios.
I know that both these methods of throughput estimation are trying to estimate the same thing, but they are implemented differently. There implementation details can result in a bias to over or under estimation.
I'm suggesting that we make an effort to make the throughput calculation method consistent across radios. More specifically, if one radio doesn't support sta_get_expected_throughput(), then we shouldn't use that method for any radio -- all radios should use the same fallback mechanism.
Does this make sense?
The more consistent the outcomes of the methods of throughput estimation are, the less problematic what I'm describing becomes.
After this patch, it means that the throughput estimation for 5ghz stas/neighbors in my network will be derived by examining an exponentially weighted average of tx rate with consideration of tx failures. If this new fallback method results in in more similar results to sta_get_expected_throughput(), then my problem will be lessened, possibly to the point of my network preferring 5ghz (as should be done).
But as long as we keep an implementation where we have different throughput calculation methods for different radios, we will remain susceptible to what I'm describing.
FYI, expected throughput and also 802.11 throughput estimation are taking congestion into account. If you believe this isn't sufficient to get an accurate read of the situation, can you please expand on your findings?
OK, thanks. If you're confident that sta_get_expected_throughput() returns a result that reflects the recent or likely external contention on the operating frequency, that's good to know. I was worried that my overestimated result was a reflection of how fast we could tx towards a client once the opportunity presented itself. But given your remark here, it sounds like the answer to this is "no" -- the throughput estimate should reflect external congestion, such as tx from other BSS's on the same frequency.
Like I noted in my original message, I was seeing the estimated throughput as 150Mb/s for the sta_get_expected_throughput() method, while really only able to tx at ~25Mb/s. This problem might be specific to my driver somehow, despite the fact that it uses mistrel for tx. I'll look into this more closely and report back what I find. I'll try out other chipsets (ie QCA) to see how they behave.
So in summary, I see one problem that results from different radios on the same router using different throughput determination mechanisms. This problem may get better with this change, but the underlying issue of using different methods per radio remains. In my case, I also found that sta_get_expected_throughput() delivers over-estimates. In my original message, I was considering that this could potentially be due to the fact that sta_get_expected_throughput() was not considering external congestion. But given your feedback, I'll now be debugging under the assumption that something else causes overestimation in my case.
Thanks,
Andy