If rtnl_trylock() is used to prevent circular dependencies, rtnl_unlock() may destroy this attempt because it first unlocks rtnl_mutex but may lock it again later. The callgraph roughly looks like:
rtnl_unlock() netdev_run_todo() __rtnl_unlock() netdev_wait_allrefs() rtnl_lock() ... __rtnl_unlock()
There are two users which have possible deadlocks and are fixed in this patch: batman-adv and bridge. Their problematic access pattern is the same:
notifier_call handler: * holds rtnl lock when called * calls sysfs code at some point (acquiring sysfs locks)
sysfs code: * holds sysfs lock when called * calls rtnl_trylock() and rtnl_unlock(), rtnl_unlock() calls rtnl_lock implicitly
Fix this by exporting __rtnl_unlock() to just unlock the mutex without implicitly locking again.
Reported-by: Sven Eckelmann sven@narfation.org Signed-off-by: Simon Wunderlich siwu@hrz.tu-chemnitz.de
--- Of course, an alternative would be to not lock again after unlocking within rtnl_unlock(), or put the todo handling into the locked section. I'm not familiar enough with this code to know what would be best.
There are others using rtnl_trylock(), but I'm not sure if they are affected. --- net/batman-adv/sysfs.c | 2 +- net/bridge/br_sysfs_br.c | 2 +- net/bridge/br_sysfs_if.c | 2 +- net/core/rtnetlink.c | 1 + 4 files changed, 4 insertions(+), 3 deletions(-)
diff --git a/net/batman-adv/sysfs.c b/net/batman-adv/sysfs.c index 66518c7..41b74aa 100644 --- a/net/batman-adv/sysfs.c +++ b/net/batman-adv/sysfs.c @@ -635,7 +635,7 @@ static ssize_t batadv_store_mesh_iface(struct kobject *kobj, ret = batadv_hardif_enable_interface(hard_iface, buff);
unlock: - rtnl_unlock(); + __rtnl_unlock(); out: batadv_hardif_free_ref(hard_iface); return ret; diff --git a/net/bridge/br_sysfs_br.c b/net/bridge/br_sysfs_br.c index c5c0593..c122782 100644 --- a/net/bridge/br_sysfs_br.c +++ b/net/bridge/br_sysfs_br.c @@ -142,7 +142,7 @@ static ssize_t store_stp_state(struct device *d, if (!rtnl_trylock()) return restart_syscall(); br_stp_set_enabled(br, val); - rtnl_unlock(); + __rtnl_unlock();
return len; } diff --git a/net/bridge/br_sysfs_if.c b/net/bridge/br_sysfs_if.c index 13b36bd..d99f394 100644 --- a/net/bridge/br_sysfs_if.c +++ b/net/bridge/br_sysfs_if.c @@ -223,7 +223,7 @@ static ssize_t brport_store(struct kobject * kobj, if (ret == 0) ret = count; } - rtnl_unlock(); + __rtnl_unlock(); } return ret; } diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c index fad649a..d95ba6f 100644 --- a/net/core/rtnetlink.c +++ b/net/core/rtnetlink.c @@ -72,6 +72,7 @@ void __rtnl_unlock(void) { mutex_unlock(&rtnl_mutex); } +EXPORT_SYMBOL(__rtnl_unlock);
void rtnl_unlock(void) {
On Saturday 01 December 2012 18:29:51 Simon Wunderlich wrote:
If rtnl_trylock() is used to prevent circular dependencies, rtnl_unlock() may destroy this attempt because it first unlocks rtnl_mutex but may lock it again later. The callgraph roughly looks like:
rtnl_unlock() netdev_run_todo() __rtnl_unlock() netdev_wait_allrefs() rtnl_lock() ... __rtnl_unlock()
There are two users which have possible deadlocks and are fixed in this patch: batman-adv and bridge. Their problematic access pattern is the same:
notifier_call handler:
- holds rtnl lock when called
- calls sysfs code at some point (acquiring sysfs locks)
sysfs code:
- holds sysfs lock when called
- calls rtnl_trylock() and rtnl_unlock(), rtnl_unlock() calls rtnl_lock implicitly
Fix this by exporting __rtnl_unlock() to just unlock the mutex without implicitly locking again.
Reported-by: Sven Eckelmann sven@narfation.org Signed-off-by: Simon Wunderlich siwu@hrz.tu-chemnitz.de
Of course, an alternative would be to not lock again after unlocking within rtnl_unlock(), or put the todo handling into the locked section. I'm not familiar enough with this code to know what would be best.
There are others using rtnl_trylock(), but I'm not sure if they are affected.
At least they look like they have a problem in a parallel user scenario involving another lock and locking order (most of them s_active or a device lock). So just to list the places and poke some other users. They can better decide for themself because they know the code.
drivers/infiniband/ulp/ipoib/ipoib_cm.c: if (!rtnl_trylock()) drivers/infiniband/ulp/ipoib/ipoib_vlan.c: if (!rtnl_trylock()) drivers/infiniband/ulp/ipoib/ipoib_vlan.c: if (!rtnl_trylock()) drivers/net/bonding/bond_alb.c: if (!rtnl_trylock()) { drivers/net/bonding/bond_main.c: if (!rtnl_trylock()) { drivers/net/bonding/bond_main.c: if (!rtnl_trylock()) { drivers/net/bonding/bond_main.c: if (!rtnl_trylock()) { drivers/net/bonding/bond_main.c: if (!rtnl_trylock()) { drivers/net/bonding/bond_sysfs.c: if (!rtnl_trylock()) drivers/net/bonding/bond_sysfs.c: if (!rtnl_trylock()) drivers/net/bonding/bond_sysfs.c: if (!rtnl_trylock()) drivers/net/bonding/bond_sysfs.c: if (!rtnl_trylock()) drivers/net/bonding/bond_sysfs.c: if (!rtnl_trylock()) drivers/net/bonding/bond_sysfs.c: if (!rtnl_trylock()) drivers/net/ethernet/chelsio/cxgb3/cxgb3_main.c: if (!rtnl_trylock()) /* synchronize with ifdown */ drivers/s390/net/qeth_l2_main.c: if (rtnl_trylock()) { drivers/s390/net/qeth_l3_main.c: if (rtnl_trylock()) { net/bridge/br_sysfs_br.c: if (!rtnl_trylock()) net/bridge/br_sysfs_if.c: if (!rtnl_trylock()) net/core/net-sysfs.c: if (!rtnl_trylock()) net/core/net-sysfs.c: if (!rtnl_trylock()) net/core/net-sysfs.c: if (!rtnl_trylock()) net/core/net-sysfs.c: if (!rtnl_trylock()) net/core/net-sysfs.c: if (!rtnl_trylock()) net/ipv4/devinet.c: if (!rtnl_trylock()) { net/ipv6/addrconf.c: if (!rtnl_trylock()) net/ipv6/addrconf.c: if (!rtnl_trylock())
Kind regards, Sven
On Sat, 2012-12-01 at 18:29 +0100, Simon Wunderlich wrote:
If rtnl_trylock() is used to prevent circular dependencies, rtnl_unlock() may destroy this attempt because it first unlocks rtnl_mutex but may lock it again later. The callgraph roughly looks like:
rtnl_unlock() netdev_run_todo() __rtnl_unlock() netdev_wait_allrefs() rtnl_lock() ... __rtnl_unlock()
There are two users which have possible deadlocks and are fixed in this patch: batman-adv and bridge. Their problematic access pattern is the same:
notifier_call handler:
- holds rtnl lock when called
- calls sysfs code at some point (acquiring sysfs locks)
sysfs code:
- holds sysfs lock when called
- calls rtnl_trylock() and rtnl_unlock(), rtnl_unlock() calls rtnl_lock implicitly
Fix this by exporting __rtnl_unlock() to just unlock the mutex without implicitly locking again.
Reported-by: Sven Eckelmann sven@narfation.org Signed-off-by: Simon Wunderlich siwu@hrz.tu-chemnitz.de
Of course, an alternative would be to not lock again after unlocking within rtnl_unlock(), or put the todo handling into the locked section. I'm not familiar enough with this code to know what would be best.
There are others using rtnl_trylock(), but I'm not sure if they are affected.
net/batman-adv/sysfs.c | 2 +- net/bridge/br_sysfs_br.c | 2 +- net/bridge/br_sysfs_if.c | 2 +- net/core/rtnetlink.c | 1 + 4 files changed, 4 insertions(+), 3 deletions(-)
diff --git a/net/batman-adv/sysfs.c b/net/batman-adv/sysfs.c index 66518c7..41b74aa 100644 --- a/net/batman-adv/sysfs.c +++ b/net/batman-adv/sysfs.c @@ -635,7 +635,7 @@ static ssize_t batadv_store_mesh_iface(struct kobject *kobj, ret = batadv_hardif_enable_interface(hard_iface, buff);
unlock:
- rtnl_unlock();
- __rtnl_unlock();
out: batadv_hardif_free_ref(hard_iface); return ret; diff --git a/net/bridge/br_sysfs_br.c b/net/bridge/br_sysfs_br.c index c5c0593..c122782 100644 --- a/net/bridge/br_sysfs_br.c +++ b/net/bridge/br_sysfs_br.c @@ -142,7 +142,7 @@ static ssize_t store_stp_state(struct device *d, if (!rtnl_trylock()) return restart_syscall(); br_stp_set_enabled(br, val);
- rtnl_unlock();
__rtnl_unlock();
return len;
}
I have no idea of why you believe there is a problem here.
Could you explain how net_todo_list could be not empty ?
As long as no device is unregistered between rtnl_trylock()/rtnl_unlock(), there is no possible deadlock.
On Saturday 01 December 2012 10:36:12 Eric Dumazet wrote: [...]
diff --git a/net/batman-adv/sysfs.c b/net/batman-adv/sysfs.c index 66518c7..41b74aa 100644 --- a/net/batman-adv/sysfs.c +++ b/net/batman-adv/sysfs.c @@ -635,7 +635,7 @@ static ssize_t batadv_store_mesh_iface(struct kobject *kobj,> ret = batadv_hardif_enable_interface(hard_iface, buff);
unlock:
- rtnl_unlock();
- __rtnl_unlock();
out: batadv_hardif_free_ref(hard_iface); return ret;
diff --git a/net/bridge/br_sysfs_br.c b/net/bridge/br_sysfs_br.c index c5c0593..c122782 100644 --- a/net/bridge/br_sysfs_br.c +++ b/net/bridge/br_sysfs_br.c @@ -142,7 +142,7 @@ static ssize_t store_stp_state(struct device *d,
if (!rtnl_trylock())
return restart_syscall();
br_stp_set_enabled(br, val);
- rtnl_unlock();
__rtnl_unlock();
return len;
}
I have no idea of why you believe there is a problem here.
Could you explain how net_todo_list could be not empty ?
As long as no device is unregistered between rtnl_trylock()/rtnl_unlock(), there is no possible deadlock.
I am not sure what "here" means for your. At least batman-adv tries to unregister a device -> problem [1]. I will not make any judgements about the other uses in the kernel/other parts patched by Simon.
Kind regards, Sven
On Sat, 2012-12-01 at 20:04 +0100, Sven Eckelmann wrote:
On Saturday 01 December 2012 10:36:12 Eric Dumazet wrote:
diff --git a/net/bridge/br_sysfs_br.c b/net/bridge/br_sysfs_br.c index c5c0593..c122782 100644 --- a/net/bridge/br_sysfs_br.c +++ b/net/bridge/br_sysfs_br.c @@ -142,7 +142,7 @@ static ssize_t store_stp_state(struct device *d,
if (!rtnl_trylock())
return restart_syscall();
br_stp_set_enabled(br, val);
- rtnl_unlock();
__rtnl_unlock();
return len;
}
I have no idea of why you believe there is a problem here.
Could you explain how net_todo_list could be not empty ?
As long as no device is unregistered between rtnl_trylock()/rtnl_unlock(), there is no possible deadlock.
I am not sure what "here" means for your. At least batman-adv tries to unregister a device -> problem [1]. I will not make any judgements about the other uses in the kernel/other parts patched by Simon.
I was reacting to the change in net/bridge/br_sysfs_br.c
rtnl_trylock() could set a boolean flag to explicitly WARN_ON() in case we try to unregister a device.
On Saturday 01 December 2012 11:44:34 Eric Dumazet wrote: [...]
I have no idea of why you believe there is a problem here.
Could you explain how net_todo_list could be not empty ?
As long as no device is unregistered between rtnl_trylock()/rtnl_unlock(), there is no possible deadlock.
I am not sure what "here" means for your. At least batman-adv tries to unregister a device -> problem [1]. I will not make any judgements about the other uses in the kernel/other parts patched by Simon.
I was reacting to the change in net/bridge/br_sysfs_br.c
Yes, in this context it makes more sense.
rtnl_trylock() could set a boolean flag to explicitly WARN_ON() in case we try to unregister a device.
Sounds interesting.
Kind regards, Sven
On Sat, Dec 01, 2012 at 11:44:34AM -0800, Eric Dumazet wrote:
On Sat, 2012-12-01 at 20:04 +0100, Sven Eckelmann wrote:
On Saturday 01 December 2012 10:36:12 Eric Dumazet wrote:
diff --git a/net/bridge/br_sysfs_br.c b/net/bridge/br_sysfs_br.c index c5c0593..c122782 100644 --- a/net/bridge/br_sysfs_br.c +++ b/net/bridge/br_sysfs_br.c @@ -142,7 +142,7 @@ static ssize_t store_stp_state(struct device *d,
if (!rtnl_trylock())
return restart_syscall();
br_stp_set_enabled(br, val);
- rtnl_unlock();
__rtnl_unlock();
return len;
}
I have no idea of why you believe there is a problem here.
Could you explain how net_todo_list could be not empty ?
As long as no device is unregistered between rtnl_trylock()/rtnl_unlock(), there is no possible deadlock.
I am not sure what "here" means for your. At least batman-adv tries to unregister a device -> problem [1]. I will not make any judgements about the other uses in the kernel/other parts patched by Simon.
I was reacting to the change in net/bridge/br_sysfs_br.c
rtnl_trylock() could set a boolean flag to explicitly WARN_ON() in case we try to unregister a device.
Well, I'm not sure if this can happen in the bridge code, but from looking at the code it doesn't appear to be impossible. It would be better to be sure that it can't deadlock IMHO.
(Although doing unlock/lock/unlock in an unlock function is also a little "uncommon").
Cheers, Simon
On Sat, Dec 01, 2012 at 09:01:53PM +0100, Simon Wunderlich wrote:
On Sat, Dec 01, 2012 at 11:44:34AM -0800, Eric Dumazet wrote:
On Sat, 2012-12-01 at 20:04 +0100, Sven Eckelmann wrote:
On Saturday 01 December 2012 10:36:12 Eric Dumazet wrote:
diff --git a/net/bridge/br_sysfs_br.c b/net/bridge/br_sysfs_br.c index c5c0593..c122782 100644 --- a/net/bridge/br_sysfs_br.c +++ b/net/bridge/br_sysfs_br.c @@ -142,7 +142,7 @@ static ssize_t store_stp_state(struct device *d,
if (!rtnl_trylock())
return restart_syscall();
br_stp_set_enabled(br, val);
- rtnl_unlock();
__rtnl_unlock();
return len;
}
I have no idea of why you believe there is a problem here.
Could you explain how net_todo_list could be not empty ?
As long as no device is unregistered between rtnl_trylock()/rtnl_unlock(), there is no possible deadlock.
I am not sure what "here" means for your. At least batman-adv tries to unregister a device -> problem [1]. I will not make any judgements about the other uses in the kernel/other parts patched by Simon.
I was reacting to the change in net/bridge/br_sysfs_br.c
rtnl_trylock() could set a boolean flag to explicitly WARN_ON() in case we try to unregister a device.
Well, I'm not sure if this can happen in the bridge code, but from looking at the code it doesn't appear to be impossible. It would be better to be sure that it can't deadlock IMHO.
(Although doing unlock/lock/unlock in an unlock function is also a little "uncommon").
But still we have the problem in batman-adv (as Sven pointed out in a previous email) that tries to unregister a device in that "critical window".
Exporting __rtnl_unlock() would solve the issue in this case.
If you think the bridge code should not end up in such situation, what if Simon resends the patch with only the __rtnl_unlock() exportation and the change in batman-adv?
Cheers,
On Monday 03 December 2012 21:09:06 Antonio Quartulli wrote:
But still we have the problem in batman-adv (as Sven pointed out in a previous email) that tries to unregister a device in that "critical window".
Exporting __rtnl_unlock() would solve the issue in this case.
If you think the bridge code should not end up in such situation, what if Simon resends the patch with only the __rtnl_unlock() exportation and the change in batman-adv?
I personally have big doubts about the removal of the second half of the unregister. Doesn't sound like the best idea. This would result in side effects... one of them for example would be the possible deadlock scenario moved to the other users of rtnl_trylock which don't unregister a device inside their critical section.... so either you do it always this way when using rtnl_trylock or it will break. I don't want to imagine other problems caused by this change.
Kind regards, Sven
b.a.t.m.a.n@lists.open-mesh.org