thanks for you analysis!!
On Mon, Sep 08, 2008 at 11:18:42PM +0200, Sven Eckelmann wrote:
Ok, I got the /proc/modules file now. Current situation is following:
crashes inside the the batman module add position 0x00000aa4
a60: 3c020000 lui v0,0x0
a64: 8c500024 lw s0,36(v0)
a68: 24420024 addiu v0,v0,36
a6c: 12020014 beq s0,v0,ac0 <cleanup_module+0x610>
a70: 3c040000 lui a0,0x0
a74: 3c050000 lui a1,0x0
a78: 3c020000 lui v0,0x0
a7c: 24840000 addiu a0,a0,0
a80: 24a50088 addiu a1,a1,136
a84: 24420000 addiu v0,v0,0
a88: 0040f809 jalr v0
a8c: 24060283 li a2,643
a90: 8e040004 lw a0,4(s0)
a94: 8e030000 lw v1,0(s0)
a98: 3c020010 lui v0,0x10
a9c: 34420100 ori v0,v0,0x100
aa0: 8e110008 lw s1,8(s0)
aa4: ac830000 sw v1,0(a0)
aa8: ae020000 sw v0,0(s0)
aac: 3c020020 lui v0,0x20
ab0: 34420200 ori v0,v0,0x200
ab4: ac640004 sw a0,4(v1)
This is part of the compiled version of packet_recv_thread. Due the
optimizations done I cannot say were exactly the problem lies.
I think the code of get_ip_addr() got inlined in packet_recv_thread and we
need to search for the crash inside of it at list_del(&entry->list);
I would also say that the really crash is inside __list_del where prev and
next will be set. To check it, look at LIST_POISON1 and LIST_POISON1 inside of
poison.h of the current linux kernel. You will notice that the values are
0x00100100 and 0x00200200 == address of the failed paging request. The list
poison stuff will be done in in list_del after calling __list_del (it is the
sequence lui, ori, sw in the asm snipped). So could it be that we have a
poisened entry inside the list?
This could for example happen when we get scheduled (please notice that the
optimizer exchanged many instrictions) while another part of the program is
deleting entries. I haven't checked the rest of the code if that really could
happen, but that is my current idea.
Mhm, as far as i looked into the issue, there are the following
points where free_client_list is accessed:
init_module() - INIT_LIST_HEAD()
* called on startup
get_ip_addr() - list_del():
* "secured" with a hash_lock spinlock
cleanup_module() - list_del():
* only called when unloading the module
batgat_ioctl() - list_del()
* from IOCREMDEV. This is called when batman shuts down.
packet_recv_thread - list_add():
* also secured in a hash_lock spinlock.
So it seems there should be no concurrency without user interaction
(module or batman shutdown).
But i don't have a good idea yet where the problem comes from ... :/