favorite
I'm developing a forwarding protocol using Contiki OS. The protocol is running on top of IEEE802.15.4 TSCH mode. The protocol requires to add a certain amount of packets during a short period of time very often I get following error:
[RLL]:Send to Parent 0 base timeslot: 40, currentTimeslot: 1, send timeslot: 45 at: asn-0.46c41d
TSCH: send packet to 255 with seqno 0, queue 0 1, len 8 120
[RLL]:Send to CS base timeslot: 40, currentTimeslot: 2, send timeslot: 50 at: asn-0.46c41e
TSCH-queue:! add packet failed: 0 #0x20003004 8 #0x0 #0x0
TSCH:! can't send packet to 255 with seqno 0, queue 1 1
While it adds the first packet, it can't add the second packet. The queue is not full, i checked that. The error simply says, its not possible to allocate memory for another packet, while there should be more than enough space.
Probably its just a simple setting i oversea but I can't find it. If anyone has a suggestion, please let me know.
Conrad
Hi Conrad,
Did you try tuning QUEUEBUF_CONF_NUM?
Simon
Hi Simon,
yeah I increased it to 32, but I found a similar problem https://github.com/contiki-os/contiki/issues/1766
It seems like that's the problem, working on a workaround at the moment.
If i know more, I'll update this post....
OK. This sounds like a problem in tsch_queue_reset. No time to dig myself now but please share any findings on your side :)
Hi,
I think that's related to TSCH locking. tsch_queue_remove_nbr grabs a lock and calls tsch_queue_flush_nbr_queue. Then tsch_queue_flush_nbr_queue calls tsch_queue_remove_packet_from_queue for each packet, but this function does the job only if TSCH is not locked.
Now that I think of it, I suspect I've seen this as well.
Could be lock-related but not exactly what you describe: tsch_queue_remove_nbr releases the lock before calling tsch_queue_flush_nbr_queue.
I couldn't find what caused the problem, but I found a workaround for the moment. I tested it overnight and I didn't receive a single error. I will further test it during my experiments and if its working fine I'll create a PR out of it.
But here is for the moment what I did to solve it:
void tsch_queue_reset(void)
{
/* Deallocate unneeded neighbors */
if (!tsch_is_locked())
{
struct tsch_neighbor *n = list_head(neighbor_list);
while (n != NULL)
{
struct tsch_neighbor *next_n = list_item_next(n);
/* Flush queue */
tsch_queue_flush_nbr_queue(n);
/* Reset backoff exponent */
tsch_queue_backoff_reset(n);
n = next_n;
}
//re-initialise the buffers
memb_init(&packet_memb); // <--- re-initialise packet buffer
queuebuf_init(); //<--- re-initialise queue buffer
}
}
Right, but I'd rather fix the root cause than aggressively re-init the modules. We need to find the memory leak.
I'll dig deeper when I have more time but might take some time till I can do it...
I fully understand that!
@Conrad2210 The patches of https://github.com/contiki-os/contiki/pull/2046 could resolve the issue you are experiencing. See https://github.com/contiki-os/contiki/pull/2108 for more information.
@yatch thanks for this, I will test it as soon as possible. Anyways, what you describe sounds reasonable and could be the problem. As soon as I know more, I'll let you know.
I had time today to check if the solution mentioned in #2108 and #2046 is solving the problem.
After running a few experiments, the error didn't appear anymore. I will run me experiments today and tomorrow. As soon as I get the results I will confirm my first observations
I was running experiments all day yesterday and during the night, and the problem is gone.
The solution mentioned in #2108 and #2046 solves the problem.
Thanks @yatch for the help!!!
@Conrad2210 Thank you for the test and the report! I'm happy to hear that!
Most helpful comment
I had time today to check if the solution mentioned in #2108 and #2046 is solving the problem.
After running a few experiments, the error didn't appear anymore. I will run me experiments today and tomorrow. As soon as I get the results I will confirm my first observations