Contiki: TSCH queue add error

Created on 9 Feb 2017  路  14Comments  路  Source: contiki-os/contiki

favorite
I'm developing a forwarding protocol using Contiki OS. The protocol is running on top of IEEE802.15.4 TSCH mode. The protocol requires to add a certain amount of packets during a short period of time very often I get following error:

[RLL]:Send to Parent 0 base timeslot: 40, currentTimeslot: 1, send timeslot: 45 at: asn-0.46c41d
TSCH: send packet to 255 with seqno 0, queue 0 1, len 8 120
[RLL]:Send to CS base timeslot: 40, currentTimeslot: 2, send timeslot: 50 at: asn-0.46c41e
TSCH-queue:! add packet failed: 0 #0x20003004 8 #0x0 #0x0
TSCH:! can't send packet to 255 with seqno 0, queue 1 1

While it adds the first packet, it can't add the second packet. The queue is not full, i checked that. The error simply says, its not possible to allocate memory for another packet, while there should be more than enough space.

Probably its just a simple setting i oversea but I can't find it. If anyone has a suggestion, please let me know.

Conrad

Most helpful comment

I had time today to check if the solution mentioned in #2108 and #2046 is solving the problem.
After running a few experiments, the error didn't appear anymore. I will run me experiments today and tomorrow. As soon as I get the results I will confirm my first observations

All 14 comments

Hi Conrad,
Did you try tuning QUEUEBUF_CONF_NUM?
Simon

Hi Simon,
yeah I increased it to 32, but I found a similar problem https://github.com/contiki-os/contiki/issues/1766
It seems like that's the problem, working on a workaround at the moment.
If i know more, I'll update this post....

OK. This sounds like a problem in tsch_queue_reset. No time to dig myself now but please share any findings on your side :)

Hi,
I think that's related to TSCH locking. tsch_queue_remove_nbr grabs a lock and calls tsch_queue_flush_nbr_queue. Then tsch_queue_flush_nbr_queue calls tsch_queue_remove_packet_from_queue for each packet, but this function does the job only if TSCH is not locked.

Now that I think of it, I suspect I've seen this as well.

Could be lock-related but not exactly what you describe: tsch_queue_remove_nbr releases the lock before calling tsch_queue_flush_nbr_queue.

I couldn't find what caused the problem, but I found a workaround for the moment. I tested it overnight and I didn't receive a single error. I will further test it during my experiments and if its working fine I'll create a PR out of it.

But here is for the moment what I did to solve it:

void tsch_queue_reset(void)
{
  /* Deallocate unneeded neighbors */
  if (!tsch_is_locked())
  {
    struct tsch_neighbor *n = list_head(neighbor_list);
    while (n != NULL)
    {

        struct tsch_neighbor *next_n = list_item_next(n);
        /* Flush queue */
        tsch_queue_flush_nbr_queue(n);
        /* Reset backoff exponent */
        tsch_queue_backoff_reset(n);
        n = next_n;
    }


    //re-initialise the buffers
    memb_init(&packet_memb); // <--- re-initialise packet buffer
    queuebuf_init(); //<--- re-initialise queue buffer
    }
}

Right, but I'd rather fix the root cause than aggressively re-init the modules. We need to find the memory leak.

I'll dig deeper when I have more time but might take some time till I can do it...

I fully understand that!

@Conrad2210 The patches of https://github.com/contiki-os/contiki/pull/2046 could resolve the issue you are experiencing. See https://github.com/contiki-os/contiki/pull/2108 for more information.

@yatch thanks for this, I will test it as soon as possible. Anyways, what you describe sounds reasonable and could be the problem. As soon as I know more, I'll let you know.

I had time today to check if the solution mentioned in #2108 and #2046 is solving the problem.
After running a few experiments, the error didn't appear anymore. I will run me experiments today and tomorrow. As soon as I get the results I will confirm my first observations

I was running experiments all day yesterday and during the night, and the problem is gone.
The solution mentioned in #2108 and #2046 solves the problem.

Thanks @yatch for the help!!!

@Conrad2210 Thank you for the test and the report! I'm happy to hear that!

Was this page helpful?
0 / 5 - 0 ratings

Related issues

davidsantosb picture davidsantosb  路  7Comments

alejandr0 picture alejandr0  路  12Comments

ragbagger16 picture ragbagger16  路  10Comments

alignan picture alignan  路  10Comments

tarakanov picture tarakanov  路  16Comments