Mbed-os: Ethernet rx network issues on i.MX RT1050

Created on 18 Jan 2019 · 37Comments · Source: ARMmbed/mbed-os

Description

Tested with 5.11.0-rc4 and 5.11.1.

We have found that there is some issues with Ethernet networking on the NXP i.MX RT1050 target. It feels like incoming packets are not always processed as they come in, and then sit around until _another_ packet is received which then results in both packets being processed.

Perhaps it is best to show with an example using ping. Here I am using the default 1s interval between sending pings:

$ ping 192.168.1.193
PING 192.168.1.193 (192.168.1.193): 56 data bytes
64 bytes from 192.168.1.193: icmp_seq=0 ttl=255 time=0.484 ms
Request timeout for icmp_seq 1
64 bytes from 192.168.1.193: icmp_seq=1 ttl=255 time=1001.064 ms
64 bytes from 192.168.1.193: icmp_seq=2 ttl=255 time=0.469 ms
64 bytes from 192.168.1.193: icmp_seq=3 ttl=255 time=1000.526 ms
64 bytes from 192.168.1.193: icmp_seq=4 ttl=255 time=0.435 ms
64 bytes from 192.168.1.193: icmp_seq=5 ttl=255 time=1001.259 ms
64 bytes from 192.168.1.193: icmp_seq=6 ttl=255 time=0.352 ms

Likewise, we can see in Mbed code the recv() will be blocked.. sending _another_ packet will result in 2 packets processed.

This can be easily reproduced with a minimal example:

#include "mbed.h"
#include "EthernetInterface.h"

EthernetInterface eth;
UDPSocket sock;
SocketAddress addr;

char buffer[8192];

int main() {
    wait(2);

    nsapi_error_t res = eth.connect();
    if (res < 0) { printf("connect = %d\n", res); }

    res = sock.open(&eth);
    if (res < 0) { printf("open = %d\n", res); }

    res = sock.bind(eth.get_ip_address(), 1234);
    if (res < 0) { printf("bind = %d\n", res); }

    printf("UDP relay -> bound to %s: 1234\n", eth.get_ip_address());

    while (true) {
        auto result = sock.recvfrom(&addr, buffer, sizeof(buffer));
        printf("(@%dms) accepted %s:%d (size=%d)\n", (int)us_ticker_read() / 1000, addr.get_ip_address(), addr.get_port(), result);

        if (result > 0) {
            sock.sendto(addr, buffer, (unsigned) result);
        }
    }
    return 0;
}

Running the same application on other targets does not show this behaviour. Tested on:

NXP K66F
ST NUCLEO-F767ZI

Issue request type

[ ] Question
[ ] Enhancement
[X] Bug

CLOSED nxp mirrored bug

Source

unsignedint

Most helpful comment

I will take a look at let you know.

mmahadevan108 on 23 Jan 2019

👍2

All 37 comments

Internal Jira reference: https://jira.arm.com/browse/MBOCUSTRIA-767

ciarmcom on 18 Jan 2019

I see this as well and have some further information:
Running ping with preload (-l) to send 8 packets without waiting for reply yields the following result:

sudo ping 10.10.101.3 -i1 -l8 -c14
PING 10.10.101.3 (10.10.101.3) 56(84) bytes of data.
64 bytes from 10.10.101.3: icmp_seq=1 ttl=255 time=0.267 ms
64 bytes from 10.10.101.3: icmp_seq=2 ttl=255 time=0.483 ms
64 bytes from 10.10.101.3: icmp_seq=3 ttl=255 time=0.228 ms
64 bytes from 10.10.101.3: icmp_seq=4 ttl=255 time=0.294 ms
64 bytes from 10.10.101.3: icmp_seq=5 ttl=255 time=0.239 ms
64 bytes from 10.10.101.3: icmp_seq=6 ttl=255 time=0.316 ms
64 bytes from 10.10.101.3: icmp_seq=7 ttl=255 time=0.187 ms
64 bytes from 10.10.101.3: icmp_seq=8 ttl=255 time=1001 ms
64 bytes from 10.10.101.3: icmp_seq=9 ttl=255 time=0.221 ms
64 bytes from 10.10.101.3: icmp_seq=10 ttl=255 time=1000 ms
64 bytes from 10.10.101.3: icmp_seq=11 ttl=255 time=0.208 ms
64 bytes from 10.10.101.3: icmp_seq=12 ttl=255 time=1000 ms
64 bytes from 10.10.101.3: icmp_seq=13 ttl=255 time=0.201 ms

--- 10.10.101.3 ping statistics ---
14 packets transmitted, 13 received, 7% packet loss, time 6002ms
rtt min/avg/max/mdev = 0.187/231.115/1001.300/421.475 ms, pipe 8

Note that only after this burst does the error occur and then for every other packet after that.

Also note that changing the interval changes the delay of the packet which fits with the description @unsignedint had about packets being stuck in queue and then handled at the time of the next incoming packet.

sudo ping 10.10.101.3 -i.1 -l1 -c6
PING 10.10.101.3 (10.10.101.3) 56(84) bytes of data.
64 bytes from 10.10.101.3: icmp_seq=1 ttl=255 time=105 ms
64 bytes from 10.10.101.3: icmp_seq=2 ttl=255 time=0.210 ms
64 bytes from 10.10.101.3: icmp_seq=3 ttl=255 time=110 ms
64 bytes from 10.10.101.3: icmp_seq=4 ttl=255 time=0.211 ms
64 bytes from 10.10.101.3: icmp_seq=5 ttl=255 time=110 ms
64 bytes from 10.10.101.3: icmp_seq=6 ttl=255 time=0.197 ms

--- 10.10.101.3 ping statistics ---
6 packets transmitted, 6 received, 0% packet loss, time 545ms
rtt min/avg/max/mdev = 0.197/54.369/110.191/54.188 ms, pipe 2

jonaslindahl on 18 Jan 2019

@mmahadevan108 ping.. any ideas?

unsignedint on 18 Jan 2019

I will take a look at let you know.

mmahadevan108 on 23 Jan 2019

👍2

@mmahadevan108 ping.. Did you manage to reproduce the error?

jonaslindahl on 28 Jan 2019

@unsignedint I am unable to replicate the issue. I ran a K66F and MXRT1050 at the same time and do not see any packets dropped when running the ping test.

I was running from the mbed-os master branch @commit 1a8844e8ed9e6766bfa57c67d8147f9dfeb19135

mmahadevan108 on 29 Jan 2019

Could you provide exact details of what you were running. Release version, target, HW?

jonaslindahl on 29 Jan 2019

GCC build?

jonaslindahl on 29 Jan 2019

ARM build, I can try GCC-ARM build and let you know.

mmahadevan108 on 29 Jan 2019

Thanx, that's what we are using...

We are using the following compiler:

arm-none-eabi-gcc (GNU Tools for ARM Embedded Processors 6-2017-q2-update) 6.3.1 20170620 (release) [ARM/embedded-6-branch revision 249437]
Copyright (C) 2016 Free Software Foundation, Inc.

jonaslindahl on 29 Jan 2019

Below is what I am using
arm-none-eabi-gcc.exe (GNU Tools for Arm Embedded Processors 7-2018-q2-update) 7.3.1 20180622 (release)
[ARM/embedded-7-branch revision 261907]
Copyright (C) 2017 Free Software Foundation, Inc.

I am not able to reproduce the ping failure that @unsignedint mentioned using GCC_ARM. I am running the app on the K66 and MXRT1050

mmahadevan108 on 29 Jan 2019

On K66F:
64 bytes from 10.81.16.55: icmp_seq=314 ttl=255 time=0.195 ms
64 bytes from 10.81.16.55: icmp_seq=315 ttl=255 time=0.198 ms
64 bytes from 10.81.16.55: icmp_seq=316 ttl=255 time=0.207 ms
64 bytes from 10.81.16.55: icmp_seq=317 ttl=255 time=0.187 ms
64 bytes from 10.81.16.55: icmp_seq=318 ttl=255 time=0.180 ms

--- 10.81.16.55 ping statistics ---
318 packets transmitted, 318 received, 0% packet loss, time 324558ms
rtt min/avg/max/mdev = 0.170/0.190/0.394/0.016 ms

On MXRT1050
64 bytes from 10.81.16.248: icmp_seq=309 ttl=255 time=0.259 ms
64 bytes from 10.81.16.248: icmp_seq=310 ttl=255 time=0.255 ms
64 bytes from 10.81.16.248: icmp_seq=311 ttl=255 time=0.268 ms
^C64 bytes from 10.81.16.248: icmp_seq=312 ttl=255 time=0.274 ms
64 bytes from 10.81.16.248: icmp_seq=313 ttl=255 time=0.259 ms
64 bytes from 10.81.16.248: icmp_seq=314 ttl=255 time=0.270 ms

--- 10.81.16.248 ping statistics ---
314 packets transmitted, 314 received, 0% packet loss, time 320478ms
rtt min/avg/max/mdev = 0.243/0.262/0.647/0.031 ms

mmahadevan108 on 29 Jan 2019

Ok, could we get access to your exact app and then we can find out what is the difference?

jonaslindahl on 29 Jan 2019

For the record: Me and @unsignedint are on opposite side of the globe, but we work together

jonaslindahl on 29 Jan 2019

For now the only difference I see is that we may have different setup/app or compiler so I would like to rule out as many differences as possible and see if I can solve this...

jonaslindahl on 29 Jan 2019

This zip file has binaries for MXRT1050 & K66F
Issue9420.zip

mmahadevan108 on 29 Jan 2019

Source for this build is below
main_cpp.txt

mmahadevan108 on 29 Jan 2019

For MXRT1050, I have an EVKB board.

mmahadevan108 on 29 Jan 2019

Ok, so I will do mbed new, add you app and change to your commit above in mbed-os.lib.
If I still see the error I guess I will build with your compiler on Windows... :)
Thanks for trying, me or @unsignedint will come back with our results..

jonaslindahl on 29 Jan 2019

Can you try the binary I sent as well. Thank you.

mmahadevan108 on 29 Jan 2019

I will try right away to rule out any HW issues...

jonaslindahl on 29 Jan 2019

Is the binary for HyperFlash?
We run QSPI NOR so I will have to do some soldering to try if it is HyperFlash...

jonaslindahl on 29 Jan 2019

Yes, this is for HyperFlash

mmahadevan108 on 29 Jan 2019

I can confirm that your binary for MXRT1050 works. I will start from source now and test all known differences until I find the problem and report back here.

jonaslindahl on 29 Jan 2019

I found part of the problem: When we found this issue we switched from our HW to the EVK, but failed to remove one thing we had changed. In our product we don't have the SDRAM so we have changed the memory configuration.

That said, I still don't get it to work without SDRAM, maybe you know what the problem is?
My understanding is that the Ethernet descriptors (rx/tx_desc_start_addr) need to go in an uncacheable area.

In our application we use DTC_RAM(cached) for most DATA, but OC_RAM(uncached) for the rx/tx desc. To simplify things I have created a minimum patch which causes the problem. The error is not quite as frequent as in our application, but definitely not rare.

The following patch should cause the problem. We would appreciate if you could help us find out what is wrong with it.

This patch uses only OCRAM for data and it is all uncached for simplicity.

0001-Ethernet-fails-with-no-DDR.patch.txt

jonaslindahl on 29 Jan 2019

Have you added an entry to make OCRAM uncached inside BOARD_ConfigMPU() i.e for Region 6 change the IsCacheable bit to 1

mmahadevan108 on 29 Jan 2019

Yes, but please tell me if I did it right. It is all in the patch in the previous message. It is very short

jonaslindahl on 29 Jan 2019

Hmm, I'm not sure I understand you... Should I set the Is Cacheable to 1? That means enable cache right?

jonaslindahl on 29 Jan 2019

The below line places the buffer descriptors in a a non-cacheable section that is defined in the linker script.
https://github.com/ARMmbed/mbed-os/blob/master/features/netsocket/emac-drivers/TARGET_NXP_EMAC/TARGET_IMX/imx_emac.cpp#L187

You may have to update this for your scenario.

mmahadevan108 on 29 Jan 2019

That's what I have done. In our code I have all our data except those two in DTC_RAM with cache and those two in OCRAM without cache...

In order not to complicate for you I tried to create another scenario with the patch above which puts all data including the two descriptor buffers into OCRAM without cache.

However I can see now that I failed to do it correctly, but it happened to give the same error. I will clean it up...

jonaslindahl on 29 Jan 2019

I have cleaned up my error. Now all data is in OCRAM and it is uncached, but it does not work.
I have copied the settings which was used for the ncache area of the SDRAM to region 6 and then changed the linker file to put all data in that region. See attached patch...

0001-All-data-in-OCRAM.patch.txt

jonaslindahl on 29 Jan 2019

@jonaslindahl Do you think you could open a PR for the patch?

cmonr on 30 Jan 2019

@cmonr the patch file is a minimal set of changes to the linker and Mbed initialisation for the RT1050 target with SDRAM disabled. This reproduces the Ethernet error discussed in this issue. Hoping to get some feedback from @mmahadevan108 ...

I'm not sure a PR is appropriate (until we have some solution and then we can possibly have some #ifdef for using SDRAM or not?). Cheers.

unsignedint on 30 Jan 2019

👍1

Hi @cmonr, @unsignedint is exactly right the patch is not supposed to be a PR, only a way for me to show how to trigger the error (bug?)

@mmahadevan108 Yesterday I tried another method where I left all regular data in SDRAM and only moved the descriptors (ncache section) to OCRAM instead of SDRAM. Doing so I used the exact same setup for the MMU for the OCRAM as was used for the ncache area of the SDRAM. Still this does not work. I'm suspecting that that either the MPU config or the implementation is at fault here, but that it does not show up in the SDRAM case due to timing.

I'm not quite experienced enough with Arm/MPU to be sure here, but I have a theory that this region should be shareable. In my understanding, if the area is not shareable, the code itself must handle the coherency between multiple bus masters. I can't see this in the code, but I'm not sure...

Quote from: ARM® Cortex®-M7 Devices - Generic User Guide

If multiple bus masters can access a non-shareable memory region,
software must ensure data coherency between the bus masters.

I will test setting the area to shareable and non cached today and report back.

jonaslindahl on 31 Jan 2019

👍1

I have managed to get it to work in OCRAM, but I'm not sure it should be like this. Maybe you can enlighten me @mmahadevan108

To get it to work I had to do the following:
1) Set the area used for all data (except Ethernet descriptors) to shareable (This was the case with the SDRAM setup and I missed it. I guess it is needed since the data buffers are in this section)
2) Set the area used for Ethernet descriptors to "Strongly Ordered". It was the only combination I found which worked. For the SDRAM setup (original) Not shareable Not cached was used, but this does not work in OCRAM

Please help me figure out why the same setup does not work for OCRAM as for SDRAM. To me it sounds like something is wrong.

My working solution for OCRAM can be found in the following patch:
0001-Works-with-OCRAM-no-SDRAM-used.patch.txt

jonaslindahl on 31 Jan 2019

@mmahadevan108 - can you comment on the status of this? Looks like the issue is resolved with the configuration changes mentioned in the last update. Have you reviewed to see if you can make updates to allow this configuration to be supported as an option?