Mbed-os: nRF SDK15 and USB Support

Created on 2 Aug 2018 · 107Comments · Source: ARMmbed/mbed-os

Description

Update - USB Implementation Done - Pull Request Merged

I have finished the implementation for the USBPhy on the nRF52840 platform. I have put in a pull request so hopefully it should be released in master soon.

You can check out my development branch here.

Please note the development branch has been updated to work with Nordic SDK version 15!

USB Basic Tests

| test case | status |
|--------------------------------------------|-------------------------|
| usb control basic test | :heavy_check_mark: PASSING |
| usb control stall test | :heavy_check_mark: PASSING |
| usb control sizes test | :heavy_check_mark: PASSING |
| usb control stress test | :heavy_check_mark: PASSING |
| usb device reset test | :heavy_check_mark: PASSING |
| usb soft reconnection test | :heavy_check_mark: PASSING |
| usb repeated construction destruction test | :heavy_check_mark: PASSING |
| endpoint test data correctness | :heavy_check_mark: PASSING |
| endpoint test halt | :heavy_check_mark: PASSING |
| endpoint test parallel transfers | :heavy_check_mark: PASSING |
| endpoint test parallel transfers ctrl | :heavy_check_mark: PASSING |
| endpoint test abort | :heavy_check_mark: PASSING |
| endpoint test data toggle reset | :heavy_check_mark: PASSING |

Serial USB Test

| test case | status |
|--------------------------------------------|-------------------------|
| cdc usb reconnect | :heavy_check_mark: PASSING |
| cdc rx single bytes | :heavy_check_mark: PASSING |
| cdc rx single bytes concurrent | :heavy_check_mark: PASSING |
| cdc rx multiple bytes | :heavy_check_mark: PASSING |
| cdc rx multiple bytes concurrent | :heavy_check_mark: PASSING |
| cdc loopback | :heavy_check_mark: PASSING |
| serial usb reconnect | :heavy_check_mark: PASSING |
| serial terminal reopen | :heavy_check_mark: PASSING |
| serial getc | :heavy_check_mark: PASSING |
| serial printf/scanf | :heavy_check_mark: PASSING |
| serial line coding change | :heavy_check_mark: PASSING |

Table last updated 3/15/2019

Original Description

Hi,
I'm interested in moving up to the nRF52840 for a future project. The current nRF SDK supported by mbed is 14.2. The update for this was completed fairly recently. It looks like mbed is planning renewed support for the USB (see feature branch here: https://github.com/ARMmbed/mbed-os/tree/feature-hal-spec-usb-device), which would be great to have on the nRF52840.

The nRF SDK14.2 documentation lists the USB driver (and support for the nRF52840 in general) as "experimental" and not "production quality". Is there any plan to start updating to nRF SDK15 for production quality support for the nRF52840 and USB library? Might be something I'd like to help with.

@marcuschangarm have you looked into this? I vaguely remember that quite a bit changed in the structure of some drivers from v14.2 to v15.

Issue request type

[ ] Question
[X] Enhancement
[ ] Bug

CLOSED nordic mirrored

Source

AGlass0fMilk

👍2

Most helpful comment

Hi @Matheus-Garbelini, I couldn't get that code to compile but I think it's not working because of this:

void global_timer_IRQ(void)
{
  timer.attachInterrupt(&global_timer_IRQ, 1000); // microseconds
  ++milliseconds;
}

I'm not sure where timer is defined. I also can't find a method attachInterrupt in Mbed sources (at least with github search). I think this is crashing because you are causing a recursion loop of never ending interrupts essentially.

Pull my example repo again, I have updated it to do essentially what your code looked like it was intended to do. There should be no issue using USB with any of the other on-board peripherals.

Try some simpler examples to get started with Mbed's lower level drivers. I recommend using the RTOS and EventQueue for anything more complicated than a single-threaded, Arduino-style application.

AGlass0fMilk on 13 Jun 2019

😄1 👍1

All 107 comments

Any word @marcuschangarm? I've started to work a bit with SDK V15 and it seems easy to switch over, however it would be quite mundane...

I'd be willing to help :)
[Mirrored to Jira]

AGlass0fMilk on 18 Aug 2018

Also interested in SDK 15
[Mirrored to Jira]

DL6AKU on 26 Sep 2018

Internal Jira reference: https://jira.arm.com/browse/IOTDEV-1591

adbridge on 4 Oct 2018

I have started my own implementation of USBPhyHw for the nRF52840 target. You can check out my code here:
https://github.com/AGlass0fMilk/mbed-os/tree/nrf52840-usb-hal

AGlass0fMilk on 12 Nov 2018

The current implementation is not functioning yet. I am having issues with hard faults when connecting USB. I am also not sure if my implementation completely follows the HAL spec.

If anyone could help it would be appreciated. I would love to see USB support for the nRF52840 out of the box for Mbed.

AGlass0fMilk on 12 Nov 2018

🎉1

I added code from SDK 15.2 to fix the hang when enabling USB. The issue was related to errata 171 and 187 in the nRF52840 production IC.

I suspect the hard fault on reset when USB is already plugged in is caused by some USB interrupt happening before the rest of the USB stack has been initialized. There are a few other issues that are keeping it from working.

AGlass0fMilk on 12 Nov 2018

I suspect the hard fault on reset when USB is already plugged in is caused by some USB interrupt happening before the rest of the USB stack has been initialized. There are a few other issues that are keeping it from working.

Do you keep track of issues you are having ?

cc @c1728p9 (might be interested in this)

0xc0170 on 12 Nov 2018

I suspect the hard fault on reset when USB is already plugged in is caused by some USB interrupt happening before the rest of the USB stack has been initialized. There are a few other issues that are keeping it from working.

Do you keep track of issues you are having ?

cc @c1728p9 (might be interested in this)

I have been in my head so far :) I just started this over the weekend so I haven't had much time to formally write up issues. I'll document them here shortly.

AGlass0fMilk on 12 Nov 2018

@TacoGrandeTX

marcuschangarm on 12 Nov 2018

I have been basing the USBPhyHw implementation for the nRF52840 off of this example from the Nordic SDK. It shows how to use the raw USBD HAL/Driver without Nordic's "class"-based USB profile system.

I'm in the middle of refreshing myself on USB in a NutShell and have pretty much only dealt with chip-level issues at this point so there's lots of room for improvement...

Here are the issues with the implementation that I have noticed:

First Issue

Status: Resolved

Description: When enabling the USBD peripheral by calling nrf_drv_usbd_enable the MCU would hang on the following loop:
https://github.com/ARMmbed/mbed-os/blob/0404701b5f5b9d2aff76432991c74a36bd80742c/targets/TARGET_NORDIC/TARGET_NRF5x/TARGET_SDK_14_2/drivers_nrf/usbd/nrf_drv_usbd.c#L1706-L1709

The READY event that was supposed to occur to break this loop would never happen. This was due to nRF52840 erratas 171 (USB might not power up) and 187 (USB cannot be enabled).

Solution: The workarounds for these issues are included in nRF SDK 15.2 (the most recent release) so I added them in to the nrf_drv_usbd.c driver included with mbed.

Second Issue

Status: Resolved

Description: When USB is already plugged into the target and a reset occurs (I saw this while resetting via debugger), Mbed will die. This occurs very early in the startup process (before the stdout UART has time to be configured) so I think it is related to a USB or Power interrupt. I'm guessing the interrupt occurs before the USB stack has been fully initialized (or perhaps before the static instance has been populated) and probably calls a function on some NULL pointer or something. I still have to look into this one a bit more.

Possible Solutions: Currently, the USBPhwHw_Nordic implementation uses the same logic as this example from the Nordic SDK. This means the usbd peripheral is enabled and started or stopped and disabled when USB-related power events happen. There are no flags to make sure the USB stack is ready to accept events from the USBPhyHw at this point, which I think could help solve this problem.

Actual Solution: My assumption above was correct. I simply added a flag that conditionalized whether the stack had called connect yet and was therefore ready to accept events. This prevents the USB event handlers from calling functions on NULL pointers or uninitialized USB objects.

Third Issue

Status: Resolved-ish

Description: USB events aren't synchronized and cause failures in the Nordic driver during the USB setup stage.

I kind of just slapped nrf_drv_usbd calls inside their counterparts in the USBPhyHw HAL API and hoped it would somewhat work. After spending a few hours debugging the first issue up there, I finally got to the point where USB-related events were being sent to the upper layers of the stack. Where I'm at now is that the usbd peripheral is successfully enabled, receives setup events from the host and sends them to the upper level of the stack. I believe when I was debugging, the place where the failure occurred was here:
https://github.com/ARMmbed/mbed-os/blob/0404701b5f5b9d2aff76432991c74a36bd80742c/targets/TARGET_NORDIC/TARGET_NRF5x/TARGET_SDK_14_2/drivers_nrf/usbd/nrf_drv_usbd.c#L1954-L1962

So some synchronization needs to be introduced (should've listened to the HAL...) to prevent this check from failing.

Actual Solution: The nRF52840 USB hardware actually needs to know a lot about the control transfer stages the upper layers of the stack are currently in. This allows the hardware to automatically NAK packets that aren't in the correct sequence. It also complicates the HAL implementation a bit, but oh well. The fix was to keep track of the control transfer stages and trigger "hardware tasks" (you'll know what I mean if you're familiar with nRF chips) at the appropriate time, letting the hardware know when the OUT data stage is allowed, or the status stage is allowed, etc. Still a work in progress at this point.

AGlass0fMilk on 13 Nov 2018

@AGlass0fMilk, @DL6AKU - We'll soon be releasing SDK15.0 peripheral driver support as a feature branch. This first release won't have USB support but that is on our backlog.

TacoGrandeTX on 15 Nov 2018

@TacoGrandeTX I will see what I can do in terms of getting a reliable USB HAL implementation working then. I don't think much changed in terms of functionality between SDK14.2 and SDK15 so my code should at least give you guys a place to start.

Hopefully I can take USB off your backlog :smiley:

AGlass0fMilk on 15 Nov 2018

🎉1

@AGlass0fMilk Thank you so much - that will be great! We also look forward to your comments on the feature release.

TacoGrandeTX on 15 Nov 2018

I have been working on the nRF52840 USBDevice HAL in my spare time for a bit now.

I ended up finding a good deal on a Total Phase Beagle 12 USB hardware analyzer so I picked that up and it's been a great help in debugging. It's very difficult to debug USB without knowing what's really happening on the bus... so that took a few days to come in.

The stage I'm at is this:

The USBPhyHw implementation passes setup packets properly to the upper layers of the stack. Enumeration goes fine, at least for getting the device descriptor and then it hangs at the next control transfer.

I have an STM32F407 Discovery board on hand so I compiled my code for that target (which already has USB Device support working). This let me see what the full transaction was supposed to look like.

My laptop's USB host asks for the first 9 bytes of the configuration descriptor. The USB device responds, letting the host know the full configuration descriptor size is 75 bytes. Then the host requests the full 75-byte configuration descriptor. The device sends the first 64 bytes (the max packet size) and then the host asks for the rest of the descriptor with another IN packet.

The trouble is this:
Mbed's USB implementation already splits transactions up into smaller transfers that do not exceed the max packet size (64 bytes). I found that Nordic's underlying call to nrf_drv_usbd_ep_transfer also seems to do this and expects the full packet length (75 bytes) provided as an input. When the host asks again for the last 11 bytes of the descriptor, the Nordic HAL driver is confused since it thinks it has already sent all data for this setup transaction. EP0 then stalls...

It seems I must implement my own nrf_drv_usbd_feeder_t to implement transfers that are facilitated by mbed. Not sure if I will have to do the same for OUT transfers at this point.

AGlass0fMilk on 29 Nov 2018

Here's a screenshot of my USB analyzer's output showing enumeration with the STM32 (left, functioning correctly) and the nRF52840 (right, issue with configuration descriptor transfer):
stm32-vs-nrf

AGlass0fMilk on 29 Nov 2018

After looking through both the Mbed stack and the nRF USBD driver, it seems they aren't easily compatible.

The Nordic driver uses the concept of "feeders" where every time a DMA transfer is finished, a feeder function is called to set up the next DMA transaction. During this call, the feeder returns true or false, representing if this is the last transfer or not. This allows the DMA to be released.

Mbed's USB stack does not expose the total size of the transaction to the USBPhyHW layer, nor let it know if this is the last transfer.

I'm playing with a few ideas on how to hook these two systems up. I'm not really liking any of them.

Let me know what you think:

Idea 1.) This idea involves changing the USBPhyHw interface to add a bool flag argument to ep0_write and endpoint_write to tell the hardware layer this is the last transfer in a series of transfers, and that it may release the DMA afterwards. This would obviously make my code a lot simpler but I don't like the idea of changing an upper layer due to the needs of a lower layer. Though it kind of does make sense to let the hardware know when it may release DMA resources... let me know if this would be possible to add

Idea 2.) Collect the full packet in ep0_write and endpoint_write. This would involve somewhat "tricking" the upper layer into giving the USBPhyHw the full-length packet by simulating IN tokens from the host. Kind of hacky, I don't really like this idea. Not even sure if it's possible. Also would have to allocate more RAM, which isn't very great.

Idea 3.) Modify the Nordic driver to allow the DMA to be released externally (essentially set a flag telling the DMA code that the endpoint no longer needs it). I think the USBPhyHw would still need to know when the last packet was being sent to do this, so it may not be possible without idea 1. The caveat: If the device wants to transfer exactly what the host requested and it's a multiple of the endpoint size, there's no way to know what the last packet is based on chunk size. That's why zero-length packets exist.

Any ideas? I think I like idea 1 out of all three. It seems to be the most straightforward for me and doesn't really impact the other supported implementations. Other targets that don't need this information can ignore the flag. Upper layers of the stack know when they are sending the last packet as well, so it doesn't add much complexity there either.

Let me know your thoughts @TacoGrandeTX

Edit:

Another solution is to just roll my own simplified USB driver based on the Nordic HAL that is compatible with the Mbed stack. Mbed seems to expect the USBPhyHw implementation to be a set of "dumb pipes", and Nordic's driver is a bit more than that. Based on the complexity of the Nordic driver (and the numerous, unintelligible errata workarounds) I'm not really liking this idea either.

AGlass0fMilk on 29 Nov 2018

After some closer reading, it seems the main issue is the hardware must be notified when a status Stage is allowed during a control transfer. So it should be possible to let the Nordic driver know when all the control transfer IN data has been sent. I don’t think it will be a problem for other endpoints 🤞

AGlass0fMilk on 30 Nov 2018

The Nordic hardware needs more synchronization to the control transfer stages than I first thought. So my PhyHw implementation had to keep track of control transfer sizes and trigger the appropriate synchronization events in the hardware (telling the HW when control transfer Data OUT stages were allowed, when the status stage was allowed, etc).

After using my nifty Beagle 12 USB analyzer, I worked out the bugs enough to get a basic USB serial example working!
screenshot from 2018-12-05 21-00-38

I just got this working, so by no means is it tested or production ready. But if anyone would like to start using USB on the nRF52840 this is a starting point. Please report any bugs you encounter back to this issue.

I'm sure there are better ways to connect the Mbed HAL layer with the Nordic driver, so please give me feedback on my implementation (see it here).

Gonna go test some of the other built-in device types :grin:

AGlass0fMilk on 6 Dec 2018

🎉2

@TacoGrandeTX should I submit a pull request to the feature branch at this time?

AGlass0fMilk on 6 Dec 2018

Impressive. @c1728p9, how does this analysis compare with the changes/updates/fixes you made recently?

cmonr on 6 Dec 2018

Hi @AGlass0fMilk awesome work so far! I would definitely recommend submitting a PR. The implementation will need to pass the mbed-os USB tests before it can be accepted though. You can run the tests locally with the command:
mbed test -t GCC_ARM -m NRF52840_DK -n mbed-os-tests-usb_device-*

c1728p9 on 6 Dec 2018

@c1728p9 What kind of setup is required to run the USB device tests on Ubuntu? Are there any drivers needed? None of the basic USB device tests passed.

AGlass0fMilk on 7 Dec 2018

Hi @AGlass0fMilk if you are getting a permission error you may need to add the account running the test to the plugdev and dialout groups, or alternatively run the command with sudo. Aside from that there shouldn't be any special setup required on Ubuntu. Just connect both USB ports and run the mbed test. As a stable reference you could try running the tests on any of the other supported targets on the feature branch.

c1728p9 on 7 Dec 2018

My user is in both dialout and plugdev groups. No permissions issues I can see.

Unfortunately the STM32-based target I have on hand right now is the DISCO-F407VG, which I flashed with JLINK software (converts STLINK to JLINK, super great). It isn't an "officially" supported mbed board either, but the target is there. This means it does not have a target id on-board and mbedls doesn't find it... it is surprisingly hard to figure out how to test a new board that does not have an mbed ID.

Anyway, are there any extra flags required to run a driver script on the host side? From just poking around I've found scripts that seem to be custom python USB drivers for running these tests.

To me it seems like something times out so the tests fail. I also don't see any traffic on the bus beyond basic enumeration and configuration...

screenshot from 2018-12-08 23-28-09

AGlass0fMilk on 9 Dec 2018

I tried the nRF52840 tests again with the -vv flag. It appears the host test is running but it fails to find the device by serial number. See output in this gist:

https://gist.github.com/AGlass0fMilk/c382d00ba81f3a539ec41683d6bfb7a2

AGlass0fMilk on 9 Dec 2018

adding a board to mbed-ls isn't that hard. Initially, I just edited my local copy of platform_database.py. mbed-ls will show the detected id, even if it won't use it properly for the mbed command.

jrobeson on 9 Dec 2018

I have an STM32F401 Nucleo board that is supported by mbed, I don't have it on hand at the moment though. I'll hopefully be able to test with that later.

I think I sorted out the permissions issues by adding a udev rule (in /etc/udev/rules.d/99-mbed-test-usb.rule) as below:
SUBSYSTEM=="usb", GROUP="dialout", MODE="0660"
It could be more specific but this worked for me, please note that it affects permissions for ALL USB devices plugged into your machine. For anyone referencing this in the future, you also need to reload the udev rules: sudo udevadm control --reload and make sure your user is in the dialout group (check the output of groups). If you need to add yourself to that group, sudo adduser <your_username> dialout and then logout/login for it to take effect.

I also changed the way find_device finds the test device through pyusb. I changed it to look for devices with the vendor id 0x0D28... not sure if this was a different issue or also related to permissions. Pyusb had trouble reading string descriptors because the Mbed test device had no "langid".

Anyway, my implementation appears to fail the first test when configured->deconfigured->reconfigured. So I will start tracking down bugs now.

AGlass0fMilk on 10 Dec 2018

So the device is failing to reconfigure. I have narrowed it down to a call into the Nordic driver that returns NRF_ERROR_BUSY. You can see the device reports the failed setting of the configuration in the screenshot below:
screenshot from 2018-12-10 21-19-59

It seems to caused by USBTester::setup_iterface
screenshot from 2018-12-10 21-17-07

The consecutive read after having just added the endpoint causes the Nordic driver to return that it's busy. If I remove the immediate read the configuration gets set successfully.

Why is the immediate read necessary?

AGlass0fMilk on 11 Dec 2018

Hi @AGlass0fMilk some devices nak out packets until a read is started on the endpoint. On these devices read_start must be called to start receiving data. Often an interface is ready to receive data as soon as its endpoints are added so you'll often see a read_start after adding an out endpoint. For example the serial port USB class, USBCDC, does that here.

c1728p9 on 19 Dec 2018

Slowly working through test failures when I get the time... the USB test suite seems pretty thorough!

Currently passing the first two tests. Working on getting it to pass the third test (usb control sizes test).

AGlass0fMilk on 23 Dec 2018

@c1728p9 I'm hitting an assert in the nordic driver during the endpoint test data correctness. See below assert in nordic driver (line 1921)

https://github.com/AGlass0fMilk/mbed-os/blob/ceb2bc064ee74e1345fc090b426293705419f423/targets/TARGET_NORDIC/TARGET_NRF5x/TARGET_SDK_14_2/drivers_nrf/usbd/nrf_drv_usbd.c#L1918-L1921

After some debugging, it seems the endpoint tester class is set up to create ISO endpoints with a max packet size of 0 bytes. This doesn't seem like it follows the USB specification. If this is not a mistake, what is the intended behavior of the USB hardware implementation in this case? (see snippet below)

https://github.com/ARMmbed/mbed-os/blob/0e5dd39264e391be045e87ad505a86fd84bcd91b/TESTS/usb_device/basic/USBEndpointTester.cpp#L61-L85

AGlass0fMilk on 27 Dec 2018

@c1728p9 @fkjagodzinski Would y'all happen to know an answer to the above question?

cmonr on 27 Dec 2018

A few questions:

1.) Is there a place where general Mbed USB stack development conversation should go? Like a forum post for feature development or something?

2.) Immediate Read During Configuration

@c1728p9 concerning the immediate read a few comments ago: I was looking into how to solve this compatibility issue, thinking I may be able to use the configure method of USBPhyHw. I read through the code flow a bit more and the read_start call happens during the user/application set_configuration callback, before the USB stack has had a chance to call USBPhyHw::configure().

So when the USBPhyHw receives an endpoint read request on a just-added endpoint, it has not yet had configure called on it telling it the new configuration is valid and that it should be ready to respond to endpoint events appropriately.

This probably would complicate a lot of the existing device classes if they aren't allowed to queue read/write requests during configuration. However, I don't think it follows logically to execute requests on endpoints before the hardware driver has been fully configured.

Thoughts on this?

I'm about ready to scrap the nordic transfer logic and write my own that is more compatible with the mbed structure.

3.) Flow Diagram Request

An execution flow diagram would be _extremely_ handy during the porting process. There are so many nested function calls it becomes hard to follow exactly when and where USBPhyHw, USBDevice, and user implementation code is called.

I'm going to start working on one on draw.io here: https://drive.google.com/file/d/1-4Hu2yBPtXBtGy_E2vj5e6xscTX4AmO_/view?usp=sharing

Send me an edit request if you want to add to it.

AGlass0fMilk on 28 Dec 2018

After some debugging, it seems the endpoint tester class is set up to create ISO endpoints with a max packet size of 0 bytes. This doesn't seem like it follows the USB specification. If this is not a mistake, what is the intended behavior of the USB hardware implementation in this case? (see snippet below)

Hi @AGlass0fMilk, this is not a mistake. In fact, the USB spec prohibits non-zero wMaxPacketSize values for iso endpoints in the default interface altsetting. Regarding all other altsettings, 0 is a placeholder value until the tests are updated.

A quote from USB 2.0 spec, paragraph 5.6.3:

All device default interface settings must not include any isochronous endpoints with non-zero data payload sizes (specified via wMaxPacketSize in the endpoint descriptor). Alternate interface settings may specify non-zero data payload sizes for isochronous endpoints.

fkjagodzinski on 28 Dec 2018

Hi @AGlass0fMilk, this is not a mistake. In fact, the USB spec prohibits non-zero wMaxPacketSize values for iso endpoints in the default interface altsetting. Regarding all other altsettings, 0 is a placeholder value until the tests are updated.

A quote from USB 2.0 spec, paragraph 5.6.3:

All device default interface settings must not include any isochronous endpoints with non-zero data payload sizes (specified via wMaxPacketSize in the endpoint descriptor). Alternate interface settings may specify non-zero data payload sizes for isochronous endpoints.

My mistake, I'll have to read through the USB spec a bit more thoroughly.

AGlass0fMilk on 28 Dec 2018

Thanks for the work on the NRF52 USB stack @AGlass0fMilk. As for your questions:

1.) Is there a place where general Mbed USB stack development conversation should go? Like a forum post for feature development or something?

Github is probably the best place for this. Mention me with @c1728p9 to ensure I see the messages.

2.) Immediate Read During Configuration

Does the NRF52 require an explicit call to enter the configured state before reads and writes are allowed? If so, one possible solution is to save off the arguments and set a read pending flag if read_start is called before the device is in the configured state. Then once USBPhyHw::configure() is called go through all the added endpoints with the pending flag and start the reads for real.

Is the NRF52 returning NRF_ERROR_BUSY on read_start due to being outside the configured state or beacuse it is in the middle of a control transfer? If it is because it is in the middle of a control transfer then the above solution I mentioned won't work. In additon to adding endpoints and calling read_start due to a Set Configuration request, endpoints are also added and started in the Set Interface request. This takes place while the device is in the configured state and a USBPhy callback isn't called after the request completes.

3.) Flow Diagram Request

The current USB techonology page documentation can be found in https://github.com/ARMmbed/mbed-os-5-docs/blob/development/docs/reference/technology/USB.md. The Control request state machine section has a flow chart showing showing how control requests are handled. Is this the kind of execution flow diagram you are looking for? If there any any information that can be added to this page to make USB easier to understand I would be happy to see it added.

c1728p9 on 2 Jan 2019

@c1728p9
1.) All right, GitHub it is.

2.)
Thanks for the idea, I'll look into that. The main problem is that, yes, the Nordic driver is returning NRF_ERROR_BUSY during read_start.

In the current hardware implementation, the USBD peripheral is normally in the disabled state until connect is called. Even then, the USBD peripheral won't be enabled until NRF_DRV_POWER_USB_EVT_DETECTED occurs (ie: VBUS is connected). This is in an attempt to minimize power consumption caused by powered and idle USB hardware.

So when a read is issued immediately after calling connect, the USB hardware is not yet ready to function. The USB peripheral has an internal regulator that has a relatively lengthy stabilization time based on the datasheet (see section 6.35.4 in nRF52840 datasheet) . There are a few interrupts that need to occur for the driver to report that it is ready to handle any USB calls.

An obvious workaround is to enable and ramp up the USB hardware as soon as init is called. But then users would have to know to delete any instances of a USB device to enter the lower-power state (right? is deinit called outside destruction?). Also, without a USB device instance it wouldn't be possible for the user, through the API, to tell when the VBUS is present.

I like your idea because it would let me retain the low-power state. I'll see what I can do.

3.)
That page has a TON of useful information. Thanks for pointing me to it... I didn't even know it existed. I'll look it over in more detail and let you know if I still have any questions or suggestions.

AGlass0fMilk on 3 Jan 2019

Oops, accidentally submitted that last comment too early and closed the issue... I blame arcane GitHub keyboard shortcuts...

AGlass0fMilk on 3 Jan 2019

😄1

@AGlass0fMilk Been there, but with a PR instead...

cmonr on 3 Jan 2019

I have started to meticulously go through the USB stack execution flow. I think there are some corner cases or race conditions causing my implementation to fail some of the test cases in the usb basic test.

I will keep an updated table of tests that are passing/failing in the original issue description as I continue development. As of right now (1/10/2019) the testing status of my implementation is as shown below:

See original issue comment for up to date test table

I think the intermittent failure of the usb control stall test is caused by a race condition of some sort.

My guess is that when a setup packet comes in, my USBPhy implementation sets up the hardware to enter either the data stage or the status stage based on the setup packet content. Depending on the latency of the USB host/bus, there may be enough time for the Mbed USB stack to stall the control endpoint (as expected) before the host tries the next transaction. If not, the hardware has already transitioned to the data/status stage and will automatically ACK transactions from the host... and then the host fails the test.

I have to spend a bit more time researching the other test cases.

As I said, I'm going through the USB stack execution flow and charting it out. As I go along, I'm trying to identify places where there should be a critical section/synchronization to prevent race conditions. @c1728p9 @cmonr Any advice on this? I'm not 100% clear on how to properly identify where critical sections should go. I've read it's wherever "shared state" variables are accessed, but that seems pretty vague. Please point me to any good resources on synchronization/critical sections in embedded/interrupt-driven systems if you can.

The flowcharts will also help me ensure I am interacting with the Nordic hardware as it expects. You can check out what I've mapped out so far here.

USB is a lot to fit all in your head at once :confounded:

AGlass0fMilk on 11 Jan 2019

🎉1

My guess is that when a setup packet comes in, my USBPhy implementation sets up the hardware to enter either the data stage or the status stage based on the setup packet content.

After the setup packet has been received the device should be NAKing, giving the software as much time as it needs to respond with either an ACK or a STALL. One way you can make this error more reproducible is to intentionally introduce a delay before sending the stall. USB hardware should be able to handle a +10ms delay between receiving a setup packet and responding with a stall. If with this delay the failures always occur, then there may be code that is setting ACK too early that needs to be removed.

I'm trying to identify places where there should be a critical section/synchronization to prevent race conditions.

USBDevice serializes all access to the USBPhy with a critical section so you shouldn't need to provide any protection at the USBPhy layer.

Please point me to any good resources on synchronization/critical sections in embedded/interrupt-driven systems if you can.

There is a general architecture page on synchronization at https://os.mbed.com/docs/mbed-os/v5.11/reference/thread-safety.html.

The flowcharts will also help me ensure I am interacting with the Nordic hardware as it expects.

I'm having a hard time seeing things when I zoom in on this image. Could you upload it as a .svg image or the raw draw.io xml?

c1728p9 on 14 Jan 2019

@c1728p9 Thanks for the tips, I'll look into that soon.

I shared the raw draw.io diagram but I guess it makes it a really tiny picture if you share it as read only.

Here's a read/write link. Hopefully that makes it more legible.

You should be able to click "open in draw.io". If not, you should be able to save it and import it yourself.

AGlass0fMilk on 14 Jan 2019

@c1728p9 The reason I included critical sections in my USB implementation in the first place is because I started with the USBPhy_Template implementation. During the example _usb_isr and process implementations there are calls to disabled/enable USB interrupts in the NVIC:

https://github.com/ARMmbed/mbed-os/blob/8b6fffb8a869c5451adfe24a168386a3b9426806/usb/device/targets/TARGET_Template/USBPhy_template.cpp#L300-L309

Does your comment above mean this isn't necessary anymore in the USBPhyHw implementation?

Edit:
Can confirm if I take out the disable/enable of USB interrupts it breaks my code.

AGlass0fMilk on 15 Jan 2019

Made some changes and I think I've fixed the premature status ACK bug that was causing the usb control stall test to intermittently fail.

I moved the status stage triggering from the setup packet handler to ep0_write, where I check if the control transfer size is 0 (meaning no data stage, so go straight to status stage).

Check out the commit here:

https://github.com/AGlass0fMilk/mbed-os/commit/41874b1f75027fc1f61c5e7760980a63412be1a1

AGlass0fMilk on 15 Jan 2019

@fkjagodzinski concerning the ISO endpoint 0 size issue you pointed out, I have posted a parallel ticket on Nordic's Devzone website: https://devzone.nordicsemi.com/f/nordic-q-a/42572/usb-iso-endpoint-max-packet-size-of-0

AGlass0fMilk on 15 Jan 2019

👍1

Does your comment above mean this isn't necessary anymore in the USBPhyHw implementation?

The NVIC calls in _usb_isr and process are disable further USB interrupts while processing the current one. This is still good to have so I wouldn't recommend removing it.

When I say that USBDevice serializes all access to the USBPhy I mean that all calls it makes are protected by a lock. For example, this call to phy->process() is protected by the lock() and unlock(). In USBDevice every call to USBPhy either calls lock() and unlock() explicitly or asserts that a lock is already held with a call to assert_locked(). In the case of _usb_isr this is a function getting called by an interrupt rather than by USBDevice, so you'll want to disable the USB interrupt and call start_process so USBDevice can continue processing it in a serialized manner.

Can confirm if I take out the disable/enable of USB interrupts it breaks my code.

I recommend you keep the disable/enable of USB interrupts for the reason mentioned above. That said, I'm a bit surprised that not having this causes problems. The function start_process() first calls lock() ,which uses a critical section to prevent all interrupts, and then calls process(). Since process() is called from a critical section, USB interrupt shouldn't be able to interfere.

c1728p9 on 16 Jan 2019

You should be able to click "open in draw.io". If not, you should be able to save it and import it yourself.

The open in draw.io link didn't work for me, but I was able to save it and import it manually. I can see everything in detail now.

Looking up close at this diagram I see several different interrupts can occur such as USBPWR Detected and USBPWRRDY Event. This may explain why removing the NVIC calls in _usb_isr and process breaks the code. Are you calling into USBPhyEvents directly from these interrupts?

c1728p9 on 16 Jan 2019

@c1728p9 All interrupts, whether from the USBD or POWER peripheral, are handled in USBPhyHw::process

That being said, there are some member variable buffers that get modified during those interrupts to pass the type of interrupt on. So maybe that is causing the crash when I remove the NVIC calls.

I could implement a queue or something that serializes events but it seems to be working fine when disabling/enabling USBD and USB-related POWER interrupts.

On another note (@fkjagodzinski), Nordic responded to my related ticket on their DevZone concerning the 0-byte size ISO endpoint assert in their USBD driver:

Hi,
Clear bug from our side in the case its an ISO EP. I will report this internally.
Thanks for letting us know, and I apologize for the inconvenience!
Kind regards,
Håkon

So it should be fixed in a future release of the nRF SDK. I'll see if it would be simple to patch for our purposes so I can start running the endpoint tests.

AGlass0fMilk on 19 Jan 2019

👍2

I removed the size == 0 assert and now I'm hitting a different error in the Nordic stack. I don't think SDK14.2 supported the ISO endpoints very well. :man_shrugging:

The culprit appears to occur in the setup process for the USBEndpointTester used for all the endpoint test ... test cases. At some point, the host requests the device to setup a new interface. During removal of the old endpoints, a call is made to nrf_drv_usbd_ep_disable with endpoint 8 (EPOUT8), which for the nRF52840 is actually the ISO OUT endpoint.

Inside the Nordic driver, eventually a call is made to nrf_usbd_epout_clear... this is where the ASSERT happens. The Nordic driver checks to see if the endpoint index is within the bounds of the EPOUT register array... which it isn't because the hardware handles the ISO OUT and ISO IN endpoints slightly differently:

https://github.com/ARMmbed/mbed-os/blob/8f48104842a8932382fd18f952e994facb1513e2/targets/TARGET_NORDIC/TARGET_NRF5x/TARGET_SDK_14_2/drivers_nrf/hal/nrf_usbd.h#L1202-L1208

I'm not sure how Nordic intended the user to remove ISO endpoints, I haven't seen any ISO-specific remove API call. Next step is to check if SDK15.2 has a patch that addresses this issue and try to merge it in. If not, then I guess I have another bug to report to Nordic :bug:

EDIT:
Does not seem to be fixed in SDK15.2... It shouldn't ASSERT when EP 8 is passed into nrf_usbd_epout_clear, it should instead access NRF_USBD->SIZE.ISOOUT rather than just access the endpoints in NRF_USBD->SIZE.EPOUT (0-7).

I'll make a parallel bug report on Nordic's DevZone again.

AGlass0fMilk on 19 Jan 2019

So I've temporarily patched the ISO endpoint issues by removing the 0-size assert and making sure not to call usbd_ep_abort on ISO endpoints (normally called as part of nrf_drv_usbd_ep_disable but certainly causes an ASSERT if used w/ an ISO endpoint).

Now I'm experiencing an assert in the Mbed stack during the endpoint test data correctness. During transfers, usually OUT transfers right after an interface change, the ASSERT here on line 939 is hit:

https://github.com/ARMmbed/mbed-os/blob/15f93890d3f4418c31c12fa993e418744950212d/usb/device/USBDevice/USBDevice.cpp#L928-L944

Not sure what's causing it. I'm going to start reviewing the code flow of normal endpoints now (most of what I was looking into was control transfer-related up until now).

EDIT:
I used a conditional breakpoint to stop execution when the ASSERT occurs. The call stack indicates USBDevice::out is being called on endpoint 8 (the ISOOUT endpoint on the nRF52840). I'm assuming since no transfer is pending for this endpoint the driver asserts.

I'll start looking through the code but @c1728p9 do you know when info->pending is incremented for ISO endpoints? My initial guess at what's happening is that a SOF event occurs during processing of another USBD interrupt. I assume transfers for ISO endpoints are set up in the SOF handler. Since USBD interrupts are disabled, the SOF event is missed. Then host initiates an OUT transfer on the ISO endpoint and this causes the assert.

AGlass0fMilk on 19 Jan 2019

Really confused as to how this is happening... the nRF52840 hardware is triggering a call to nrf_usbd_epoutiso_dma_handler indicating that a DMA transfer was completed for the ISOOUT endpoint. When I check the SIZE.ISOOUT register (0x400274C0), the ZERO bit (bit 16) is set, indicating Zero-length data received, ignore value in SIZE.

screenshot from 2019-01-19 14-12-39

What's weird is that I don't see any OUT transfer initiated by the host for the ISO endpoint (ep 8). So a DMA interrupt shouldn't occur, even for a ZLP.

However, there _is_ a zero-length OUT transfer that occurs on endpoint 1 after a SOF event on the bus.

screenshot from 2019-01-19 14-13-13

Perhaps this is an hardware errata where an endpoint transfer after a SOF event may cause an interrupt on the ISO endpoint? I'll see if Nordic can check this out.

EDIT:
Something weird is happening. I'm going to go through my transfer code a bit more...

AGlass0fMilk on 19 Jan 2019

Just merged in recent updates to the official feature branch.

The first test (control_basic_test) is now failing again. It looks like an issue I had previously so I have an idea of how to solve it. The good news is that now usb soft reconnection test has started passing. @fkjagodzinski was anything changed in the test code related to that?

AGlass0fMilk on 22 Jan 2019

It looks like what's happening is during interface reconfiguration, in the function call USBEndpointTester::_setup_non_zero_endpoints, a call is made to read_start on the ISO endpoint. When _phy->endpoint_read is called, the Nordic driver returns that it was busy and the read start failed. This prevents the info->pending counter from being incremented, but does not cause a direct test failure.

So when the next transaction comes on the bus, the driver gets stuck... I'm still working on investigating this further.

AGlass0fMilk on 22 Jan 2019

❤1

Thanks for the updates @AGlass0fMilk!

c1728p9 on 23 Jan 2019

Yeah, I'm kinda just using this issue thread to jot down my ideas/thoughts during debugging. So some stuff may not be fully thought through.

I'm going to see why the Nordic driver returns that it is busy. My first idea for how to mitigate this is to just always keep _all_ endpoints enabled and mask interrupts that the upper layers of the stack haven't enabled. This way there should be no delay in when the driver is ready to start transfers on any endpoint after being added. Since the device is connected over USB then the potential slight increase in power consumption should be negligible anyway...

AGlass0fMilk on 23 Jan 2019

@AGlass0fMilk Please let know if there is any individual efforts within this that you could use assistance on.
We have a resource here who will need this working at some point, and they may have time to spare.

loverdeg on 23 Jan 2019

@loverdeg-ep Any help would be great! I'm doing this as a side project and I don't get as much time to work on it as I'd like.

For reference, I am using a Rigado BMD340 development kit as a target and a TotalPhase Beagle USB12 protocol analyzer to trace issues on the bus.

I am hoping my implementation is getting very close... control transfers seem to be pretty solid. If you guys want to reach me via email we can collaborate in a bit more detail.

AGlass0fMilk on 23 Jan 2019

The good news is that now usb soft reconnection test has started passing. @fkjagodzinski was anything changed in the test code related to that?

Hi @AGlass0fMilk, the test case you mentioned hasn't been updated recently.

fkjagodzinski on 28 Jan 2019

After wrestling with strange issues that are hard to debug, I'm starting to think the Nordic driver is not directly compatible with the Mbed USB HAL specification. The Nordic driver operates on entire transfer objects and expects to get the entire payload all at once. As a result, Nordic paces itself on the DMA events (which is nice for speed considerations).

Mbed handles breaking up the packets into multiples of max_packet_size and hands off these chunks to the USB HAL. Mbed is expecting to only get notified of a completed transfer (USBPhyEvents::in/out()) after a data transfer has been ACK'd by either side.

My theory is that the Nordic driver is notifying the upper layers of the stack that the transfer has been completed when in reality only the DMA transfer has completed. Most of the time this is okay because it is closely coupled with when a transfer _is_ completed. But if something happens that messes up this timing, bad things happen (eternally waiting IN/OUT endpoints, too many IN/OUT events, etc).

Mbed requires less intelligence in its USB HAL layer. So I think it would be easier to implement a custom USB HAL rather than continue trying to make the Nordic driver fit... At least I learned a bunch about USB and how the Nordic driver works.

I'm working out a program flow chart on draw.io again. Comment back if anyone wants to see/help with it.

AGlass0fMilk on 31 Jan 2019

Good work.
So summary of your efforts thus far is:

NRF SDK15's USB driver for NRF52 looks incompatible with Mbed USB HAL specification
Quickest path forward might be developing a basic USB driver in place of Nordic's (I have maintenance fears but I suppose it would be better than nothing)

Our schedule hasn't worked out such that we've been able to apply our resource to this yet.

loverdeg on 1 Feb 2019

Well, I guess it would be possible to rework the Nordic driver some.

I think I've narrowed down the problem to this: the Nordic driver is currently notifying the upper layer that a transfer is "finished" when just the DMA transfer has finished. In reality, the host has not yet received/ACK'd the packet.

In most cases this is okay, but sometimes (like with 0-length or short transfers) the DMA transfer is done almost immediately and can cause the upper layer to take premature actions and become desynchronized.

It should be possible to modify the Nordic driver to notify the upper layers when a transfer is actually finished (when EPDATA or EP0DATADONE events happen).

Right now I'm getting intermittent failures during the regular endpoint tests. Usually what happens is I hit an assert in USBDevice.cpp at line 947 or 965. This assert makes sure there was a pending transfer when the USB HAL sends an in or out event.

AGlass0fMilk on 2 Feb 2019

@c1728p9 I've also gotten intermittent [Errno 32] Pipe error's and when I googled it I found libusb/libusb#241 :laughing:

AGlass0fMilk on 2 Feb 2019

I worked on the Nordic-based implementation a bit more. I found a major bug that would cause the control basic test to fail. I never implemented USBPhyHw::unconfigure because I thought I didn't need it. Turns out the Mbed USB stack doesn't call endpoint_remove when going to the unconfigured/configuration 0 state. So the Nordic driver would never abort the last queued transfers on an endpoint and return NRF_DRV_BUSY when the next configuration change attempted to queue transfers.

I implemented USBPhyHw::unconfigure now and it properly removes all endpoints aside from control.

I have updated the original comment test table... 4 tests are failing still. Still looking into them.

AGlass0fMilk on 2 Feb 2019

🚀1

I realized I was compiling with an older version of USBEndpointTester.cpp and running a newer version of the host test python script. There was a change in the format of a vendor-specific request and it was causing the endpoint test data toggle reset test to stall on the control endpoint. That test is now consistently passing.

This was also happening during the endpoint test halt test. This test is still failing for another reason though...

AGlass0fMilk on 2 Feb 2019

My understanding of the Nordic driver has changed a bit after reviewing the datasheet more closely. The driver notifies the user code when a DMA transfer is finished for OUT transfers, which is correct because then the data is ready to be processed. For IN transfers, the user code is notified when the host ACK's the packet.

I have a few things on my todo list to hopefully iron out the rest of the bugs:
1.) Add asserts to the USBPhyHw implementation to catch the Nordic driver returning anything but NRF_DRV_SUCCESS
2.) Rework the control transfer code a bit. Right now I think it not only triggers the hardware-handled status stage but also triggers a zero-length transfer that may be causing issues.
3.) Run through the tests a few times and capture bus traffic during failures. I can post screenshots and the data files up here so maybe someone can suggest what may be causing a failure.

AGlass0fMilk on 4 Feb 2019

👀1 👍1

I added a "virtual status stage" that is triggered when the opposite control transfer is initiated by the Mbed stack (signifying the status stage). There were issues with how status stage ep0_in and ep0_out callbacks were being sent since the Nordic driver doesn't actually notify the user when the status stage ACK has completed.

I have run the automated tests a few times. The first two times were very promising -- all tests passed!

But after running it a few more times there are intermittent failures in the endpoint data transfer code that have to be debugged.

The control transfer code seems to be really fast and rock solid. Perhaps the endpoint data transfer intermittent failures are related to control transfers still.

I will look into this later on.

AGlass0fMilk on 22 Feb 2019

So I looked at the Nordic documentation website today and noticed that as of 1/31/2019 they have added a few errata pertaining to essentially all IC revisions (including IC rev 1 and rev 2)

One in particular caught my eye:
[199] USBD: USBD cannot receive tasks during DMA

GAHHH! I hope this can explain some of the weird behavior I've been seeing, including random hangs (where an OUT event doesn't happen, so the driver fails to queue a subsequent IN transfer) and random data correctness failures where the Nordic writes back incorrect data (rare).

Luckily there is a workaround mentioned. I'll try adding it into the Nordic driver when triggering DMA transfers and see if the tests fail less frequently.

AGlass0fMilk on 25 Feb 2019

All tests _can_ pass. The control transfer tests always pass, I have not seen a failure there with the updated code yet.

The intermittent failures I am seeing have the following characteristics:

Intermittent Failure - EP Halt Test Incorrect Payload

My understanding of the endpoint test halt test is:

OUT xfer - write a random payload to device endpoint
IN xfer - read back payload
Make sure data written and data read are the same
...continue to loop...
OUT xfer - write a random payload to device endpoint
HALT IN endpoint
IN xfer - endpoint should reply STALL
CLEAR HALT on IN endpoint
OUT xfer - write a new random payload to device endpoint
IN xfer - read back payload
Make sure the new random payload and the data read are the same

My theory:
The failure occurs because the Mbed stack immediately loads the payload received during the first OUT stage into EasyDMA and begins the transfer for the corresponding IN endpoint. Then the endpoint is halted, it replies STALL as expected. But when the HALT is cleared, the hardware is still prepared to automatically ACK the next IN transfer that begins on that endpoint. It will then respond with the old payload -- causing the test to fail.

See screenshot of captured USB traffic below:
screenshot from 2019-02-25 15-52-48

Sometimes the test passes, probably when the new data transfer is fast enough to overwrite the old payload in the DMA buffer before the IN transfer comes in.

This may be a point of discrepancy between the Mbed API spec and the Nordic USBD hardware capabilities -- Nordic does not present a clear way of canceling a "loaded" auto-ACK once the DMA transfer has been initiated. I am trying some hacky methods (disabling/reenabling an endpoint during a stall) to accomplish this, but I'm not sure if there's a solution right now.

Any ideas?

See my cross-post on Nordic Devzone.

Intermittent Failure - EP Parallel Ctrl Timeout

Sometimes during parallel transfers endpoint 1 IN will time out. I've seen it with other endpoints but I haven't collected enough definitive data.

It may be related to a hardware issue... there was an errata that was "fixed" where EPDATA events would not be generated. Maybe it still happens occasionally.

My other theory is that somehow the DMA scheduling algorithm (which essentially just loops through pending endpoints starting in a certain order) never reaches the EP1 IN DMA transfer since it is so busy with EP0 and EP2 IN/OUT transfers.

See screenshot below:
screenshot from 2019-02-26 10-57-01-small

Real Testing

These issues seem to be corner cases that affect a small percentage of transfers/transfer patterns. One out of every 150,000 transfers (< 0.001%)

I've done a little real testing using currently implemented classes (USBCDC, USBMSD) and haven't seen any apparent issues. I haven't done too much testing admittedly.

It was cool being able to hook up a bunch of Mbed APIs and have a working nRF52840-DK "flash drive" in under an hour. (QSPIFBlockDevice + USBMSD + FATFileSystem = 8.2MB flash drive!)

I'm not sure where to go from here. I've spent a lot of time wrestling with the Nordic hardware, I think I need some outside input.

AGlass0fMilk on 26 Feb 2019

@AGlass0fMilk are you using production boards not the ones labeled preview? We had to work through spurious issues like you are seeing with QSPI and eDMA some of which turned out to be errata related to the preview boards.

@c1728p9 solved one of these issues where the UART was overflowing by turning down the DMA.
https://github.com/ARMmbed/mbed-os/pull/8784

dlfryar-zz on 26 Feb 2019

@dlfryar The board I'm working on is actually the Rigado BMD-340-EVAL kit for the nRF52840-based BMD-340 module they sell. I bought it back in August 2018 -- most of the major distributors were out of stock of the nRF52840-DK at that time.

I'm fairly certain it contains a "production-ready" IC revision (at least IC revision 1). I'm not sure if there's a place to verify this in the FICR or something... the chip is under an RF shield so I can't look at its markings.

I'll look at the UART fix, can you point me to a specific commit? I'm not sure what you mean by "turning down the DMA".

AGlass0fMilk on 26 Feb 2019

Turning down the DMA meaning xfer 1 byte versus waiting on up to 32 bytes.

Maybe you can verify with the stamp on the chip and the errata?

nrf-versions

dlfryar-zz on 26 Feb 2019

@dlfryar As I'm using a module dev board the actual nRF52840 IC is hidden under an RF shield. I'm almost certain it is a production rev IC.

AGlass0fMilk on 27 Feb 2019

Nordic engineers replied to me and pointed out a fix that exists in SDK15.2 for truly aborting/disarming auto-ACK on IN transfers that have already been loaded into the USBD internal buffers via EasyDMA. I figured it would involve modifying undocumented registers.

In light of this, I am going to go through SDK15.2 drivers for USBD and pull in any changes/fixes I haven't already until Mbed upgrades to nRFSDK15+

When I pulled in the changes before I wasn't as familiar with the internal Nordic driver operation... it should be easier now. :sweat_smile:

See convo here: https://devzone.nordicsemi.com/f/nordic-q-a/44117/nrf52840-usb-cancel-in-transfer

AGlass0fMilk on 27 Feb 2019

👍1

I pulled in changes from SDK 15.3 and the halt test has been consistently passing now!

Still seeing a timeout failure on the parallel transfer tests, but the implementation is almost ready to merge. My theory for this is:

Since the "DMA scheduler algorithm" implemented by Nordic in this driver version is essentially just checking the endpoints in order of 0 to 8 (starting with IN endpoints), the driver is so busy during transfers of some endpoints that it never gets to process other endpoints -- resulting in a timeout eventually.

It's probably due to the IN transfers being processed before the OUT transfers. So when an OUT transfer is received, the hardware ACKs the packet but then has to wait for EasyDMA to transfer the data to RAM from the internal buffers. This never happens in some cases. Another endpoint has pending IN transfers and creates a race condition where only the highest index endpoint gets attention from the DMA for both IN and OUT transactions.

I am going to change the scheduler algorithm slightly to see if the failure characteristics change. If it ends up being the issue, a simple FIFO queue could solve the problem. This way all endpoints have an equal chance to be processed by the DMA. Shouldn't take too much RAM since there are only 18 endpoints to worry about.

It may also be caused by endpoints returning NRF_ERROR_BUSY in response to read/write request sporadically, which pretty much crashes the stack during testing.

I'll check back when I have time to try this fix. See attached screenshot of bus traffic during one failure event (note the timestamps, I filtered out some control transfers):
screenshot from 2019-03-01 08-59-50

AGlass0fMilk on 1 Mar 2019

I have been collecting statistical data on what tests are failing on the current USB Phy implementation and it seems (based on bus complexity/latency) that only the ep_halt_test is still failing. I spoke too soon in my previous comment.

The results from ~280 USB basic test runs with various bus configurations are shown in the ASCII histograms below:

Iterative Testing - First Run

Configuration:

Laptop (Thunderbolt 3) => Dell TB16 Dock => Anker 4-port USB hub => Beagle12 USB analyzer => DUT

##################################################################################################
Iterative test results:
Tests failed 27 times out of 100 iterations (27% failure rate)
Test failure histogram
##################################################################################################
                                                     0  usb control basic test                    
                                                     0  usb control stall test                    
                                                     0  usb control sizes test                    
                                                     0  usb control stress test                   
                                                     0  usb device reset test                     
███████████                                          5  usb soft reconnection test                
                                                     0  usb repeated construction destruction test
                                                     0  endpoint test data correctness            
██████████████████████████████████████████████████  22  endpoint test halt                        
                                                     0  endpoint test parallel transfers          
                                                     0  endpoint test parallel transfers ctrl     
                                                     0  endpoint test abort                       
                                                     0  endpoint test data toggle reset

Iterative Testing - Second Run (w/ Round Robin DMA scheduler)

Configuration:

Laptop (USB Port) => DUT (direct connection)

##################################################################################################
Iterative test results:
Tests failed 63 times out of 100 iterations (63% failure rate)
Test failure histogram
##################################################################################################
                                                     0  usb control basic test                    
                                                     0  usb control stall test                    
                                                     0  usb control sizes test                    
                                                     0  usb control stress test                   
                                                     0  usb device reset test                     
                                                     0  usb soft reconnection test                
                                                     0  usb repeated construction destruction test
                                                     0  endpoint test data correctness            
██████████████████████████████████████████████████  63  endpoint test halt                        
                                                     0  endpoint test parallel transfers          
                                                     0  endpoint test parallel transfers ctrl     
                                                     0  endpoint test abort                       
                                                     0  endpoint test data toggle reset

Iterative Testing - Third Run (w/ Nordic original DMA scheduling algo)

Configuration:

Laptop (USB Port) => DUT (direct connection)

##################################################################################################
Iterative test results:
Tests failed 45 times out of 83 iterations (54% failure rate)
Test failure histogram
##################################################################################################
                                                     0  usb control basic test                    
                                                     0  usb control stall test                    
                                                     0  usb control sizes test                    
                                                     0  usb control stress test                   
                                                     0  usb device reset test                     
                                                     0  usb soft reconnection test                
                                                     0  usb repeated construction destruction test
                                                     0  endpoint test data correctness            
██████████████████████████████████████████████████  45  endpoint test halt                        
                                                     0  endpoint test parallel transfers          
                                                     0  endpoint test parallel transfers ctrl     
                                                     0  endpoint test abort                       
                                                     0  endpoint test data toggle reset

Conclusions:

I think the usb soft reconnection test failures in the first iterative test are flukes caused by the complex USB bus configuration. There are also fewer endpoint test halt failures than in later tests that used a direct connection between the laptop and DUT. This is likely due to the increased latency of transfers, allowing a better probability the old data loaded into the IN endpoint is overwritten by the new data before being read back.

I have reached out to Nordic again to see if there is actually a fix for "unloading" the DMA... fingers crossed.

I am going to run iterative tests and modify the host test script. I will insert a small delay in the endpoint loopback test during the endpoint test halt and see if allowing the DUT more time reduces test failures.

@TacoGrandeTX ultimately, if the Nordic hardware is simply incapable of always passing the halt test at full speed, what is your suggestion as for how to proceed?

AGlass0fMilk on 13 Mar 2019

👀1

@AGlass0fMilk Understood. Do you think the updates from SDK v15.3 made any difference? I'm going to have to defer to @c1728p9 for the way forward as he has review authority in this area.

TacoGrandeTX on 13 Mar 2019

Well, I think what is actually happening in this failure scenario _is_ a corner case.

The ability to stall/halt an endpoint (what this test is evaluating) is not what fails. What fails is the first endpoint loopback transfer of _the next_ halt test iteration with the same endpoint since there’s old data loaded in there.

I’m trying to find a way to suppress the first loopback transfer failure in a given halt_ep_test run.

AGlass0fMilk on 13 Mar 2019

I may have found a somewhat hacky workaround for fixing the halt issue.

I tried disabling and then re-enabling the given endpoint during a call to USBPhyHw::endpoint_unstall and it seems to have remedied the intermittent halt failure. It does not affect the other tests.

I'm doing a bit more iterative testing and will report back with results. Maybe we'll see a pull request today :raised_hands:

AGlass0fMilk on 14 Mar 2019

👍1

Iterative test results:
Tests failed 0 times out of 100 iterations (0% failure rate)

:grin:

I am going to clean up the code a bit and submit a pull request later. Hopefully it will pass automated testing on Mbed's side.

AGlass0fMilk on 14 Mar 2019

@AGlass0fMilk Very good! Are you still using your nrf52840-usb-hal branch? I'm not seeing the Nordic SDK 15 files there although your last commit was about 2 weeks ago. Mbed OS v5.12 is bringing USB support into master (if you weren't aware). I have rebased the feature-nrf52-sdk15 branch in preparation for v5.12 and that is on my denmark branch: https://github.com/TacoGrandeTX/mbed-os/tree/denmark. I think you need to target this, but I'm a bit concerned about not seeing a TARGET_SDK_15_0 folder.

TacoGrandeTX on 14 Mar 2019

@tacograndetx

I didn’t use the SDK 15 files for this build. I can rebase with your Mbed 15.2 update and make sure it works.

I just did a diff on the sdk14.2 usbd driver and added in relevant changes from the 15.3 driver.

AGlass0fMilk on 14 Mar 2019

@AGlass0fMilk Understood. We are still on SDK 15.0. In the 15.0 release Nordic packaged the drivers differently and USB fell under \nRF5_SDK_15.0.0_a53641a\components\drivers_nrf\usbd. This wasn't carried forward in Mbed since it wasn't supported. The USB drivers appear to be "old-style" format so I think the most suitable location for them is \targets\TARGET_NORDIC\TARGET_NRF5x\TARGET_SDK_15_0\integration\nrfx\legacy. I'm open to an alternative location if you have a better idea.

TacoGrandeTX on 14 Mar 2019

@TacoGrandeTX So you're suggesting I take the modified driver code and associated headers and just put them in the SDK15 update in an appropriate location?

If we eventually switch to the separate "nrfx" driver library the porting process won't be too bad in this case.

AGlass0fMilk on 14 Mar 2019

So you're suggesting I take the modified driver code and associated
headers and just put them in the SDK15 update in an appropriate location?

Yes - I'm suggesting the best place for the new nrf_drv_ files (usbd.c, usbd.h and usbd_errata.h) is
\TARGET_NORDIC\TARGET_NRF5x\TARGET_SDK_15_0\integration\nrfx\legacy.

We do have the HAL header (nrf_usbd.h) at
TARGET_NORDIC\TARGET_NRF5x\TARGET_SDK_15_0\modules\nrfx\hal so you will need to manually merge your changes into this file.

The TARGET_SDK_14_2 tree is removed from denmark so please don't add it back.

If we eventually switch to the separate "nrfx" driver library the porting process won't be too bad in this case.

We have done just that but a few drivers are still left with the old-style format like usbd. So you will have some changes to make. For instance on line 1936:

https://github.com/AGlass0fMilk/mbed-os/blob/nrf52840-usb-hal/targets/TARGET_NORDIC/TARGET_NRF5x/TARGET_SDK_14_2/drivers_nrf/usbd/nrf_drv_usbd.c#L1936

That will have to become NRFX_IRQ_DISABLE(USBD_IRQn);

I suspect my PR will soon be merged as it has been reviewed and I will be out next week. Could you submit your PR to denmark after it has been force-pushed to master?

I invited you as a contributor to my fork.

TacoGrandeTX on 14 Mar 2019

@TacoGrandeTX Yes I can submit a PR to denmark. I'm trying to get my driver to pass the usb serial tests now but it's having a problem with the "RX multiple bytes" tests. My guess is that the Nordic driver doesn't like it when you queue writes in the same interrupt context that processed the "TX done" event.

I'll make sure to just add it in to the SDK15 tree and find equivalents for some of the deprecated functions.

AGlass0fMilk on 14 Mar 2019

👍1

Actually I didn't read the README in the usb serial test directory. I added the udev rule it mentioned and now it's passing all tests for that as well!

I am working on merging in my USBPhy implementation to your denmark branch @TacoGrandeTX.

I will have to rewrite some parts to use the nrfx power driver. Maybe down the line I can also rewrite the driver to use the nrfx usbd driver when that is merged in (the nrfx_usbd.c driver is included in SDK version 15.3)

AGlass0fMilk on 14 Mar 2019

👍1

@AGlass0fMilk That's excellent. I just made a final commit (https://github.com/TacoGrandeTX/mbed-os/commit/833ed6392892b924ec9d3d6b8f7aae0a43746b56) to remove the last unneeded legacy files that @paul-szczepanek-arm noticed.

TacoGrandeTX on 14 Mar 2019

I reworked the code a bit and ported the nrf_drv_usbd.* files over to nrfx for common functions. It's working and building with the SDK15 update.

I have put in a pull request to your fork: TacoGrandeTx/mbed-os#10

Let me know on that PR what needs to change or if I didn't set up the PR correctly...

AGlass0fMilk on 15 Mar 2019

🎉1

@AGlass0fMilk Thank you for the PR! I have had a quick look and the location of files looks fine. We had a CI failure on the PR to update feature-nrf52-sdk15 (https://github.com/ARMmbed/mbed-os/pull/9999), but a correction was made and that has been kicked off again. As I mentioned I will be out next week, but @c1728p9 may have time to review it next week. He agrees that the correct approach is to first force push denmark to feature-nrf52-sdk15 and then bring in your PR.

I pulled down your PR but testing failed - so I will put my log in the PR for you to comment.

TacoGrandeTX on 15 Mar 2019

@AGlass0fMilk We're planning to release SDK15 in 5.13. Assuming that occurs as planned, is there anything specific left to address this issue?

linlingao on 30 May 2019

@linlingao

I'd prefer this stay open until @AGlass0fMilk's #10689 merges.

loverdeg on 30 May 2019

Hello @AGlass0fMilk , I'm a little confused in which repository to test the usb cdc support for Nrf52840 as of now. Is https://github.com/AGlass0fMilk/mbed-os/tree/nrf52840-usbphy-implementation the correct repository?

Matheus-Garbelini on 13 Jun 2019

@Matheus-Garbelini yes, that’s the branch I currently have a PR open for.

It got a little confusing, I kept having to make a new branch and rebase against master. I should delete the other branches.

Please test with the branch you mentioned!

AGlass0fMilk on 13 Jun 2019

@AGlass0fMilk thank you very much. I'm trying to test an example with NRF52840 dongle.
Sorry to ask but do you have the direct link to mbed usb cdc code example? As I'm new to mbed I don't I'm a little lost to where find things there. Also, I have to apply the udev rules as mentioned here right? (https://github.com/ARMmbed/mbed-os/tree/master/TESTS/usb_device/serial)

I'm using the NRF8240_DK as target. Let me know if this is the correct target to compile the project as in the command line: 'mbed compile -t gcc_arm -m nrf52840_dk --clean -v', but it's just giving me errors such as:

./mbed-os/targets/TARGET_NORDIC/TARGET_NRF5x/TARGET_SDK_15_0/modules/nrfx/mdk/nrf52840.h:2252:3: note: previous declaration as 'typedef struct NRF_GPIO_Type NRF_GPIO_Type'
 } NRF_GPIO_Type;
   ^~~~~~~~~~~~~

[mbed] ERROR: "/usr/bin/python" returned error.
       Code: 1
       Path: "/home/matheus/SUTD/projeto/proj"
       Command: "/usr/bin/python -u /home/matheus/SUTD/projeto/proj/mbed-os/tools/make.py -t gcc_arm -m nrf52840_dk --source . --build ./BUILD/NRF52840_DK/GCC_ARM -c -v"
       Tip: You could retry the last command with "-v" flag for verbose output

Thanks.

Matheus-Garbelini on 13 Jun 2019

@Matheus-Garbelini You shouldn't have to apply those patches as that is just so the tests run more consistently. Otherwise, Linux might interact with the serial device and cause spurious failures.

There are usually examples in the headers for each class. Check out the USBSerial example here:
https://github.com/ARMmbed/mbed-os/blob/3e6f5eba6c7da02f65220f6e9bc33804445c613b/usb/device/USBSerial/USBSerial.h#L25-L44

I'm not sure you'll be able to do USB on the nRF52840 dongle. The nRF52840_DK has two USB ports -- one for the programmer and one for the nRF52840. The dongle's USB plug may only be connected to the programmer (unless there are hardware jumpers to set it up otherwise).

EDIT: I read the dongle spec sheet and it says there's a direct connection to the nRF52840. I haven't tested it with this target however.

AGlass0fMilk on 13 Jun 2019

Please note that the dongle seems to have a special bootloader that lets you program it over USB without a debugger. It is not currently supported by Mbed. I would suggest getting an nRF52840_DK to test with

https://os.mbed.com/questions/85218/Will-the-nrf52840-USB-Dongle-be-supporte/

AGlass0fMilk on 13 Jun 2019

@AGlass0fMilk Thanks for the response.

However, the bootloader is not a problem with a custom script to handle the .hex .
I've already flashed successfully the dongle with Adafruit Arduino board support package and it's serial cdc implementation works really bad. I've also used platformio to compile other mbed examples for this board and it works fine. However I can't compile any example using your repository no matter what I do, here's why:

What target to use is not clear, I've tried NRF52840_DK, which is the same I've used before and it still gives me compile errors
I don't know if there's more defines to include in the mbed configuration json file. A lot of projects uses the mbed_app.json and I don't have it for the NRF52840_DK target.
Maybe my arm-gcc toolchain is messing up with something.

Let me know if you can share the command line you use to compile one project using your repository and the version of your toolchain so I could start making some tests.

PS: I just use the following commands to flash the device:

nrfutil pkg generate --hw-version 52 --application-version 1 --application .pioenvs/adafruit_feather_nrf52840/firmware.hex --sd-req 0xB6 app_dfu_package.zip
sudo nrfutil dfu usb-serial -p /dev/ttyACM0 -pkg app_dfu_package.zip",

So as long as mbed supports NRF52840_DK the dongle will work.

Matheus-Garbelini on 13 Jun 2019

@Matheus-Garbelini I made a repository with the example in it. I built and tested it a few minutes ago with my nRF52840_DK. Let me know if you have any other problems.

https://github.com/AGlass0fMilk/mbed-usb-cdc-example

Try using Python2 with Mbed's tools. I've had issues with Python3

nrf52840-cdc-test

Perhaps try cloning Mbed-OS fresh from my fork and checkout the feature branch with USB.

AGlass0fMilk on 13 Jun 2019

😄1 👍1

Wow, Thank you very much for your fast response and upload. It finally worked

Thanks for all your effort, just by fastly opening the serial monitor I can already see it's much more reliable than mainstream arduino BSP. I'm not sure what you did differently but it compiled without any errors now.

Update: I'm not being able to enable timers, the application hangs when the timer0-4 interrupts are enabled. But I need to test more to be sure.

Matheus-Garbelini on 13 Jun 2019

Wow, Thank you very much for your fast response and upload. It finally worked

No problem! Glad you could test it out.

Thanks for all your effort, just by fastly opening the serial monitor I can already see it's much more reliable than mainstream arduino BSP. I'm not sure what you did differently but it compiled without any errors now.

That's awesome. Arduino definitely has some reliability issues vs. Mbed...

As for the build issue, my guess would be you had some left over files after changing branches. My PR was written for use with the nRF SDK version 15 and previous versions of Mbed had SDK14.2 so some files may have gotten left behind or something and caused conflicts.

If you're sure you won't lose any untracked work, you can try git reset --hard in the mbed-os repository you had issues with. Then delete the BUILD folder and do a fresh build of the project. That might help solve your issue.

I've removed around 100kB of unused code by creating "mbed_app.json" in root path with the following:

{
    "target_overrides": {
        "NRF52840_DK": {
            "target.extra_labels_add": ["SOFTDEVICE_NONE"],
            "target.extra_labels_remove": ["BLE", "SOFTDEVICE_COMMON", "SOFTDEVICE_S140_FULL", "NORDIC_SOFTDEVICE"]
        }
    }
}

Yes, if you don't need BLE you can exclude merging the soft device. Mbed is removing support for the Nordic softdevice soon and it will be replaced by the open-source ARM Cordio BLE stack.

Have fun with USB! 🥂

AGlass0fMilk on 13 Jun 2019

Thanks, I've just noticed that Ticker class and timers hangs the application for some reason. Now I'm worried :-X.

Matheus-Garbelini on 13 Jun 2019

Thanks, I've just noticed that Ticker class and timers hangs the application for some reason. Now I'm worried :-X.

Can you share the code?

AGlass0fMilk on 13 Jun 2019

yes, here it is:

#include <mbed.h>
#include <USBSerial.h>

USBSerial serial;
DigitalOut LedB(P0_12, 1);
volatile uint32_t milliseconds;
Ticker ticker;

void global_timer_IRQ(void)
{
  ++milliseconds;
}

int main(void)
{  
  ticker.attach_us(global_timer_IRQ, 1000); // Comment this line to unfreeze
  while (1)
  {
    LedB.write(0);
    wait(1);
    LedB.write(1);

    serial.printf("%d\n", us_ticker_read());

    wait(1);
  }
}

Matheus-Garbelini on 13 Jun 2019

Hi @Matheus-Garbelini, I couldn't get that code to compile but I think it's not working because of this:

void global_timer_IRQ(void)
{
  timer.attachInterrupt(&global_timer_IRQ, 1000); // microseconds
  ++milliseconds;
}

Pull my example repo again, I have updated it to do essentially what your code looked like it was intended to do. There should be no issue using USB with any of the other on-board peripherals.

Try some simpler examples to get started with Mbed's lower level drivers. I recommend using the RTOS and EventQueue for anything more complicated than a single-threaded, Arduino-style application.

AGlass0fMilk on 13 Jun 2019

😄1 👍1

@AGlass0fMilk Sorry, that line shouldn't be there. I've removed that line.

update: Yes, it was a confusion I was doing with the timers, nothing related to USB, Ticker is working ok. Sorry for the silly questions.

Matheus-Garbelini on 13 Jun 2019

@AGlass0fMilk Sorry that I checked out of this thread. You've done some amazing work here, good job! 🎉

marcuschangarm on 14 Jun 2019

@marcuschangarm Thanks!

PR has been merged into master. Closing this issue.

AGlass0fMilk on 18 Jun 2019

Was this page helpful?

0 / 5 - 0 ratings