Marlin: [2.0.x, HAL_DUE] "sd_mmc_spi_mem.cpp" incompatible with SPI buses.

Created on 11 Jul 2018 · 27Comments · Source: MarlinFirmware/Marlin

The routines in "sd_mmc_spi_mem.cpp" appear to be called as an interrupt-like fashion, meaning they are called even when the main code thread is doing something else. Although this does not present a problem when hardware SPI is used only for the SD card, it does present a problem when the hardware SPI lines are made into a bus with several devices.

In our next gen printers, we have the three devices on a SPI bus:

An on-board SPI flash chip on the Archim 2.0 board.
A color LCD display
A MAX3421e USB host

Access to these devices happens in the idle loop, so there is no contention for the SPI bus, but the routines in "sd_mmc_spi_mem.cpp" appear to be called out-of-context, like an interrupt. When these routines attempt to access the SD card, there is a possibility that this may happen when the main thread was in the middle of talking to another device on the bus. This leads to two devices being chip selected, and thus, garbled data and a likely hung system.

Currently I am working around the problem by making it so all routines in "sd_mmc_spi_mem.cpp" return CTRL_NO_PRESENT, which effectively disables the troublesome code.

In the future I expect people to want to use more SPI devices with their printers and making them into a bus is far preferable than allocating different pins to different devices and using software SPI for them. I think a rule needs to be put in place to not allow interrupt-like routines to access the hardware SPI, or if that is inevitable, there needs to be access control or a way to shut off such routines when code needs to access SPI.

Source

marcio-ao

Most helpful comment

After sound and disco lights now videos.
Are we still building 3D-printers or did we move to entertainment systems.

AnHardt on 12 Jul 2018

❤1 😄1

All 27 comments

This ticket is related to #10755, which I will close.

marcio-ao on 11 Jul 2018

I'll make the suggestion that this feature should be controllable via a "Configuration_adv.h" option. That way if people want to use a SPI bus, they can easily disable this feature.

marcio-ao on 11 Jul 2018

I've recently been looking at the spi code used for the MKS SBASE board and trying to understand the issues for that system. With that the spi bus is used to access the "on board" sd card (not the one on any LCD display) from the USB code at interrupt time, in effect allowing an attached PC access to the sd card (at least that is my understanding of what is going on). I couldn't see any other cases of using spi from an interrupt context, but I may have missed something. Is that also the case here or will Marlin try to access the sd card from an interrupt in other situations?

I'm not really that familiar with spi as used by Marlin, how long is a typical spi transaction (if there is such a thing)? Would it be reasonable to "hold off" interrupt based access for that amount of time?

gloomyandy on 11 Jul 2018

@gloomyandy: I suspect the "sd_mmc_spi_mem.cpp" code is the only one accessing devices via SPI, because once I remove those functions, I no longer see any issues. If the routines in "sd_mmc_spi_mem" are indeed interrupt driven (I have not confirmed this, I only suspect it to be the case due to the issues I am seeing), then I could attempt to bracket my calls to SPI with "noInterrupts()" and "interrupts()". I'm not sure if that is the preferred way of doing things or if there is a danger of messing up printing by doing that.

marcio-ao on 11 Jul 2018

I suspect there should be a better way than that. I'm not that familiar with the DUE but a quick look at that HAL file and it seems very similar to the ones on the SBASE which support PC access to the SD card via USB. So I suspect it provides the same capability.

As to how to disable the access, if the access is from the USB side of things then there are almost certainly better ways of doing it than disabling all interrupts. It may be possible to simply disable the USB interrupts (assuming that USB access can handle the max delay?), or perhaps use some sort of critical section? I'm sure someone more familiar with Marlin will comment. I hope so as I'm interested in this area of the code and would like to understand the options better.

gloomyandy on 11 Jul 2018

@gloomyandy : Me too.

marcio-ao on 11 Jul 2018

@marcio-ao ; You are wrong: The main problem with sharing SPI with an MMC/SD is that the card can´t be arbitrarily deselected by just toggling the CS pin. This is a limitation of the SD specification
The SDCS line state is only considered at very specific points in the SD protocol, for example, you cannot interrupt a sector transfer in the middle, you cannot interrupt a SD command in the middle, and so on...
As said there are specific points were a card can be deselected (or selected), and sometimes, even it is required to send extra clocks to make sure the card actually sees the deselection
The sd_mmc_spi_mem.cpp functions are not called from an ISR context. In fact, they are called from the idle loop.
The call chain goes ... HAL_idletask() -> usb_task_idle() -> udi_msc_process_trans(). That last function uses flags to process pending requests that were received from the USB stack isr, and eventually calls the sd_mmc_spi_usb_read_10/ sd_mmc_spi_usb_write_10 and all the other functions present on the sd_mmc_spi_mem.cpp file.
As explained previously, probably all what is needed is to be sure the SPI transfer to the SD is never interrupted, and also, after all transfers, send 8 clocks to make sure the SD card is actually deselected
Yes, i have done designs where SPI lines are shared between Sd and other devices. The only requirement was to send those extra 8 clocks once the transaction was done, to be absolutely sure the card stops driving the data lines...

ejtagle on 11 Jul 2018

I'm in the process of replacing the USB stack on the LPC176x platform, (while pulling out the framework from Marlin) and this is one of the issues I will be paying attention to, When I designed the current implementation (LPC176x not DUE) I made assumptions based on the Re-ARM that the onboard sd-cards would have a dedicated SPI connection, so I just plugged in the USB library, merged it with chanfs (fixed all the issues), and made the composite device and all was good in the land (it was supposed to be temporary), that is until people realised that the LPC176x framework worked on other boards.. boards where that assumption did not hold. As @ejtagle states the DUE works, it shouldn't be a problem to defer the transfer from the interrupt, and handle other devices on the SPI port.

p3p on 11 Jul 2018

👍1

@ejtagle: That's interesting. If it's not an ISR, then I am wondering why I am seeing calls to routines in sd_mmc_spi_mem.cpp in the middle of a loop I setup to play a startup animation file from the SPI flash. The loop that does not return to the idle function until the entire video is played... but, I am calling watchdog_reset() and thermalManager.manage_heater(); and safe_delay(ms);

Is it possible that any of those functions are calling usb_task_idle() indirectly? If so, I can see that causing the problem.

marcio-ao on 11 Jul 2018

The main problem with sharing SPI with an MMC/SD is that the card can´t be arbitrarily deselected by just toggling the CS pin... you cannot interrupt a sector transfer in the middle, you cannot interrupt a SD command in the middle, and so on...

This is not the issue. My code isn't even using the SD card at all. What I am finding instead is that while my loop does not access the SD card at all, hooking up a digital analyzer shows that the SDSS line is being pulled low. This only happens when a USB cable is plugged in.

It is not my code accessing the SD card, as far as I can tell, it is sd_mmc_spi_mem.cpp accessing it on behalf of the connection from a PC. If that code is not called from an ISR, then some other function that I am calling in my loop is calling it indirectly.

marcio-ao on 11 Jul 2018

be carefull @marcio-ao . Any call to safedelay() WILL end up calling the idle() loop. That was and is the purpose of the safedelay() function (that is being used everywhere!): To ensure the idle() calls happen even when a blocking delay is executed -- You simply can´t trust the SPI bus to be in the same state between safe_delay() or any other Marlin implemented calls

ejtagle on 11 Jul 2018

@p3p : The exact situation that you describe is where a proper RTOS starts to shine :) @Roxy-3D 👍

ejtagle on 11 Jul 2018

@ejtagle: Are you sure about that? It looks like safe_delay() only calls thermalManager.manage_heater(), and that in turn does not appear to call idle()

Anyhow, it's very good to know that functions in sd_mmc_spi_mem.cpp are not called from an ISR. That means something else is causing the video to hang and with some additional investigation, I should be able to narrow it down (hopefully!).

marcio-ao on 11 Jul 2018

@marcio-ao : Very hard to tell if there is a call to idle(). The easiest way would be to use a JTAG debugger to trace... or use the trace function call facility that is already integrated into marlin...

ejtagle on 11 Jul 2018

After sound and disco lights now videos.
Are we still building 3D-printers or did we move to entertainment systems.

AnHardt on 12 Jul 2018

❤1 😄1

@AnHardt: Rather than a static logo at startup, I'm thinking of something animated. But mostly it came about as curiosity as to what the graphics controller could do. The board has a 2MB flash chip on it, so it was also an excuse to use that storage for something.

marcio-ao on 12 Jul 2018

@AnHardt: On a more practical level, video could also be used as a startup wizard to help people set up their printers. With inexpensive color displays now available, there really isn't a need for 3D printers to be stuck in the 80s with text-only prompts or pixelated B&W bitmaps.

marcio-ao on 12 Jul 2018

@marcio-ao : I´ve been playing with that same idea for a very different project. Using an ILI93xx controller on a Color LCD (320x240) . As soon as you start adding graphics, you will run out of (flash) resources.
I have written several image compressors to be able to avoid that situation, but it is not easy ... It works, yes. A windowed GUI is not hard to implement over it.
But to make it "flow" a 32bit processor is required...

ejtagle on 12 Jul 2018

@ejtagle: Yes, there isn't very much space for graphics on the Atmel chip's Flash itself. But we are using an Archim 2.0 board, which has an on-board 16Mbit external SPI Flash chip. In my UI, I made a hidden developer menu where you can go to instruct it to copy media files off the SD card or USB flash-drive and write them to the SPI Flash, so that it will subsequently be used at startup to display the logo animation. This allows the media files to be pre-loaded by us and it won't consume any space on the Atmel chip itself.

marcio-ao on 12 Jul 2018

@ejtagle: I think the issues I am seeing are caused by an SD card not being a well-behaved SPI device, as you noted above (I was not aware of this, but after reading up more on it, I see why it's quite a different beast!). It turns out it all works fine when I substitute the SD card with an USB host.

I'm going to close this ticket, since this doesn't appear to be a issue in Marlin, but rather the fact that SD cards don't play nice on a SPI bus. I guess the recommendation will be if you are using SD cards, you should use a dedicated interface for it.

marcio-ao on 12 Jul 2018

👍1

I believe the issue is fundamentally that the SD card code assumes nothing else will alter the SPI bus enable state while the SD card is using it. If a different device was enabled on the same bus, I can see how weird things could happen. There's no central dispatcher to ensure that only one enable pin is ever set on a shared SPI bus. Marlin could use a helper class for this purpose to help mitigate internal collisions. Not sure if it would help with libraries that do their own SPI internally.

thinkyhead on 13 Jul 2018

@thinkyhead : You are probably right.

ejtagle on 13 Jul 2018

@thinkyhead : When I write my own code I keep the CS assert/de-assert close to each other, even if it means asserting/de-asserting more than necessary (for example, if a chip supports reading multiple registers while holding the CS asserted, I will generally have the function that reads one register at a time that asserts and de-assert the CS for each access). While less efficient, I prefer that approach over premature optimizations that may invite errors. I would also prefer that over a Marlin helper class. Having a specific helper class would make it harder to integrate external libraries that may already work. Most libraries, if correctly written, should de-assert the CS before returning (if it does not do so, it should be fixed!).

The following two commandments should take care of the issue:

All methods (with the possible exception of private functions) will de-assert CS before returning to the caller.
Interrupt code shall not use SPI.

I tried to put a "if(!READ(SDSS))" before and after the "HAL_idle_task" to see if the SDSS was being left asserted, and it did not seem to be the case. I don't think the problems I am seeing are related to chip select. An SD card requires a special initialization before going into SPI mode and my code was running before the SD card was initialized, so possibly that was the problem. My particular SD card could also have a bad implementation of the standard. This is yet another reason SD cards are a poor choice for an SD bus... you can't really known in advance whether all cards do things correctly.

I've decided to quit investigating this particular issue. The problem goes away entirely when I use a USB host that respects the SPI protocol. So anyone using a SPI display and an SPI USB host should be good.

SD card support is an outlier right now, but if someone is using a SD card, they probably are using a RepRapDiscount style display with it anyway and not a SPI bus.

marcio-ao on 13 Jul 2018

@marcio-ao , @thinkyhead ... Both are right. If we want to share an SPI bus with an SD, the FIRST THING that MUST BE DONE is to deassert all CS lines, SWITCH the SD to SPI mode, deassert the SD CS line and send an extra 8 clock cycles to make sure the SD ends any pending operation and releases the SPI bus.

I have designed, several years ago, a commercial product were i had just one hi speed SPI bus (40mhz capable) and i had absolutely no problems sharing that SPI bus with an SD and several other "normal" SPI devices -- But the card MUST be initialized as the first thing, because otherwise, the card will operate in SD/MMC mode and will interfer with the other SPI devices...

ejtagle on 13 Jul 2018

@ejtagle I assume you will have to do the above when a card is newly inserted? Can this cause any problems? Is it safe to wait for other devices to complete operations? With the MKS SBase my understanding is that by default the onboard SD card and the SD card in the display will share an SPI bus. At the moment Marlin requires that we split the two and use different pins (by wiring the display SD card to other pins). Should it be possible to have both SD cards share the same hardware SPI bus (assuming issues like contention are sorted out)?

gloomyandy on 13 Jul 2018

@gloomyandy : It is perfectly possible for 2 cards to share the SPI bus. Each one should have its own CS line, that´s the only requirement. The limitation right now is that the Arduino SD card class only can handle one card and assumes total control.
I think that that class could be replaced with chanfs and a proper implementation would allow to access the 2 cards with no issues at all

Read here: http://elm-chan.org/docs/mmc/mmc_e.html#spibus , as explained, sending 8 clocks after CS deassertion is all that is required

ejtagle on 14 Jul 2018

👍1

This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.