Zephyr: samples/subsys/shell/fs/ fail on native posix board

Created on 21 Apr 2020  路  16Comments  路  Source: zephyrproject-rtos/zephyr

I have tried this:

sudo apt install libfuse-dev
mkdir flash
west build -b native_posix_64 samples/subsys/shell/fs/
west build -t run

I did't export PKG_CONFIG_PATH in this case.
I can see this,

Mounting flash at flash/
UART_0 connected to pseudotty: /dev/pts/2
*** Booting Zephyr OS build zephyr-v2.2.0-1666-gf3b4d8a86f78  ***

but after screen /dev/pts/2, no shell, nothing display on the screen, just stuck there.
I also tried native_posix without _64 and the result is same.

BTW, in #24484 @pabigot comfirmed the same issue.

native port bug medium

Most helpful comment

Got it. Fix up at #24648

The root cause was that the uart polling loop was asking for a k_timer period of exactly one tick, and adjusting that produced a K_NO_WAIT internally that the code was treating as an externally-provided "no period needed" argument. It wasn't anything to do with native_posix per se, just the particular interaction of polling period and tick rate.

All 16 comments

I had the exact same thing happen in tests/bluetooth/shell on native_posix (cc @joerchan).
I think this might not be related to this particular sample but to the mapping between /dev/pts/ and the console code in native_posix.

I found samples/subsys/shell/shell_module/ had the same issue, but samples/net/eth_native_posix/ works fine.
Looks CONFIG_NET_SHELL works, but CONFIG_FILE_SYSTEM_SHELL not.
after add all of those config in samples/subsys/shell/fs/ to samples/net/eth_native_posix/, fs shell works on eth_native_posix sample .

After running bisect it looks like following commit introduced the issue:
kernel/timeout: Make timeout arguments an opaque type: 7832738ae985a63febb8f82e7c4e34824f48486e

The commit message mentions that not all subsystems are converted and that the legacy behaviour can be restored by setting CONFIG_LEGACY_TIMEOUT_API. Setting this config manual resolves the issue.

@andyross What do you think of the below patch, I'm not 100% sure it should be applied at the top level of the subsytem?

diff --git a/subsys/shell/Kconfig b/subsys/shell/Kconfig
index af8496f961..9589c22199 100644
--- a/subsys/shell/Kconfig
+++ b/subsys/shell/Kconfig
@@ -8,6 +8,7 @@
 menuconfig SHELL
        bool "Shell"
        imply LOG_RUNTIME_FILTERING
+       select LEGACY_TIMEOUT_API
        select POLL

 if SHELL

Legacy mode is supposed to be for build failures only. If we have a runtime failure then there's a bug that's been exposed somewhere.

FWIW: I can't reproduce that workaround. Seems like I don't get the shell prompt ever, regardless of whether CONFIG_LEGACY_TIMEOUT_API is set for the app or not.

FWIW: I can't reproduce that workaround. Seems like I don't get the shell prompt ever, regardless of whether CONFIG_LEGACY_TIMEOUT_API is set for the app or not.

When @carlescufi ran into this issue he was using tests/bluetooth/shell which has CONFIG_LEGACY_TIMEOUT_API set (Bluetooth currently has it set) so that seems like a coincidence.

Came back to look at this. Was about to report that commit 99a815591 (the one right before the timeout series) was hanging too. But then I started a git bisect, did a full clean of the tree, and it worked. Then I reverted to origin/master ... and it worked too!

I'm guessing that there's something wonky with this test. Is there a fuse mount that doesn't get reset/cleaned/initialized correctly in all cases?

I actually did the bisect with samples/subsys/shell/shell_module so this more or less rules out fuze and it looks like I have no stale mount points on my machine.

@vanwinkeljan Are you still reproducing this issue? Can you check if deleting flash.bin (in the directory you start the shell from) has any effect on the issue?

@joerchan No I'm not trying to reproduce it any further as for me it is resolved by setting CONFIG_LEGACY_TIMEOUT_API.

Note that in my reproduction scenario (samples/subsys/shell/shell_module ) the flash file system is no longer in place so I cant remove the file flash.bin.

@joerchan I tried it with samples/subsys/shell/fs. Removing flash.bin had no effect. CONFIG_LEGACY_TIMEOUT_API=y made it work.

My bisect reproduced the results by @vanwinkeljan in https://github.com/zephyrproject-rtos/zephyr/issues/24553#issuecomment-617950667. I'd approve a PR that made the fix suggested in that comment.

I believe legacy timeout workarounds are supposed to be fixed in this release so it would be good to get this marked as a subsystem that needs attention.

No. We need to fix the bug or identify the usage that breaks with the new timeouts. The legacy setting changes the API, it shouldn't ever change behavior for conforming code.

Also, I just tried this again and despite the fact that it sometimes works and sometimes doesn't, I can't currently reproduce the workaround. Putting CONFIG_LEGACY_TIMEOUT_API=y into prj.conf on that test still hangs on start.

Also, I just tried this again and despite the fact that it sometimes works and sometimes doesn't, I can't currently reproduce the workaround. Putting CONFIG_LEGACY_TIMEOUT_API=y into prj.conf on that test still hangs on start.

@andyross maybe a stupid question but you are aware that the shell is spawn on a dedicated pseudo terminal?

In fact I'm just discovering that while reading the native_posix UART code. Let's just say I'm very much not amused with the choice of defaults here...

Got it. Fix up at #24648

The root cause was that the uart polling loop was asking for a k_timer period of exactly one tick, and adjusting that produced a K_NO_WAIT internally that the code was treating as an externally-provided "no period needed" argument. It wasn't anything to do with native_posix per se, just the particular interaction of polling period and tick rate.

Was this page helpful?
0 / 5 - 0 ratings