Zephyr: Problem with PPP + GSM MUX with SIMCOM7600E

Created on 20 Jul 2020  路  23Comments  路  Source: zephyrproject-rtos/zephyr

Describe the bug
I am trying out the GSM PPP driver and GSM MUX with two SIMCOM modems:
SIM868 and SIM7600E.
It works as a charm on SIM868 but I am having problems with SIM7600E.

When just using MODEM_GSM_PPP=y and GSM_MUX=n it works fine

When enabling GSM_MUX, AT commands works fine and I can run ping in parallel, so network connectivity works to some extent...

My problem is when I start something more demanding on PPP (such as a lwm2m client, or a dns-lookup) the modem halts and then reboots. The same thing happens even if I do not use any AT commands in parallel with the client.

When dumping the logs I do not see anything that is obviously wrong.
Is there anyone here who have experience with the GSM MUX and can advise on how to debug this issue?

I have been in contact with the modem vendor and they claim muxing should work in general.

To Reproduce
Steps to reproduce the behavior:

Use the LWM2M client sample code

This is the prj.conf I used:

CONFIG_NET_SHELL=y
CONFIG_LOG=y
CONFIG_CONSOLE=y
CONFIG_LOG_DEFAULT_LEVEL=4
CONFIG_LWM2M_LOG_LEVEL_DBG=y
CONFIG_NET_LOG=y

CONFIG_NET_IPV6=n
CONFIG_NET_CONFIG_NEED_IPV4=y
CONFIG_NET_CONNECTION_MANAGER=y
CONFIG_NETWORKING=y

CONFIG_LWM2M=y

CONFIG_MODEM=y
CONFIG_NET_PPP=y
CONFIG_NET_L2_PPP=y
CONFIG_NET_L2_PPP_OPTION_MRU_NEG=y
CONFIG_MODEM_GSM_PPP=y
CONFIG_MODEM_GSM_UART_NAME="UART_1"
CONFIG_MODEM_GSM_APN="m2m.tele2.com"
CONFIG_DEVICE_POWER_MANAGEMENT=y

CONFIG_GSM_MUX=y
CONFIG_MODEM_GSM_SIMCOM=y
CONFIG_MAIN_STACK_SIZE=2048

Impact
This is a showstopper for my client

Logs and console output
I do not see anything in the logs that looks out of the ordinary. Would appreciate tips on what data to look for.

Environment (please complete the following information):

  • OS: Linux, MacOS
  • Toolchain: arm-none-eabi
  • Commits:
    ```

    • name: zephyr

      remote: upstream

      path: zephyr

      revision: 98bcc51b0925a92db28f127d08b842b1d609bce8

      clone-depth: 10

      west-commands: scripts/west-commands.yml

- name: hal_nordic
  revision: 00580f43b9d48f2079d611ce12f0fbc78d3040f1
  path: modules/hal/nordic

- name: cmsis
  revision: 542b2296e6d515b265e25c6b7208e8fea3014f90
  path: modules/hal/cmsis

- name: segger
  revision: 6fcf61606d6012d2c44129edc033f59331e268bc
  path: modules/debug/segger

```

bug low

All 23 comments

Did you try #26671? It might help.

Yes I did try your PR @mniestroj sadly it did not help.

This can be very difficult to debug as typically the modems are just black boxes that do not tell why they fail. I spent countless hours debugging the muxing with simcom808 modem.

Do you have any suggestions as to how I can verify that the Zephyr code is working as it should?
I have been trying to sniff the UART and analyze it with Wireshark but I only get garbish (even with the SIM868 which works fine)

One thing to check is to make sure that the muxing channel is correctly set. Different modems might use different channel number for PPP and AT channels. See CONFIG_GSM_MUX_DLCI_AT and CONFIG_GSM_MUX_DLCI_PPP options.

I have been trying to sniff the UART and analyze it with Wireshark but I only get garbish (even with the SIM868 which works fine)

It is a bit surprising that Wireshark is not able to decode the muxing, usually it is handling all the protocols that I have thrown to it.

You can enable more debugging, for example CONFIG_GSM_MUX_LOG_LEVEL_DBG=y might give usable information,

My problem is when I start something more demanding on PPP (such as a lwm2m client, or a dns-lookup) the modem halts and then reboots. The same thing happens even if I do not use any AT commands in parallel with the client.

This sounds like modem is running out of resources etc. One thing to check is to compare the working modem SIM868 manual to non working SIM7600E if there are differences with configuration, like does the modem require some AT command to be sent before it starts to work correctly.

You probably know this but just in case, the https://simcom.ee/documents/ contains documentation for various SIMCOM modems.

Yeah I have tried to compare the AT-command listings.

There does not seem to be any difference in the AT-command listing although the AT+CMUX description for SIM7600E is very brief and states "Currently only basic operation mode is supported." which I guess refer to the "basic operation" in the GSM 07.10 protocol?

One thing to check is to make sure that the muxing channel is correctly set. Different modems might use different channel number for PPP and AT channels. See CONFIG_GSM_MUX_DLCI_AT and CONFIG_GSM_MUX_DLCI_PPP options.

good tip will try that :)

The channels seem fine. Although I did some manual interpretation of the data captured on the UART all of the muxed packets sent only seem to be 7 byte. I guess there could be an issue with flooding the modem?

Do you know where I can find settings for the packet size?

There is no such thing as tweaking the packet size. It sounds like there is some handshake error here. You could enable more modem debugging in order to see more information

CONFIG_UART_MUX_LOG_LEVEL_DBG=y
CONFIG_GSM_MUX_LOG_LEVEL_DBG=y
CONFIG_MODEM_GSM_MUX_LOG_LEVEL_DBG=y
CONFIG_MODEM_GSM_MUX_VERBOSE_DEBUG=y
CONFIG_NET_PPP_LOG_LEVEL_DBG=y

This will print lot of data, so I had to increase the logging sizes too

CONFIG_LOG_BUFFER_SIZE=65536
CONFIG_LOG_STRDUP_BUF_COUNT=256

I had some experimental stuff for SIMCOM LTE modems at drivers/modem/gsm_ppp.c:mux_enable() function. If you set SIMCOM_LTE the mux enabling is done slightly differently. I was not able to test this as SIM808 did not work with it. This is just for experimenting if you want to try it.

There is no such thing as tweaking the packet size. It sounds like there is some handshake error here. You could enable more modem debugging in order to see more information

Sorry I meant that size in static int gsm_mux_send_data_msg(struct gsm_mux *mux, bool cmd, struct gsm_dlci *dlci, uint8_t frame_type, const uint8_t *buf, size_t size
always seems to be 1.

So when I send "AT" on the AT DLCI (2) it is sent as two frames:

0000   f9 07 ef 03 41 d4 f9                              ....A..
0000   f9 07 ef 03 54 d4 f9                              ....T..

Pretty sure I am flooding the modem with tiny frames

I realized this when I spent the morning manually dissecting and formatting the sniffed UART data ... then Wireshark behaved much nicer

Pretty sure I am flooding the modem with tiny frames

There are lot of small frames but I doubt it is the issue here.

The uart_poll_out API sends data one byte at a time so what you are seeing here is unavoidable (unless new API is implemented). See drivers/console/uart_mux.c:uart_mux_poll_out() for details.

I guess it is fine for the AT commands but it seems to be the case for all of the IP packets being sent on the PPP line as well...
Is this really how it is intended to work?

Is there a way I could use uart_mux_fifo_fill instead?

Sorry if I am missing the point

Currently the modem code in drivers/modem/modem_iface_uart.c:modem_iface_uart_write() is using polling interface. Perhaps it might be possible to use the interrupt driver API instead, would require some changes to modem code and more testing. Patches are welcome if you want to try that route.

I will try to verify that this is my issue first. If it is the case I might attempt the rewrite.
Thank you for the help :)

It seems like flooding is the issue.

I was able to verify this by hacking around uart_poll_out in modem_iface_uart_write and in a similar way in ppp_send_flush

if (strncmp(iface->dev->name, CONFIG_UART_MUX_DEVICE_NAME, sizeof CONFIG_UART_MUX_DEVICE_NAME - 1) == 0) {
    uint8_t dlci_address = iface->dev->name[4] - '0';

    // in this hacked version the first argument of gsm_dlci_get is ignored
    (void)gsm_dlci_send(gsm_dlci_get(NULL, dlci_address), buf, size);
} else {                                                                 
    do {                                                             
         uart_poll_out(iface->dev, *buf++);                       
    } while (--size);                                                
}    

With this everything works as expected. I realize that this is not a nice solution but at least I have been able to find a cause of the issue.

Do you think this kind of workaround could be cleaned up to some acceptable state or how would you recommend I do this in a proper way? I am happy to look into doing it :)

So there is a bug in the modem firmware if it cannot accept things one byte at a time. Would you be able to report this to the vendor, perhaps they are able to fix it in next firmware release.

What you are proposing will not work in all cases as the device instance id found in the device name, is not really related to DLCI number. If you want to experiment a bit, you could change the uart_poll_out() call to uart_fifo_fill() and try if it works better. The documentation says that function is only to be called from ISR handler if uart_irq_tx_ready() returns true. Result of calling this function not from an ISR is undefined (hardware-dependent). So it is not a proper solution but worth experimenting.

So there is a bug in the modem firmware if it cannot accept things one byte at a time. Would you be able to report this to the vendor, perhaps they are able to fix it in next firmware release.

I agree. I have been in contact with the vendor and I have a hard time understanding if they are going to fix this or if they are considering that one should not send only one byte / frame. Lets see where that discussion leads :)

What you are proposing will not work in all cases as the device instance id found in the device name, is not really related to DLCI number.

The main idea with the solution is to decorate the data with muxing frames before even sending it to the UART. Basically I am asking if it is possible to move the UART MUX API one layer up somehow... I get that this is not a trivial change. And ofc needs to be in a much neater and thought through way than my hack :)

If you want to experiment a bit, you could change the uart_poll_out() call to uart_fifo_fill() and try if it works better. The documentation says that function is only to be called from ISR handler if uart_irq_tx_ready() returns true. Result of calling this function not from an ISR is undefined (hardware-dependent). So it is not a proper solution but worth experimenting.

I did try a naive swap which did not work but will for sure do some more exploration in the future :)

So my main issue is solved and you can close this issue. It will be a while until I get some time to do something better than my hack workaround :)

Thank you for your time :)

For a proper solution to this issue, I would recommend that the modem API is converted from polled UART API to interrupt based API. Then the uart_fifo_fill() could be used instead. Anyway, this is a bigger change.

Was this page helpful?
0 / 5 - 0 ratings