Using the old toolchain the latest ESP-IDF master, I'm able to get Sample1 working... when the log level is Debug or Verbose. When the log level is Info, the custom characteristic doesn't actually appear through the nRF Connect app, but the device is there and named correctly.
For obvious reasons, I'm having trouble debugging this, since the issue appears to be timing dependent, and the log statements slow the code down just enough to avoid the issue.
Possibly related to https://github.com/nkolban/esp32-snippets/issues/82 or https://github.com/espressif/esp-idf/issues/857
One thing we could try is to introduce "slowness" at various points in our code. We can selectively enable/disable logging dynamically in the apps. For example, enable logging at the start and disable logging 1/2 way through. Using this "divide by 2" approach, we might be able to quickly locate the "area" where the timing conditions are becoming problematic.
I am not saying by any stretch of the imagination that there isn't an error in my usage of the ESP-IDF low level BLE APIs. The only clue I have is that all worked at one build of the ESP-IDF and stopped working at another ... but if I was doing something wrong in the older build, it might have been tolerated where now it isn't. But wrong is wrong ... and I need to snipe hunt it down.
Well hmm ... I tried to recreate the issue ... and in the end did ... however I'm troubled.
First, I built the source with 100% debugging enabled, ran the app and used nRF Connect and saw the exposed service as expected.
Next, I compiled the code but switched off debugging within the application using
esp_log_level_set("*", ESP_LOG_ERROR);
I re-ran the test and even though no log output was appearing in the console, nRF Connect continued to see the exposed service.
Finally, I switched off debugging inside make menuconfig, re-compiled and re-ran and NOW we don't see the exposed service through nRF Connect as you described in the original report.
Ouch!!! This is going to be a royal pain to diagnose. If I switch on debugging (even with no-output), we work but if I switch off the code generation for debugging, we fail. This is (to me) a timing issue ... the only way we are going to solve this one is by testing with manual injection of delays until it works ... this is going to be slow. Oh well ...
As an added note, with Log Level set to Verbose, I'm trying to add 5 characteristics, each with their own unique UUID. Each characteristic is initialized to an 8-byte NULL sequence with:
BLECharacteristic *characteristic = service->createCharacteristic(
UUID, perms
);
char value[8] = {0};
characteristic->setValue((uint8_t*)&value, 8);
Interestingly, only 4 characteristics show up at any given time. I can identify which characteristic is missing using nRF Connect and disable one of the ones that is currently working, and then the missing one will appear. I haven't been able to identify a pattern regarding how the missing characteristic is selected to be missing.
I think this is probably related to the timing issue noted above, I don't _think_ this is a new issue. So, for now, I don't want to create a new issue. When the current issue gets resolved, I'll have to see whether this other issue is resolved as well.
Perhaps even more interestingly, I just tested with Log Level set to No Output, and I still get 4 out of the 5 characteristics... hmm.
How did you set log level to no Output? At compilation time or at run time?
at compilation time in "make menuconfig"
When you run your tests, do you see the following error?
https://github.com/espressif/esp-idf/issues/1077
Can you also confirm that you are running the latest ESP-IDF version (master from Github?). I want to verify that we are running the same development base.
For our records, here is a pastebin of a trace with logging enable that works ...
My purpose in capturing this is to replace the logs, one at a time, with delays to see what I can find.
Some important progress ... I resorted to a horrible test scenario ... one I was hoping to avoid. We know that the apps work with Logging enabled and don't work with logging compiled out. So with logging enabled, I went through each log output in turn generated by the app as shown in the output and commented it out. I then re-built the app and re-tested.
I must have gone through about 20 iterations before I found a big-ole-clue.
There is a function called:
BLEService::executeCreate()
Its purpose is to create a GATT server advertised service. At the beginning of the function there is a log statement. When I commented it out, we fail (i.e. no characteristics shown) but if we leave it present, the characteristics remain. I demonstrated this to myself a few times.
Now what remains is to focus attention there and see what the story is. I need to look at the immediate preceding calls and subsequent calls. However, that will be tomorrow night's task as it is end of day here.
Updated: 2017-10-05:
After the work last night, I thought the resolution for this was going to be easy. Here is the start of the function called "executeCreate" in BLEService:
void BLEService::executeCreate(BLEServer *pServer) {
ESP_LOGD(LOG_TAG, ">> executeCreate() - Creating service (esp_ble_gatts_create_service) service uuid: %s", getUUID().toString().c_str());
m_pServer = pServer;
...
Pretty standard stuff. I found that if I removed the debug statement, no service would be found while if I left the debug statement in, the service would be found. Ok ... sounds like a race condition. So I commented out the debug statement and replaced with a sleep for 1000msecs ... just as a test. When I ran it, we delayed for a second as expected ... but there was no service found. I put the debug statement back in, tested again, and the service was found.
This made me think that what we were hunting here may not be a race condition.
My next test was to leave the debug statement in but remove the parameters to it. I recompiled and re-ran, and again no service was found. Aha!!! This would seem to say that something in the evaluation of the parameters is causing us problems. Ok ... so now let us focus on the parameters. It is a string print of:
getUUID().toString().c_str()
With the debug commented out but the above as a source line, I recompiled again. This time ... with no debug but this line ... the service was present.
I changed the statement to:
getUUID().toString();
retested and still the service was present.
Finally I ended up with just:
getUUID();
and still the service was present. I removed that line ... and no service!!!
OOooh!! Ok ... so what have we here?
the getUUID() returns an instance of BLEUUID.... and that's it.
Its late and I'm done again for this evening ... but I'm not understanding what is going on yet. What ever is happening is not a race condition.
Offtopic, sorry Neil, but it is funny. I thought it may be
getUUID().toString().c_str()
but i didnt expected it can be only getUUID().
I'm truly perplexed on this one. Its causing me a lot of head scratching. I've never seen a bug quite like this one. Obviously, it isn't local environment specific (i.e. my system isn't the only one that is broke) because it is being reported by mr @coder543 . Because it didn't manifest with debugging enabled, it was pure brute force to get to this point where we can enable/disable the bug with a one line change (and a lot of luck). I'm not anywhere close to being satisfied with what would appear to be a workaround (i.e. coding a do-nothing call to getUUID()). Friday is a working day for me on my job but Sat/Sun are coming soon and I'll be able to devote contiguous hours to the study. However I have no "science" to yet apply ... I feel like an alchemist who will try and produce gold by mixing different chemicals together "just to see what will happen". My hope is that as I poke it more with a stick, something will show up ... but I'm not confident. I'll be looking for colleagues like yourself who would be willing to invest a little time to hop on Skype with me just so you can watch for 10 minutes while I recreate the problem and show you the symptoms in the hope that:
1) You can see something I'm missing?
2) You can think up new explanations and tests?
3) You can recreate on your system and we can parallelize tests and theories?
Im short of time this days, a lot work at my job, but if you have test code i have spare couple hours and i can do some tests.
Next session I get on this puzzle I'll put together a "goody package". Simple instructions, a sample app and a description of how to reproduce. I'll then post a link to that in this issue. The only think I won't supply is the other end of the BLE story ... I'll assume that a tester has access to an Android phone running "nRF Connect". But that of itself might be a good test ... for example using a different peer tool to see if that plays a part ... for example "bluetoothctl" or "gatttool" on Linux.
I will try with SampleServer from BLEtests with custom service/characteristic UUIDs. The rest i need is described here. (I think)
Its problem with adding 5th and next characteristics to service:
D (1467) BLEDevice: gattServerEventHandler [esp_gatt_if: 3] ... ESP_GATTS_ADD_CHAR_EVT
D (1475) BLEUtils: GATT ServerEvent: ESP_GATTS_ADD_CHAR_EVT
D (1481) BLEUtils: [status: ESP_GATT_ERROR, attr_handle: 0 0x00, service_handle: 40 0x28, char_uuid: 00005234-0000-1000-8000-00805f9b34fb]
D (1493) BLEServer: >> handleGATTServerEvent: ESP_GATTS_ADD_CHAR_EVT
D (1500) BLECharacteristic: >> setHandle: handle=0x00, characteristic uuid=00005234-0000-1000-8000-00805f9b34fb
D (1510) BLECharacteristic: << setHandle
D (1514) BLEServer: << handleGATTServerEvent
D (1518) BLECharacteristic: << executeCreate
D (1522) BLECharacteristic: >> executeCreate()
D (1527) BLECharacteristic: Registering characteristic (esp_ble_gatts_add_char): uuid: 00006234-0000-1000-8000-00805f9b34fb, service: UUID: 4fafc201-1fb5-459e-8fcc-c5c9c331914b, handle: 0x28
D (1544) FreeRTOS: Semaphore taking: name: CreateEvt (0x3ffe3ca8), owner:for executeCreate
D (1553) FreeRTOS: Semaphore taken: name: CreateEvt (0x3ffe3ca8), owner: executeCreate
E (1561) BT: btc_gatts_arg_deep_copy 8 no mem
D (1566) BLEDevice: gattServerEventHandler [esp_gatt_if: 3] ... ESP_GATTS_ADD_CHAR_EVT
D (1574) BLEUtils: GATT ServerEvent: ESP_GATTS_ADD_CHAR_EVT
D (1580) BLEUtils: [status: ESP_GATT_ERROR, attr_handle: 0 0x00, service_handle: 40 0x28, char_uuid: 00006234-0000-1000-8000-00805f9b34fb]
D (1592) BLEServer: >> handleGATTServerEvent: ESP_GATTS_ADD_CHAR_EVT
D (1599) BLECharacteristic: >> setHandle: handle=0x00, characteristic uuid=00006234-0000-1000-8000-00805f9b34fb
D (1609) BLECharacteristic: << setHandle
D (1613) BLEServer: << handleGATTServerEvent
D (1618) BLECharacteristic: << executeCreate
D (1622) FreeRTOS: Semaphore taking: name: StartEvt (0x3ffd7fe8), owner:for start
D (1630) FreeRTOS: Semaphore taken: name: StartEvt (0x3ffd7fe8), owner: start
D (1638) BLEDevice: gattServerEventHandler [esp_gatt_if: 3] ... ESP_GATTS_START_EVT
D (1645) BLEUtils: GATT ServerEvent: ESP_GATTS_START_EVT
D (1650) BLEUtils: [status: ESP_GATT_OK, service_handle: 0x28]
I dont know if something been changed in BLE stack, but i can see characteristics (still max 4) with log level ERROR setup in menuconfig.
I just checked the logs over here, and I'm seeing the same error message @chegewara is seeing on my 5th characteristic. Somehow I didn't notice this before.
So, I guess my issue of having more than 4 characteristics is a separate bug from the one this issue is tracking. I'll go ahead and open another ticket for it.
Yes indeed ... please do separate the concept of > 4 characteristics causing an issue from the notion of having to have debug logging enabled in order to see any services/characteristics.
ATM i cant recreate issue with log level causing not showing characteristics. I can see all 4 characteristics with log level VERBOSE and ERROR.
@chegewara have you tried using Sample1.cpp with different log levels? I can consistently reproduce it over here.
I still owe you a full written story to recreate plus sample artifacts. That is number one on my task list when my day job ends in about 4 hours and my weekend starts. Because this story is such a mysterious puzzle, we will need to spell out all the environmental components. In summary, and before I have written up the full story, the salient points are:
gcc version 5.2.0 (crosstool-NG crosstool-ng-1.22.0-61-gab8375a)
Output from:
$ xtensa-esp32-elf-gcc -v
Ok then, we are in different environment.
SampleServer just a little bit modified is my test app. I will try with Sample1.
Got it last night, thanks.
Ive found something, maybe this can help you. Still dont work like expected but:
I dont know if this is related somehow but with 2 and more services ping is over 1200 ms.
Got back to studying this tonight ... found that the following doesn't work:
void BLEService::executeCreate(BLEServer *pServer) {
m_pServer = pServer;
m_semaphoreCreateEvt.take("executeCreate");
But the following does:
void BLEService::executeCreate(BLEServer *pServer) {
char x[1000];
memcpy(x, &m_uuid, sizeof(m_uuid));
m_pServer = pServer;
m_semaphoreCreateEvt.take("executeCreate");
The pattern I seem to be seeing is that "accessing" the m_uuid data is causing "something" to happen.... but what!!!
Digging further, one more clue ... when changing the "make menuconfig" option: Compiler options -> Optimization level from "-Og" to "-Os" the problem seems to go away completely. There was nothing "thoughtful" in my discovery of this ... purely a guess.
The underlying SDKCONFIG flag appears to be CONFIG_OPTIMIZATION_LEVEL_RELEASE=y
At cpp_utils/BLEService.cpp line 75, you didn't set srvc_id.is_primary. So the services you created may be Second Service and will not show.
In release mode, srvc_id.is_primary will be initialed to a rand value, mostly not 0, so it works well. But in debug mode, it will be initialed to 0, causing the service not showing.
Please add this line and test again, thanks!
srvc_id.is_primary = true;
Looks like its it. Many thanks.