Nodemcu-firmware: onewire used to be more solid

Created on 30 May 2017  ·  12Comments  ·  Source: nodemcu/nodemcu-firmware

background information

I have a product (a thermometer) with 1-wire network of DS18B20 sensors, based on NodeMCU firmware that was originally built on 2016-05-28 from the dev branch. It worked well for several customers.

In February I updated the firmware in my thermometer to version built on 2017-02-02 from the master branch. Suddenly the 1-wire communication of one customer's setup started to misbehave. The higher temperatures there were the more communication errors appeared. Somewhere above 80 ℃ the commucation broke completely = onewire search didn't find any connected sensors.

I admit he's got a large network of sensors (10+) with complicated wiring but his rather logical argument was that "it worked before you flashed new firmware".

commit

It took me lot of time and effort (shipping devices back and forth with the customer, with various changed Lua files first and then NodeMCU firmware versions) to trace it down to the commit 04b86b80f6abac3d6e0bcc6ee99c4795407dd07e that changes the onewire.c implementation in two crucial things: driving bus high and timing.

Unfortunately I don't know which of these two changes break the network in the customer's case but I'd bet it's the driving bus high and not the timing (judging from the temperature based misbehavior).

diff

The two onewire.c changes are (if I read the git diff correctly) as follows:

Originally (=before 2016-06-28) writing '1' to 1-wire bus was implemented by DIRECT_WRITE_HIGH that pulled the pin high by CPU actively. Nowadays (=since 2016-06-28) the CPU just leaves the pin floating (using DIRECT_MODE_INPUT). I believe this is too weak even though I use pull-up resistor of 1800 ohms. I think the original strong pull-up by CPU made the 1-wire communication much more solid and stable in the case of large networks with complicated wiring.

not sure

As I mentioned above the timing of 1-wire reading/writing has been changed as well by several microseconds. I don't think this is source of the problem mentioned above but I am not 100% sure.
The customer is not excited to test any more firmware updates so I cannot rule the timing change out completely but I believe it's OK.

Naturally the only way to be sure is to bring an oscilloscope and analyze the bus errors in real-time but it's not easily possible so I thought I'd just document it here that the onewire is less stable than it used to be.

proposal

My proposal is to return the strong CPU pull-up to onewire_write_bit, at least optionally (similarly to the power flag).

All 12 comments

Thanks for the heads-up!

Some items are unclear to me:

  • Is your observation related to sensors in parasite power mode or are they all powered from an external supply at the VDD pin?
  • Are only the sensors exposed to high temperatures ("above 80⁰C") or also the ESP chip?
    I'd guess that the temperature correlation is caused by increased leakage currents - excluding the ESP would narrow-down the potential spots.

In summary I agree with your proposal that driving strong high levels (where possible) should increase overall stability.

For the records: The root cause of the issue fixed with https://github.com/nodemcu/nodemcu-firmware/commit/04b86b80f6abac3d6e0bcc6ee99c4795407dd07e is the implicit output enabling performed by GPIO_OUTPUT_SET() (which is used to implement DIRECT_WRITE_*). This contradicts the intended logic of the original onewire library code: output enabling/disabling is exclusively controlled by DIRECT_MODE_* macros, not by DIRECT_WRITE_*.

All sensors are powered from an external supply 3.3 V.

The ESP chip is not exposed to high temperatures (ESP stays in cellar while DS18B20 are on the roof in solar system and at many other places measuring mostly water pipes temperature).

EDIT: I have just realized that if he had an error in his long and complicated wiring (or inside one of the DS18B20 waterproof sensors - a cold junction?) that would cause issue when delivering the external supply to some sensors while they are exposed to a higher temperature (say that something moves by a tiny bit by temperature expansion) then it perhaps could cause similar effects? Just thinking out loud. It's hard to debug the wiring, I am afraid.

Anyway, my proposal stays: 1-Wire is not a short bus like say I2C that could rely on a pull-up resistor. The CPU should drive the 1-Wire pin HIGH actively.

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

FYI, I've got several more reports of OneWire issues that started after upgrading from old NodeMCU firmware to a new one. Some of them are really critical - the network either does not work at all, or sensors start disappearing after few hours and the only fix is to TURN POWER OFF, wait a bit and then turn it back on. No idea what's going on there electrically, it's too far from my place so I cannot inspect it using a scope. The only thing I know is that the network of sensors worked perfectly until the NodeMCU firmware got updated.

It would be great if there was an option in the OneWire driver of NodeMCU for turning on driving pin HIGH by CPU, not just by the pull up resistor (would be off by default so people with wrong bus wiring don't blow up their ESP8266, but desperate users could enable it and save themselves).

BTW, one would think that making the pull up resistor small enough would help. Unfortunately, while it can fix the logical High it will create issues with logical Low when the sensors have to pull the bus down and sink a lot of current going via the small pull up that makes them hot (so e.g. temperature readings are way off). So it is much better to keep the pull up resistor's value sane and just use the MCU to drive the High bits strongly.

AFAICT, there is already some support for the NodeMCU driving the bus high; see app/modules/ow.c's ow_write's power parameter, which is passed to app/driver/onewire.c's onewrire_write_bit via onewire_write. Is that not sufficient?

onewire_write_bit() is implemented properly (i.e. if the power flag is set then it uses active driving of the bus).
All functions that call onewire_write() do the right thing - pass the owDefaultPower into the called function.
But the onewire_write() ignores the input parameter power for all bits but the latest one and calls the onewire_write_bit() with power = 0 thus disables the active driving of the bus.

This is actually correct because it serves as a power source for parasitically powered devices that sink the energy after the data transfer from master finished. But it could be extended easily to restore the optional active driving of the bus like it used to be before 2016/06/28.

My proposal is to:
1) fix the onewire_write() to pass the owDefaultPower for all bits but the latest one to the onewire_write_bit()
2) make the owDefaultPower a variable (currently it's a macro constant) that could be set by a call similar to depower() (perhaps empower ? just kidding :-))

I can prepare a patch for the onewire.c and perhaps also for the ow.c if you give me some guidance how the API extension for setting/getting the owDefaultPower state (it's a boolean basically) should look like.

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

I apologize if my not offering API guidance has caused this to fall on the floor. I don't believe we've changed ow meaningfully in the interim, and it sounds like this is still a problem?

What do you have hanging off your 1W bus that's causing problems? My understanding was that most (high-current) devices had an optional power pin; could that be used rather than sourcing current from a ESP8266 GPIO pin? If there's truly a need to drive the 1W bus high, tho', let's just do it.

Please note this has never been about devices missing power. This was always about the bus not working correctly because without the strong pull up by the MCU (ESP8266 in this case) the signals on large and split networks with many devices got distorted so much that the communication was falling apart.

In the meantime I learnt how to live even with the weak bus, I just had to use the pull-up resistor of much lower value: I use just 1000 Ohm (1k) one currently. This in turn affects all measurings because when a slave device (say DS18B20) needs to pull the bus down (to signal zero bit) it has to sink the current - 3.3 mA in this case - and that causes it to warm up. That's why it's not a good solution - strong MCU/ESP pull up by would be better.

On the other hand, strong MCU/ESP pull up is dangerous when there's a short to the ground on the bus - that can kill the ESP instantly. So I'd leave the current implementation as-is and would let this issue closed by the stale bot.

I got curious, so I did some reading. Forgive me if this is all old hat to you. https://dutta.csc.ncsu.edu/csc453_spring16/wrap/1-Wire-Design%20Guide%20v1.0.pdf strongly suggests against using anything but a linear topology without DS2409 hubs (but the DS2409 is no more; https://www.maximintegrated.com/en/design/technical-documents/app-notes/4/4930.html?sisint suggests replacement designs). Moreover, it suggests that pullups are typically between 1Kohm to 4.7Kohm.

But at the end of the day it comes down to achieving sufficiently high highs with good slew rates, and so is a matter of analog magic that's somewhat beyond me... and so, in my ignorance, I wonder if it would be possible to safely drive even a shorted 1W bus if we had an appropriately sized resistor between the ESP8266 GPIO and the bus itself?

For hairy 1-Wire buses like you've got, should we be suggesting that people don't drive it directly off the ESP but use specialized drivers like the DS2482 (https://datasheets.maximintegrated.com/en/ds/DS2482-100.pdf) which have dedicated circuitry for actively pulling up the bus?

I'm afraid that a serial resistor together with capacitance of the bus itself forms an RC filter that could make things even worse. Just my feeling, haven't spent any time with oscilloscope on it. But it would certainly made the strong pull-up by MCU safe against shorts to ground. The resistor would have to have a value of at least 270 ohms, ideally not more, so let's say exactly 270 ohms for ESP8266.

Suggesting using specialized/HW drivers is always a good idea but I am happier with the software implementation of the OneWire protocol because it allows me to add even my own devices (I have developed a number of OW devices for measuring all sorts of things, not just temperature). As for the dedicated circuitry for active pulling - the exact same thing could be implemented in NodeMCU as well - the ow driver would just take an additional parameter for the pin number where users could attach the external MOSFET for driving the bus.

This could be the best "fix" for this issue - instead of returning to error prone direct bus driving the NodeMCU could offer the eventual external MOSFET driver for advanced users with hairy buses :-)

Patches definitely considered! :)

Was this page helpful?
0 / 5 - 0 ratings

Related issues

ShAzmoodeh picture ShAzmoodeh  ·  6Comments

joysfera picture joysfera  ·  5Comments

marcelstoer picture marcelstoer  ·  4Comments

fsch2 picture fsch2  ·  7Comments

vsky279 picture vsky279  ·  7Comments