I will move a series of posts by @pnkowl from #3078 (which is discussion about the error handling architecture in the Lua runtime) here. These posts relate to documentation issues.
@pnkowl, thanks for your contributions, but can you please follow our guidelines, and avoid hijacking an issue or PR with off-topic discussions. New topic = new issue. Thank-you. We would normally delete such offf-topic posts, but in this case since this is a lot of content, I've moved them here.
3 days ago @pnkowl wrote:
node.startupcommand() added
several spelling fixes
Given the above discussion of error handling, would it not be appropriate to add additional clarity to the module reference documentation? When I read this,
Redirects the Lua interpreter output to a callback function. Optionally also prints it to the serial
console.should I not believe what is says? There is no fine print, no mention or reference is made to what really happens.
I did search for "telnet" in issues.
I believe I had already "_read_" this post (#3078), and it did not seem to apply -- that is to say, how could a module like telnet be offered without the caveat stated boldly that, in too many cases, errors will not be show?
How could node.output() be offered without some referece to this issue?
Now I certainly understand that fixing an issue is way more satisfying than documenting where things fall short of our expectations, but I believe that the documentation should accurately reflect real life behavior, and not what we wish or for what we have plans. How to avoid the tl;dr? Perhaps have the module/method documentation page contain "td;lr" link(s) to separate pages which contain the caveats?
BTW, I do not feel that pointing telnet and node.output() to this page satisfies the need. Telnet, for example, might benefit, by having a description of each class of errors and whether they are printed or not, and an example of each class. Alternatively, a companion pcall wrapper (executes the pcall, then manages the error handling/printing) could be referenced. I could take a crack at all of these in isolation, but the current style of docs does not cater to this kind of thing... so more experienced folks should guide...
This all being said, the difference between having telnet in its present form versus not having it at all, is day and night.
@TerryE wrote in reply
@pnkowl, time to do an ow-oooom or whatever. If you search the Issue and PR history then you will find 9 open and 57 closed threads which raise and discuss various aspects of this. Sorting out this sort of architectural feature is hard. Doing excellent documentation is hard. We (and this means mostly me in this case) have been sorting these step by step, and that involves at times up to maybe 8 hrs development effort a day unpaid and pro bono.
An open issue is where we collaborate and get feedback when identifying and fixing issues. The PR is where we post the change, and the "The code changes are reflected in the documentation at
docs/*" checkbox is there for a reason.In this case, the early implementations of output redirection were a bit of a botch. The runtime core handled redirection of
stdoutbutstderrwas just sent straight to the UART. Remember that using the net interface relies on heavy use of callbacks, so a lot of other features had to be put in place so thatstderrcontent could also be reliably sent to the redirector.This is community project, so all I can suggest here is that once the PR has been integrated into
dev, then review the documentation and if you feel that you could improve this then raise the issue / PR and do this yourself.
2 days ago @pnkowl wrote:
@pnkowl, time to do an ow-oooom or whatever. If you search the Issue and PR history then you will find 9 open and 57 closed threads which raise and discuss various aspects of this. Sorting out this sort of architectural feature is hard. Doing excellent documentation is hard. We (and this means mostly me in this case) have been sorting these step by step, and that involves at times up to maybe 8 hrs development effort a day unpaid and pro bono.
An open issue is where we collaborate and get feedback when identifying and fixing issues. The PR is where we post the change, and the "The code changes are reflected in the documentation at
docs/*" checkbox is there for a reason.In this case, the early implementations of output redirection were a bit of a botch. The runtime core handled redirection of
stdoutbutstderrwas just sent straight to the UART. Remember that using the net interface relies on heavy use of callbacks, so a lot of other features had to be put in place so thatstderrcontent could also be reliably sent to the redirector.This is community project, so all I can suggest here is that once the PR has been integrated into
dev, then review the documentation and if you feel that you could improve this then raise the issue / PR and do this yourself.
then review the documentation and if you feel that you could improve this then raise the issue / PR and do this yourself.
Summary
Whether it be dedicated volunteers or folks getting paid, rarely are things as we would like. I am okay with "a bit of a botch" as long as the core reference docs reflect the actual state of the system.
in the dev branch docs, the node.output() "caution" not to print within the callback, is gone.
in the master branch docs, the caution is present
should it be expected that print is allowed in the callback now? Based on what I see above, that seems unlikely (but what do I know)
when doing some pcall investigation, I had the urge to add a print within the callback, but being forwarned, I did not (thanks)
why take time to remove a caveat? Why remove a helpful thought that may lead to a useful connection?
if I were to be making PRs that have meaning to me, they would be to add more clarity, more hyperlinks, more examples for those that come next and have no pretentions of becoming experts.
before investing much time, it would be comforting to know that at least my sensibilities are aligned with the overall docs and development strategy. I do not want to be in conflict over this. At present, I do not get warm fuzzies that this type of PR would be kindly received.
Details follow (tl;dr)
So this is what I have proactively done as a result of the offered suggestion.
I decided that the PR should contain
an additional "caution" regarding stderr in node.output of the docs and perhaps a helper function utilizing pcall that would surface the errors with the least extra typing and complication.
a hyperlink to #3078 as the connection to more information
I then headed out to the docs (and started in master), did the fork, found the location to anchor the changes then realized dev is the place for PRs
So off to dev branch docs
Guess what?
- the "caution" block is gone
- the telnet example is gone
- reference to the telnet.lua exists with no hyperlink
With my limited skills I wade through History, Blame and Raw and can find no explanation as to why the "caution" block is gone (but my skills are limited).
This is not the only occurance of this type of removal in the node module.
Perhaps a print in the callback is allowed now. So
- I try "print in the callback" in master and it panics. So far, so good.
- I setup to try dev branch and telnet is broken (without LFS too), so I update issue #3108
But the basic question still remains. How do new folks (or any folks for that matter) attach the tips, tricks, wisdom, and clarity where it is most likley to create meaningful connections from problems to solutions, especially for those who are, in fact, not experts, when it appears that at least some of these very connections are being systematically removed?
@marcelstoer replied
You can never have too much documentation. So, any correct and meaningful additions are welcome.
it appears that at least some of these very connections are being systematically removed?
Thanks for bringing it to our attention but what makes you think they are _systematically_ removed? How about (documentation) bugs that just went unnoticed so far?
Git blame on
node.md(fornode.output()) leads straight to Terry's "Lua 5.1 to 5.3 realignement" PR #2836. It changed over a thousand lines of code and documentation.14c1b8f#diff-45fdccb9fe7b1f8ae504702b0a034d6e is the specific diff for the dropped admonition block. Whether the removal was intentional or not only Terry can explain.
@TerryE replied:
In previous versions of NodeMCU,
node.output()was calling was calling the Lua core recursively and there are internal health warnings not to do this, so it was a miracle that it worked at all. Now output spools to a pipe which is emptied on a subsequent task which is posted to empty the pipe. Actually printing inside a output CB does work, though this can easily cause a recursion loop and this will exhaust memory, so yes we probably should have kept the warning. Our challenge is that we have a team of 1 doing all of this work.
BTW I was referring to Tibetan OM chanting -- a chill-out joke.
2 days ago @pnkowl wrote:
@ marcelstoer,
what makes you think they are systematically removed?
Okay, so maybe those could be interpreted as "fighting words". Let me try again.
I see the core documentation as having 3 basic functions (1) to tell it like it is (2) to bridge from the past (3) to foster the interconnection of all the disparate pieces, all using the best technology available.
It seemed to me, that the dev branch documentation had greater brevity than master. This is not a bad thing in itself, but it seemed to me that important connections had been lost. It also seems that there were places where notes and cautions disappeared. Notes and cautions are the evidence that blood has been spilled and IMO should not be discarded without cause or more blood will be spilled in the future.
So to be objective about this, let me take a brief walk down through the node module
https://nodemcu.readthedocs.io/en/master/modules/node/#nodeoutput
https://nodemcu.readthedocs.io/en/dev/modules/node/#nodeoutput
Differences
node.input(). removed: Attention, This function only has an effect when invoked from a callback. Using it directly on the console does not work.
node.output() removed this Caution. Do not attempt to print() or otherwise induce the Lua interpreter to produce output from within the callback function. Doing so results in infinite recursion, and leads to a watchdog-triggered restart.
node.output() materially changed in a non backward compatible fashion
telnet example removed and a non linked reference added.
node.startupcommand() added
several spelling fixes
Please understand that I can be very directed in my activities. Although it seems obvious now that node.output is significantly changed, it was not obvious hours ago. It really should be made obvious.
Additionally, it would be helpful to have a set of actions required by end users when the new master appears, and for those attempting dev now. It is not the same old thing but will, if all is done right, possibly break your application. I have not looked very hard, but does this list exist?
Verdict: "systematic" now seems over the top given the evidence presented.
If requested, I could review all the docs, however, copy/paste/diff is likely not the most efficient way to accomplish this. That being said, this seemed to yield a much smaller set than I expected
It changed over a thousand lines of code and documentation.
Blame seems to have many more differences (as seen by the untrained), but that may be the nature of the beast (not side by side, but more a classic unix diff)
plus an addendum:
@TerryE
Our challenge is that we have a team of 1 doing all of this work.
So how do we change that?
Most of us have neither the skill, breadth of knowledge, nor the temperament to do the high value tasks. That being said, I believe there must be a path to a system state where a non trivial number of time consuming and necessary tasks can be carved out that mortals can do, without taking more of your time, hearding all the new volunteer cats trying to help, than is saved,
Where does a volunteer start? For example, I believe I know how to author a PR to put the cautions back and find the new telnet sample and make the link.
Going through this content and distilling it to the essence:
lua_examples/telnet have got out of sync and telnet.lua is called telnet_pipe.lua. This breaks the documentation links to lua_examples/telnet/telnet.lua and this needs fixing.telnet.lua _is_ documented as per our conventions in the lua_examples/telnet\README.md. Maybe we should move this to lua_modules.telnet.lua and the node.ouput() was changed to use the new Pipe interface, as was documented in the 2.2.1-master_20190405 release notes. Maybe we should add a "compatibility break" caveat to the node documentation.telnet.lua everything here is consistent. I have no control over use of obsolete versions outside this project repository. node.input() caution was removed is that the limitation was removed. Try node.input('return "This works"\n') and it works fine. (Though there is an issue that the line must be CR terminated.)node.output() was removed because this is no longer handled as a C stack iteration, but rather by adding to the stdout pipe, though yes this can out-of-memory so maybe we should keep it.So AFAIK, this thread points to a couple of minor documentation changes and one example file rename.
See closure of #3018
telnet.lua and the node.ouput() was changed to use the new Pipe interface, as was documented in the 2.2.1-master_20190405 release notes. Maybe we should add a "compatibility break" caveat to the node documentation.
A github search for "2.2.1-master_20190405" and did not find anything. I then did a google search and found this link
https://github.com/nodemcu/nodemcu-firmware/tree/2.2.1-master_20190405
Is this the page you were referenching?
I can't find any references to the obsolete version of the telnet code and as far as I can see apart from the name mix-up for telnet.lua everything here is consistent
telnet_fifosock.lua still in
https://github.com/nodemcu/nodemcu-firmware/tree/dev/lua_examples/telnet
I would give all the particulars, but should the example be in dev branch at all? So no reason to investigate the E:M below
`
print(wifi.sta.getip())
192.168.1.17 255.255.255.0 192.168.1.1
pnksrv=dofile("telnet_fifosock.lua").open()
Telnet server started (40712 mem free, 192.168.1.17)
=pnksrv
nil
after close of putty
=node.heap()
E:M 24592
`
I ask @nwf to have a look at whether we should remove the telnet_fifosock.lua as this still assumes that the node.output() CB takes a string and not a pipe.
Yea, given the existence of pipe, telnet_fifosock.lua and possibly fifosock itself should be removed. fifo is more generically useful, tho'.
I agree that fifo is generally useful and the module is also a great implementation example, but the pipe implementation has moved a subset of this functionality into C and its runtime resource use is less -- if your needs fit within its limitations, which they do in the telnet case. Let use proceed on this basis.
May I bring to your attention that https://github.com/nodemcu/nodemcu-firmware/blob/master/lua_modules/http/httpserver.lua requires fifosock.
Ok, maybe we need to check whether it still works and raise a separate issue if necessary. I also note that the post handler doesn't use hold/unhold to large posts will overrun the net stack.
PS. Forget the last sentence TLS will shag is first. 🤣