Riot: RFC: Use RobotFramework to write RIOT tests

Created on 24 Oct 2018  ·  42Comments  ·  Source: RIOT-OS/RIOT

overview

At the RIOT summit 2018 @MrKevinWeiss and I were hosting a breakout session on hardware in the loop (HIL) testing for RIOT. Apart from the hardware and tools Kevin developed, we also introduced a new style/way of writing tests for RIOT utilising the RobotFramework, and showed some examples on existing tests adapted to RF but also a completely new test for the I2C peripheral bus.

However, there was no consensus on how to actually move forward with using the RobotFramework to write tests for RIOT. We agreed on opening the discussion on this to a wider audience by providing initial PRs as proof-of-concept. These PRs can be found in #10095 as the initial introduction of RF-based tests to RIOT, and #10147 providing the I2C tests. The current implementation does not replace any existing tests nor does it change the test behaviour, but rather introduces a new make target, i.e., robot-test to run RF based tests, so both PRs are non-intrusive.

The purpose of this issue is to move the general discussion around using RobotFramework to write and run RIOT tests from mentioned PRs here and get more people involved. But also collect all the different arguments in one place, because some commented on the PRs, some voiced opinions in eMails, telcos and f2f meetings.

more details

As mentioned above, currently the integration of RobotFramework is non-intrusive and parallel to all testrunner scripts. Also RF-based tests are not run for every PR via Murdock, but we already hooked it into the nightlies - currently the test report only contains I2C tests.

But why consider using RobotFramework anyway? And what about testrunner?

RobotFramework is an active open source test framework that is around for several years and widely used by other projects. Under the hood it is based on Python and can be extended using Python or Java.

Tests are written in a RobotFramework specific format and syntax - that is obviously different from our current way of writing test with testrunner, which is basically Python+pexpect, right?
Certainly, if someone is used (and loves) to write Python code, RF-tests look and are (very) different. While plain Python allows for a larger degree of freedom to implement tests, it does not provide or enforce any structure to adhere to. In RobotFramework there are a limited number of central building blocks and concepts that are part of every test and are used repeatedly.

For instance, each test-script in RF defines a test-suite which contains one or more test cases, and each test case is a collection of keywords processed and evaluated sequentially. The test file itself typically contains several sections, like settings, variables, and most importantly test cases. Test suites and cases can have documentation and tags, the latter allowing for test selection during execution but also for evaluation. As an example look at a simple tests such as xtimer_hang.robot, which does the same as the corresponding Python script for testrunner.

In the aforementioned PRs I already started to create custom RIOT specific keywords (see riot_util.keywords.txt) that combine several existing keywords. Also there is a Python wrapper around the I2C HIL testing API which easily integrates with RobotFramework. This can and should be extended in the future to ease writing new tests and also keep them readable.

RobotFramework generates XML output for every test script, which can be stored and processed further. Therefore it provides extensive tooling around this output such as generating HTML from one XML file, but also allowing to combine XML output from many tests into a single HTML report or select test results (from multiple XMLs) eg. by tags to generate a smaller test report. Further, it allows to do process the test output from a previous run and rerun all failed tests.

discussion

Unsorted, incomplete list of issues raised so far:

  • From what was discussed and commented so far via different channels, one major issue with RF-style tests is its format and syntax of writing actual tests. (see details above). But then, is I like Python code better a good and valid argument in this discussion?

  • Another issue is, that the output format of RobotFramework is not that special, there are other standards too and such could easily be added to testrunner. That may be true, but in RF its already there, no work needed.

Comparison with PyTest (NEW)

Test | Robot Framework | Pytest
-----|------------------ | --------
CoAP | registration, conf. retry | registration, conf. retry
. | report, log, xml| report, xml
I2C | periph_base, write_register| periph_base, write_register
. | report, log, xml | report, xml

links

finally

please comment, ask questions ...

CI tests tools help wanted RFC don't stale new feature

Most helpful comment

As one of the strongest voices agains RF so far, I have to admit that seeing them both side-by-side in input and output I prefer RF over pytest.

The test description language is far more concise (but as extendible as normal python... even more as it also allows for extensibility using other languages). It allows for more focussed test description and configuration and when taking comments like https://github.com/RIOT-OS/RIOT/pull/10431#issuecomment-440311508 into account we won't get around something like that in the end. pytest on the other hand is just python code and if the test implementor is not careful it can (easier) get cluttered with (more or less) unrelated stuff or duplicated code. I would however suggest, that we do not re-implement test features that some python libraries already implement quite well as done in https://github.com/RIOT-OS/RIOT/pull/10095 (many of the keywords there for instance should have been done with a pexpect wrapper IMHO as pexpect provides some more functionality for output checking than just regex parsing).

Moreover, as far as I can see RobotFramework as a more native integration for running tests in other languages (see https://github.com/RIOT-OS/RIOT/issues/10241#issuecomment-433104183), while pytest only supports python.

The output of RobotFramework is more detailed (and we didn't find any way to my current knowledge to get pytest to do the same) while at the same time not cluttering the rendered output with information unnecessary for average contributor.

All 42 comments

This sounds like a step in the right direction. Not reinventing the wheel should be a priority, as long as there is a good open source alternative to creating our own tools.

question by @kaspar030, copied from #10095

it ships with some adapted tests.

Did you try modeling our release specs with the robot framework?

I'm working on it ...

I'm curious how compilation/flashing/handling of multiple nodes would look like. I remember finding it not trivial to implement a generic way to e.g., ping the link-local address of the 802154 interface of node A from node B without knowing it beforehand, or testing node A with examples/gnrc_networking compiled with "GNRC_SETTING_FOO=whiskybar", node B with "GNRC_SETTING_FOO=milkybar".

The problem with that is proper separation of functionality, meaning: what should be done/handled where. For instance for a simple test one would invoke (robot) tests by running

BOARD=<name> make -C tests/<name> all flash robot-test

In this case, compiling and flashing is handled by the build system and thus out-side of robot test scripts. Which to me makes sense, because if any of the targets before robot-test fail (i.e. compile or flash) make will abort with an error. No need to (re)implement error handling in some test script.

This also allows for running the test without compiling or flashing, if the binary is already there and the board was flashed before. For instance you run I2C tests multiple times just because you messed up the wiring and to get it right, still no need to reflash every time. Also note: for the I2C tests (#10147) I added a check in the test script that verifies the correct binary is present on the board, otherwise it would fail right at the beginning.


But your question is still valid: how to run tests with multiple nodes?

Again, this needs proper definition what do we want to test and achieve, because there are many different aspects on this which involve hardware setup, too. For instance, (IMHO) one could easily test with two RIOT native instance on a single Linux host running ping between one another, or send pings from RIOT-native to the Linux host. The latter could also be done on a real board connected to RaspberryPi with the OpenLabs transceiver to run the test.

Where it gets difficult is to run such tests between two boards, which might also be connected to different Linux hosts. But even with a single host, this already raises problems when trying to flash the boards, i.e. discover which boards are present and also which device-serial matches which tty-serial.

So there are parts that could be done via make or test-scripts, and others that need new/extra tooling - or fixes/enhancements to existing stuff.

But your question is still valid: how to run tests with multiple nodes?

Not sure if it's that related, but I've been doing that for Release Specs tests. Basically I have a python script that aggregates the information of all nodes.

Check the branch here (I plan to push the reduced version for this Release so at least it's easier to run these tests :)

I see some reasons for taking this direction:

  • It's standard and well known (we don't reinvent the wheel as mentioned by @gebart )
  • Different tests approaches out of the box (Behaviour, Data and Keyword driven)
  • Excellent Reports generated by Robot Framework

Also, it's easy to learn how to write tests and run them in RF.

Also, it's easy to learn how to write tests and run them in RF.

others tend disagree, I heard 😉

others tend disagree, I heard wink

Actually it's one of their selling points. E.g here or here

Maybe it is because I'm coming from an imperative programming world, but I find it quite clunky to write tests with RobotFramework. Take for example the test respective scripts for tests/xtimer_hang:

import sys
from testrunner import run


def testfunc(child):
    child.expect_exact("[START]")
    # due to timer inaccuracies, boards might not display exactly 100 steps, so
    # we accept 10% deviation
    for i in range(90):
        child.expect(u"Testing \( +\d+%\)")
    child.expect_exact("[SUCCESS]")

if __name__ == "__main__":
    sys.exit(run(testfunc))

vs.

*** Settings ***
Documentation       runtime test to check if xtimer runs into a deadlock.
...                 It is not about clock stability nor timing accuracy.
Test Setup          Reset Application
Test Teardown       Terminate All Processes    kill=True
Resource            riot_base.keywords.txt
Resource            riot_util.keywords.txt
 *** Test Cases ***
xtimer_hang
    [Tags]  xtimer
    Run Application     timeout=15 secs
    # keyword                               # pattern               # > number
    Verify Lines Matching Pattern Greater   *Testing (???%)         90
    Result Should Contain   SUCCESS

Sure, we put a lot of things away to a separate python library, but for me the python test is much clearer and I know what is to do. For most things in the robot file I don't even know what they do or if they are required (though I have to admit that it is nicer to read since it is closer to natural language). Sure, I can just copy-and-paste tests from other tests, but if I want to do more involved stuff I might not be able to do that without learning a whole new language first. With pure python this is easier, as it close to "normal" programming and when it comes to more complex functionality my usual line of thought is "some one already must have done that, google, are cool there is a lib, joink, write test, done". One example for this are tests for the network stack(s) which I plan to provide very soon, using scapy as my reference implementation as it allows for easy construction and manipulation of network packages and thus makes it really easy (and human readable!) to construct interesting corner-cases and cause potentially weird behavior.

(There seems to be an extension for robot framework with scapy, but I don't know the state of it and I don't know how hard it would be for me to integrate that)

The current make system just executes any executable in an application's tests/ folder.

Currently, there's python pexpect scripts. "testrunner" is just the cut out boilerplate code that is shared between python pexpect scripts.

IMO is would be nice to have those test scripts that are easier to implement using robot in those folders alongside the pexpect scripts we already have.

We don't really care about how tests work (as much as we care how RIOT code works), as long as there's an understanding that tests test what they advertise. If dev A prefers plain shell & grep and dev B prefers RobotFramework with a python backend lib, that should both be alright.

We don't really care about how tests work (as much as we care how RIOT code works), as long as there's an understanding that tests test what they advertise. If dev A prefers plain shell & grep and dev B prefers RobotFramework with a python backend lib, that should both be alright.

Yes, I guess I went a little to far ahead in that discussion and assumed, that we want to replace all tests with RobotFramework tests. I don't mind if e.g. the HiL tests are written in RobotFramework, another test is written with pexpect and yet another is using nodejs, as long as it scales with our CI and as long as there are enough contributors and maintainers (apart from the original contributor) have enough understanding of the test to check for their correctness. But we shouldn't enforce a certain style of tests if not all contributors are comfortable with it or don't have the time or leisure to learn the respective language to write a sufficiently good test.

But we shouldn't enforce a certain style of tests if not all _contributors_ are comfortable with it or don't have the time or leisure to learn the respective language to write a sufficiently good test.

That's pretty much what I meant. :)

Thanks to @smlng and @MrKevinWeiss for this work. In the interest of finding a solution, I posted a WIP PR #86 in Release-Specs to use pytest to run the CoAP specs. I think pytest would provide an evolutionary path forward to help with test infrastructure, reuse, and automation while also meeting some of the objections raised in this issue.

[Moved here from #10095, didn't belong there]

There are many strong arguments against running multiple test frameworks and test technologies in parallel. Agreeing on one framework and its technologies requires compromises. A careful selection of the tools might minimize pain, but in a community of a hundred people we cannot expect to find one test framework that is the favorite of all.

In an offline discussion today, we agreed that @smlng and @MrKevinWeiss will produce a side-by-side equivalent of the current Robot HIL testing using Pytest. This should make a comparison easy and qualified - and we can initiate a consensus call thereafter.

Thanks for the consideration, @tcschmidt and everyone. I agree that the comparison will be worthwhile to inform the decision.

Hi all, I took up the challenge and converted the pytest based CoAP tests by @kb2ma in the Release-Specs #86 to robot tests, reusing as much as possible of the code already present to make a fair comparison possible. The robot test code is here.

I reuse con_ignore_server as is and also the existing ExpectHost class ... with only slight renaming. The actual tests are put in two files as with the pytest, compare test_01.py and 01_cord.robot as well as test_02.py and 02_coap.robot.

I also added the generated html and xml output of both frameworks in my branch, so you can check it out and view, see pytest-output and robot-output. First you'll notice all tests succeed for both frameworks, but secondly you'll also find that the robot output is much more detailed showing all the test steps even though they were successful.

If you don't want to checkout the branch locally to view the xml files (because on GitHub you only get the raw content), you can compare them here:

The latter is the default output format of RobotFramework, this XML is also used to generate HTML reports and logs. However, for comparison robot additionally support xunit (similar to pytests junit).

I have to admit, that the more detailed output format is a big plus for RobotFramework (especially, since it hides all the verbose information in the generated report page and only gives you more information if you click on the respective links). @kb2ma do you know if there is an option to pytest that @smlng might have missed to get a similar result?

On the flip side I'm a little bit confused when looking at the robot files: I'm not really sure where it takes the definitions from. E.g. How do they know where to take the definition for ExpectHost from (from what I read it rather should be included with ExpectHost.py first?). They just appear, but no hint where they are from. I have the hint of the idea, that it just implicitly imports <classname>.<classname>, but I'm not sure. Also if that is the case, this might lead to a lot of code duplication, but I'm not sure here either.

The imports in RobotFrameworks are handled similarly to standard python, i.e. where you would use import <lib/package> in python you say Library <lib/package> in RF. The package needs to be either in the same directory as the test script or reachable via PYTHONPATH. As far as I know, there is not equivalent for the from <lib> import <that> syntax for RF.

Further if the package include a Class of the same name you can instantiate an object by passing required arguments when importing the library in the test script, that's what's done in the tests scripts. Also you can create several instances of that class if needed and give them names to access the proper handle later on. There are more ways to utilise python packages in RF, but in the end RF tries to discover keywords in these packages which are then accessible to write tests: more info on that can be found in the RF manual here.

What I don't understand why and how this might lead to code duplication? In many cases its fairly easy to integrate existing lib into RF, maybe a small wrapper is required, but code duplication should be avoidable in most cases.

What I don't understand why and how this might lead to code duplication? In many cases its fairly easy to integrate existing lib into RF, maybe a small wrapper is required, but code duplication should be avoidable in most cases.

I wasn't aware that they can be in PYTHONPATH. I assumed them to be required in the same directory as the *.robot file.

For python packages they need to be in PYTHONPATH, but you can also specify a RF specific resource path where to look for e.g. common keyword files. That's what I also do in my RF PRs for RIOT and the make integration, see here, see -P <path> parameter. It adds several directories under dist/robotframework where robot should look for additional python, keyword, and other robot files that might be imported either using Library or Resource in the Settings section of a test script.

That should avoid most of the code duplication and allow for reuse of custom RIOT-specific keywords which are introduced when writing tests.

@smlng, thanks for this comparison! It really helps to look at the test definitions side by side. I am not an expert at pytest, so would appreciate any ideas from others with more experience. My reactions below.

01-cord.robot

Straightforward to map to the pytest version.

02-cord.robot

  1. Is it possible to make use of the gcoap example CLI into a library or reusable file? In pytest this is in the common conftest.py, so the gcoap_example CLI fixture can be reused. This way we would not need to repeat Test Setup and Test Teardown in each test.

  2. I find use of a separate test_vars.py more difficult to follow for the definition of the regular expression.

  3. The format for "Run Keyword and Expect Error" would take some getting used to. I like the pytest.raises() implementation.

  4. What would it take to change the timeout value on a per-test basis? Currently defined as 100 for both tests.

Reports

Yes, the detailed Test Execution Log in Robot is nice. I don't know of a built-in way to add more detail in pytest-html. We could explicitly measure elapsed time and use regular Python logging statements in ExpectHost.send_recv(), with a result like the linked image. Definitely klunkier and more work than having the ability built in, but it provides a hide/show click for details.

I did find the pytest-json plugin, which provldes more timing information for setup and teardown, but not individual test steps. See the example, report-formatted-json.txt, which I pretty printed with Python's json.tool.

Pytest test parameterization

It took me some time to begin to build a mental model for this, to know where and how to add parameters. I'm still not sure about the best approach here. It is a nice feature though.

@smlng, thanks for this comparison! It really helps to look at the test definitions side by side. I am not an expert at pytest, so would appreciate any ideas from others with more experience. My reactions below.

01-cord.robot

Straightforward to map to the pytest version.

02-cord.robot

  1. Is it possible to make use of the gcoap example CLI into a library or reusable file? In pytest this is in the common conftest.py, so the gcoap_example CLI fixture can be reused. This way we would not need to repeat Test Setup and Test Teardown in each test.

Yes, we could wrap that in a custom keyword and then put it into Test Setup.

  1. I find use of a separate test_vars.py more difficult to follow for the definition of the regular expression.

You're right but that's the only way to directly pass python regex strings, i.e. r'<regex>' to RF keywords. I did it that way to ease comparison, i.e. using the same regex as your test, there are builtin patterns in RF which might be more handy and not that ugly to use, but again: for comparison I didn't want to change to much of your code.

  1. The format for "Run Keyword and Expect Error" would take some getting used to. I like the pytest.raises() implementation.

Well, that's the way of RF to handle expected errors - actually I find this with pytest.raises ... not really more intuitive, but that's personal opinion and should [not] be a blocker. There are always those not so nice things.

  1. What would it take to change the timeout value on a per-test basis? Currently defined as 100 for both tests.

Yes, easily. But therefore I would have to adapt your con_ignore_server.py slightly, and I didn't want to fiddle with that right now, again: to ease comparison.

Reports

Yes, the detailed Test Execution Log in Robot is nice. I don't know of a built-in way to add more detail in pytest-html. We could explicitly measure elapsed time and use regular Python logging statements in ExpectHost.send_recv(), with a result like the linked image. Definitely klunkier and more work than having the ability built in, but it provides a hide/show click for details.

I did find the pytest-json plugin, which provldes more timing information for setup and teardown, but not individual test steps. See the example, report-formatted-json.txt, which I pretty printed with Python's json.tool.

Pytest test parameterization

It took me some time to begin to build a mental model for this, to know where and how to add parameters. I'm still not sure about the best approach here. It is a nice feature though.

In RobotFramework you have so called data-driven tests, where you define a test-template and then just pass in values to test and verify - this could used for parameterisation but also fuzzing and so on.
For an example see my second I2C robot tests here.

I just posted a similar comparison for the i2c test with pytest #10448. I had to go through some trouble to make each step obvious. I think there is still a lot to learn and it can be done better (feedback appreciated).

My impression of pytest is that it does allow for a lot of customization but it will take longer to implement (meaning more code to maintain). I couldn't find a clean way to document things when everything passes. It would be nice if asserts could be resolved if they pass but neither RF or pytest seems to do that.

The question that I would ask myself when deciding this would be "Does Robot Framework support the features we need or will we need to customization that cannot be done easily within Robot Framwork?"
One example is the css of the html report on robot is included in the html and I am not sure if there is a setting to separate it. With pytest there is a separate css file unless specified.

Another nice thing is the pytest-regtest plugin for regression testing, I couldn't find a similar thing for robot but I didn't look very hard (and it isn't that difficult to implement).

@MrKevinWeiss why in a separate PR? Let's keep the discussion here in this issue.
It would be also nice to show inputs (Robot : Pytest), XML outputs (rendered), and HTML outputs (rendered) side by side to ease comparison.

The question that I would ask myself when deciding this would be "Does Robot Framework support the features we need or will we need to customization that cannot be done easily within Robot Framwork?"
One example is the css of the html report on robot is included in the html and I am not sure if there is a setting to separate it. With pytest there is a separate css file unless specified.

Well, separating out the css is a trivial task and can be done easily at the side. Similarly, the finegranular structure of the XML output of Robot allows for further, easy post processing by XSL-transforms - even to trigger further action of the testing. This is all work on the surface and does not interfere with the framework.

Another nice thing is the pytest-regtest plugin for regression testing, I couldn't find a similar thing for robot but I didn't look very hard (and it isn't that difficult to implement).

I guess this needs to be worked out - nobody has looked at regression testing yet. Maybe this helps:
https://www.researchgate.net/publication/268185107_Usage_of_Robot_Framework_in_Automation_of_Functional_Test_Regression

@MrKevinWeiss why in a separate PR? Let's keep the discussion here in this issue.
It would be also nice to show inputs (Robot : Pytest), XML outputs (rendered), and HTML outputs (rendered) side by side to ease comparison.

You can't make issues into PRs...

@MrKevinWeiss it would be good to have the same comparative display for #10448 .

Comparison

Test | Robot Framework | Pytest
-----|------------------ | --------
CoAP | registration, conf. retry | registration, conf. retry
. | report, log, xml| report, xml
I2C | periph_base, write_register| periph_base, write_register
. | report, log, xml | report, xml

NOTE: I suggest to delete all other comments that just give pointers to the inputs and outputs of the test frameworks. Also, I think this table should be added to the first entry in this issue, and this message should be deleted then.

As one of the strongest voices agains RF so far, I have to admit that seeing them both side-by-side in input and output I prefer RF over pytest.

The test description language is far more concise (but as extendible as normal python... even more as it also allows for extensibility using other languages). It allows for more focussed test description and configuration and when taking comments like https://github.com/RIOT-OS/RIOT/pull/10431#issuecomment-440311508 into account we won't get around something like that in the end. pytest on the other hand is just python code and if the test implementor is not careful it can (easier) get cluttered with (more or less) unrelated stuff or duplicated code. I would however suggest, that we do not re-implement test features that some python libraries already implement quite well as done in https://github.com/RIOT-OS/RIOT/pull/10095 (many of the keywords there for instance should have been done with a pexpect wrapper IMHO as pexpect provides some more functionality for output checking than just regex parsing).

Moreover, as far as I can see RobotFramework as a more native integration for running tests in other languages (see https://github.com/RIOT-OS/RIOT/issues/10241#issuecomment-433104183), while pytest only supports python.

The output of RobotFramework is more detailed (and we didn't find any way to my current knowledge to get pytest to do the same) while at the same time not cluttering the rendered output with information unnecessary for average contributor.

But why consider using RobotFramework anyway?

I must say that I'm still missing some context. What exactly are we trying to improve/optimize?

@kaspar030 well, as you may have heard, there is a larger ongoing initiative to put RIOT testing onto some systematic, professional ground. Part of this initiative is the work on HiL-testing, which was discussed at the RIOT summit and its progress was presented on mailing lists and online: https://ci.riot-os.org/nightlies.html. Another part of this work is to introduce a good, solid, standard test framework.

As you may have further taken notice, this already led to a series of critical bug discoveries and repairs for the I2C rework and the latest release. Does this explain the context?

Another part of this work is to introduce a good, solid, standard test framework.

We don't introduce a framework in order to have a framework.

As you may have further taken notice, this already led to a series of critical bug discoveries and repairs for the I2C rework and the latest release. Does this explain the context?

The I2C testing is awesome, but it doesn't at all require a "testing framework". After all the python abstraction that Robot Framework requires, the I2C tests could've been expressed as pure-python scripts in half the lines, with the tests producing the same useful results.

I don't want to argue against introducing a testing framework. I'd just like to see proper reasoning.
We've already established that writing the test cases themselves is highly subjective in terms of preferred languages. Currently we're converging towards "framework X has nicer output", which sounds somewhat irrelevant to me.

You're claiming that by using a "good, solid, standard test framework" we're putting testing on "systematic, professional ground". How? Why is it currently unprofessional or unsystematic, and how is a framework changing that?

What are the functional requirements of a testing framework for RIOT? How is what we have not fulfilling those?

For me the question is not about "do we need a test framework" but "how do we do nightly tests properly". The current approach of just running it on Murdock has a lot of drawbacks -- not to mention that it is somewhat of a break of separations of concerns -- and going for an industry standard solution (whatever it may be) would not be a bad approach.

We've already established that writing the test cases themselves is highly subjective in terms of preferred languages. […]

I mentioned that in my assessment: A solid test framework should be able such a case. In the case of robot framework I see a possibility for that. On the flip-side (the current approach): just executing anything executable in a certain directory seems somewhat dangerous to me (I'm already facing problems on my local machine with that approach whenever Vim or Git create a temporary copy of files in tests/*/tests/).

I don't want to argue against introducing a testing framework. I'd just like to see proper reasoning.

You present these elementary questions very late in the discussion, @kaspar030.
In short, the reasons for selecting a common, professional open source testing framework are:

  1. Have a well structured, well tailored layer to define and parametrize tests.
  2. Have a well structured result format that open grounds for further processing.
  3. Have a decent, informative presentation layer of test results.
  4. Have a tool that performs these tests automatically in a structured way.
  5. Rely on an informed community that advances this tool.

With the introduction of HiL testing, we gave a first example by using Robot.
Prior to that, none of the above had been around.

TL;DR; My advice if you do not want to read all:

Whatever the direction you go for the test framework, please please please try to provide the interface with a node in a python only library. Then use/wrap it in robot framework or use it directly in python tests. This means that any change in the future in a direction or another does not cost re-implementing everything. Also, it will be usable outside of tests context.


One thing to consider on the implementation approach. Would any of you write the automation for benchmarking you do for a research paper using RobotFramework ?

Why am I asking this. In many of my tests implementation, one part of the layer between tests assert this == expected, and my system under test, ended up, not in the tests directory but, after some refinement, in the actual code directory and was directly usable.

serial_aggregator was taken out of tests scripts, having a notification that A8 node booted in your experiment log was a part taken out of my tests directory, a client for a server I was implementing for armour test project was the result of tests implementation. The IoT-LAB gateway reboot system for releasing new distribution for testing ended up being an administration tool to deploy updates on the prod system.
The IOTLAB_NODE wrapping was the result of running tests, adding file that you can configure to modify the build system configuration, was the result of letting Kevin test on his HIL nodes from his machine.
The compile_and_test.py script is currently outside and did not yet really modified things in RIOT, but is in my short term goals to address it.

One of the issue we have right now, is that the abstraction for a node, is a serial expect stream. Only.
Which makes tests writing quite verbose and not re-usable. And even right now, tests wait for SUCCESS for timeout when FAILED was already printed.
For me, the interface is part of the issue, not the language we are writing it in.

Whatever in which direction you go for the test framework, please please please try to provide the interface with a node in a python only library. Then use/wrap it in robot framework or use it directly in python tests. This means that any change in the future in a direction or another does not cost re-implementing everything. Also, it will be usable outside of tests context.

When comparing tests framework, please do not compare the html output page results…
I ran tests with only the jenkins Junit output for 5 years, which is not shiny, but you only need to list what failed. And I almost never looked at it. As long as the test result format is standard, you can format it in the format you want.
Also we can already have xml output without changing any tests implementation in RIOT, so it is not a selling argument.

Question about a comparison too. In which testing context did you find that the syntax for C/python unit tests was not good enough and RF would have been better ?

At least for me, whenever I needed to code tests around error cases or for real unittests (not integration), I was really really happy to have the full language power for mocking or introducing quick hacks without having to have a high level interface.
This also made it hard for me to migrate from the Junit setup/teardown wrapping to using fixture as I started by the other one and that was harder to adapt to the overhead of declaring a fixture per different test environment.
It is more the "normal flow" tests that managed to result in reusable code for me, but having the possibility of doing both was great.

BTW did no-one find surprising that our embunit tests are wrapped as TestFixture ?

please try to provide the interface with a node in a python only library

:+1:
This is why we have the riot_pal (available as pip package). It is a python interface that is used by the current tests/periph_i2c/tests/test.py (terrible naming I know) and RF and in the pytest PR.

_I have an ongoing but seemingly low priority effort to create instructions how to utilize riot_pal and give guidelines on how to write tests for RIOT_

In which testing context did you find that the syntax for C/python unit tests was not good enough and RF would have been better ?

From my experience I liked how verbose robot framework was, it enforced a certain testing structure, it self documents (assuming the keywords make sense). Everything probably could be implemented with python but this implementation would then require maintenance and have it's own issues.

I understood that RF wasn't going to replace everything or be used for everything. I thought it would be used for the things that it was designed for (more for the integration testing rather then say, unit testing).

please try to provide the interface with a node in a python only library

+1
This is why we have the riot_pal (available as pip package). It is a python interface that is used by the current tests/periph_i2c/tests/test.py (terrible naming I know) and RF and in the pytest PR.

+1

We used RF based HIL testing extensively in a couple of HW startups I was previously involved with, with good results.

Some lessons learned:

  1. Designing and keeping in order the underlying Python library that implements the RF keywords is (was?) more an art than science. You may want to have a dedicated person for this, and make sure that you don't get multiple Python scripts / RF keywords that implement the same thing. For us, it turned out more useful to keep almost all of the logic in Python, and to expose only "high level" keywords to RF. Trying to have RF libraries that compose high level keywords from lower level ones was most of the time quite cumbersome. Sometimes it made sense, especially for HW engineers.

  2. It is quite easy to extend RF to do also other HIL testing but basic digital testing. In particular, we interfaced an oscilloscope with the RF and ran a number of tests overseeing the voltages and currents of the ToS while running specific tests. If there is any interest, I can try to dig (parts of) this code from the dust.

  3. When working with HW engineers or managers who don't speak Python, thinking about the expressiveness of the RF libraries is very important. It is possible, but not trivial, to get a level where the actual RF test descriptions are very descriptive and easy to understand for HW engineers and/or managers.

When doing this, it was much better to have declarative tests (value tables) rather than trying to have sequential test cases described in RF.

  1. We considered having a "standardised" serial line interface between the RF test and the ToS, so that the test can tell the ToS to run different tests and different points of time, but never got to the level of actually having a "standard." For us, the interface grew in an ad hoc manner and started to become gradually a little bit messy.

  2. For some test cases, keeping track of the real time between the RF and the ToS may be needed. Our solution for this was a "heartbeat" that the ToS transmitted over a pin to the oscilloscope, which then used that for triggering/synchronising the measurements. As a result, we were able to correlate the data from the oscilloscope with the phases of the test case.

Did we receive all feedback here? We should then proceed with #10095 and include the feedback where it belongs.

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. If you want me to ignore this issue, please mark it with the "State: don't stale" label. Thank you for your contributions.

Since we somewhat have a resolution (ie. the RobotFW repo) should we close it or leave it since it is not merged into RIOT and this would be more visible if left open?

We can close, I believe. It's not an open issue anymore.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

silkeh picture silkeh  ·  5Comments

jcarrano picture jcarrano  ·  7Comments

pietrotedeschi picture pietrotedeschi  ·  4Comments

jcarrano picture jcarrano  ·  5Comments

miri64 picture miri64  ·  3Comments