It seems other test module does not support parallelism, which seems the reason why it was so slow even on CircleCI.
I have two questions here:
Consider using parallel_test_core.py for faster iteration times but we don't really use it, right? I think runner.py already automatically supports parallelism for core tests.It seems test_emcc_multiprocess_cache_access is the only blocker on other?
PS: No, it anyway uses its own temporary cache directory so it shouldn't matter in my opinion.
Yes, that one should be ok.
It's the tests using EMCC_DEBUG that write to /tmp/emscripten_temp, the canonical temp dir. We could either not run those in parallel, or maybe let EMCC_DEBUG get a directory as the parameter.
It seems test_openjpeg and test_source_map in core tests are also manipulating EMCC_DEBUG, are they safe?
True, they are not. I don't think I've ever seen a failure from them running in parallel, though, so the statistical risk is very low. But if we had a way to move them to be non-parallel, we should do that.
Can we instead set EMCC_TEMP_DIR for those tests?
parallel_test_core.py is something that should now be now obsolete, @jgravelle-google wrote the new parallel harness for the core suite, which supercedes the earlier one.
If a tests needs redirecting the temp directory, using EMCC_TEMP_DIR sounds fine. It should be safe for multiple concurrent Emscripten executables to reuse the same temp directory, so this would only be for tests that for some reason need an exclusive access to the whole cache directory?
Several other tests call clean_write_access_to_canonical_temp_dir to clean temp directory. Some of them then checks the result by os.listdir(CANONICAL_TEMP_DIR) so they probably need an exclusive access, but others like the following probably shouldn't clean the directory?
I think that test uses clean_write_access because we have leak checks for files left around (I think it's optional and in runner.py in the tearDown) and EMCC_DEBUG would leave files that the check would hit.
Hard to understand why there are two ways to get temp directory. emcc.py uses EMCC_TEMP_DIR environment variable, or it uses shared.get_emscripten_temp_dir() which depends on TEMP_DIR variable from .emscripten configuration file. Is this on purpose?
https://github.com/kripken/emscripten/blob/fe3510ca5ebe65b388c4c85c2fc15f8995f7c8b1/emcc.py#L85
That looks silly, yeah... I can't think of a reason for us to have two of those.
We should probably use TEMP_DIR from the config file, and ignore the EMCC_TEMP_DIR env var. But we should probably check if tests use it. Also we should warn or maybe even error if it is set.
Nothing else uses EMCC_TEMP_DIR, fortunately, so the use can just be deleted. But code uses TEMP_DIR in a peculiar fashion later:
https://github.com/kripken/emscripten/blob/6dc4ac5f9e4d8484e273e4dcc554f809738cedd6/emcc.py#L370-L379
I think the right thing to do here is to just temp_dir = get_emscripten_temp_dir(), yes?
I think we should use get_emscripten_temp_dir everywhere and then make EMCC_DEBUG follow EMCC_TEMP_DIR so that we can run those tests in parallel.
I think the only constraint we should keep here is that EMCC_DEBUG writes by default to the canonical location of /tmp/emscripten_temp, since that is useful for debugging (otherwise, we can change pretty much everything in the temp file code). @saschanaz, if I understand you correctly, I think that is possible in what you propose, by using EMCC_TEMP_DIR to override that default and let each test write to a personal temp dir? If so, that sounds good.
Emscripten test runner has a feature when environment variable EM_TESTRUNNER_DETECT_TEMPFILE_LEAKS=1 is specified that before the test starts, the set of files that exists in the temp directory is snapshotted, then the test is run, and after the test a second snapshot is taken, and these two snapshots are compared to ensure that no new files had been left around.
This kind of file leak checking has an issue that if any external process would be writing to the same system-global temp directory (e.g. /tmp/ or C:\temp) while the test was running, then this would conflict with the leak checking feature. Because of this, when running in this mode, one needs to ensure that EMCC_TEMP_DIR points to a directory that is exclusive to Emscripten's own operation so that other processes on the system can't create spurious temp files in between.
However if there are multiple Emscripten compiler instances running, then these too can mess the temp directories if they use the same one. That is, if two tests are running in parallel processes, one might create temp files while the other was running a test, causing the leak checker to find new files in the temp directory. Because of this, one can't run multiple Emscripten compiler instances (with the same temp directory) when EM_TESTRUNNER_DETECT_TEMPFILE_LEAKS=1 is enabled.
The Emscripten Buildbot-based test runners run the other suite with this environment variable enabled. Historically since the core suite is parallelized, EM_TESTRUNNER_DETECT_TEMPFILE_LEAKS=1 is not enabled for it, but only other suite is run through this leak checker. If the other suite is parallelized, then this will mean that EM_TESTRUNNER_DETECT_TEMPFILE_LEAKS=1 will start generating false positives in that suite as well.
I think the solution forward here might be to assign unique/dedicated temp directories to each parallel test runner process in core and other? That is, for each parallel test runner thread, append a unique subdirectory path to EMCC_TEMP_DIR for each of them, so that they all operate in unique directories? Then the leak checker could also be enabled for the core suite.
I have a question, by default the test runner creates a new random directory e.g. emscripten_test_default_tewe12rk so that there would be no conflicts with other processes. Can we check this directory instead of global temp directory?
If I recall correctly, the intent was to proof against "what if some files were generated to a directory outside this? (a bug)". Also, if running with EM_SAVE_DIR or EMCC_DEBUG, there's the canonical temp directory that's being sometimes used, which iirc other suite does use in some tests, so those can theoretically conflict.
Most helpful comment
parallel_test_core.pyis something that should now be now obsolete, @jgravelle-google wrote the new parallel harness for the core suite, which supercedes the earlier one.If a tests needs redirecting the temp directory, using
EMCC_TEMP_DIRsounds fine. It should be safe for multiple concurrent Emscripten executables to reuse the same temp directory, so this would only be for tests that for some reason need an exclusive access to the whole cache directory?