Original issue on the watcher
package https://github.com/dart-lang/watcher/issues/52, related issue in build_runner
https://github.com/dart-lang/webdev/issues/436.
I created a simple repro below that uses vanilla Directory.watch
. You can run each of these scripts at the same time to repro, which exposes a few issues:
Stream closed!!
message on stdout, but the process does exit. There is no unhandled exception that surfaces either.File modification script, edits a single file in a loop:
import 'dart:io';
void main() async {
var i = 0;
var file = File('out.txt');
await file.create();
while (true) {
i++;
await file.writeAsString('$i');
}
}
Watcher script, logs watch events:
import 'dart:io';
main() async {
var i = 0;
await for (var event in Directory.current.watch(recursive: true)) {
print('$i: $event');
i++;
}
print('Stream closed!!');
}
As a result of this we have to use a polling watcher, which doesn't perform well.
cc @kevmoo @vsmenon
Just to clarify the impact here, even a basic hello world web project (no angular) will currently consume an entire core indefinitely as long as webdev is running due to using the polling watcher.
How critical is this bug? I probably won't have time to address this quarter meaning it probably won't get fixed until after the summer vacation.
@sortie Can you take a look at the https://github.com/dart-lang/webdev/issues/436 to see how critical it is?
Do we have an idea when this was introduced?
Afaik it has been this way forever.
The problem has been around for a very long time. The impact has gone up since the new build system relies on caching to disk.
Any update on a fix for this?
related comment: https://github.com/dart-lang/webdev/issues/436#issuecomment-507679238
CC @a-siva @dgrove 鈥撀爐his is really gnarly issue for anyone doing (Flutter) web on Windows!
This issue is blocking our company from upgrading to Dart 2
FYI @aadilmaan
This is because of native ReadDirectoryChangesW
calls. Once many changes are made within a short period of time, it didn't notify with events. Did some tests here.
void main() async {
var i = 0;
var file = File('out.txt');
await file.create();
while (true) {
i++;
await Future.delayed(new Duration(milliseconds : 1000));
await file.writeAsString('$i');
}
}
With around 100 milliseconds sleeping, it works well. Once the sleeping time is down to 10 milliseconds, watcher can't perform well. Not sure how to deal with it.
The other issue on silently shut down. I have no idea. I can't reproduce it locally.
This is because of native ReadDirectoryChangesW calls
Is that the most efficient way to be notified of file changes for Windows?
I'm no expert, but what about this: https://docs.microsoft.com/en-us/windows/win32/api/fileapi/nf-fileapi-findfirstchangenotificationw
@bkonyi Any idea? Do we have a way to improve/fix it?
@bkonyi Any idea? Do we have a way to improve/fix it?
Hm... nope, I'm not terribly familiar with the file system watcher code. I'm wondering if maybe we're seeing a race that's causing events to be dropped or if that's just a limitation of the API itself. I'm not sure why the isolate would be exiting though... it's almost like it things it's done with everything on the event queue and there's nothing left to do so it shuts down. Unfortunately, it's hard to say if you can't reproduce it though.
I hope you don't mind, but I seem to recall reading on Bruce Dawson's blog him running into this sort of thing, so I asked him for his advice:
They should use the directory monitoring APIs. Copying VsChromium's code would be a good place to start. It does monitoring-and-scanning with throttling to avoid pathologies:
(link: https://chromium.github.io/vs-chromium/) chromium.github.io/vs-chromium/
If you don't know him, he's an expert on Windows performance: https://randomascii.wordpress.com/
(He also works at Google)
I was looking here: https://github.com/chromium/vs-chromium/blob/master/src/Core/Files/DirectoryChangeWatcher.State.cs
MSDN says the following:
When you first call
ReadDirectoryChangesW
, the system allocates a buffer to store change information. This buffer is associated with the directory handle until it is closed and its size does not change during its lifetime. Directory changes that occur between calls to this function are added to the buffer and then returned with the next call. If the buffer overflows,ReadDirectoryChangesW
will still return true, but the entire contents of the buffer are discarded and thelpBytesReturned
parameter will be zero, which indicates that your buffer was too small to hold all of the changes that occurred.
There is obviously an inherent reliability problem with the API - no matter what we do it always can loose events if somebody is doing a lot of file system changes. We can try to fight this issue, but I also think that build_runner
should change the way it uses watchers to prevent events from build/
from overflowing the watch buffer. @jakemac53 could build_runner avoid watching directory that contains output of the build?
I think the only simple way we could try to cope with this is to try bumping kBufferSize
we use for DirectoryWatchHandle
higher (e.g. to several megabytes) - because there is an indication that the size of the internal buffer is controlled by that. Another possibility is to try issuing subsequentReadDirectoryChangesW
as fast as possible to drain internal OS buffer - but I think that would require some major refactoring of the implementation.
Just a simple thought while you lookging for a definitive solution: webdev serve
and build_runner serve
could have a --without-watcher
parameter, for example. We would not have a new build when there were changes. But at least this way we would have a _development server_ running without a high CPU usage.
could build_runner avoid watching directory that contains output of the build?
Here is at least one issue where we have discusses changing what we are watching to avoid seeing the updates we make in .dart_tool
- https://github.com/dart-lang/build/issues/1579
I think there may have been some other places where it came up.
There are some tricky things to work out if we try to do this - for instance if we start up watchers in the directories we care about initially, and a new directory we _would_ care about gets added, we won't have a watcher in it.
I think we could solve this if there were some SDK and OS supported way of either:
The other thing to consider would be to move the output somewhere other than .dart_tool
- and that is a tricky thing to solve...
@jakemac53 Sorry for delayed reply. For all the issue you posted,
If you start the watch script after the file modifying script is already running, then it won't see any events at all.
I managed to find the root cause for this. The reason is that ReadDirectoryChangesW
will be started immediately after it is set up. program has not come back to dart side to set up the stream, since modification happens in a very high frequency. Then the event sent by VM will not be received by dart side.
Can you try this cl locally to see whether it solve your problem? https://dart-review.googlesource.com/c/sdk/+/111342
After a while (few thousand events, but not consistent), the directory watcher script simply exits.
The watcher script terminates in some weird way, I never see the Stream closed!! message on stdout, but the process does exit. There is no unhandled exception that surfaces either.
I have never triggered any termination when I worked on the issue. Do you know how to reproduce it consistently?
I have never triggered any termination when I worked on the issue. Do you know how to reproduce it consistently?
I don't seem to be able to reproduce this any more... :man_shrugging:.
Can you try this cl locally to see whether it solve your problem? https://dart-review.googlesource.com/c/sdk/+/111342
I don't have the sdk set up to build on my windows box (its my home machine) so I can't easily test, trying to figure out how to just download the version the try bots built though. Or you could land the cl and I could get it from the build bots after they run.
The fixing cl has been landed. Since termination issue cannot be reproduced, I'll close this.
@jakemac53 @ZichangG I believe one of core reasons this issue was opened was extremely poor performance seen here: https://github.com/dart-lang/webdev/issues/436
A lot of comments in this issue pertain to the performance aspect, but I don't think this fixes those issues?
If not, should the above issue be moved? Or a new issue be opened?
@mnordine the performance is poor because we have to use a polling watcher, this was the issue preventing us from migrating away from that and to the native windows directory watcher.
Great, is there another issue I can follow for that, so I know when I can test it out?
We can use https://github.com/dart-lang/webdev/issues/436 to track that, I added an update there. Will need to wait for the next dev release though.
The cl has introduced flakiness on bots. I'll reopen this issue and investigate on crashes. #37909
https://github.com/dart-lang/sdk/commit/95db62d6a2ecd1b8bb6eeec36fd58950855c1ab9 has landed and fix the problem!
@ZichangG I can now consistently repro the issue where the entire program simply exits - in 3 runs this happened between 100 and 200 events in so it doesn't take long (this is using the two original programs linked at the top). I saw the same behavior regardless of whether I started the watcher script before or after starting the file editing script.
The program exits without any sort of stack trace or anything so I unfortunatley can't help much with debugging here, I don't know if there are additional vm flags I could provide to get more logging?
@jakemac53 I failed to reproduce it. No matter using powershell, cmd or git bash. Do you have a detailed reproduction? I'm testing with
Dart VM version: 2.6.0-edge.46f2603781be3fae545cdd9f9f2f1a07f0fde0f5 (Mon Sep 16 19:36:40 2019 +0000) on "windows_x64"
I don't know if there are additional vm flags I could provide to get more logging?
I don't think there is specific flags for file watchers. Maybe use observatory and set --pause-isolates-on-exit.
@jakemac53 You can probably pass --trace-isolates to VM to print more info.
Does the "program exiting" behavior reproduces more easily if you pause watcher(by clicking somewhere in powershell console which has watcher running) for 4-5 seconds, then resume it(hit Esc, for example).
I will try this again tomorrow with --trace-isolates
. Fwiw I was using whatever the default shell in vscode is.
@jakemac53 did you have a chance to try it out? I would like to make sure that this issue is not falling through.
Thanks for the ping I had indeed forgotten to check this, I can check it when I get home tonight on my windows machine there.
I tried with dart --trace-isolates watch.dart
but it says it is an unrecognized flag:
Setting VM flags failed: Unrecognized flags: trace_isolates
I did find --trace_shutdown
and got this (but doesn't seem that useful):
PS C:\Users\Jake\Desktop\git\playground> dart --trace_shutdown watch.dart
0: FileSystemModifyEvent('C:\Users\Jake\Desktop\git\playground\ooouut.txt', contentChanged=true)
...
725: FileSystemModifyEvent('C:\Users\Jake\Desktop\git\playground\ooouut.txt', contentChanged=true)
[+2823ms] SHUTDOWN: Starting shutdown
[+2825ms] SHUTDOWN: Shutting down profiling
[+2845ms] SHUTDOWN: Disabling isolate creation
[+2847ms] SHUTDOWN: Killing all app isolates
[+2849ms] SHUTDOWN: Shutting down app isolates
[+2851ms] SHUTDOWN: Shutting down kernel isolate
[+2857ms] SHUTDOWN: Shutting down service isolate
[+2872ms] SHUTDOWN: Waiting for isolate shutdown
[+2880ms] SHUTDOWN: Entering vm isolate
[+2881ms] SHUTDOWN: Deleting thread pool
[+2883ms] SHUTDOWN: Disabling OS Thread creation
[+2885ms] SHUTDOWN: Cleaning up vm isolate
[+2887ms] SHUTDOWN: Deleted os_thread
[+2888ms] SHUTDOWN: Deleting code observers
[+2890ms] SHUTDOWN: Shutting down timeline
[+2893ms] SHUTDOWN: Done
@jakemac53 Just wanna double check. Is your reproduction the same as what @aam described?
you pause watcher(by clicking somewhere in powershell console which has watcher running) for 4-5 seconds, then resume it(hit Esc, for example).
I don't need to pause it to get the reproduction - I run the watch script and then run the file modification script. The watch script dies typically within a couple seconds (after starting the file modification one).
@ZichangG and @jakemac53. A solution for this issue is very important to my company team to start a project with Dart on Windows enviromnent. Is there a deadline to close this issue?
PS: Just one thought, no thorough investigation: analyzer
package in Dart I think it uses some kind of watcher to detect code changes, and although a slow one when booting the IDE or code change, apparently it works on Windows. Maybe the same package could be used in webdev
.
The analyzer uses the same watcher - but there is a specific bad interaction with the watcher and package:build due to the high number of file events we cause when doing builds.
(also anecdotally I have many times seen analyzer stop working on windows and had to restart vscode to get it back, I would be willing to bet this is the reason)
@jakemac53 can you try the following variant of watcher please?
import 'dart:io';
import 'dart:async';
main() async {
var i = 0;
Directory.current.watch(recursive: true).listen((event) {
i++;
});
Timer.periodic(Duration(seconds:1), (Timer t) {
print('at ${DateTime.now()} saw $i notifications');
});
}
Does it still exits for you?
The explanation I have for why your version exits is that the only active receive port in that watcher is directory changes listener, but it just stops(perhaps due to internal constraints of buffer overflow handling), which leads to that single receive port being closed, leading to dart program exit.
If you add another active port - timer - then dart program doesn't exit even when watcher port is closed.
perhaps due to internal constraints of buffer overflow handling
I think that's root cause. I tested different buffer size with aam's reproduction. With extremely small buffer size, program will exit automatically. With 64k bytes buffer, pausing watcher's powershell for several seconds can reproduce the problem.
ReadDirectoryChangesW will still return true, but the entire contents of the buffer are discarded and the lpBytesReturned parameter will be zero, which indicates that your buffer was too small to hold all of the changes that occurred.
For buffer size,
ReadDirectoryChangesW fails with ERROR_INVALID_PARAMETER when the buffer length is greater than 64 KB and the application is monitoring a directory over the network. This is due to a packet size limitation with the underlying file sharing protocols.
64kb is what we are using now. I don't think we have anything to do here.
OK - if this is an issue with the underlying OS then we can fall back to trying to mitigate the problem instead.
Can we get some sort of specific exception thrown when this happens? Then we can at least message to the user what has happened.
That would give us a relatively sane path towards enabling this behind a flag.
Yeah, something needs to get called when watcher exits(currently there is no indication that watcher stopped watching).
The patch below allows to get onDone
callback of Directory.watch(...).listen(..., onDone: ...)
called.
diff --git a/sdk/lib/_internal/vm/bin/file_patch.dart b/sdk/lib/_internal/vm/bin/file_patch.dart
index 338a260002..90fcf26b3a 100644
--- a/sdk/lib/_internal/vm/bin/file_patch.dart
+++ b/sdk/lib/_internal/vm/bin/file_patch.dart
@@ -171,7 +171,9 @@ class _FileSystemWatcher {
assert(_watcherPath.count > 0);
_watcherPath.count--;
if (_watcherPath.count == 0) {
- _unwatchPath(_id, _watcherPath.pathId);
+ if (_idMap.containsKey(_watcherPath.pathId)) {
+ _unwatchPath(_id, _watcherPath.pathId);
+ }
_pathWatchedEnd();
_idMap.remove(_watcherPath.pathId);
}
@@ -293,6 +295,8 @@ class _FileSystemWatcher {
}
}
} else if (event == RawSocketEvent.closed) {
+ _idMap.remove(socketId);
+ stops.add([socketId, null]);
} else if (event == RawSocketEvent.readClosed) {
} else {
assert(false);
Folks, I'm sorry for the insistence. But would we now have a "light at the end of the tunnel" for this issue?
If this really has _low priority_, I would like to know for me to try to look for other workarounds.
This is not a low priority.
There is https://dart-review.googlesource.com/c/sdk/+/119524 with the fix that hopefully will be landed soon. The fix being exception thrown by the watcher when OS closes it. That should give an app an opportunity to handle this in some reasonable fashion for example by restarting the listener.
https://dart-review.googlesource.com/c/sdk/+/119524 landed, it has an example of how one can catch those exceptions(watcher method used testWatchOverflow test https://dart-review.googlesource.com/c/sdk/+/119524/16/tests/standalone_2/io/file_system_watcher_test.dart#451). Please see if it works for you.
To handle the exception, simply surround stream.listen() with runZoned() and provide onError() callback which will be invoked if exception occurs.
runZoned(() {
watcher.listen((data) async {
// handle event
});
}, onError: (error) {
// handle error
});