Typescript: Change only updated files when using watch option.

Created on 11 May 2015 · 27Comments · Source: microsoft/TypeScript

Updating all files slows down debugging.

Compile on save would also do but I can't expect it would soon be there in Visual Studio Code, even it would be there, it might not be stable enough since the very beginning. It would be great if tsc updates changed files only when doing incremental compilation.

Fixed Suggestion help wanted

Source

vilic

Most helpful comment

One thing this would really help with for us is preventing our nodemon apps from all going nuts when re-running watchers.

Right now it causes tons of restarts, cpu goes to 100% for quite a while. So we have to kill all our processes and completely restart all the apps whenever we make a change that watch doesnt pick up, which seems fairly often (sometimes even adding a new file seems to not get picked up).

Edit: I see this claims to be fixed, but I still see a lot of restarts. Perhaps there is something else going on...

natew on 13 Jun 2018

👍3

All 27 comments

We already cache the bound ASTs during --watch to speed things up, but the problem with holding off on emit is that it's difficult to see what files might change in the process. Imagine two files, a.ts and b.ts.

a.ts

const enum E {
    Hello, World
}

b.ts

var x = [E.Hello, E.World]

When compiled, b.js will be

var x = [0, 1]

If we change a.ts to the following

const enum E {
    Hello, Beautiful, World
}

b.js is also affected and it's difficult to track that:

var x = [0, 2]

I could see us caching a hash of the emit result to avoid an IO penalty but I'd have to see some numbers indicating whether this becomes a real win.

Edit: I should clarify - when I say that it's difficult, I mean that it would introduce complexity in the compiler which I don't think is warranted.

DanielRosenwasser on 11 May 2015

Are you trying to save the cost of compiling the file? Or just the cost of writing the file to disk?

We cannot avoid the former. We possibly could avoid the latter. Though we'd need numbers to know what sort of benefit there would be.

CyrusNajmabadi on 12 May 2015

Thanks for your replies.

IO seems to be an issue on (my) Windows, though I can't figure out why (adding exception to Windows Defender makes it better, but still slow comparing to Mac). http://stackoverflow.com/questions/30046811/why-is-compiling-typescript-on-mac-much-faster-than-on-pc

On the other hand, many tools have "watch" functionality to detect file changes and do some re-releasing or restarting. I managed to make the part I can control smoother by determine compiling status, but I can't change third-party tools we use. So they are purely file changes + timeout based. Which means one more js file the compiler updated, one more js file that tool will need to re-releasing. Also, if two files are emitted not within time that's short enough, that tool might run more than once.

These kind of slow down our debugging process.

vilic on 12 May 2015

Sounds like a reasonable suggestion. We could definitely attempt to avoid writes to files if we've emitted the exact same bytes as what exists on disk. We'd likely need to still read the files in to verify if they changed or not. But at least we wouldn't be writing (and updating timestamps) for anything unchanged.

CyrusNajmabadi on 12 May 2015

👍1

That would definitely help, looking forward to it.

vilic on 12 May 2015

I noticed this as well recently, and ended up using grunt-ts, which will only pass changed files to the compiler and reduces output significantly

scriby on 13 May 2015

@scriby FYI, I made some changes to sys.ts, and replaced tsc.js file of my team as a temporary solution.

function compareFileWithData(fileName: string, data: string, writeByteOrderMark: boolean): boolean {
    if (!_fs.existsSync(fileName)) {
        return false;
    }
    var buffer = _fs.readFileSync(fileName);
    var len = buffer.length;
    if (len >= 2 && buffer[0] === 0xFE && buffer[1] === 0xFF) {
        // Big endian UTF-16 byte order mark detected. Since big endian is not supported by node.js,
        // flip all byte pairs and treat as little endian.
        return false;
    }
    if (len >= 2 && buffer[0] === 0xFF && buffer[1] === 0xFE) {
        // Little endian UTF-16 byte order mark detected
        return false;
    }
    if (len >= 3 && buffer[0] === 0xEF && buffer[1] === 0xBB && buffer[2] === 0xBF) {
        // UTF-8 byte order mark detected
        return writeByteOrderMark && buffer.toString("utf8", 3) == data;
    }
    // Default is UTF-8 with no byte order mark
    return !writeByteOrderMark && buffer.toString("utf8") == data;
}

function writeFile(fileName: string, data: string, writeByteOrderMark?: boolean): void {
    // If the file is exactly the same as what will be written, skip it.
    if (compareFileWithData(fileName, data, writeByteOrderMark)) {
        return;
    }

    // If a BOM is required, emit one
    if (writeByteOrderMark) {
        data = '\uFEFF' + data;
    }

    _fs.writeFileSync(fileName, data, "utf8");
}

vilic on 13 May 2015

@vilic can you describe the type of speedup you're getting from this? Is it substantial? What is the size of the codebase you are --watching?

DanielRosenwasser on 14 May 2015

Some numbers:

Modified

Initial compilation.

files:             329
lines:           64319
nodes:          219998
identifiers:     74564
symbols:         55576
types:           21180
memory used:   124673k
i/o read:        0.10s
i/o write:       0.12s
parse time:      0.67s
bind time:       0.36s
check time:      1.26s
emit time:       0.54s

Incremental compilation.

files:             329
lines:           64319
nodes:          219998
identifiers:     74564
symbols:         55576
types:           21180
memory used:   166245k
i/o read:        0.02s
i/o write:       0.14s
parse time:      0.06s
bind time:       0.01s
check time:      1.65s
emit time:       0.47s
total time:      2.19s

Unmodified

Initial compilation.

files:             329
lines:           64319
nodes:          219998
identifiers:     74564
symbols:         55576
types:           21180
memory used:   120182k
i/o read:        0.08s
i/o write:       0.49s
parse time:      0.64s
bind time:       0.35s
check time:      1.22s
emit time:       0.97s

Incremental compilation.

files:             329
lines:           64319
nodes:          219998
identifiers:     74564
symbols:         55576
types:           21180
memory used:   154028k
i/o read:        0.01s
i/o write:       0.64s
parse time:      0.05s
bind time:       0.00s
check time:      1.82s
emit time:       1.19s

These are numbers when third party tools are not watching, it seems that keep them watching will dramatically increase i/o write time (up to about 2 seconds, on (my) Windows).

And updating only necessary files speeds up re-releasing process quite a lot.

vilic on 14 May 2015

So IO-write goes from around 0.6s to around 0.1 seconds. A reasonable savings, though not huge.

I'm curious what third party tools you're using that are watching this?

CyrusNajmabadi on 14 May 2015

We are using yog, which is an integrated solution for fis on NodeJS. We are moving our project from PHP to NodeJS, before _yog_ we were using another integrated solution for _fis_ named fisp.

The integrated solution (yog/fisp) divided a project to several apps, and we need to watch and release for apps we are working on, separately.

vilic on 14 May 2015

We would consider a PR for this. things to watch for are the size of the change (a non-invasive/destabilizing would be ideal), also a quick way to verify that the file has not changed would be ideal.

mhegazy on 9 Dec 2015

@mhegazy I would like to give it try this weekends. But I am not very sure about what you mean by "size of the change"? Do you want the implementation to be limited to file system utils or with the help of something else?

vilic on 10 Dec 2015

@vilic, we will need an implementation proposal, outlining the approach of detecting if a file changed or not, and what implications if any on the performance. I would not expect this change to be large, if that is not the case let us know.

mhegazy on 4 Jan 2016

@mhegazy I am thinking about memorizing the last modified time and output after writing every file. And determine whether a file has external changes by querying and comparing its last modified time. If the last modified time changed or the output changed, the file will get updated. (I guess storing output in memory should have minor impact comparing to things like AST?)

vilic on 29 Jan 2016

i would be concerned about performance, mainly GC load if we keep two copies of the project in memory at a time.

mhegazy on 30 Jan 2016

Then what I can think of are:

Read previous output from file system every time, just like the very first implementation. (But maybe compare by buffers in stream.)
Store hashes of previous output files and last modified time, and comparing these factors to determine whether actual outputs are needed.

vilic on 30 Jan 2016

Storing a hash of the output would be the most memory efficient way, I would suspect.

kitsonk on 31 Jan 2016

@kitsonk I think so. But as I am not familiar with general GC implementation, I don't know which would actually have more performance impact. I think the output (which are strings) should be considered ignorable comparing to the AST stored in memory.

vilic on 31 Jan 2016

Comparing AST is likely to be CPU intensive when you have low frequency of change. The only way you know that nothing has changed is by walking the whole AST tree. If the change level is high, then you can obviously benefit from short circuit logic. Where as a hash of the output would have the same CPU profile and most engine string comparison is highly efficient and also benefits from short circuiting, but on a much more smaller data set, with no tree walking logic. I suspect it would be several orders of magnitude more CPU efficient irrespective of likelihood of change. As far as GC/memory utilisation in general, I can't see where it wouldn't be more efficient unless every output file was really small. Less memory used, less GC incurred. A map of output hashes seems ideal. I would even consider abandoning time stamps of the files, as validating this is likely wasted cycles. If the last output hash is different, output, if not, don't, if there is no output hash in memory, output.

kitsonk on 31 Jan 2016

@kitsonk Oops, there seems to be some misunderstanding. I meant string output won't take much memory comparing TO AST. But it helps if hashing in Node implementation is very efficient. (But I guess that also means we would probably give up other hosts on this.)

vilic on 1 Feb 2016

Well, even still, big strings are more efficient than AST. AST would have a pointer for each node and those pointers are unlikely to be shared. A big string would have one or two pointers (I am not 100% sure) irrespective of size. Also, JVMs take advantage of string interning (e.g. a = 'abc' and b='abc' share the same memory space) and are immutable (you mutate a string it reallocated and the previous instance is potentially freed for GC) and so tend to be more memory efficient than any other sort of complex multi-nodal representation, where the analysis of GC alone could be quite a lot of overhead. I guess comparing output files, without hashing their value, likely would be a similar CPU profile, though obviously more working memory that is less likely to get GC'ed.

While NodeJS comes with some hashing functions, it would be trivial to fallback to a JS hash function on other hosts.

kitsonk on 1 Feb 2016

@kitsonk I see, thanks a lot!

vilic on 1 Feb 2016

What we could do is use node hashing functions and if not available fall back to the current behavior.

mhegazy on 6 Feb 2016

thanks @vilic !

mhegazy on 5 Apr 2016

One thing this would really help with for us is preventing our nodemon apps from all going nuts when re-running watchers.

Edit: I see this claims to be fixed, but I still see a lot of restarts. Perhaps there is something else going on...

natew on 13 Jun 2018

👍3

There seems to have been a regression. I noticed it while using nodemon, but it's reproducible without it:

$ tsc --version
Version 3.3.1
$ tsc --init
$ tsc --watch

$ echo '//a' > a.ts
$ stat a.js
16777220 21539803 -rw-r--r-- 1 laptop staff 0 18 "Feb  6 10:42:56 2019" "Feb  6 10:45:59 2019" "Feb  6 10:45:59 2019" "Feb  6 10:42:56 2019" 4096 8 0 a.js

$ echo '//a' > a.ts
$ stat a.js
16777220 21539803 -rw-r--r-- 1 rabbit staff 0 18 "Feb  6 10:42:56 2019" "Feb  6 10:47:21 2019" "Feb  6 10:47:21 2019" "Feb  6 10:42:56 2019" 4096 8 0 a.js

I've also confirmed the above triggers file change events for nodemon.