fsharp 🚀 - It's very hard to work on VisualFSharp solution due to constant blocking GC

@vasily-kirichenko We really just have to take this stuff out-of-process (and into a 64-bit process), don't we...

dsyme on 9 Jan 2017

Yes, we do. But it's a huge amount of work.

vasily-kirichenko on 9 Jan 2017

"just" ;-)

forki on 9 Jan 2017

🎉1

Have you tried running a memory profiler on devenv to find what's retaining memory and/or causing the GCs / GC pressure?

jack-pappas on 17 Jan 2017

@jack-pappas about millions of times :) It's FSC caches.

vasily-kirichenko on 17 Jan 2017

@jack-pappas look at this:

So name resolution cache Map occupies 100MB of memory _for map nodes themselves_.

Your HashMap has even larger size per node :( I was thinking about using Dictionary here, but it seems to be used for layering names in nested scopes (I may be wrong though) and immutable map fits this perfectly.

vasily-kirichenko on 17 Jan 2017

Add of these object sit in Gen 2, of course, that's the problem.

vasily-kirichenko on 17 Jan 2017

What if you sort by the rightmost column there ('Minimum retained bytes')? E.g., if you look at the second line in your screenshot for NameResolution+CapturedNameResolution, those objects themselves only account for ~38MB, but if you include the transitive closure of the objects they're referencing, it's around ~130MB.

I guess what I'm wondering is -- are all of these cached objects necessary / useful? Maybe it'd be better to have some bounded-size data structures for the caches, or do something like define a record to hold all of the per-compilation caches and hold it in a ConditionalWeakTable<_,_> with some key that'll allow the cache records to be cleaned up / freed once they're no longer needed.

I could implement a snapshottable dictionary type that could be used for the name resolution map; however, some of the size of the Map nodes is the data held by the node, so I wonder how much that'd really gain us in terms of memory usage reduction (maybe 100MB -> 60MB)?

Is there a way to run the code used here in a more standalone way (e.g., have some F# script which loads the FCS assembly and uses it to analyze the FCS code itself)? I'm happy to dig deeper on this to try to improve the performance, and it'd be easiest if I could run the code both inside and outside VS.

jack-pappas on 17 Jan 2017

@jack-pappas I think running FCS in a script is a fine way to start, maybe adding the code of few analyzers, that would depend on roslyn assembly but would still be possible without VS, but just FCS should get you there in terms of objects in the cache and memory size compared to code which has been loaded/queried.

smoothdeveloper on 18 Jan 2017

@jack-pappas @vasily-kirichenko It would, I think, be very productive to go through those objects and work out what isn't necessary. That would easily reduce the cached information by 50%

We do a little of this with the keepAllBackgroundResolutions flag, which is set to false, see
https://github.com/Microsoft/visualfsharp/blob/ee84eb77bcd62379d8178e3167a17d12d204de30/src/fsharp/vs/IncrementalBuild.fs#L1481

It's tricky because one single captured handle somewhere in one of the cached data structures can keep hold of vast amounts of information, most of which will be unnecessary when processing later requests. We really need an automated test in the FCS tests which cleans everything, does some processing, then measures the size of the captured information graph. I think a small amount of reflection code could walking the captured object graph could achieve this.

Cheers
don

dsyme on 18 Jan 2017

@jack-pappas

vasily-kirichenko on 18 Jan 2017

@vasily-kirichenko Is that switching to try to use ImmutableDictionary? I would expect that not to be better as you will lose sharing between the different immutable maps (many of which have much in common)

dsyme on 19 Jan 2017

@vasily-kirichenko If you could show some more dumps of the current master sorted by both bytes and minimum retained bytes I think I could come up with a decent strategy for trimming a good chunk of stuff, or at least making it more weakly held

Some initial ideas

Reduce incrementalTypeCheckCacheSize to 2 or 3. It is currently 5. This is the number of files whose ParseAndCheckFileInProject results are kept strongly. This certainly needs to be at least 1, so the current file of focus is kept strongly. It benefits from being 2, if switching between files. Switching between 3 files is also common, between 5 is less common (or at least not much more common than 6 or 7)
The big swag of Map nodes is CapturedEnvs in TcResolutions. This contains a lot of information in variious tables. However, in practice I believe only a subset of information is needed, in particular after selecting an environment in GetBestEnvForPos we pass this to either ResolvePartialLongIdent and ResolvePartialLongIdentToClassOrRecdFields (the only two places these are called) These only look at a few of the tables in the NameResolutionEnv. So instead of storing the whole NameResolutioEnv we could just store those tables, and pass those back in to those two routines
There may also be a way of avoiding storing so many NameResolutionEnv in the first place. They are needed relatively rarely - for example when completing a partial identifier Syst, but they are still needed.
I'm a bit shocked at the number and cost of SynExpr.Sequential and SynExpr.Const and SynExpr.UInt16 nodes. The latter must be the parser or lexer tables. But why are we storing all these untyped ASTs? I assume you never opened pars.fs or lex.fs. I think these ASTs must be being held by the background builder in the let parseTreesNode = Vector.Map "ParseTrees" ParseTask stampedFileNamesNode output node from the background build graph. This in turn is used by GetParseResultsForFile (and also consumed by the typechecking nodes in the build graph), in turn to provider the FCS API entry point GetBackgroundParseResultsForFileInProject. However I don't think GetBackgroundParseResultsForFileInProject is used anywhere in the Visual F# Tools. I believe it would be better not to store these parse trees in the result nodes of the background builder at all (e.g. it would be quicker and better to just reparse the background files when requested)

Looking at the results I think these could together reduce FCS memory usage by up to 30-40%, I might be wrong

dsyme on 19 Jan 2017

👍1

The big swag of Map nodes is CapturedEnvs in TcResolutions. This contains a lot of information in variious tables. However, in practice I believe only a subset of information is needed, in particular after selecting an environment in GetBestEnvForPos we pass this to either ResolvePartialLongIdent and ResolvePartialLongIdentToClassOrRecdFields (the only two places these are called) These only look at a few of the tables in the NameResolutionEnv. So instead of storing the whole NameResolutioEnv we could just store those tables, and pass those back in to those two routines

TypeChecker and the service both calls functions in NameResolution.fs passing NameResolutionEnv, so we cannot substitute it with a lighter type. What about adding an interface, which contains properties which return only those maps that are needed by the service? Then make NameResolutionEnv and ServiceNameResolutionEnv implement it, and make all the functions used by service accept that interface instead of NameResolutionEnv? I'm afraid, however, that we results with as fat interface as current nenv :(

vasily-kirichenko on 21 Jan 2017

so we cannot substitute it with a lighter type. ....

@vasily-kirichenko Yeah, I took a look and you are right - the Partial* routines end up using pretty much everything in the NameResolutionEnv. That's a shame - there's no direct saving here.

However there must still be a way to do better here - all those NameResolutionEnv's are taking up vast amounts of memory, for a relatively small part of the autocomplete implementation.

dsyme on 21 Jan 2017

@vasily-kirichenko One idea: instead of storing the final computed NameResolutionEnv at each point (there are a huge number of these and they each have quite a few unique allocations)), what if we store the sequence of operations required to compute each NameResolutionEnv, e.g. a list of NameResolutionEnv -> NameResolutionEnv operations logically corresponding to "open XYZ", "add a value" etc.

My intuition is that the total storage for these lists of closures may be much, much smaller than the total storage for all the computed lookup tables, e.g. the closure for an open may be just 5-10 words, where the computed NameResoutionEnv may have allocated many more nodes in the lookup tables. Effectively I'm saying that in this case

    [f1; f2; f3; f4]

is much smaller than

    [f1(); f2(f1()); f3(f2(f1())); f4(f3(f2(f1())))]

We then replay the derivation (list of computations) for a selected NameResolutionEnv when that specific NameResolutionEnv is needed for autocomplete (relatively rare), but we never store that environment in any persistent way.

See this draft for example https://github.com/fsharp/FSharp.Compiler.Service/compare/master...dsyme:draft-mem-red?expand=1 That draft is over the FCS repo.

There is of course a risk that recomputing the environments would take some significant time, but I suspect not. Also the operations are somewhat tunable - one can make it that operations like "open" empty the derivation list, replacing it by a closure that returns the computed nenv.

If this worked

dsyme on 21 Jan 2017

@dsyme hmm. If I understand your code properly, what we'll end up is basically a function (closure) that builds all the maps in NameResolutionEnv _every time from scratch, adding to a map one element at a time_, then throw it immediately after one-shot use (in code completion scenario at least. In my "Simplify Name" analyzer I think it'd not work at all because it's already slow). Would be very interesting to see what it's gonna give us in memory consumption (why are you doing it in FCS repo BTW? I'd love to test it right in VFT).

vasily-kirichenko on 21 Jan 2017

Is there a way to run the code used here in a more standalone way (e.g., have some F# script which loads the FCS assembly and uses it to analyze the FCS code itself)? I'm happy to dig deeper on this to try to improve the performance, and it'd be easiest if I could run the code both inside and outside VS.

@jack-pappas

Here is the process I used to run dottrace on a compile of FCS with latest compiler:

in VS2015, open FCS solution and do "rebuild" on the main project
in output window you'll have a full command line, copy that
substitute the SDK's fsc.exe with the fsc.exe compiled from master (to get the compiler, you need to run "build net40" then you need to run once "src\update.cmd release" so the compiler doesn't fail due to signing)
use that as command line, parameters, and set the working directory to where the FCS .fsproj is located to run in dottrace / dotmemory

smoothdeveloper on 21 Jan 2017

Compiling is not interesting, because caches are not populated. The more interesting would be:

add a console project into VisualFSharp.sln
add it as "internals visible to" to FSharp.LanguageService.Compiler (I've tried to do just this after which the VSIX refused to build saying something about a not signatured assembly)
in that console project create a FSharpChecker instance
call checker.ParseAndCheckProject passing a proper FSharpProjectOptions
call checker.CheckFileInProject for a large file
call FSharpFileCheckResults.GetAllUsesOfAllSymbols
stop, but not exit
attach to it with dotMemory and take a snapshot

vasily-kirichenko on 21 Jan 2017

👍1

I've tried to do just this after which the VSIX refused to build saying something about a not signatured assembly

If an assembly is strong-named, then all InternalsVisisibleTo also need to be strong-named and the InternalsVisibleTo attribute needs to contain the public key. (see http://stackoverflow.com/questions/30943342/how-to-use-internalsvisibleto-attribute-with-strongly-named-assembly)

So you would need to

1) Strong-name the console project
2) Extract the full public key (sn -t youSnk.snk)
3) add the public key to the attribute string in FSharp.LanguageService.Compiler

0x53A on 21 Jan 2017

👍1

https://github.com/Microsoft/visualfsharp/pull/2312

vasily-kirichenko on 22 Jan 2017

@dsyme hmm. If I understand your code properly, what we'll end up is basically a function (closure) that builds all the maps in NameResolutionEnv every time from scratch, adding to a map one element at a time, then throw it immediately after one-shot use (in code completion scenario at least. In my "Simplify Name" analyzer I think it'd not work at all because it's already slow).

Currently we store something like ~40,000 fully computed, massive NameResolutionEnv for a large file like TypeChecker.fs. Basically none of these data structures are ever accessed by the core feature set, it's just a vast amount of dead weight memory. One or two may be needed to support various GetDeclarations and GetToolTip calls as the user hovers the mouse, but even in those cases the NameResolutionEnv is not always needed. As soon as the use types a character the whole lot is recomputed.

Recomputing the NameResolutionEnv on demand (e.g. on hover, in the cases it is actualy needed) feels like it is going to be fast enough. We can amortize those recomputations, or amortize the processing of "open" declarations. Everything else along a particular path to a point in the file will, I think, be fast.

This analysis is based on the assumption that GetDeclarations and GetTooTip calls are only made for a few locations in that file. I need to understand what calls the Simplify Name analyzer is using. If the memory savings are large we might choose to turn off the Simplify Name analyzer or make it opt-in. I don't see that analyzer as critical to the feature set, whereas decreasing memory usage by 2x or 5x is critical.

In any case, certainly we are storing far, far too much information for those zillions of unused NameResolutionEnv. We need to understand if we can stop doing that, and with what effect, or else we are forever doomed to have excessive memory usage.

dsyme on 22 Jan 2017

(Notes to self)

The following FCS entry points may ultimately cause NameResolutionEnv objects to be needed (or, in the draft above, reconstituted):

GetDeclarationListInfo
GetDeclarationListSymbols
GetStructuredToolTipTextAlternate
GetToolTipTextAlternate
GetF1KeywordAlternate
GetMethods
GetMethodsAsSymbols
GetDeclarationLocationAlternate
GetSymbolUseAtLocation

As expected these are all the "get me some specific information at this specific location" calls.

[ Aside: not all the above calls always result in reconstitution - indeed far from it - and not all need the entire reconstituted NameResolutionEnv. For example, the majority of consumption points immediately call ResolveCompletionsInType. This in turn only uses two small tables from the reconstituted NameResolutionEnv to look up extension members. Two consumption points call ResolvePartialLongIdent and ResolvePartialLongIdentToClassOrRecdFields respectively which do require the whole information. ]

The questions are

What VFT and VFPT functionality depends on each of the above FCS calls?
Are the above entry points called for one specific location (e.g. the current cursor position), or globally across locations across a file/project?
Do the above calls realy actually need the NameResolutionEnv's at all to do their job?

To answer (1) I went looking at calls to the above in the VFPT and VFT feature set. Here's what I see in VFPT:

GetOpenDeclarationTooltip calling GetToolTipTextAlternate
GetDeclarationLocation using GetDeclarationLocationAlternate and used as part of the the Peek Definition implementation and GoToDefinition filter
GetSymbolUseAtLocation is used in quite a lot of places.

Here's what I see in VFT today:

GetDeclarationListInfo used by the auto-complete implementation here
GetDeclarationListSymbols is not used but may be used by the auto-compete impementation in the future
GetStructuredToolTipTextAlternate is used by the quick info provider
GetF1KeywordAlternate is used by the help context service
GetMethodsAlternative is used by the signature help implementation
GetMethodsAsSymbols is not used
GetDeclarationLocationAlternate is used by the GoToDefinition and FindReferences services.
GetSymbolUseAtLocation is used by the QuickInfo provider, the document highlighting service, the ImplementInterface quick fix and the changeAllSymbols helper. and the Inline Rename Service.

To answer (2) - In both the VFT and VFPT feature set, it is the uses of GetSymbolUseAtLocation that look most challenging (i.e. most likely to require the en-masse reconstitution of NameResolutionEnv across the whole file).

To answer (3) - my gut feeling is that GetSymbolUseAtLocation should never really need these NameResolutionEnv objects at all. The whole point of that method is to consult only the table of precise name resolutions, not to perform auto-complete.

In short, I believe the following

the current VFT and VFPT functionality can be fully supported with on-demand reconstitution of NameResolutionEnv objects at the few places in a file where they are actually needed.
the on-demand reconstitution is likely (but not certain) to give major memory savings
the on-demand reconstitution is likely to be "fast enough" (and if not we can tune things so that less re-computation is done and a bit more information is saved)

Overall it's a non-trivial piece of work but i feel we should take a look at rooting out this memory-hogging evil in the Update 1 or Update 2 timeframe.

dsyme on 22 Jan 2017

Scenario:

open VFT solution, no file is open
open TypeChecker.fs, wait for syntax highlighting
devenv.exe = 850MB
make a change
wait
devenv.exe = 1500MB
keep making changes in this single file, no jumps to other files at all
in about 15 minutes we have this picture:

Now tell me what the hell has eaten the memory? 4 copies of type checked info for this files and 2 copies of the whole project's info? 6 copies? Why?

vasily-kirichenko on 6 Feb 2017

@vasily-kirichenko I plan to work on this today. I think I will first try to adjust your memory-performance app to get a set of reliable memory statistics on each compiler performance run, then I will try some of the techniques above.

dsyme on 7 Feb 2017

@dsyme thank you. My VS 2017 crashes at work, so I opened VFT solution in 2015 and edited TypeChecker.fs. It stood around 800MB. Does the old compiler service cache less info? Or maybe it uses better caches?

vasily-kirichenko on 7 Feb 2017

@dsyme thank you. My VS 2017 crashes at work, so I opened VFT solution in 2015 and edited TypeChecker.fs. It stood around 800MB. Does the old compiler service cache less info? Or maybe it uses better caches?

It cached less info

the parse trees were not (needlessly) cached in VS2015
the tools didn't implement solution-wide checking
the project cache size was 3 not 200 (though I don't think that has an affect here since only 2-3 projects are involved)
possibly other things

dsyme on 7 Feb 2017

@vasily-kirichenko See https://github.com/Microsoft/visualfsharp/pull/2377 for a 25% reduction in background memory usage. Will now try the parse trees

dsyme on 7 Feb 2017

🎉1

@vasily-kirichenko See https://github.com/Microsoft/visualfsharp/pull/2378 for another reduction of about the same amount.

It would be good to get new memory traces after this - if there is some way to capture them automatically that would be even better. We can probably find the next low hanging fruit pretty easily now that the big noise is gone. (I don't have the jetBrains memory profiling tools installed, or anything but the VS tools - I've tried but always had issues keeping them)

dsyme on 7 Feb 2017

OK, worked out how to drive the VS tools to get a basic analysis. Also added a "P for pause" to the memory perf tool. Here are inclusive and exclusive snapshots for "P" in

 .\Release\net40\bin\LanguageServiceProfiling.exe ..\FSharp.Compiler.Service CXGSCGPS

after the two changes above

mem1
mem2

dsyme on 7 Feb 2017

OK the sort-by-exclusive-size gives this easy reduction, chopping off another 2.2% https://github.com/Microsoft/visualfsharp/pull/2378/commits/235a9552dc8659d68b6171419eca1522966f99e1

dsyme on 7 Feb 2017

Two new snapshots after applying #2377, #2378 and https://github.com/Microsoft/visualfsharp/pull/2384 (but not #2381 or #2380)

Exclusive:

mem1

Inclusive

mem2

dsyme on 8 Feb 2017

about 7% less memory. Do you have a snapshot before all your optimization applied?

vasily-kirichenko on 8 Feb 2017

@vasily-kirichenko No though it's roughly 318MB --> 139MB (that's the change in the additional memory for checking the project)

dsyme on 8 Feb 2017

wow :) So you've reduced the amount of data we keep for a project at about 2.3 times :)

The next step is investigating the caches behavior?

vasily-kirichenko on 8 Feb 2017

@vasily-kirichenko I'll probably leave it at that for the moment. The caches definitely need work - I'm still concerned by the 200 - but I don't have a specific angle I want to follow just yet

dsyme on 8 Feb 2017

@vasily-kirichenko Looks like we knock another 20MB off that: https://github.com/Microsoft/visualfsharp/pull/2388

dsyme on 8 Feb 2017

coool :)

vasily-kirichenko on 8 Feb 2017

Tried the current master on VFT solution. Opened TypeChecker and some other files, edited here and there a bit, executed Find all usages. devenv.exe was staying at 1.8GB. However, after not touching it for about 2 minutes, it's already 1.9GB and CPU at 20% all the time, see https://github.com/Microsoft/visualfsharp/issues/2389

vasily-kirichenko on 8 Feb 2017

It seems it deviates between 1.5-1.9GB quite quickly.

vasily-kirichenko on 8 Feb 2017

@vasily-kirichenko May be best to try on FSharp.Compiler.Service.sln, and then on VisualFSharpPowerTools.sln (if it loads correctly).

dsyme on 8 Feb 2017

OK with both #2384 and #2388 we get down to this

statistics:     TimeDelta: 27.45     MemDelta:  118     G0:  898     G1:  697     G2:   26

I think that's the end of this round of work once those are in. I believe we've reduced 318MB --> 118MB, a 62% reduction (or equally, you can load and work with solutions roughly 2-3x as big). We can make more savings bit by bit.

@vasily-kirichenko The LanguageServiceProfiling.exe tool has been a godsend, thank you. It makes iterative analysis, baseline comparisons and object-heap-snapshots so much easier, becuase you get to concentrate on just the compiler data structures and FCS caches/objects, absolutely nothing else.

dsyme on 8 Feb 2017

🎉1

After applying both #2384 and #2388:

mem1

mem2

dsyme on 8 Feb 2017

Next steps would be

~5MB: Use a compact representation for Val in the most common case (it's way too big today). Reduces the Tast+Val 10MB to, say, 5MB, but is quite hard work
~4MB: Use a compact representation for Typar in the most common case (it's way too big today). Reduces the Tast+Typar 7MB to, say, 4MB, but is quite hard work
5MB: Use a more compact representation for Item, e.g. one that doesn't allocate a new object for ValRef (thus eliminating the NameResolution+Item_Value`` 3MB). e.g. a private struct wrapper over anobj` that is just the ValRef in the common case. Annoying but gives some value. Also for Item+Types 1.5MB, Item+UnionCase 0.8MB
Look at the ValRef nodes? There seem to be too many of them, also maybe they are too fat in the common case of local values
Look at the number of Typar and TType_var nodes. There seem to be a lot of them. Are they all needed? How many represent solved type variables?

dsyme on 8 Feb 2017

👍2

When those changes land in FCS, I think it will be worth to have another service release of VFPT for sustained usage of VS2015.

smoothdeveloper on 8 Feb 2017

@smoothdeveloper yes, it will be easy.

vasily-kirichenko on 8 Feb 2017

@smoothdeveloper @vasily-kirichenko that would be great

dsyme on 9 Feb 2017

@smoothdeveloper @vasily-kirichenko FSharp.Compiler.Service 10.0.1 is now available with all this work integrated. Add an issue on the FCS repo if you have problems, thanks

dsyme on 11 Feb 2017

Thanks! I'll give a try to get this integrated in VFPT.

smoothdeveloper on 11 Feb 2017

@smoothdeveloper call for help if needed.

vasily-kirichenko on 11 Feb 2017

I've investigated memory usage at a moment what VS got 1.8GB (it shrank to 1.2GB some time later :) ) Looks like ILBinaryReader keep about 20% of memory:

@dsyme Am I right that it's referenced assemblies read into memory? Why it's necessary to keep them in caches? However, I think you said they are accessed by memory mapped files.

vasily-kirichenko on 11 Feb 2017

@vasily-kirichenko I'll have a PR (so CI results and vsix will be available), the changes are minimal apparently.

Regarding memory and ILBinary I've opened this discussion: #2345

I remember there are two types, memory mapped and plain byte array, with the later being used in my case when I took the snapshot.

smoothdeveloper on 11 Feb 2017

Since https://github.com/Microsoft/visualfsharp/pull/2395 was merged, I've found working on VFT solution comfortable. I'm closing this issue.

vasily-kirichenko on 13 Feb 2017

🎉2

@vasily-kirichenko That's awesome. It's one of the key tests of all the work that has been done since moving to Roslyn workspaces and merging with VFPT - that we can dogfood our own tools happily on the VFT codebase itself. Achieving that result has been a long haul - thank you (and everyone else!) so much for what you've done to get us there.

Let's keep using the tools with the VFT solution as much as possible as it really is one key way to keep them at a base level of quality.

dsyme on 13 Feb 2017

🎉1

I can reproduce. Using self built vsix from master is really really nice. I
wish that would be the RTM thing. But if we manage to make it super easy
for everyone to update directly after RTM release then it is probably OK.

Am 13.02.2017 16:53 schrieb "Don Syme" notifications@github.com:

@vasily-kirichenko https://github.com/vasily-kirichenko That's awesome.
It's one of the key tests of all the work that has been done since moving
to Roslyn workspaces and merging with VFPT - that we can dogfood our own
tools happily on the VFT codebase itself. Achieving that result has been a
long haul - thank you (and everyone else!) so much for what you've done to
get us there.

Let's keep using the tools with the VFT solution as much as possible as it
really is one key way to keep them at a base level of quality.

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/Microsoft/visualfsharp/issues/2202#issuecomment-279432627,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AADgNGkni-R5TS4ISXtJxxcLNPJ7dQ59ks5rcHx9gaJpZM4LeH95
.

forki on 13 Feb 2017

👍2

Fsharp: It's very hard to work on VisualFSharp solution due to constant blocking GC

Most helpful comment

All 55 comments

Related issues