After several minutes of working on any file from FSharp.LanguageService.Compiler project devenv.eve occupies ~2.5GB of memory and VS becomes practically unusable:

@vasily-kirichenko We really just have to take this stuff out-of-process (and into a 64-bit process), don't we...
Yes, we do. But it's a huge amount of work.
"just" ;-)
Have you tried running a memory profiler on devenv to find what's retaining memory and/or causing the GCs / GC pressure?
@jack-pappas about millions of times :) It's FSC caches.
@jack-pappas look at this:

So name resolution cache Map occupies 100MB of memory _for map nodes themselves_.
Your HashMap has even larger size per node :( I was thinking about using Dictionary here, but it seems to be used for layering names in nested scopes (I may be wrong though) and immutable map fits this perfectly.
Add of these object sit in Gen 2, of course, that's the problem.
What if you sort by the rightmost column there ('Minimum retained bytes')? E.g., if you look at the second line in your screenshot for NameResolution+CapturedNameResolution, those objects themselves only account for ~38MB, but if you include the transitive closure of the objects they're referencing, it's around ~130MB.
I guess what I'm wondering is -- are all of these cached objects necessary / useful? Maybe it'd be better to have some bounded-size data structures for the caches, or do something like define a record to hold all of the per-compilation caches and hold it in a ConditionalWeakTable<_,_> with some key that'll allow the cache records to be cleaned up / freed once they're no longer needed.
I could implement a snapshottable dictionary type that could be used for the name resolution map; however, some of the size of the Map nodes is the data held by the node, so I wonder how much that'd really gain us in terms of memory usage reduction (maybe 100MB -> 60MB)?
Is there a way to run the code used here in a more standalone way (e.g., have some F# script which loads the FCS assembly and uses it to analyze the FCS code itself)? I'm happy to dig deeper on this to try to improve the performance, and it'd be easiest if I could run the code both inside and outside VS.
@jack-pappas I think running FCS in a script is a fine way to start, maybe adding the code of few analyzers, that would depend on roslyn assembly but would still be possible without VS, but just FCS should get you there in terms of objects in the cache and memory size compared to code which has been loaded/queried.
@jack-pappas @vasily-kirichenko It would, I think, be very productive to go through those objects and work out what isn't necessary. That would easily reduce the cached information by 50%
We do a little of this with the keepAllBackgroundResolutions flag, which is set to false, see
https://github.com/Microsoft/visualfsharp/blob/ee84eb77bcd62379d8178e3167a17d12d204de30/src/fsharp/vs/IncrementalBuild.fs#L1481
It's tricky because one single captured handle somewhere in one of the cached data structures can keep hold of vast amounts of information, most of which will be unnecessary when processing later requests. We really need an automated test in the FCS tests which cleans everything, does some processing, then measures the size of the captured information graph. I think a small amount of reflection code could walking the captured object graph could achieve this.
Cheers
don
@jack-pappas

@vasily-kirichenko Is that switching to try to use ImmutableDictionary? I would expect that not to be better as you will lose sharing between the different immutable maps (many of which have much in common)
@vasily-kirichenko If you could show some more dumps of the current master sorted by both bytes and minimum retained bytes I think I could come up with a decent strategy for trimming a good chunk of stuff, or at least making it more weakly held
Some initial ideas
Reduce incrementalTypeCheckCacheSize to 2 or 3. It is currently 5. This is the number of files whose ParseAndCheckFileInProject results are kept strongly. This certainly needs to be at least 1, so the current file of focus is kept strongly. It benefits from being 2, if switching between files. Switching between 3 files is also common, between 5 is less common (or at least not much more common than 6 or 7)
The big swag of Map nodes is CapturedEnvs in TcResolutions. This contains a lot of information in variious tables. However, in practice I believe only a subset of information is needed, in particular after selecting an environment in GetBestEnvForPos we pass this to either ResolvePartialLongIdent and ResolvePartialLongIdentToClassOrRecdFields (the only two places these are called) These only look at a few of the tables in the NameResolutionEnv. So instead of storing the whole NameResolutioEnv we could just store those tables, and pass those back in to those two routines
There may also be a way of avoiding storing so many NameResolutionEnv in the first place. They are needed relatively rarely - for example when completing a partial identifier Syst, but they are still needed.
I'm a bit shocked at the number and cost of SynExpr.Sequential and SynExpr.Const and SynExpr.UInt16 nodes. The latter must be the parser or lexer tables. But why are we storing all these untyped ASTs? I assume you never opened pars.fs or lex.fs. I think these ASTs must be being held by the background builder in the let parseTreesNode = Vector.Map "ParseTrees" ParseTask stampedFileNamesNode output node from the background build graph. This in turn is used by GetParseResultsForFile (and also consumed by the typechecking nodes in the build graph), in turn to provider the FCS API entry point GetBackgroundParseResultsForFileInProject. However I don't think GetBackgroundParseResultsForFileInProject is used anywhere in the Visual F# Tools. I believe it would be better not to store these parse trees in the result nodes of the background builder at all (e.g. it would be quicker and better to just reparse the background files when requested)
Looking at the results I think these could together reduce FCS memory usage by up to 30-40%, I might be wrong
The big swag of Map nodes is CapturedEnvs in TcResolutions. This contains a lot of information in variious tables. However, in practice I believe only a subset of information is needed, in particular after selecting an environment in GetBestEnvForPos we pass this to either ResolvePartialLongIdent and ResolvePartialLongIdentToClassOrRecdFields (the only two places these are called) These only look at a few of the tables in the NameResolutionEnv. So instead of storing the whole NameResolutioEnv we could just store those tables, and pass those back in to those two routines
TypeChecker and the service both calls functions in NameResolution.fs passing NameResolutionEnv, so we cannot substitute it with a lighter type. What about adding an interface, which contains properties which return only those maps that are needed by the service? Then make NameResolutionEnv and ServiceNameResolutionEnv implement it, and make all the functions used by service accept that interface instead of NameResolutionEnv? I'm afraid, however, that we results with as fat interface as current nenv :(
so we cannot substitute it with a lighter type. ....
@vasily-kirichenko Yeah, I took a look and you are right - the Partial* routines end up using pretty much everything in the NameResolutionEnv. That's a shame - there's no direct saving here.
However there must still be a way to do better here - all those NameResolutionEnv's are taking up vast amounts of memory, for a relatively small part of the autocomplete implementation.
@vasily-kirichenko One idea: instead of storing the final computed NameResolutionEnv at each point (there are a huge number of these and they each have quite a few unique allocations)), what if we store the sequence of operations required to compute each NameResolutionEnv, e.g. a list of NameResolutionEnv -> NameResolutionEnv operations logically corresponding to "open XYZ", "add a value" etc.
My intuition is that the total storage for these lists of closures may be much, much smaller than the total storage for all the computed lookup tables, e.g. the closure for an open may be just 5-10 words, where the computed NameResoutionEnv may have allocated many more nodes in the lookup tables. Effectively I'm saying that in this case
[f1; f2; f3; f4]
is much smaller than
[f1(); f2(f1()); f3(f2(f1())); f4(f3(f2(f1())))]
We then replay the derivation (list of computations) for a selected NameResolutionEnv when that specific NameResolutionEnv is needed for autocomplete (relatively rare), but we never store that environment in any persistent way.
See this draft for example https://github.com/fsharp/FSharp.Compiler.Service/compare/master...dsyme:draft-mem-red?expand=1 That draft is over the FCS repo.
There is of course a risk that recomputing the environments would take some significant time, but I suspect not. Also the operations are somewhat tunable - one can make it that operations like "open" empty the derivation list, replacing it by a closure that returns the computed nenv.
If this worked
@dsyme hmm. If I understand your code properly, what we'll end up is basically a function (closure) that builds all the maps in NameResolutionEnv _every time from scratch, adding to a map one element at a time_, then throw it immediately after one-shot use (in code completion scenario at least. In my "Simplify Name" analyzer I think it'd not work at all because it's already slow). Would be very interesting to see what it's gonna give us in memory consumption (why are you doing it in FCS repo BTW? I'd love to test it right in VFT).
Is there a way to run the code used here in a more standalone way (e.g., have some F# script which loads the FCS assembly and uses it to analyze the FCS code itself)? I'm happy to dig deeper on this to try to improve the performance, and it'd be easiest if I could run the code both inside and outside VS.
@jack-pappas
Here is the process I used to run dottrace on a compile of FCS with latest compiler:
Compiling is not interesting, because caches are not populated. The more interesting would be:
FSharpChecker instancechecker.ParseAndCheckProject passing a proper FSharpProjectOptionschecker.CheckFileInProject for a large fileFSharpFileCheckResults.GetAllUsesOfAllSymbolsI've tried to do just this after which the VSIX refused to build saying something about a not signatured assembly
If an assembly is strong-named, then all InternalsVisisibleTo also need to be strong-named and the InternalsVisibleTo attribute needs to contain the public key. (see http://stackoverflow.com/questions/30943342/how-to-use-internalsvisibleto-attribute-with-strongly-named-assembly)
So you would need to
1) Strong-name the console project
2) Extract the full public key (sn -t youSnk.snk)
3) add the public key to the attribute string in FSharp.LanguageService.Compiler
@dsyme hmm. If I understand your code properly, what we'll end up is basically a function (closure) that builds all the maps in NameResolutionEnv every time from scratch, adding to a map one element at a time, then throw it immediately after one-shot use (in code completion scenario at least. In my "Simplify Name" analyzer I think it'd not work at all because it's already slow).
Currently we store something like ~40,000 fully computed, massive NameResolutionEnv for a large file like TypeChecker.fs. Basically none of these data structures are ever accessed by the core feature set, it's just a vast amount of dead weight memory. One or two may be needed to support various GetDeclarations and GetToolTip calls as the user hovers the mouse, but even in those cases the NameResolutionEnv is not always needed. As soon as the use types a character the whole lot is recomputed.
Recomputing the NameResolutionEnv on demand (e.g. on hover, in the cases it is actualy needed) feels like it is going to be fast enough. We can amortize those recomputations, or amortize the processing of "open" declarations. Everything else along a particular path to a point in the file will, I think, be fast.
This analysis is based on the assumption that GetDeclarations and GetTooTip calls are only made for a few locations in that file. I need to understand what calls the Simplify Name analyzer is using. If the memory savings are large we might choose to turn off the Simplify Name analyzer or make it opt-in. I don't see that analyzer as critical to the feature set, whereas decreasing memory usage by 2x or 5x is critical.
In any case, certainly we are storing far, far too much information for those zillions of unused NameResolutionEnv. We need to understand if we can stop doing that, and with what effect, or else we are forever doomed to have excessive memory usage.
(Notes to self)
The following FCS entry points may ultimately cause NameResolutionEnv objects to be needed (or, in the draft above, reconstituted):
GetDeclarationListInfoGetDeclarationListSymbolsGetStructuredToolTipTextAlternateGetToolTipTextAlternateGetF1KeywordAlternateGetMethodsGetMethodsAsSymbolsGetDeclarationLocationAlternateGetSymbolUseAtLocationAs expected these are all the "get me some specific information at this specific location" calls.
[ Aside: not all the above calls always result in reconstitution - indeed far from it - and not all need the entire reconstituted NameResolutionEnv. For example, the majority of consumption points immediately call ResolveCompletionsInType. This in turn only uses two small tables from the reconstituted NameResolutionEnv to look up extension members. Two consumption points call ResolvePartialLongIdent and ResolvePartialLongIdentToClassOrRecdFields respectively which do require the whole information. ]
The questions are
To answer (1) I went looking at calls to the above in the VFPT and VFT feature set. Here's what I see in VFPT:
GetOpenDeclarationTooltip calling GetToolTipTextAlternateGetDeclarationLocation using GetDeclarationLocationAlternate and used as part of the the Peek Definition implementation and GoToDefinition filterGetSymbolUseAtLocation is used in quite a lot of places. Here's what I see in VFT today:
GetDeclarationListInfo used by the auto-complete implementation hereGetDeclarationListSymbols is not used but may be used by the auto-compete impementation in the futureGetStructuredToolTipTextAlternate is used by the quick info providerGetF1KeywordAlternate is used by the help context serviceGetMethodsAlternative is used by the signature help implementationGetMethodsAsSymbols is not usedGetDeclarationLocationAlternate is used by the GoToDefinition and FindReferences services.GetSymbolUseAtLocation is used by the QuickInfo provider, the document highlighting service, the ImplementInterface quick fix and the changeAllSymbols helper. and the Inline Rename Service.To answer (2) - In both the VFT and VFPT feature set, it is the uses of GetSymbolUseAtLocation that look most challenging (i.e. most likely to require the en-masse reconstitution of NameResolutionEnv across the whole file).
To answer (3) - my gut feeling is that GetSymbolUseAtLocation should never really need these NameResolutionEnv objects at all. The whole point of that method is to consult only the table of precise name resolutions, not to perform auto-complete.
In short, I believe the following
Overall it's a non-trivial piece of work but i feel we should take a look at rooting out this memory-hogging evil in the Update 1 or Update 2 timeframe.
Scenario:
TypeChecker.fs, wait for syntax highlighting
Now tell me what the hell has eaten the memory? 4 copies of type checked info for this files and 2 copies of the whole project's info? 6 copies? Why?
@vasily-kirichenko I plan to work on this today. I think I will first try to adjust your memory-performance app to get a set of reliable memory statistics on each compiler performance run, then I will try some of the techniques above.
@dsyme thank you. My VS 2017 crashes at work, so I opened VFT solution in 2015 and edited TypeChecker.fs. It stood around 800MB. Does the old compiler service cache less info? Or maybe it uses better caches?
@dsyme thank you. My VS 2017 crashes at work, so I opened VFT solution in 2015 and edited TypeChecker.fs. It stood around 800MB. Does the old compiler service cache less info? Or maybe it uses better caches?
It cached less info
@vasily-kirichenko See https://github.com/Microsoft/visualfsharp/pull/2377 for a 25% reduction in background memory usage. Will now try the parse trees
@vasily-kirichenko See https://github.com/Microsoft/visualfsharp/pull/2378 for another reduction of about the same amount.
It would be good to get new memory traces after this - if there is some way to capture them automatically that would be even better. We can probably find the next low hanging fruit pretty easily now that the big noise is gone. (I don't have the jetBrains memory profiling tools installed, or anything but the VS tools - I've tried but always had issues keeping them)
OK, worked out how to drive the VS tools to get a basic analysis. Also added a "P for pause" to the memory perf tool. Here are inclusive and exclusive snapshots for "P" in
.\Release\net40\bin\LanguageServiceProfiling.exe ..\FSharp.Compiler.Service CXGSCGPS
after the two changes above
OK the sort-by-exclusive-size gives this easy reduction, chopping off another 2.2% https://github.com/Microsoft/visualfsharp/pull/2378/commits/235a9552dc8659d68b6171419eca1522966f99e1
Two new snapshots after applying #2377, #2378 and https://github.com/Microsoft/visualfsharp/pull/2384 (but not #2381 or #2380)
Exclusive:
Inclusive
about 7% less memory. Do you have a snapshot before all your optimization applied?
@vasily-kirichenko No though it's roughly 318MB --> 139MB (that's the change in the additional memory for checking the project)
wow :) So you've reduced the amount of data we keep for a project at about 2.3 times :)
The next step is investigating the caches behavior?
@vasily-kirichenko I'll probably leave it at that for the moment. The caches definitely need work - I'm still concerned by the 200 - but I don't have a specific angle I want to follow just yet
@vasily-kirichenko Looks like we knock another 20MB off that: https://github.com/Microsoft/visualfsharp/pull/2388
coool :)
Tried the current master on VFT solution. Opened TypeChecker and some other files, edited here and there a bit, executed Find all usages. devenv.exe was staying at 1.8GB. However, after not touching it for about 2 minutes, it's already 1.9GB and CPU at 20% all the time, see https://github.com/Microsoft/visualfsharp/issues/2389
It seems it deviates between 1.5-1.9GB quite quickly.
@vasily-kirichenko May be best to try on FSharp.Compiler.Service.sln, and then on VisualFSharpPowerTools.sln (if it loads correctly).
OK with both #2384 and #2388 we get down to this
statistics: TimeDelta: 27.45 MemDelta: 118 G0: 898 G1: 697 G2: 26
I think that's the end of this round of work once those are in. I believe we've reduced 318MB --> 118MB, a 62% reduction (or equally, you can load and work with solutions roughly 2-3x as big). We can make more savings bit by bit.
@vasily-kirichenko The LanguageServiceProfiling.exe tool has been a godsend, thank you. It makes iterative analysis, baseline comparisons and object-heap-snapshots so much easier, becuase you get to concentrate on just the compiler data structures and FCS caches/objects, absolutely nothing else.
After applying both #2384 and #2388:
Next steps would be
~5MB: Use a compact representation for Val in the most common case (it's way too big today). Reduces the Tast+Val 10MB to, say, 5MB, but is quite hard work
~4MB: Use a compact representation for Typar in the most common case (it's way too big today). Reduces the Tast+Typar 7MB to, say, 4MB, but is quite hard work
5MB: Use a more compact representation for Item, e.g. one that doesn't allocate a new object for ValRef (thus eliminating the NameResolution+Item_Value`` 3MB). e.g. a private struct wrapper over anobj` that is just the ValRef in the common case. Annoying but gives some value. Also for Item+Types 1.5MB, Item+UnionCase 0.8MB
Look at the ValRef nodes? There seem to be too many of them, also maybe they are too fat in the common case of local values
Look at the number of Typar and TType_var nodes. There seem to be a lot of them. Are they all needed? How many represent solved type variables?
When those changes land in FCS, I think it will be worth to have another service release of VFPT for sustained usage of VS2015.
@smoothdeveloper yes, it will be easy.
@smoothdeveloper @vasily-kirichenko that would be great
@smoothdeveloper @vasily-kirichenko FSharp.Compiler.Service 10.0.1 is now available with all this work integrated. Add an issue on the FCS repo if you have problems, thanks
Thanks! I'll give a try to get this integrated in VFPT.
@smoothdeveloper call for help if needed.
I've investigated memory usage at a moment what VS got 1.8GB (it shrank to 1.2GB some time later :) ) Looks like ILBinaryReader keep about 20% of memory:


@dsyme Am I right that it's referenced assemblies read into memory? Why it's necessary to keep them in caches? However, I think you said they are accessed by memory mapped files.
@vasily-kirichenko I'll have a PR (so CI results and vsix will be available), the changes are minimal apparently.
Regarding memory and ILBinary I've opened this discussion: #2345
I remember there are two types, memory mapped and plain byte array, with the later being used in my case when I took the snapshot.
Since https://github.com/Microsoft/visualfsharp/pull/2395 was merged, I've found working on VFT solution comfortable. I'm closing this issue.
@vasily-kirichenko That's awesome. It's one of the key tests of all the work that has been done since moving to Roslyn workspaces and merging with VFPT - that we can dogfood our own tools happily on the VFT codebase itself. Achieving that result has been a long haul - thank you (and everyone else!) so much for what you've done to get us there.
Let's keep using the tools with the VFT solution as much as possible as it really is one key way to keep them at a base level of quality.
I can reproduce. Using self built vsix from master is really really nice. I
wish that would be the RTM thing. But if we manage to make it super easy
for everyone to update directly after RTM release then it is probably OK.
Am 13.02.2017 16:53 schrieb "Don Syme" notifications@github.com:
@vasily-kirichenko https://github.com/vasily-kirichenko That's awesome.
It's one of the key tests of all the work that has been done since moving
to Roslyn workspaces and merging with VFPT - that we can dogfood our own
tools happily on the VFT codebase itself. Achieving that result has been a
long haul - thank you (and everyone else!) so much for what you've done to
get us there.
Let's keep using the tools with the VFT solution as much as possible as it
really is one key way to keep them at a base level of quality.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/Microsoft/visualfsharp/issues/2202#issuecomment-279432627,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AADgNGkni-R5TS4ISXtJxxcLNPJ7dQ59ks5rcHx9gaJpZM4LeH95
.
Most helpful comment
I can reproduce. Using self built vsix from master is really really nice. I
wish that would be the RTM thing. But if we manage to make it super easy
for everyone to update directly after RTM release then it is probably OK.
Am 13.02.2017 16:53 schrieb "Don Syme" notifications@github.com:
@vasily-kirichenko https://github.com/vasily-kirichenko That's awesome.
It's one of the key tests of all the work that has been done since moving
to Roslyn workspaces and merging with VFPT - that we can dogfood our own
tools happily on the VFT codebase itself. Achieving that result has been a
long haul - thank you (and everyone else!) so much for what you've done to
get us there.
Let's keep using the tools with the VFT solution as much as possible as it
really is one key way to keep them at a base level of quality.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/Microsoft/visualfsharp/issues/2202#issuecomment-279432627,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AADgNGkni-R5TS4ISXtJxxcLNPJ7dQ59ks5rcHx9gaJpZM4LeH95
.