Trace: https://developercommunity.visualstudio.com/content/problem/379694/another-repro-trace-for-github-bug-4718.html
Trace: https://developercommunity.visualstudio.com/content/problem/245786/slow-f-editing-experience-up-to-a-minute-until-typ.html
Gen | Count | MaxPause | MaxPeak MB | Max AllocMB/sec | TotalPause | TotalAlloc MB | Alloc MB/MSec GC | Survived MB/MSec GC | MeanPause | Induced
-- | -- | -- | -- | -- | -- | -- | -- | -- | -- | --
ALL | 506 | 953.4 | 2,638.9 | 116,179.562 | 18,629.5 | 231.4 | 0.0 | 34.716 | 36.8 | 0
0 | 443 | 12.7 | 2,638.9 | 116,179.562 | 2,303.8 | 93.8 | 10.0 | 0.037 | 5.2 | 0
1 | 43 | 13.2 | 2,636.8 | 96.406 | 265.3 | -1.6 | 1.1 | 0.228 | 6.2 | 0
2 | 20 | 953.4 | 2,638.9 | 1.985 | 16,060.4 | 139.1 | 69.4 | 677.189 | 803.0 | 0
In the above trace, we noticed that we had a tonne of Gen2 collections due to LOH allocations. The first one we noticed that parseFormatStringInternal splits (what appears to be) the source file many times over. For large source files these all end up on the LOH heap:

It also does a Replace that is causing a bunch too:

mmm that's interesting. I just looked briefly at that file/function and there is no direct invoction of Split nor Replace. Do you have more details about that stack trace?
@forki look at the code in branch dev15.9 https://github.com/Microsoft/visualfsharp/blob/dev15.9/src/fsharp/CheckFormatStrings.fs
@AviAvni ? so you are saying it's already fixed? or what?
Possibly in https://github.com/Microsoft/visualfsharp/pull/5121 - but that was 5 months ago, as to why that hasn't made it in 15.9...
@forki not really the code moved to somewhere else in https://github.com/Microsoft/visualfsharp/pull/5121
to https://github.com/Microsoft/visualfsharp/blob/master/src/fsharp/NameResolution.fs#L1538-L1548
but i have a code that can fixed that
seq {
let mutable pos = 0
yield pos
for c in source do
if c = '\r' then ()
if c = '\n' then yield pos
else pos <- pos + 1
}
|> Seq.toArray
I'll send PR soon
Ouch. that part looks indeed dangerous

@forki when checking the difference with my code
open System.IO
// Learn more about F# at http://fsharp.org
// See the 'F# Tutorial' project for more help.
let positions1 (source:string) =
let source = source.Replace("\r\n", "\n").Replace("\r", "\n")
source.Split('\n')
|> Seq.map (fun s -> String.length s + 1)
|> Seq.scan (+) 0
|> Seq.toArray
let positions2 (source:string) =
seq {
let mutable pos = 0
yield pos
for c in source do
if c = '\r' then ()
if c = '\n' then yield pos
else pos <- pos + 1
}
|> Seq.toArray
let str = File.ReadAllText(@"C:\israel-public-transportation\trips.txt")
#time
let v1 = positions1 str
let v2 = positions2 str
you can see the result that no gc on really big strings the file is 28MB
>
Real: 00:00:00.238, CPU: 00:00:00.265, GC gen0: 11, gen1: 5, gen2: 1
val v1 : int [] =
[|0; 64; 113; 162; 211; 260; 309; 358; 407; 456; 505; 554; 603; 652; 701;
750; 799; 848; 897; 946; 995; 1044; 1093; 1142; 1191; 1240; 1289; 1338;
1387; 1436; 1485; 1534; 1583; 1632; 1681; 1730; 1779; 1828; 1877; 1926;
1975; 2024; 2073; 2122; 2171; 2220; 2271; 2322; 2373; 2424; 2475; 2526;
2577; 2628; 2679; 2730; 2781; 2832; 2883; 2934; 2985; 3036; 3087; 3138;
3189; 3240; 3291; 3342; 3393; 3444; 3495; 3546; 3597; 3648; 3699; 3750;
3801; 3850; 3899; 3948; 3997; 4046; 4095; 4144; 4193; 4242; 4291; 4340;
4389; 4438; 4487; 4536; 4585; 4634; 4683; 4732; 4781; 4830; 4879; 4928;
...|]
>
Real: 00:00:00.155, CPU: 00:00:00.156, GC gen0: 0, gen1: 0, gen2: 0
val v2 : int [] =
[|0; 64; 113; 162; 211; 260; 309; 358; 407; 456; 505; 554; 603; 652; 701;
750; 799; 848; 897; 946; 995; 1044; 1093; 1142; 1191; 1240; 1289; 1338;
1387; 1436; 1485; 1534; 1583; 1632; 1681; 1730; 1779; 1828; 1877; 1926;
1975; 2024; 2073; 2122; 2171; 2220; 2271; 2322; 2373; 2424; 2475; 2526;
2577; 2628; 2679; 2730; 2781; 2832; 2883; 2934; 2985; 3036; 3087; 3138;
3189; 3240; 3291; 3342; 3393; 3444; 3495; 3546; 3597; 3648; 3699; 3750;
3801; 3850; 3899; 3948; 3997; 4046; 4095; 4144; 4193; 4242; 4291; 4340;
4389; 4438; 4487; 4536; 4585; 4634; 4683; 4732; 4781; 4830; 4879; 4928;
...|]
iz's still creating a full copy. Do we really need that?
What full copy?
a full copy of the source file in memory.
In parseFormatStringInternal we are just iterating over that thing anyway can't we just somehow handle the line breaks on the fly while iterating over the string?
OK I debugged it. I only able to reproduce calling to this code when visual studio F# tools(so able to do it only from the dev15.9 branch)
the using of the source is because of this call
https://github.com/Microsoft/visualfsharp/blob/master/src/fsharp/service/service.fs#L1734
that specify TcResultsSink.WithSink which I don't understand what is this beside this parsing
anyway after this this call this method called again without the source
So I can send a PR that remove all of this allocations but don't know why this source is used so don't know what to to test if I change it
I dug into the trace a little more, this trace is almost entirely caused by Gen2 collections, caused by the above two LOH allocations. I filed a couple of internal bugs to track the other LOH allocations, but they are not the central cause. I closed the above linked developer community bug in lieu of this one.
@aviavni your proposed change is almost certainly already a big improvement. So please send that PR. But my feeling here is that we can do even more.
@forki PR sent
@AviAvni can you do another comparison with the full changes from #5940?
When #5943 is merged we should do another comparison. @blumu if you subscript to the F# nightly feed then you can see if the changes make a difference. We should still measure this a bit more though.
@cartermp We should grab @blumu's solution and do up-front analysis to confirm this actually fixes the problem. We should be using data to make sure we're fixing the right things.
Yep, we've got it and we're currently validating another fix. We can do that with this, too.
@cartermp I've subscribed to F# nightly. Latest release is from Monday 19th before Avi's PR was merged. I'll let you know how it goes once I get the version with the fix.
@davkean @cartermp @TIHan I was able to repro typing slow down using nightly build of F# compiler on VS 15.9.3. This time, VS did not yet reach the state where it hangs completely, nor did it crash yet. The ETL trace seems to reveal many Gen2 garbage collections, but I'll let the expert take a look. I will share the trace privately.
Thanks, send it on and we'll take a look.
@davkean Trace shared privately. (My VS session is now hanging pretty badly and is unusable, I have to kill it before it dies on its own.)
Memory usage:


Am I reading that right? Only 700MB managed heap? We must have a native leak somewhere. I'm downloading trace now.
@davkean you are reading the plot correctly. Most of it is private data (in yellow). The trace, though, seems to indicate GC issues. (VS has crashed on its own btw.) Here is the GC Rollup By Generation:
| ALL | 1749 | 5,331.4 | 2,600.6 | 1,678,811.997 | 26,931.8 | 97,893.3 | 3.6 | 0.499 | 15.4 | 11|
|-----|----|----|----|----|----|----|----|----|----|----|
| 0 | 269 | 51.0 | 2,492.5 | 1,621,967.774 | 2,250.2 | 41,995.2 | 0.0 | 0.042 | 8.4 | 0 |
| 1 | 1460 | 46.1 | 2,600.6 | 1,678,811.997 | 18,482.3 | 54,527.7 | 0.2 | 0.092 | 12.7 | 0 |
| 2 | 20 | 5,331.4 | 2,595.3 | 347.249 | 6,199.2 | 1,370.3 | 0.1 | 0.636 | 310.0 | 11 |
Yeah but the trace shows the peak GC heap size was 2.6 GB.
Given the VM reserved for image loads, etc, that is pretty high. Also, could some of the native VM be the memory mapped files for metadata reading?
Looking at the GC stats trace (unfortunately, the trace is missing allocation data), the GC heap hovers from about 2.1 -> 2.6 GB, not sure how that maps to that VMMap above.
Which name/color are you referring for the native VM?
Sorry, I was just thinking without really looking closely at the VM Map. It doesn't look like Memory mapped files are a huge problem, as they are around 135MB. My point however was that a 2.6GB Managed heap when added to the 750MB of images above is really quite a lot of address space. Agreed that that doesn't seem to map to the VMMap snapshot above though, unless "Private" is somehow part of the managed memory? Maybe the fact that LOH allocations go directly to VirtualAlloc means that VMMap shows them as Private?
750MB of images seems a lot but I don't have a reference point. I'm thinking the same thing on the GC heap.
It is a lot, but it's in line with my expectations for VS unfortunately.
Just for summary of the thread, the trace was missing allocation data so it didn't give insight into the issue. GC time was pretty much all CPU, with a splattering of other stacks.
Will close this as #5940 does remove the source file allocations from this function.
Just wanted to mention that I grabbed a trace from a customer report today where this was causing ~130 MB to be allocated on the LOH. Thanks for making this better @AviAvni 馃檪
@cartermp any time just ping me if I didn't noticed opportunity to help
Most helpful comment
Just wanted to mention that I grabbed a trace from a customer report today where this was causing ~130 MB to be allocated on the LOH. Thanks for making this better @AviAvni 馃檪