sudo apt install chafa
curl https://media.giphy.com/media/12UwsVgQCYL3H2/giphy.gif --output winanim.gif
chafa winanim.gif --font-ratio 1/3
Edit: On Ubuntu 18.04, follow directions here to get chafa sources and build 'em:
https://hpjansson.org/chafa/
Edit2: after doing a ./configure and sudo apt install loop as you realize stuff is missing, you'll get all the way through and it will whine about not being able to find the lib. Do ldconfig and it will shut up.
Edit3:
ubuntu
cd ~
mkdir chafa
cd chafa
curl https://hpjansson.org/chafa/releases/chafa-1.4.0.tar.xz --output chafa.tar.xz
tar xf chafa.tar.xz chafa-1.4.0/
cd chafa-1.4.0/
sudo apt install gcc pkg-config libgtk2.0-dev libmagickwand-dev
sudo ldconfig
./configure
make
sudo make install
WPR analysis:
For conhost.exe...
On the I/O thread, hot areas include:
- 2007ms spent notifying accessibility eventing
1. Most of this (1405ms) spent inside user32.dll!NotifyWinEvent
1. This call causes a syscall/kernel transition which is SLOW
1. The best thing to do here is probably detect that no one needs the event and not transmit it OR
1. Transmit it less often by coalescing the accessibility events into frames much like the renderer
- 1196ms spent adjusting the cursor position
1. This is mostly attributable (823ms) to figuring out whether the cursor is sitting on top of a 2-column-wide character (so it can move 2 spaces right instead of 1). It looks like we're doing this the wrong way and wasting time here since _lookupIsWide is called below for another purpose and retrieving the same information and that is also taking a lot of time.
1. 236ms also spent in kernelbase.dll!SetEvent to trigger the render thread (probably can't avoid, need kernel object to notify a potentially sleeping thread...)
- 877ms spent run-length-encoding colors
1. 634ms spent in vector reallocation for holding the run-length-encoded colors. This could maybe be re-strategized to leave a bit of excess memory usage around in exchange for not reallocing so hard.
- 608ms spent looking up the narrow/wideness of characters (_lookupIsWide) during insertion
1. This is actually backending on gdi32full.dll!GetCharABCWidthsW versus the current font to figure out how wide it is going to be when we don't otherwise know because it's an ambiguous width character.
On the render thread, hot areas include:
- 4386ms in gdi32full.dll!PolyTextOutW
1. I don't think there's anything to be done here. I think this is just the consequence of trying to emit a ton of text really fast
I didn't do a wait chain analysis yet to see if the locking/threading was slowing things down because at this point, we have a few areas with obvious routes to improvement that might alleviate the whole deal:
Therefore, my conclusion is:
And I have now filed MSFT: 21167256 to do these things at some point and hopefully we'll have fixed the performance issue.
Thanks Michael, that's a nice analysis. So, this is probably a moot question, but I'm assuming PolyTextOutW is called _once_ per render frame and only emits text for invalidated regions?
Correct.
Okay, excuse the debugging by proxy ;) I'll get back to bundling up a PR for you... :)
No problem. It was a fair question to ask. We make dumb mistakes all the time.
@oising, somehow this morning when I'm looking at this, it's not as slow on my machine as it was in one of your videos. I implemented the 3rd thing above (GDI size measurement caching) quickly as I thought it was the best cost-benefit ratio and it improved things by 20-30%, but I want to make sure that I'm actually fixing your problem.
Can you possibly send me a WPR trace of your specific repro? (First Level Triage + CPU Usage Profiles? Let me know if you need help on how to do this.)
@miniksa Sure, I've got the WPR trace now. Where shall I send it?
Email me the attachment or a link to a share at Microsoft.com. My GitHub alias is unoriginal and is my work address too. Just don't sign me up for spam please.
I'm sorry, I don't understand -- your github alias is unoriginal? I don't know what you mean.
My e-mail is my Github alias @microsoft.com. Sorry for being obtuse, I'm trying to avoid spam bots picking it up if I write the real mailto:
The major time spent on the WinEvent turns out to be only if Node.js is running on your system.
If Node.js is running, it registers for the WinEvent notifications for EVENT_CONSOLE_LAYOUT to know when the window size has changed. Given WinEvents require kernel work to broadcast and tend to be registered globally, this causes a system-wide slowdown of all of your consoles when it is listening here.
https://github.com/nodejs/node/blob/0109e121d3a2f87c4bad75ac05436b56c9fd3407/deps/uv/src/win/tty.c line 2294
If you kill all node.js runtimes (including the one that Visual Studio 2017 launches), that performance drag goes away.
I need to:
I checked ReadConsoleInput queue is filled with a WINDOW_BUFFER_SIZE_RECORD at more or less the same time and circumstances as when EVENT_CONSOLE_LAYOUT is dispatched over NotifyWinEvent.
Given Node.js in the tty file is already reading through the queue with ReadConsoleInput and discarding all non-KEY_EVENT records... they could probably drop the whole MSAA hookup and just get the events there in a performant manner instead of hijacking an accessibility feature.
Of course, this also screams of #281 needing to implement a better way overall of receiving these sorts of events, but it's going to be a bit before we get to that.
That rabbit hole is getting deep :D Very interesting to read.
Yes... It is...
@DHowett-MSFT promised me he'd drive resolution with Node (or libuv upstream).
Follow up with the WinEvent team to see if they can tell me Node.js is only listening for EVENT_CONSOLE_LAYOUT and not all the messages (because the expensive ones aren't the layout messages, but because MSAA/WinEvent infrastructure is very old... registering for any one registers you for all of them.)
The answer to this is "no". We'd have to make categories for each event that we wanted to register separately and I'm not sure there are enough category flags left. Also, no one wants to touch this given it's legacy tech. We will need to drive improvement of this through the other options.
So, the upshot of this is that any time NodeJS is running (e.g. Visual Studio) then all console windows suffer an approximate 20% slowdown (or worse) due to accessibility eventing broadcasts. Urgh.
The first quick fix for this (GDI measurement caching) just went out with insider build 18932!
Most helpful comment
The major time spent on the WinEvent turns out to be only if Node.js is running on your system.
If Node.js is running, it registers for the WinEvent notifications for EVENT_CONSOLE_LAYOUT to know when the window size has changed. Given WinEvents require kernel work to broadcast and tend to be registered globally, this causes a system-wide slowdown of all of your consoles when it is listening here.
https://github.com/nodejs/node/blob/0109e121d3a2f87c4bad75ac05436b56c9fd3407/deps/uv/src/win/tty.c line 2294
If you kill all node.js runtimes (including the one that Visual Studio 2017 launches), that performance drag goes away.
I need to: