Would like to see Sixel support in the Terminal, this is the standard used to show graphics in the console.
Sixel is part of the original DEC specification for doing graphics in terminals and has been re-popularized in recent years for doing graphics on the command line, in particular by Pythonistas doing data science.
The libsixel library provides an encoder but is also a great introduction to the subject (better than the Wikipedia page):
While implementing Sixel, it is important to test with images that contain transparency.
Transparency can be achieved by drawing pixels of different colors but not drawing some pixels in any of the Sixel colors, leaving the background color as it.
I believe this is the only way to properly draw non-rectangular Sixels, and would be especially nice with the background acrylic transparency in the new Windows Terminal.
Testing using WSL with Ubuntu for example, in mlterm such images are properly rendered as having a transparency mask and the background color is kept, while in xterm -ti vt340, untouched pixels are drawn black, even though the background is white, which seems to imply they render sixels on a memory bitmap initialized as black without transparency mask or alpha before blitting them into the terminal window.
OOh. Sixel is very cool stuff.
I've decided that I need that. NEED.
I'll happily review a PR :)
Caught the Build 2019 interview today that mentioned this request. I still maintain that Xorg on sixel is just wrong. So _very very wrong_.
The ffmpeg-sixel "Steve Ballmer Sells CS50" demo never gets tired tho. Gotta say, it is a little disappointing the video lacks sound (sound really makes the video). Consoles already have sound, naturally. They totally beep. Precedent set. What we really _need_ is a new CSI sequence for the opus clips interleaved with the frames, amirite?
Ken, I truly deserve this for mentioning Sixels ;)
From: therealkenc notifications@github.com
Sent: Wednesday, May 8, 2019 4:31:31 PM
To: microsoft/Terminal
Cc: Subscribed
Subject: Re: [microsoft/Terminal] Sixel graphics support (#448)
Caught the Build 2019https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmybuild.techcommunity.microsoft.com%2Fhome%23top-anchor&data=01%7C01%7Cduhowett%40microsoft.com%7C81f48be19f374665cd3408d6d40d4dc6%7C72f988bf86f141af91ab2d7cd011db47%7C1&sdata=i8rfPCaN%2FxqdF%2F4qRtdN2Py4%2BVRlbPgpwJWtPZSGGHc%3D&reserved=0 interview today that mentioning this request. I still maintain that Xorg on sixel is just wronghttps://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fmicrosoft%2FWSL%2Fissues%2F1099%23issuecomment-248513013&data=01%7C01%7Cduhowett%40microsoft.com%7C81f48be19f374665cd3408d6d40d4dc6%7C72f988bf86f141af91ab2d7cd011db47%7C1&sdata=J%2BwCnn0z70FkI9lDcus1nMXcKz1P0ArL%2Bmdz5oi9uDo%3D&reserved=0. So very very wrong.
The ffmpeg-sixelhttps://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fsaitoha%2FFFmpeg-SIXEL&data=01%7C01%7Cduhowett%40microsoft.com%7C81f48be19f374665cd3408d6d40d4dc6%7C72f988bf86f141af91ab2d7cd011db47%7C1&sdata=G%2F9mvw1EdADkwChSbHZ%2FI54k9xvXagV%2FxD9VbJtyw7g%3D&reserved=0 "Steve Ballmer Sells CS50" demohttps://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.youtube.com%2Fwatch%3Fv%3D7z6lo4aq6zc%26feature%3Dyoutu.be&data=01%7C01%7Cduhowett%40microsoft.com%7C81f48be19f374665cd3408d6d40d4dc6%7C72f988bf86f141af91ab2d7cd011db47%7C1&sdata=6IVwBHs6%2F43rXdk6GabiSUpTFS86xUGB6bubfkS3ea0%3D&reserved=0 never gets tired tho. Gotta say, it is a little disappointing the video lacks sound (sound really makes the videohttps://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.youtube.com%2Fwatch%3Fv%3DEl2mr5aS8y0&data=01%7C01%7Cduhowett%40microsoft.com%7C81f48be19f374665cd3408d6d40d4dc6%7C72f988bf86f141af91ab2d7cd011db47%7C1&sdata=Mm1ICN5KcgrP5YmdAZsUCzUKbVQDtxFE1qAEpkhKiZk%3D&reserved=0). Consoles already have sound, naturally. They totally beep. Precedent set. What we really need is a new CSI sequencehttps://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fen.wikipedia.org%2Fwiki%2FANSI_escape_code%23CSI_sequences&data=01%7C01%7Cduhowett%40microsoft.com%7C81f48be19f374665cd3408d6d40d4dc6%7C72f988bf86f141af91ab2d7cd011db47%7C1&sdata=29pJq5661TXtnn2huLyUMgebTyYMEhTKXpAm19jzqHU%3D&reserved=0 for the opushttps://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fen.wikipedia.org%2Fwiki%2FOpus_(audio_format)&data=01%7C01%7Cduhowett%40microsoft.com%7C81f48be19f374665cd3408d6d40d4dc6%7C72f988bf86f141af91ab2d7cd011db47%7C1&sdata=XOq6Acz4%2B7gQeTKQBQ2fYJPnoLvx6vUjmLRhgOX1eDo%3D&reserved=0 clips interleaved with the frames, amirite?
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHubhttps://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fmicrosoft%2FTerminal%2Fissues%2F448%23issuecomment-490688164&data=01%7C01%7Cduhowett%40microsoft.com%7C81f48be19f374665cd3408d6d40d4dc6%7C72f988bf86f141af91ab2d7cd011db47%7C1&sdata=pnXPvsuGF7l5mQfU2htzFwJnqZjEuW4zNuh1HaBJnKM%3D&reserved=0, or mute the threadhttps://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FADNHLGXQOYKINZMIBKTB4LTPUNPFHANCNFSM4HLENFOQ&data=01%7C01%7Cduhowett%40microsoft.com%7C81f48be19f374665cd3408d6d40d4dc6%7C72f988bf86f141af91ab2d7cd011db47%7C1&sdata=%2F4pMmm7bvPa%2BbFmE1gyN8%2BoTZDKJyRksBrkJpDh%2BLug%3D&reserved=0.
Related: #120
Need.
LOL I was watching the stream and I just thought to myself "here's my boss assigning me work live in front of a studio audience".
Please make this a priority for v1.0!
3d animations can be v1.5 😛
OMG
Upvoting this request, Sixels would be such an amazing thing to have in the Terminal.
This weekend I finished implementing sixel read support for my MIT-licensed Java-based TUI library, and it was surprisingly straightforward. The code to convert a string of sixel data to a bitmap image is here, and the client code for the Sixel class is here.
I have done very little for performance on the decoder. But when using the Swing backend, performance is still OK, as seen here. (The snake image looks bad only because byzanz used a poor palette creating the demo gif.) I was a bit taken aback how quickly it came together. It's very fair to say that the "decode sixel into bitmap" part is the easy bit, the hard bit is the "stick image data into a text cell, and when that is present blit the image to screen rather than the character".
Just want to mention it to other folks interested in terminal support for sixel, and hoping it could help you out.
I'll upvote if someone else writes a Jupyter notebook client ;)
We already have an example of Sixel support in mintty which is written in C (vice java). Only thing needed is a refactor to C++ (at least for initial support). Still always good to see how it's been implemented in other projects.
We already have an example of Sixel support in mintty which is written in C (vice java). Only thing needed is a refactor to C++ (at least for initial support). Still always good to see how it's been implemented in other projects.
Any issues with mintty's license (GPLv3 or later)?
From that link:
Sixel code (sixel.c) is relicensed under GPL like mintty with the
permission of its author (kmiya@culti)
If you transliterate that exact code to C++, the derivative work would need to be licensed GPLv3 or later, as per its terms, or not distributed at all. (One could also ask kmiya@culti if they are willing to offer sixel.c under a different license, or if it was once available under something else find a copy from that source.)
I don't know what is acceptable or not for inclusion in Windows Terminal -- my quick glance at Windows Terminal says it is MIT licensed, so depending on how it is linked/loaded using a direct descendant of mintty's GPLv3+ sixel.c could lead to a license issue.
Anyway, sorry to be bugging someone else's project here, heading back to the cave now...
There is a sixel capable, humble terminal emulator widget written in C/C++ for Windows/Linux, and it has a SixelRenderer class which you can use, (though it needs some optimization), and it has a BSD-3 license. Arguably its biggest downside is that it is written for a specific C++ framework. Still, IMO the SixelRenderer's code is translatable with little effort. (I know this because I am its author. :) )
https://github.com/ismail-yilmaz/upp-components/tree/master/CtrlLib/Terminal
While implementing Sixel, it is important to test with images that contain transparency.
Transparency can be achieved by drawing pixels of different colors but not drawing some pixels in any of the Sixel colors, leaving the background color as it.
I believe this is the only way to properly draw non-rectangular Sixels, and would be especially nice with the background acrylic transparency in the new Windows Terminal.Testing using WSL with Ubuntu for example, in mlterm such images are properly rendered as having a transparency mask and the background color is kept, while in xterm -ti vt340, untouched pixels are drawn black, even though the background is white, which seems to imply they render sixels on a memory bitmap initialized as black without transparency mask or alpha before blitting them into the terminal window.
hmm. the VT340 i'm in front of honors the P2 parameter in the DCS P1 ; P2 ; P3 ; q sequence that initiates the SIXEL sequence. Xterm, on the other hand, seems to ignore it. But if you use the raster attributes sequence ( " Pan ; Pad ; Ph ; Pv ) and give it a height and width, it will clear the background so you get a black pixel.
i was thinking about getting the free trial of the ttwin emulator and checking out how it's behviour differs from the VT340 and the Xterm acting as a VT340.
But... +1 on the idea of supporting SIXEL in general and +10 for the idea of coming up with compatibility tests.
We could add support for iTerm2 Inline Images Protocol once we are there... At least it should be easier to implement, it just only need a path to the image and does everything on its own.
One doubt I have with both systems is, what happens with aligment? If images width or height are a multiple of chars width or height everything is ok, but if not, should a padding be added only in lower and right sides, or should image be centered adding padding to all sides?
Hey here are some relevant links for research:
We could add support for iTerm2 Inline Images Protocol once we are there... At least it should be easier to implement, it just only need a path to the image and does everything on its own.
That probably should be a different task. Sixel and ReGIS are explicitly for in-band graphical or character data. I'm not saying it's a bad idea, I'm just saying it should be treated as a different feature.
One doubt I have with both systems is, what happens with aligment? If images width or height are a multiple of chars width or height everything is ok, but if not, should a padding be added only in lower and right sides, or should image be centered adding padding to all sides?
Alignment of Sixel and ReGIS graphical data is described (poorly) in various manuals. Sixel images are aligned on character cell boundaries. If you want a black border around an image, you have to add those black pixels yourself; there's no concept of anything like HTML's margin or padding. Each line of sixel data describes a stripe six pixels high. If you're trying to align sixel image data with text characters on a terminal emulator, this can be frustrating as the software generating the sixel data may not know how many pixels high each character glyph is. If you have an old-school xterm handy, you can see this by starting it up in vt340 mode, specifying different font sizes (to give you different character cell sizes) and then printing out some sixel data that tries to align image data with text data. (Here's a simple test file that looks correct when I tell the font server to use 96DPI and I specify a 15 point font. Modifying the font size causes images to increasingly come out of alignment with the text. https://gist.github.com/OhMeadhbh/3d63f8b8aa4080d4de40586ffff819de )
The original vt340s didn't have this problem because (of course) you didn't get to specify a font size when turning the terminal on.
The other thing you can see from that image, that isn't well described in the sixel documentation is that printing a line of sixel data establishes a "virtual left margin" for the image data. If you do the moral equivalent of a CR or CRLF using the '$' or '-' characters, the next line is printed relative to this virtual left margin, not the real left margin at the left side of the terminal.
Hope this helps.
Finally scrolling back to read this. Sorry for the tardy reply.
Testing using WSL with Ubuntu for example, in mlterm such images are properly rendered as having a transparency mask and the background color is kept, while in xterm -ti vt340, untouched pixels are drawn black, even though the background is white, which seems to imply they render sixels on a memory bitmap initialized as black without transparency mask or alpha before blitting them into the terminal window.
It shouldn't be too hard to support transparency in xterm. I've been digging around in the code for other reasons. I fear that someone, somewhere is depending on this behaviour of Xterm so would recommend putting it behind a compatibility flag, which also should be straight-forward. But then there's the question of the default value. What should be the default? Black or transparent.
Do we know what the original VT240, 241, 330 and 340's did? Could I suggest trying to faithfully represent the experience of an actual VT
I don't know that I care too much what the default is for the msft terminal as long as there's the capability of behaving like Xterm emulating a VT340. The code I've written to do loglines over ssh in the terminal sort of assumes the "unspecified pixels are black" behaviour described above. I'd have to rewrite that code if we make this change.
If you're trying to align sixel image data with text characters on a terminal emulator, this can be frustrating as the software generating the sixel data may not know how many pixels high each character glyph is.
The original vt340s didn't have this problem because (of course) you didn't get to specify a font size when turning the terminal on.
Is there any reason why a terminal emulator couldn't just scale the image to exactly match the behaviour of the original DEC terminals? So if the line height on a VT340 was 20 pixels, then a image that is 200px in height should cover exactly 10 lines, regardless of the font size. That seems to me the only way you could remain reasonably compatible with legacy software, which is kind of the point of a terminal emulator.
I can understand wanting to extend that behaviour to render images at a higher resolution, but that should be an optional extension I think (or just use one of the existing proprietary formats). So ideally I'd like the default for Sixel to be as close as possible to what you would have gotten on an actual DEC terminal.
Hey here are some relevant links for research:
"Basics for a Good Image Protocol" on terminal-wg
Sixel is broken because it cannot be supported by tmux with side-by-side panes.
It took some work (actually a lot of work), but with sixel one can perform nearly all of the "images in a terminal" tricks one can image:
Layered per-cell-masked images in a terminal: https://jexer.sourceforge.io/images/sixel_many_images.png
A floating (multiplexed) terminal window in a terminal that is using sixel for VT100-style double-width support: https://jexer.sourceforge.io/screenshots/jexer_sixel_in_sixel.png
"tmux-style" tiled terminals with images: https://gitlab.com/klamonte/jexer/-/wikis/uploads/7603381f82414ef9ae214bfcf759c064/example_tilingwm2_1.png
Multi-headed shared terminal session with differing text cell sizes showing the same plot: https://jexer.sourceforge.io/screenshots/multiscreen_2b.png
The use of sixel to render CJK and emoji that are not present in the main terminal's font: https://jexer.sourceforge.io/screenshots/xterm_sixel_cjk.png
I have included some other remarks at the referenced "good" protocol thread that might be of interest.
If nothing else, sixel is a good stepping stone to working out the terminal side infrastructure of mixed pictures-and-text. Speaking from direct experience, the terminal side (storing/displaying images) is about 1/4 as hard as the multiplexer/application side (tmux/mc et al).
sixels are indeed the ideal solution for in-band graphics (for example over ssh): as they are supported by many existing tools, they are ready to use for practical purposes like plotting timestamp sync issues on the go.
As illustrated by therealkenc and further explained by klamonte in 640292222 everything can be handled with sixels, even side-by-side images, but it requires some work.
A while ago I was working with a few other people on a fallback mode for tmux, using advanced unicode graphics to represent sixel images in terminals that do not support sixel.
It is a bit like automated ANSII art, taking advantage of special block characters that are present in most fonts: this equivalent color unicode representation could be substituted for the sixels, then later overwritten by the actual sixel image (or not!). It would also solve the problem of keeping all the sixel pictures for scrolling back, by substituting them with low fidelity unicode placeholders (for ex to save memory), and having placeholders for sixel images when they can't be displayed for whatever reason.
The code was public domain. It could be usable immediately as a first step towards sixel support:
detect when sixels sequence are transmitted, then compute the unicode text replacement
diplay this unicode sequence, which is already supported by Windows Terminal
later, when sixels are implemented, render on top the sixel sequence.
Would you be interested?
BTW I recognize here my familiar gnuplot x^2 sin and 10 sin(x) plots I'm happy it provided some inspiration 😄
Please.
@DHowett Is acac350 a first step toward actually rendering sixel graphics? I'm getting requests for sixel support in Microsoft Terminal from folks using ssh and wanting to view directories of images using my lsix program.
Sorta. We now have the ability to handle incoming DCS sequences. We haven't hooked up any handlers yet, but having the infrastructure to do so was pretty important. :smile:
Here's some updates. I have a working branch here. An early screenshot looks like this:
Contrary to what I originally thought, the most difficult part of rendering sixel images is actually the conpty layer. Sixel images are supposed to be inline objects. The rendering of sixel images depends on the rendering size of a character. However due to the extra conpty layer we actually can not get the rendering size of a character when processing sixel sequences. This sounds very abstract and vague. Anyone who's interested in this can checkout my branch and see how it's done.
Overall, the conpty layer makes it very difficult to handle scrolling and resizing of sixel images. In my branch it works if you only need to display it. But both scrolling and resizing are completely broken.
Didn't look yet but can you use pass-through mode to implement in Terminal itself? I would still add it in OpenConsole but sounds like sharing code isn't possible. Since Windows Terminal needs to be decoupled from OpenConsole at some point, you're best off simply duplicating the code for both. Also are you basing it on yours and j4james PRs for parameters? That would likely help as well.
@WSLUser Thanks for the attention. This screenshot is actually from about a month ago, when the fantastic parameters PR from j4james does not even exists. My work is entirely inside Windows Terminal, not conhost. I showed this PR to the Console team internally and made some progress since then. But I'm stuck because of the conpty problem.
Yeah I'd rebase off of master and add https://github.com/microsoft/terminal/pull/7578 and https://github.com/microsoft/terminal/pull/7799. From there, maybe see what's missing in ConPTY for pass-through mode. I wonder Mintty is using pass-through for ConPTY mode.
I wonder Mintty is using pass-through for ConPTY mode.
Pretty sure mintty isn't using conpty at all 😜
The trick here with conpty is that the console (conpty) will need to know about the cells that are filled with sixel contents, as to not accidentally clear that content out from the connected Terminal. Maybe conpty could be enlightened to ignore painting cells with sizel graphics, and just assume that the connected Terminal will leave those cells alone.
That might mess up some of our optimizations (like we can't EraseLine rows that have sixel data), but it might be a good enough start
\
Maybe conpty could be enlightened to ignore painting cells with sizel graphics, and just assume that the connected Terminal will leave those cells alone.
This had been my original plan as well, and it may well be the best solution with the current conpty architecture, but there are a number of complications.
The second issue @j4james brought up becomes even more complicated with the consideration of different font, different font size and font resizing. So generally I think there's 3 aspects of the issue:
The second issue @j4james brought up becomes even more complicated with the consideration of different font, different font size and font resizing. So generally I think there's 3 aspects of the issue:
Just to be clear, my point was that none of that would be a problem if we exactly matched the behaviour of a VT340, so a 10x20 pixel image would occupy exactly one character cell, regardless of font size. It's only an issue if we want to match the behaviour of other terminal emulators, and that could always be an option that is left for later. There would still be complications with this approach, but I personally think they're less of a concern.
My bigger concern is that you seem to be ignoring the DCS streaming issue, which I expect could fundamentally change the architecture of the solution. The steps I would like to have seen are: 1. Resolve #7316; 2. Agree on a solution for cell pixel size; 3. Get something working in conhost; 4. Once all the complications are worked out in conhost, only then consider how we make it work over conpty.
Sorry for leaving the DCS streaming issue. In my current implementation I just store the entire string and pass it to the engine. This introduces performance issue when the sequence is larger. But at least it works. So my comments above are largely based on it.
But you are right. The DCS streaming issue is actually the top priority if someone else want to get their hands dirty on this.
获取 Outlook for iOShttps://aka.ms/o0ukef
Per discussion in https://github.com/microsoft/terminal/issues/57, I thought conpty doesn't care about fonts at all?
wrt resizing I think the most natural way to do it is to "anchor down" the image into character cells once the image arrives, and re-calculate image size based on the anchor geometry. Anything else will cause inconsistency in image vs. character cells.
@yatli Yes. That's also what makes the issue tricky.
10x20 pixel image would occupy exactly one character cell
This is unfortunately wrong, at least for my current font setting.
Correct me if I'm wrong, but for pixel perfect image display, I think we do need to care about fonts.
@skyline75489 pls see my updated comment about the "anchor"
The cell data structure needs to be updated as char | sixel anchor
The sixel anchor should contain information about:
It's a good idea but the implementation details were killing me, due to the extra translation in conpty layer. To avoid spamming people with email, feel free to reach me on Teams @yatli if you're interested.
10x20 pixel image would occupy exactly one character cell
This is unfortunately wrong, at least for my current font setting.
What I'm suggesting is that you should make that the case. If you create a 10x20 pixel image and output it on a real DEC VT320 terminal, it's going to take exactly one character (at least in 80 column mode). So if we're trying to emulate that terminal, then we should be doing the same thing. If your current font happens to be 30x60, then you need to scale the image up. If your font is smaller, then you scale the image down.
This guarantees that you can output a Sixel image at any font size and always get the same layout. If you want it to cover a certain area of the screen, or you want to draw a border around it with text characters, you know exactly how much space the image will occupy.
Correct me if I'm wrong, but for pixel perfect image display, I think we do need to care about fonts.
It's true that you're not going to get "pixel perfect" images this way, but I don't think that should be the primary goal. Many modern computers have high dpi displays where it's routine for images to be scaled up, so it's not like this is a strange concept. And if we want to keep the layout consistent when the user changes their font size, we're going to have to scale the image at some point anyway, so you might as well do it from the start and get all the benefits of a predictable size.
And of course the other benefit of doing things this way is that it could feasibly be implemented over conpty. I don't see how you can make conpty work if the area occupied by the image is dependent on the font size, which you can't possibly know.
I'm not going to pretend this approach won't have any downsides, but I think the positives outweigh the negatives.
What if the font has a different aspect ratio than 10:20?
What if the font has a different aspect ratio than 10:20?
May I suggest reading this long - and somewhat "brutal"- discussion about the general problems regarding the inline images in terminal emulators.
It can give you the general idea.
Best regards
What if the font has a different aspect ratio than 10:20?
The image may be a bit stretched or squished, but I don't think that's the end of the world.
Let me demonstrate with a real world example. Imagine I'm a Bond villain, and I've got an old security system using a VT340 as the frontend. Now because of the coronavirus, I'm in lockdown and working from home, so I'm logging into the system remotely with Windows Terminal. If we exactly match the VT340 this is no problem - the terminal looks like this:
But maybe I prefer fonts with a weird aspect ratio. So let's see what it would look like with _Miriam Fixed_, which is wider than most. The image of Bond now looks a bit squished, but he is still easily recognisable.
The alternative would be to go with a pixel perfect image (not currently feasible with conpty, but let's pretend for a second). Bond no longer looks squished, but now the image is only a fraction of the size it was expected to be. And the higher the resolution of your monitor, the worse this is going to look.
Maybe this is a matter of personal preference, but I know I'd definitely choose option 1 over option 2.
Also note that there is no reason we couldn't have options to tweak the exact behaviour when the font aspect ratio isn't 1:2. One option could be to center the image within the cells it was expected to occupy. Or we could expand the image so it covers the full area, but clip the edges that overflow the boundaries. Any of these choices would be better than an exact pixel rendering in my opinion.
Maybe this is a matter of personal preference, but I know I'd definitely choose option 1 over option 2.
Me too, just only it would be better to know the font has a different aspect ratio, so image can adjust itself and keep the correct one.
One option could be to center the image within the cells it was expected to occupy. Or we could expand the image so it covers the full area, but clip the edges that overflow the boundaries
I think it's better to center them.
Maybe I'm misreading this thread. Are we actually talking about the terminal faking 10:20 characters for sixel image? I think that will cause many problems like the Bond distortion. Doing it the right way may be more difficult, but, in my humble opinion, a modern terminal should be font agnostic and leave it up to application programmers to deal with sixels and character cells.
Using escape sequences a user run program can determine the character cell size in pixels and decide how to intelligently deal with distortion for that application. The image viewing program I use works exactly like that. As I change font family or size, the displayed thumbnail updates to always be precisely five text lines high. The width is scaled proportionally for the image, unless it would be larger than a certain (in this case, rather large) maximum. By basing the image size on the character cell, it works automatically on high-DPI screens.
While the VT340 is a noble goal to emulate, fixing character cell resolution at 10:20 (and thus limiting resolution for the entire screen) is a mistake. The VT340 was only one of several sixel implementations, so its font size isn't necessarily more correct.
Forcing 10:20 will also lead to ugly kludges. (E.g., how to respond to a request for the size of the terminal window in pixels. Tell the truth, presuming they'll be positioning windows on the screen? Or, always return 800x480, presuming the user is scaling images for sixel output?)
Are we actually talking about the terminal faking 10:20 characters for sixel image?
Yes.
a modern terminal should be font agnostic
This proposal is font agnostic. The application doesn't need to know anything about the font. That's the whole point.
Using escape sequences a user run program can determine the character cell size in pixels and decide how to intelligently deal with distortion for that application.
I'm not exactly sure what method you're using, but the way I've seen this done before is with a proprietary XTerm query to get the window pixel size, and another query to get the window cell size, and then using that data to calculate the actual cell pixel size. The downsides of such an approach are:
That said, I understand that this is a mode that some people may wish to use, and I think we should at least have an option to support it one day (for reasons discussed above, this just isn't possible at the moment). But in my opinion, this is not the best approach for Sixel.
I have 300+ VT340's in nuclear power plants that I would like to eventually
replace.
There are commercial terminal emulation packages we could use, but I think
all but one have been EoL'd.
We have replaced some of them with Linux PCs running XTerm (or less
frequently, Win10 + Hummingbird + WSL running XTerm), because it has a
half-way decent open source sixel implementation and a sort of bad, but
open sourced ReGIS implementation.
The likelihood that we will be writing new software for the part of this
system that generates the sixel octet stream is NIL.
If your objective is to send graphics over an inline octet stream, there
are other options. But if you want to support sixel graphics, you should
support sixel graphics in a way that is halfway similar to previous
implementations. This, unfortunately, means you should emulate the
behaviour of exemplar systems (i.e. VT240, VT241, VT330 and VT340
terminals) even when it comes to integrating graphics with text.
This is a mock-up of the kind of thing I'm talking about. It would be very
nice if any new Sixel implementation maintains compatibility with existing
implementations so images do not run off the edge of the screen or only
fill half the screen.
a modern terminal should be font agnostic
This proposal is font agnostic. The application doesn't need to know anything about the font. That's the whole point.
I meant the _terminal_ should be font agnostic instead of imposing 10:20 on every font. The application should be able to know the actual font size, if it wishes, since it's the application that knows the domain of what it is trying to show and can figure out the best way to present text and graphics together.
Using escape sequences a user run program can determine the character cell size in pixels and decide how to intelligently deal with distortion for that application.
I'm not exactly sure what method you're using, but the way I've seen this done before is with a proprietary XTerm query to get the window pixel size, and another query to get the window cell size, and then using that data to calculate the actual cell pixel size.
Yup, that's about right. There's also a query to directly get the character cell size, but I don't think that's as widely supported as just getting the screen size and dividing by ROWS and COLUMNS.
The downsides of such an approach are:
1. It's proprietary, so wouldn't work on a real terminal, or any terminal emulator that exactly matched a real terminal.
That's not a downside. It only means the program has to fall back on doing what it would have done anyway: presume $TERM=="VT340" means character cells are 10:20, "VT240" means 10:10, "mskermit" means 8:8, and so on.
Also, it's not an xterm proprietary sequence. Getting the screen size is called a "dtterm" escape sequence, but it was actually first implemented in SunView (SunOS, 1986). I believe it was later documented in the PHIGS Programming Manual (1992). Try sending "\e[14t" to a few terminal emulators and you'll see it is widely implemented.
2. If the user changes their font size while your application is running, then your calculations will no longer be correct, and images will be rendered at the wrong size (unless you're continuously recalculating the font size which seems impractical).
This is not a problem. The program simply traps SIGWINCH and only recalculates if the window has actually changed.
3. If the user has a high resolution display, and/or large font size, you're forced to send through a massive image to try and match that resolution. Considering how inefficient Sixel is to start with, that can amount to a lot of bandwidth.
Yes, sixel is extremely inefficient. But on modern computers, sending full screen images is quite usable, even over ssh. Does the Microsoft Terminal have some sort of baudrate limitation?
By the way, I believe sixel does have a "high DPI" mode where every dot is doubled in width and height. I've never used it and I don't think xterm even implements it, but perhaps that would alleviate concerns about bandwidth.
That said, I understand that this is a mode that some people may wish to use, and I think we should at least have an option to support it one day (for reasons discussed above, this just isn't possible at the moment).
This "mode" is simply having characters and graphics aligned just like the various historical sixel terminals did and current emulators do. I admit, I don't understand why it is not possible to do the same in Microsoft Terminal. If you say this 10:20 kludge is the best that can be done, I will trust that you are correct and thank you for doing it. A distorted picture is much better than nothing.
Using escape sequences a user run program can determine the character cell size in pixels and decide how to intelligently deal with distortion for that application.
@hackerb9, what's the actual escape sequence to get the font dimensions?
The relevant XTerm sequences can be found here: https://invisible-island.net/xterm/ctlseqs/ctlseqs.html -- look for XTWINOPS.
Additionally, on Unix you can typically get the terminal's internal pixel size along with the cell size using the TIOCGWINSZ ioctl. With openssh this works remotely too.
Just as a data point, the sixel branch for libvte is taking the cell size-agnostic route @hackerb9 is talking about. It treats incoming sixel data as "pixel perfect" and rescales previously received images across zoom levels and font sizes to cover a consistent cell extent. When merged, this implementation will be available to a large share of Linux terminal emulators, including GNOME Terminal, the XFCE Terminal, Terminator, etc. Superficially this seems to be interoperable with at least XTerm and mlterm.
Since libvte records a per-image virtual cell size, it'd be trivial to make this work with a fixed virtual 10x20 cell size too for interoperation. However, we'd need a way for programs to communicate their expected pixel:cell ratios to the terminal (e.g. by extending the DCS parameters). That could be very useful in general, since it'd also provide a form of pixel density control in bandwidth-constrained environments, as you touched on above.
Additionally, on Unix you can typically get the terminal's internal pixel size along with the cell size using the TIOCGWINSZ ioctl. With openssh this works remotely too.
Linux console returns always 0... they should fix that, though, but seems are not willing too :-/
Most helpful comment
OOh. Sixel is very cool stuff.
I've decided that I need that. NEED.