Katex: Create an intermediate representation that can be used to render to multiple targets

Created on 30 Oct 2015  路  30Comments  路  Source: KaTeX/KaTeX

Current efforts such as https://github.com/Khan/KaTeX/pull/251 use the output of buildHTML which contains some HTML specific data which is necessary for canvas rendering. It would be nice if we had a more minimal intermediate representation that could be use for rendering HTML, MathML, Canvas, and SVG.

enhancement help wanted

All 30 comments

It probably makes sense to base this representation on what can come in TeX's math mode, which can be found on page 290 of the TeXbook. We can choose a useful subset of them, (like the ones that we currently support) and figure out all the parameters necessary to render them.

We also might want to think about how we should handle state going forwards. Right now, the only state modification we really allow are style changes (like \displaystyle) which lets us basically sidestep the problem entirely using the funky "implicit groups" but we might want to come up with a representation that more correctly models how TeX works for the future (in case we want to start allowing other kinds of state changes, like variables or macros).

I could write up a doc detailing such a representation, or we could work on this together some time?

We had a chat and decided to do a couple of things:

  • simplify ParseNode/ParseResult into POJOs w/ functions that return those POJOs (this is so that we maintain output that can be JSON serialized)
  • audit existing parse node types to see where we deviate from commands being used in chapters 24 and 25.

My vision of this issue as far as I currently understand it:

  • KaTeX parser generates CST which is potentially less readable than AST. It contains more data than external user (author of an editor for example) absolutely needs to provide.
    Trivial example: \\frac node has type, numer and denom fields which are semantic and are needed, but also hasBarLine, leftDelim, rightDelim and size whose necessity may not be obvious to external user.
  • Both buildHtml and buildMathML accept CST.
  • An editor (#809) most likely would want to get rendered HTML from AST.
  • There are use cases for bare parser. I guess, this cases would want AST instead of CST too.

So the task is to provide an AST and a way to get CST from AST while keeping changes minimal and backward-compatible. An explicit list of all AST nodes should be provided to simplify tree building from outside of KaTeX for external user.

Also I'd like to avoid changes in buildHtml and buildMathML if possible.

Thus I propose a plan:

  • Make a set of classes representing AST nodes. Filed type of ParseNode and file functions.js are sources of information what classes are needed. Example for op (\sin, \max, \int, \mathop, etc.):
    javascript class Operator extends Node { constructor(command) { this.command = command; } }
  • Make a AST->CST converter. Here I see some options:

    • Make an explicit converter which basically would be functions.js rewritten. Typical code in such converter:

      ```javascript

      function convertOperator(op) {

      switch (op.command) {

      case "\arcsin":

      return {

      type: "op",

      limits: false,

      symbol: false,

      alwaysHandleSupSub: false,

      body: op.command,

      };

        case "\\det":
            return {
                type: "op",
                limits: true,
                symbol: false,
                alwaysHandleSupSub: false,
                body: op.command,
            };
    }
}
```
Any drawbacks? Personally I like this way the most because it provides clear point where AST is converted to CST.

  • Same as explicit converter, but return instances of AST classes themselves. Then build*.js could use some classes power to be more readable.
    I'd like to avoid such intrusion into builders code, at least on first step, but I think I could do it too.
  • Add all variables from CST currently sitting in ParseNodes into local variables of build* functions. Like this (buildHTML.js):
    ```javascript
    groupTypes.op = function(group, options) {
    const symbol = ["\int", "\iint", "\iiint", "\oint", ...]
    .indexOf(group.value.body) >= 0;
  const limits = ["\\det", "\\gcd", "\\inf", "\\lim", ...]
      .indexOf(group.value.body) >= 0;

  // rest of the code

}
```

  • Cheat. Move all CST variables into functions of objects returned in function.js:
    ```javascript
    defineFunction([
    "\det"
    ], {
    numArgs: 0,
    }, function(context) {
    const op = new Operator(context.funcName);
    op.limits = function() {
        return ["\\int", "\\iint", "\\iiint", "\\oint", ...]
            .indexOf(group.value.body) >= 0;
    };

    op.symbol = function() {
        return ["\\int", "\\iint", "\\iiint", "\\oint", ...]
            .indexOf(group.value.body) >= 0;
    };
});
```
Then modify build*.js to use that functions. Strange option just for variety.

  • Finally, modify parseTree usages to include AST->CST conversion.

I see subsups are special. Should it be treated in a special way? Any more similar cases?

I suppose that for any symbol or funciton all CST-exclusive data desired for build*.js could be computed from AST data. Is it correct? In other words, is there any parser state which is added to ParseNodes outside of functions.js and is used in rendering?

Regarding errors reporting: as far as I see, explained plan do not interfers with it at all.

For \frac commands which generate genfrac nodes, it appears that leftDelim and rightDelim as always null, at least in my testing. For \left( \frac{x}{y} \right) we wrap genfrac inside a leftright node. I hope that we can iterate on the parser so that it returns a minimal (or almost minimal) AST. I'll respond in more detail to the rest of the post later.

I thought about it a bit more, and I think my statement about computing CST from AST is false. Or, at least, I should suppose it as being false.

So, I think I should start with explicit list of tree node types and AST->CST converter. parseTree usages should not be altered. Instead, a new way to render should be added so external user would be able to render AST without evaluating all attributes used by renderers.

I'll do some code soon and I'll show what I'm trying to achieve.

I did some experimenting with a possible IR a while back, but had some troubles with the vertical HTML layout b/c I was trying to simplify that. I think if we use our existing HTML vlist builder it should be fine. Here's a link to that IR: https://github.com/khan/katex-ir. I feel like we should clean up our parse tree and HTML build tree (using HAST) first before trying to inject a layer between them.

For \frac commands which generate genfrac nodes, it appears that leftDelim and rightDelim as always null, at least in my testing.

A random interjection: Something like {a \choose b} would return something with a leftDelim and rightDelim, right? In general though, there might be cases that KaTeX doesn't currently handle that we would want the AST to be able to handle.

I'm trying to define CST right now here. It's supposed to be used in our editor prototype as an interface between editor syntax tree and Katex parsing tree because the latter is filled with parsing logic which we're not interested in.
Firstly I've tried to extend ParseNode but stuck in Parser.js. So now I'm doing it on the side - CST is built manually by editor, ParseNodes are built by CST's toParseNode, ParseNodes are fed to Katex to render. If you find this version of CST useful, I would like to try to use it everywhere instead of ParseNode.

@spontaliku-softaria you may find the conversation in #892 interesting.

I think I understand better now what you're going for. I don't think the term CST really applies to KaTeX's AST, the reason being is that it doesn't encode all information necessary to recover the original input string. I think what we're talking about is actually two ASTs. They just describe different things. KaTeX's AST describes how various primitive commands (encoded as groupTypes in buildHTML.js and buildMathML.js) are nested.

This may or may not be a useful view of the world depending on how you want your editor to behavior. In the case of genfrac, if a user decides they want a \choose b or a \over b, KaTeX will generate a genfrac node. In either case the user might want to navigate to edit a or b or delete the whole thing. If there are left/right delimiters a user could modify the delimiters but they wouldn't be able to delete just one. How these restrictions are communicated to the user and enforced on their input will be up to use as creator of the editor.

Think about this example, I'm starting to question the need for leftDelim and rightDelim on genfrac nodes. There's already a leftright node for delimiters. It would make the AST more regular if we wrapped fractions with a leftright node. @xymostech @sophiebits what was the rational for treating the left/right delimiters for fractions differently from other uses of left/right delimiters?

@kevinbarabash anyway, ParseNode containts more than external user would want to provide. Lexer for example. So yes, looks like two ASTs.

I was going to evaluate how an AST for external user might look like after I finish current AST and get used with all variants of ParseNodes. But the one I'm doing now should be useful, just not the most convenient.

Also I'm not sure yet how to avoid passing math/text mode from external user and evaluate it from the tree - I guess this logic is not simple.

Looking at #892, wouldn't it be easier to fix this task before? Because the one working on #892 would need to do IR implicitly anyway to be able to reason about ParseNodes.

Oh sorry I misread #892 - by some reason I've decided it's about migrating to HAST.

I believe TeX itself does the genfrac-with-delimiters thing. IIRC the delimiters might end up a little too large if you treat them independently.

I love the idea of having a "minimal intermediate representation". What's the status on this? What can I do to help?

@rrandallcainc my rough roadmap for an IR would be:

  • finish adding flow types
  • finish moving functions in functions/ folder
  • simplify the parse tree
  • add intermediate representation

While not all of the steps are strictly necessary I think they will improve our chances of success. The first two items are in progress changes that are modifying lots of files/code so it would be nice to finish those off before introducing other big changes. Flow types will also add some safety as we continue to make big changes.

Simplifying the parse tree I think will making writing transforms from the parse tree to the intermediate representation simpler.

We'll also want to figure out what we want the IR to look like. To start with I think we can build on top of the existing VList by introducing an HList. We'll also need Rule and Glyph/Atom. One thing that I'm uncertain of is how to deal with horizontal centering of things, e.g. fractions. In TeX there's equal glue on either side of the numerator and denominator which results in the numerator and denominator being centered. If we can figure out a way to use flex-box to emulate this behavior then I think we should be able to move forward with this. I've spent some time prototyping an intermediate representation in https://github.com/Khan/katex-ir, but I got stuck with how to render glue to HTML.

@kevinbarabash Just curious, why does mathml not count as an intermediate representation?

@rrandallcainc I hadn't thought about it, but I suppose it could. As long as it has all the information needed to render to HTML and Canvas it should work.

@rrandallcainc I did some more thinking about this and one of the things we'd like to encode in the IR is visual layout of things, e.g. vertical spacing for fractions. That information would then be used by different renderers for HTML, SVG, and Canvas to produce renderings that look more or less the same. In this respect, MathML isn't a good fit.

I was thinking about this a bit more. I think the following data flow might make sense:

TeX -> parseTree,
parseTree -> MathML
parseTree -> IR
IR -> HTML
IR -> SVG
IR -> Canvas

To actually make this happen the IR should have the following objects:

  • VLists
  • HLists
  • Rules
  • Glyphs
  • Glue
  • Paths (TeX doesn't have this, but based on how we render stretchy accents, square roots, etc. we need something to model those)

To actually start moving towards having an IR, we should look and updating various parts of buildHTML.js and the htmlBuilders in function/*.js to generate IR and then immediately convert it to HTML. A good starting place might be Glyphs because they're relatively simply.

Glyphs should track the following data:

  • char index (the unicode character code)
  • fontFamily (roman, sans-serif, typewriter, etc.)
  • fontWeight (normal, bold)
  • fontStyle (normal, italics)

What I'm a little less uncertain on is whether to include class (ord, punct, op, etc.) in each Glyph. Currently the HTML layout makes use of this information, but for SVG and Canvas layouts we only care about the space between atoms in a horizontal layout. Moreover, if we want to support setting the value of \thinmuskip, \medmuskip, and \thickmuskip in the future, we won't be able to rely on CSS for setting space between atoms anyways. My preference here is move away from CSS solutions to more programmatic solutions. To save space in the markup we can have some general styles for spacing that match the defaults.

After Glyphs we'd probably want to work on Glue next. This would be used to implement \kern et al. To start with, a simplified Glue that only has the desired width and doesn't support fil et al.

The whole implementation sequence might look something like this:

Glyph, Glue, Rule, Path, HList, Vlist

Before starting on Glyph though it might be good to work on replacing our current CSS for atoms with more general \thinmuskip, \medmuskip, \thickmuskip styles.

My preference here is move away from CSS solutions to more programmatic solutions.

I believe synchronous rendering is the feature which makes Katex possible to use in an editor. If you want to make some parts of rendering programmatic, please don't lose that.

@spontaliku-softaria rendering is already programmatic. The only thing that would change is which styles are being applied to atoms. Also, the renderer would have to look at adjacent atoms to decide which styles to apply. We definitely want to maintain KaTeX's ability to do server side rendering and to renderToString().

This could be implemented by adding a step to buildExpression() in buildHTML.js which inserts Glues of appropriate sizes as needed between atoms as needed. These Glues would then be immediately converted to <span>s with styles for \thinmuskip, \medmuskip, or \thickmuskip.

I've definitely found the CSS solutions a bit hacky. It's impressive what they've been able to do, but e.g. it led to the bug #995 (now fixed by a workaround). Adding styles to the elements would feel closer to LaTeX, and it's still server-side/synchronous. And it would enable the user to override things like \thinmuskip, which is possible in TeX but impossible in KaTeX (you can't override the CSS). (Not that this particular functionality is important, but it feels a bit weird to bake so many lengths into CSS.)

I like @kevinbarabash's approach. Incidentally, what KaTeX calls VLists (and HLists?) are what TeX calls vboxes and hboxes, I believe. So again this feels nicely close to TeX (and we might consider renaming to match TeX).

I think the main worry here is that, by adding another layer of indirection, we might slow down KaTeX rendering somewhat. We'll probably want to do some speed tests before spending a lot of time in this direction. But overall it seems like the natural approach to enabling SVG etc. output.

I have one other vague concern. I'm not a fan of the current implementation of tables (array), in particular because it makes \multicolumn impossible (see #269). LaTeX finds a way to implement them with hbox/vboxes, so presumably it's possible... but if we end up using CSS tables or flexbox or something to fix this, the IR might also need some kind of Table structure.

I did some experimentation on this a while back. See https://github.com/KaTeX/katex-ir. It was based on the approach presented in https://people.eecs.berkeley.edu/~fateman/temp/neuform.pdf which also includes a Box object whose type can be either HBox or VBox. It wraps HList and VList respectively along with adding additional information about the height, depth and width of the box. We may end up introducing Boxes as well here. A scala implementation of the paper can be found at https://github.com/pkamenarsky/formulae.

@edemaine thanks for raising the concern about slowing down rendering. I think the slowdown will be negligible at least for render as DOM updates will probably still occupy most of the time. As for renderToString, my hope is that people would be caching the output for static pages if performance is an issue. That being said, it's good to know how our performance is being affected by any change we make. See #1053.

Yeah, multicolumn seems difficult. It might be easier to do if we use horizontal metrics for everything (measuring the width of things we don't have metrics for as a fallback).

Curious, why not a simpler "IR". Something that contains simple concepts like:

x/y
width/height
type: (i.e. text, shape, etc.)
...

I only ask because as I am working on a Canvas implementation, that's basically what I boiled it down to. I turned the buildTree -> simple POJOS -> Canvas

While x/y, width/height, and type is sufficient for rendering to canvas (or SVG), it has a couple short-comings:

  • it would require knowing the width of all glyphs when rendering to HTML, currently we only use glyph width in only a couple of place iirc
  • it doesn't reflect how TeX does things which means when implementing low-level TeX commands we need to more work to translate between the x/y, width/height model

@rrandallcainc translating from the TeX box model to model you suggesting shouldn't be too difficult.

I'd be curious if there is some general update on this issue? Totally understood if no one has time for it, I'd just be interested where people see that in terms of a roadmap right now.

Bump, any updates on this? I'd just be curios where this stands in terms of project priorities :)

No update. This is something that I support, but I won't have have time to pursue personally. Definitely open to PRs for this although I think a more detail proposal of what the IR should look like. I prototyped a possible ir a while back. One of the downsides of https://github.com/KaTeX/katex-ir is that it expects to know the width of every glyph which we currently don't have.

All of this would be made much easier if browser vendors would spend some time on the Font Metrics API but that's been in limbo for years.

Thanks for the update!

No update. This is something that I support, but I won't have have time to pursue personally. Definitely open to PRs for this although I think a more detail proposal of what the IR should look like. I prototyped a possible ir a while back. One of the downsides of https://github.com/KaTeX/katex-ir is that it expects to know the width of every glyph which we currently don't have.

@kevinbarabash You _should_ be able to leverage canvas' measureText method to get the width of each glyph. As long as the rendered "state" is currently updated on the context (i.e. font, size, etc.) it'll give you an accurate measurement that would apply to both canvas and dom (same size). It's a fairly quick measurement, and should lend itself well to synchronous rendering.

Edit: Realizing you probably meant you need this earlier (i.e. at build time). But leaving this up here just in case..

Was this page helpful?
0 / 5 - 0 ratings

Related issues

oddhack picture oddhack  路  3Comments

OisinMoran picture OisinMoran  路  4Comments

sophiebits picture sophiebits  路  3Comments

yawnoc picture yawnoc  路  3Comments

q2apro picture q2apro  路  3Comments