Jsdom: Implement DOMParser and XMLSerializer classes

Created on 27 Jan 2016 · 43Comments · Source: jsdom/jsdom

All modern browsers expose these two useful classes with both a very simple API:

https://developer.mozilla.org/en/docs/Web/API/DOMParser
https://developer.mozilla.org/en/docs/XMLSerializer

Without knowing the details, I believe it shouldn't be too hard to implement these as thin shim to what jsdom already has available.

I am currently using quickly hacked together shims in paper.js that look like the code below.

I am happy to work on a better version that can find its way into jsdom, if you think this would be useful and could give me some pointers as to where to start looking.

function XMLSerializer() {
}

XMLSerializer.prototype.serializeToString = function(node) {
    var text = jsdom.serializeDocument(node);
    // Fix a jsdom issue where all SVG tagNames are lowercased:
    // https://github.com/tmpvar/jsdom/issues/620
    var tagNames = ['linearGradient', 'radialGradient', 'clipPath', 'textPath'];
    for (var i = 0, l = tagNames.length; i < l; i++) {
        var tagName = tagNames[i];
        text = text.replace(
            new RegExp('(<|</)' + tagName.toLowerCase() + '\\b', 'g'),
            function(all, start) {
                return start + tagName;
            });
    }
    return text;
};

function DOMParser() {
}

DOMParser.prototype.parseFromString = function(string, contenType) {
    var div = document.createElement('div');
    div.innerHTML = string;
    return div.firstChild;
};

feature

Source

lehni

👍14

Most helpful comment

@domenic, @tommedema, @ianks, @lehni. There is a complete javascript xml-serializer that i created and is avaliable as an npm package. It follows the w3c spec and even added some improvements. its serialization is neat and accurate with 99% test coverage.

teclone on 12 Jul 2018

❤5

All 43 comments

Sure, it would be great to have help implementing this!

Let's do this one at a time: one PR for DOMParser, one for XMLSerializer.

For DOMParser, the spec is at https://w3c.github.io/DOM-Parsing/#the-domparser-interface. You'd need to introduce new IDL (probably a new folder, domparsing, next to window/ and nodes/ and friends). Then write the impl class. You can peruse recent commits that introduced Location and History in the same fashion to see how that's done.

The actual implementation does indeed look not too complicated. We'd ideally want to do things a bit more in depth than your version though. I think the method will end up looking similar to jsdom.jsdom, actually! in lib/jsdom.js.

domenic on 28 Jan 2016

👍1

Thanks for the pointers! For XMLSerializer the specs are also there: https://w3c.github.io/DOM-Parsing/#the-xmlserializer-interface

For the DOMParser, I am not sure I understand your last comment. With _jsdom.jsdom_, did you mean exports.jsdom = ... inside jsdom.js?

lehni on 28 Jan 2016

Yep! The behavior is actually quite similar: take a string, and depending on another passed argument, create a document parsed as XML or a document parsed as HTML.

domenic on 28 Jan 2016

Alright, sounds good! BTW, I just realized that this is more according to specs:

DOMParser.prototype.parseFromString = function(string, contenType) {
    // Create a new document, since we're supposed to always return one.
    var doc = document.implementation.createHTMLDocument(''),
        body = doc.body,
        last;
    // Set the body's HTML, then change the DOM according the specs.
    body.innerHTML = string;
    // Remove all top-level children (<html><head/><body/></html>)
    while (last = doc.lastChild)
        doc.removeChild(last);
    // Insert the first child of the body at the top.
    doc.appendChild(body.firstChild);
    return doc;
};

lehni on 28 Jan 2016

This may be helpful: https://developer.mozilla.org/en-US/docs/Web/API/DOMParser

It's under the section "DOMParser HTML extension for other browsers".

Tell me how I can help on this. I'm trying to write some test cases that involve DOMParser.

kahwee on 3 Feb 2016

@kahwee I tried to outline how to do a pull request for DOMParser in https://github.com/tmpvar/jsdom/issues/1368#issuecomment-175907738. You might want to coordinate with @lehni to make sure you two don't duplicate work.

domenic on 4 Feb 2016

@domenic I haven't started work on this. I find the harder piece is XMLSerializer which xmldom sort of covers: https://github.com/jindw/xmldom/blob/master/dom.js

It is better to divide the task into 2 pull request.

@lehni How's progress on this?

Thank you both.

kahwee on 4 Feb 2016

So DOMParser is now implemented and will be in the next release.

I've looked at a bunch of XML serializers on npm and nobody seems to have put together one that matches the spec. In the meantime https://github.com/cburgmer/xmlserializer seems closest, so we could use that I guess. I opened https://github.com/cburgmer/xmlserializer/issues/8 to get more spec compliance.

domenic on 2 Jul 2016

🎉4

@domenic many thanks for implementing this! And apologies for dropping the ball on this one. Been swamped with work here...

lehni on 6 Jul 2016

@domenic Does DOMParser in jsdom support parsing XML yet?

ianks on 13 Aug 2016

It supports it as well as jsdom does, which is pretty OKish

domenic on 13 Aug 2016

I attemped to parse a SOAP response with it and it seemed to choke a bit, giving back undefined as a few of the properties on the Document object. I assume this probably has to do with namespaces?

ianks on 13 Aug 2016

Hmm, possibly. Could you give a small-ish testcase?

domenic on 13 Aug 2016

@domenic I'm not sure how to write a full test case here, but this example highlights the undefined keys issue that exists.

var data =
  `<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/">
      <soap:Header>
          <ns2:ResponseHeader xmlns:ns2="https://adwords.google.com/api/adwords/mcm/v201509" xmlns="https://adwords.google.com/api/adwords/cm/v201509">
              <requestId>1234554321</requestId>
              <serviceName>ManagedCustomerService</serviceName>
              <methodName>get</methodName>
              <operations>1</operations>
              <responseTime>115</responseTime>
          </ns2:ResponseHeader>
      </soap:Header>
      <soap:Body>
          <ns2:getResponse xmlns="https://adwords.google.com/api/adwords/cm/v201509" xmlns:ns2="https://adwords.google.com/api/adwords/mcm/v201509">
              <ns2:rval>
                  <totalNumEntries>2</totalNumEntries>
                  <Page.Type>ManagedCustomerPage</Page.Type>
                  <ns2:entries>
                      <ns2:name>Test1</ns2:name>
                      <ns2:customerId>1234566789</ns2:customerId>
                  </ns2:entries>
                  <ns2:entries>
                      <ns2:name>Test2</ns2:name>
                      <ns2:customerId>987654321</ns2:customerId>
                  </ns2:entries>
              </ns2:rval>
          </ns2:getResponse>
      </soap:Body>
  </soap:Envelope>`;

var document = (new window.DOMParser()).parseFromString(data, "text/xml")

ianks on 15 Aug 2016

And what document object properties are undefined when you do that?

domenic on 15 Aug 2016

The two types which are have undefined keys are: SymbolTreeNode and Document, which I believe correspond to the documentElement and... something else in chrome.

It should be noted that the way I required the files looks like this:

  // is there a better way?
  global.DOMParser = window.DOMParser = require('jsdom/lib/jsdom/living/domparsing/DOMParser-impl').implementation;

ianks on 15 Aug 2016

Oh, yeah, that won't work. You need to use an actual jsdom, e.g.

const DOMParser = jsdom.jsdom().defaultView.DOMParser;

Anyway, I still don't understand what properties you're talking about, exactly. document.SymbolTreeNode and document.Document are undefined in real browsers too, not just jsdom.

domenic on 15 Aug 2016

Thank you for the info. Here is what I meant by SymbolTreeNode and Document types:

domparser

ianks on 15 Aug 2016

Weird. But I'd again really appreciate some code. Are you saying that document.undefined === SymbolTreeNode? Or what?

domenic on 15 Aug 2016

Yes exactly!

ianks on 15 Aug 2016

Great. So then it can't be true that document.undefined === Document, so I guess that part of your comment was just a red herring.

domenic on 15 Aug 2016

Here's a test case showing that this is indeed happening: https://tonicdev.com/57b2082567f249120021c588/57b2082567f249120021c589/branches/master I'll open a separate issue.

domenic on 15 Aug 2016

Oh, no, that's not what's happening: https://tonicdev.com/57b2082567f249120021c588/57b2082567f249120021c589/branches/master

The issue is that both document.undefined and window.SymbolTreeNode are undefined. So I don't think there is any bug here. Maybe there is a bug in whatever software you are using to generate those tree views.

domenic on 15 Aug 2016

Yes, I think you are right. Here is a list of all the properties I have when looping through document: https://gist.github.com/ianks/c935930573921025c28c460019ec85ab

I think my original diagnoses was just wrong :cry:

Here is my super simple test case:

var data =
  `<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/">
      <soap:Header>
          <ns2:ResponseHeader xmlns:ns2="https://adwords.google.com/api/adwords/mcm/v201509" xmlns="https://adwords.google.com/api/adwords/cm/v201509">
              <requestId>1234554321</requestId>
              <serviceName>ManagedCustomerService</serviceName>
              <methodName>get</methodName>
              <operations>1</operations>
              <responseTime>115</responseTime>
          </ns2:ResponseHeader>
      </soap:Header>
      <soap:Body>
          <ns2:getResponse xmlns="https://adwords.google.com/api/adwords/cm/v201509" xmlns:ns2="https://adwords.google.com/api/adwords/mcm/v201509">
              <ns2:rval>
                  <totalNumEntries>2</totalNumEntries>
                  <Page.Type>ManagedCustomerPage</Page.Type>
                  <ns2:entries>
                      <ns2:name>Test1</ns2:name>
                      <ns2:customerId>1234566789</ns2:customerId>
                  </ns2:entries>
                  <ns2:entries>
                      <ns2:name>Test2</ns2:name>
                      <ns2:customerId>987654321</ns2:customerId>
                  </ns2:entries>
              </ns2:rval>
          </ns2:getResponse>
      </soap:Body>
  </soap:Envelope>`;

var doc = (new window.DOMParser()).parseFromString(data, "text/xml")
var entries = doc.querySelectorAll('rval > entries');

if (entries.length !== 2) {
  console.log('Invalid number of entries, should be 2, got ' + entries.length);
} else {
  console.log('Passed!');
}

ianks on 15 Aug 2016

That test case "fails" in Chrome and Firefox for me: http://jsbin.com/timizagego/edit?html,console

domenic on 15 Aug 2016

Yeah, Monday got the best of me. Should be 2 entries, not one. I updated the example:

http://jsbin.com/duvavigasi/edit?html,console

ianks on 15 Aug 2016

Ah great! I'll file a separate issue.

domenic on 15 Aug 2016

@domenic Thanks for holding my hand through that one. You are an excellent OSS contributor!

ianks on 15 Aug 2016

❤1

Thank you for the info. Here is what I meant by SymbolTreeNode and Document types:

I am guessing that your tool is showing symbol keys as "undefined":

const mySymbol = Symbol();
// try inspecting this object in your tool:
const obj = {};
obj[mySymbol] = 123; // your tool would probably display this as {undefined: 123}
// you can also play around with:
Object.getOwnPropertySymbols(obj);

Joris-van-der-Wel on 16 Aug 2016

@domenic - great work getting DOMParser included. I was using it though and realized that it doesn't handle syntax errors the same way. You can check out what it should be like here.

Is this possible to add?

DylanPiercey on 22 Oct 2016

@DylanPiercey please file a separate issue with a test case. We already attempt to produce parsererror elements when possible, but I think there are some edge cases where it's not working.

domenic on 22 Oct 2016

@domenic I was mostly just messing arround with stuff like

<div>
    <a></>
</div>

Which didn't throw a parseerror.

I can open a new issue, but if it's already implemented then thats good enough for me.
Do you think you could point me to an actual parse error that is handled by jsdom so that I can use it for some unit testing?

DylanPiercey on 22 Oct 2016

@domenic is there any way to avoid the lower casing of tag names when using the workaround for XMLSerializer.serializeToString proposed by the original poster?
Started having a look at parse5, and got lost in the api docs...

jonnermut on 3 Nov 2016

I'm not aware of any, no. We need to use an actual XML serializer, not a HTML one like parse5, to get XML-correct casing.

domenic on 3 Nov 2016

I managed to get mostly want using jsdom's DOMParser and https://github.com/cburgmer/xmlserializer

One thorny one was that xmlserializer lower cases tags with the namespaceURI set to 'http://www.w3.org/1999/xhtml', which it seems that jsdom sets on all tags created via Document.createElement.

Here's some rough code illustrating this:

#!/usr/local/bin/node
var jsdom = require("jsdom");
var xmlserializer = require('xmlserializer');

function XMLSerializer() {
}

XMLSerializer.prototype.serializeToString = function(node) {
    return xmlserializer.serializeToString(node);
};

jsdom.env({
  html: `<html><body></body></html>`,
   virtualConsole: jsdom.createVirtualConsole().sendTo(console),
  done: function (err, window) {
     var DOMParser = window.DOMParser
     var xml = '<myTag myAttr="hello"></myTag>'
     var parser = new window.DOMParser();
      var doc = parser.parseFromString(xml, "application/xml");

      // swizzle the doc.createElement func, otherwise <otherElement> comes out as <otherelement>
      doc.createElement = function(localName) { return doc.createElementNS('', localName) }

      var el2 = doc.createElement('otherElement')
      doc.firstChild.appendChild(el2)

      var xmlSerializer = new XMLSerializer()
      console.log(xmlSerializer.serializeToString(doc))

  }
});

Outputs:

<myTag xmlns="null" myAttr="hello"><otherElement/></myTag>

Which is mostly want I want - case preserved - except for the xmlns="null" which I think is a recent bug in xmlserialiser

jonnermut on 5 Nov 2016

👍1

I tried to polyfill XMLParser as suggested:

function XMLSerializer() {
}

XMLSerializer.prototype.serializeToString = function(node) {
    var text = jsdom.serializeDocument(node);
    return text;
};

Unfortunately serializeDocument is no longer part of the API. serialize() only works on the whole document; and it is not possible to create a new document from a HTML node (only from a string). It's not possible to turn a node into a string without a XML parser. Especially if you are trying to serialize a doctype.

My solution was to use xmldom:

import { XMLSerializer } from 'xmldom';
global.XMLSerializer = XMLSerializer;

tommedema on 15 Mar 2018

My current version looks like this:

function XMLSerializer() {
};

XMLSerializer.prototype.serializeToString = function(node) {
  if (!node) {
    return '';
  }
  // Fix a jsdom issue where all SVG tagNames are lowercased:
  // https://github.com/tmpvar/jsdom/issues/620
  var text = node.outerHTML;
  var tagNames = ['linearGradient', 'radialGradient', 'clipPath', 'textPath'];
  for (var i = 0, l = tagNames.length; i < l; i++) {
    var tagName = tagNames[i];
    text = text.replace(
      new RegExp('(<|</)' + tagName.toLowerCase() + '\\b', 'g'),
      function(match, start) {
        return start + tagName;
      }
    );
  }
  return text;
};

lehni on 16 Mar 2018

@lehni that doesn't return doctypes if you pass in document.documentElement as node, even though browsers do

tommedema on 16 Mar 2018

I'm not saying it's perfect, but it works for my use-case

lehni on 17 Mar 2018

teclone on 12 Jul 2018

❤5

Oh dear, I wish we'd known about that a month ago! @Sebmaster created our own package, w3c-xmlserializer, and #2282 has the code to integrate it all ready to go... I just need to find some time to work on jsdom -_-.

domenic on 12 Jul 2018

😄1

Alright @domenic. sorry it came late.

teclone on 12 Jul 2018

I think this is it! Closing due to the release of v13.