All modern browsers expose these two useful classes with both a very simple API:
https://developer.mozilla.org/en/docs/Web/API/DOMParser
https://developer.mozilla.org/en/docs/XMLSerializer
Without knowing the details, I believe it shouldn't be too hard to implement these as thin shim to what jsdom already has available.
I am currently using quickly hacked together shims in paper.js that look like the code below.
I am happy to work on a better version that can find its way into jsdom, if you think this would be useful and could give me some pointers as to where to start looking.
function XMLSerializer() {
}
XMLSerializer.prototype.serializeToString = function(node) {
var text = jsdom.serializeDocument(node);
// Fix a jsdom issue where all SVG tagNames are lowercased:
// https://github.com/tmpvar/jsdom/issues/620
var tagNames = ['linearGradient', 'radialGradient', 'clipPath', 'textPath'];
for (var i = 0, l = tagNames.length; i < l; i++) {
var tagName = tagNames[i];
text = text.replace(
new RegExp('(<|</)' + tagName.toLowerCase() + '\\b', 'g'),
function(all, start) {
return start + tagName;
});
}
return text;
};
function DOMParser() {
}
DOMParser.prototype.parseFromString = function(string, contenType) {
var div = document.createElement('div');
div.innerHTML = string;
return div.firstChild;
};
Sure, it would be great to have help implementing this!
Let's do this one at a time: one PR for DOMParser, one for XMLSerializer.
For DOMParser, the spec is at https://w3c.github.io/DOM-Parsing/#the-domparser-interface. You'd need to introduce new IDL (probably a new folder, domparsing, next to window/ and nodes/ and friends). Then write the impl class. You can peruse recent commits that introduced Location and History in the same fashion to see how that's done.
The actual implementation does indeed look not too complicated. We'd ideally want to do things a bit more in depth than your version though. I think the method will end up looking similar to jsdom.jsdom, actually! in lib/jsdom.js.
Thanks for the pointers! For XMLSerializer the specs are also there: https://w3c.github.io/DOM-Parsing/#the-xmlserializer-interface
For the DOMParser, I am not sure I understand your last comment. With _jsdom.jsdom_, did you mean exports.jsdom = ...
inside jsdom.js?
Yep! The behavior is actually quite similar: take a string, and depending on another passed argument, create a document parsed as XML or a document parsed as HTML.
Alright, sounds good! BTW, I just realized that this is more according to specs:
DOMParser.prototype.parseFromString = function(string, contenType) {
// Create a new document, since we're supposed to always return one.
var doc = document.implementation.createHTMLDocument(''),
body = doc.body,
last;
// Set the body's HTML, then change the DOM according the specs.
body.innerHTML = string;
// Remove all top-level children (<html><head/><body/></html>)
while (last = doc.lastChild)
doc.removeChild(last);
// Insert the first child of the body at the top.
doc.appendChild(body.firstChild);
return doc;
};
This may be helpful: https://developer.mozilla.org/en-US/docs/Web/API/DOMParser
It's under the section "DOMParser HTML extension for other browsers".
Tell me how I can help on this. I'm trying to write some test cases that involve DOMParser.
@kahwee I tried to outline how to do a pull request for DOMParser in https://github.com/tmpvar/jsdom/issues/1368#issuecomment-175907738. You might want to coordinate with @lehni to make sure you two don't duplicate work.
@domenic I haven't started work on this. I find the harder piece is XMLSerializer which xmldom
sort of covers: https://github.com/jindw/xmldom/blob/master/dom.js
It is better to divide the task into 2 pull request.
@lehni How's progress on this?
Thank you both.
So DOMParser is now implemented and will be in the next release.
I've looked at a bunch of XML serializers on npm and nobody seems to have put together one that matches the spec. In the meantime https://github.com/cburgmer/xmlserializer seems closest, so we could use that I guess. I opened https://github.com/cburgmer/xmlserializer/issues/8 to get more spec compliance.
@domenic many thanks for implementing this! And apologies for dropping the ball on this one. Been swamped with work here...
@domenic Does DOMParser in jsdom support parsing XML yet?
It supports it as well as jsdom does, which is pretty OKish
I attemped to parse a SOAP response with it and it seemed to choke a bit, giving back undefined as a few of the properties on the Document object. I assume this probably has to do with namespaces?
Hmm, possibly. Could you give a small-ish testcase?
@domenic I'm not sure how to write a full test case here, but this example highlights the undefined
keys issue that exists.
var data =
`<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/">
<soap:Header>
<ns2:ResponseHeader xmlns:ns2="https://adwords.google.com/api/adwords/mcm/v201509" xmlns="https://adwords.google.com/api/adwords/cm/v201509">
<requestId>1234554321</requestId>
<serviceName>ManagedCustomerService</serviceName>
<methodName>get</methodName>
<operations>1</operations>
<responseTime>115</responseTime>
</ns2:ResponseHeader>
</soap:Header>
<soap:Body>
<ns2:getResponse xmlns="https://adwords.google.com/api/adwords/cm/v201509" xmlns:ns2="https://adwords.google.com/api/adwords/mcm/v201509">
<ns2:rval>
<totalNumEntries>2</totalNumEntries>
<Page.Type>ManagedCustomerPage</Page.Type>
<ns2:entries>
<ns2:name>Test1</ns2:name>
<ns2:customerId>1234566789</ns2:customerId>
</ns2:entries>
<ns2:entries>
<ns2:name>Test2</ns2:name>
<ns2:customerId>987654321</ns2:customerId>
</ns2:entries>
</ns2:rval>
</ns2:getResponse>
</soap:Body>
</soap:Envelope>`;
var document = (new window.DOMParser()).parseFromString(data, "text/xml")
And what document object properties are undefined when you do that?
The two types which are have undefined keys are: SymbolTreeNode
and Document
, which I believe correspond to the documentElement
and... something else in chrome.
It should be noted that the way I required the files looks like this:
// is there a better way?
global.DOMParser = window.DOMParser = require('jsdom/lib/jsdom/living/domparsing/DOMParser-impl').implementation;
Oh, yeah, that won't work. You need to use an actual jsdom, e.g.
const DOMParser = jsdom.jsdom().defaultView.DOMParser;
Anyway, I still don't understand what properties you're talking about, exactly. document.SymbolTreeNode
and document.Document
are undefined in real browsers too, not just jsdom.
Thank you for the info. Here is what I meant by SymbolTreeNode
and Document
types:
Weird. But I'd again really appreciate some code. Are you saying that document.undefined === SymbolTreeNode
? Or what?
Yes exactly!
Great. So then it can't be true that document.undefined === Document
, so I guess that part of your comment was just a red herring.
Here's a test case showing that this is indeed happening: https://tonicdev.com/57b2082567f249120021c588/57b2082567f249120021c589/branches/master I'll open a separate issue.
Oh, no, that's not what's happening: https://tonicdev.com/57b2082567f249120021c588/57b2082567f249120021c589/branches/master
The issue is that both document.undefined
and window.SymbolTreeNode
are undefined. So I don't think there is any bug here. Maybe there is a bug in whatever software you are using to generate those tree views.
Yes, I think you are right. Here is a list of all the properties I have when looping through document
: https://gist.github.com/ianks/c935930573921025c28c460019ec85ab
I think my original diagnoses was just wrong :cry:
Here is my super simple test case:
var data =
`<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/">
<soap:Header>
<ns2:ResponseHeader xmlns:ns2="https://adwords.google.com/api/adwords/mcm/v201509" xmlns="https://adwords.google.com/api/adwords/cm/v201509">
<requestId>1234554321</requestId>
<serviceName>ManagedCustomerService</serviceName>
<methodName>get</methodName>
<operations>1</operations>
<responseTime>115</responseTime>
</ns2:ResponseHeader>
</soap:Header>
<soap:Body>
<ns2:getResponse xmlns="https://adwords.google.com/api/adwords/cm/v201509" xmlns:ns2="https://adwords.google.com/api/adwords/mcm/v201509">
<ns2:rval>
<totalNumEntries>2</totalNumEntries>
<Page.Type>ManagedCustomerPage</Page.Type>
<ns2:entries>
<ns2:name>Test1</ns2:name>
<ns2:customerId>1234566789</ns2:customerId>
</ns2:entries>
<ns2:entries>
<ns2:name>Test2</ns2:name>
<ns2:customerId>987654321</ns2:customerId>
</ns2:entries>
</ns2:rval>
</ns2:getResponse>
</soap:Body>
</soap:Envelope>`;
var doc = (new window.DOMParser()).parseFromString(data, "text/xml")
var entries = doc.querySelectorAll('rval > entries');
if (entries.length !== 2) {
console.log('Invalid number of entries, should be 2, got ' + entries.length);
} else {
console.log('Passed!');
}
That test case "fails" in Chrome and Firefox for me: http://jsbin.com/timizagego/edit?html,console
Yeah, Monday got the best of me. Should be 2 entries, not one. I updated the example:
Ah great! I'll file a separate issue.
@domenic Thanks for holding my hand through that one. You are an excellent OSS contributor!
Thank you for the info. Here is what I meant by SymbolTreeNode and Document types:
I am guessing that your tool is showing symbol keys as "undefined":
const mySymbol = Symbol();
// try inspecting this object in your tool:
const obj = {};
obj[mySymbol] = 123; // your tool would probably display this as {undefined: 123}
// you can also play around with:
Object.getOwnPropertySymbols(obj);
@domenic - great work getting DOMParser included. I was using it though and realized that it doesn't handle syntax errors the same way. You can check out what it should be like here.
Is this possible to add?
@DylanPiercey please file a separate issue with a test case. We already attempt to produce parsererror elements when possible, but I think there are some edge cases where it's not working.
@domenic I was mostly just messing arround with stuff like
<div>
<a></>
</div>
Which didn't throw a parseerror.
I can open a new issue, but if it's already implemented then thats good enough for me.
Do you think you could point me to an actual parse error that is handled by jsdom so that I can use it for some unit testing?
@domenic is there any way to avoid the lower casing of tag names when using the workaround for XMLSerializer.serializeToString proposed by the original poster?
Started having a look at parse5, and got lost in the api docs...
I'm not aware of any, no. We need to use an actual XML serializer, not a HTML one like parse5, to get XML-correct casing.
I managed to get mostly want using jsdom's DOMParser and https://github.com/cburgmer/xmlserializer
One thorny one was that xmlserializer lower cases tags with the namespaceURI set to 'http://www.w3.org/1999/xhtml', which it seems that jsdom sets on all tags created via Document.createElement.
Here's some rough code illustrating this:
#!/usr/local/bin/node
var jsdom = require("jsdom");
var xmlserializer = require('xmlserializer');
function XMLSerializer() {
}
XMLSerializer.prototype.serializeToString = function(node) {
return xmlserializer.serializeToString(node);
};
jsdom.env({
html: `<html><body></body></html>`,
virtualConsole: jsdom.createVirtualConsole().sendTo(console),
done: function (err, window) {
var DOMParser = window.DOMParser
var xml = '<myTag myAttr="hello"></myTag>'
var parser = new window.DOMParser();
var doc = parser.parseFromString(xml, "application/xml");
// swizzle the doc.createElement func, otherwise <otherElement> comes out as <otherelement>
doc.createElement = function(localName) { return doc.createElementNS('', localName) }
var el2 = doc.createElement('otherElement')
doc.firstChild.appendChild(el2)
var xmlSerializer = new XMLSerializer()
console.log(xmlSerializer.serializeToString(doc))
}
});
Outputs:
<myTag xmlns="null" myAttr="hello"><otherElement/></myTag>
Which is mostly want I want - case preserved - except for the xmlns="null" which I think is a recent bug in xmlserialiser
I tried to polyfill XMLParser as suggested:
function XMLSerializer() {
}
XMLSerializer.prototype.serializeToString = function(node) {
var text = jsdom.serializeDocument(node);
return text;
};
Unfortunately serializeDocument
is no longer part of the API. serialize()
only works on the whole document; and it is not possible to create a new document from a HTML node (only from a string). It's not possible to turn a node into a string without a XML parser. Especially if you are trying to serialize a doctype.
My solution was to use xmldom
:
import { XMLSerializer } from 'xmldom';
global.XMLSerializer = XMLSerializer;
My current version looks like this:
function XMLSerializer() {
};
XMLSerializer.prototype.serializeToString = function(node) {
if (!node) {
return '';
}
// Fix a jsdom issue where all SVG tagNames are lowercased:
// https://github.com/tmpvar/jsdom/issues/620
var text = node.outerHTML;
var tagNames = ['linearGradient', 'radialGradient', 'clipPath', 'textPath'];
for (var i = 0, l = tagNames.length; i < l; i++) {
var tagName = tagNames[i];
text = text.replace(
new RegExp('(<|</)' + tagName.toLowerCase() + '\\b', 'g'),
function(match, start) {
return start + tagName;
}
);
}
return text;
};
@lehni that doesn't return doctypes if you pass in document.documentElement as node, even though browsers do
I'm not saying it's perfect, but it works for my use-case
@domenic, @tommedema, @ianks, @lehni. There is a complete javascript xml-serializer that i created and is avaliable as an npm package. It follows the w3c spec and even added some improvements. its serialization is neat and accurate with 99% test coverage.
Oh dear, I wish we'd known about that a month ago! @Sebmaster created our own package, w3c-xmlserializer, and #2282 has the code to integrate it all ready to go... I just need to find some time to work on jsdom -_-.
Alright @domenic. sorry it came late.
I think this is it! Closing due to the release of v13.
Most helpful comment
@domenic, @tommedema, @ianks, @lehni. There is a complete javascript xml-serializer that i created and is avaliable as an npm package. It follows the w3c spec and even added some improvements. its serialization is neat and accurate with 99% test coverage.