Ghost Blog fails RSS validation

Created on 6 May 2017  Â·  16Comments  Â·  Source: TryGhost/Ghost

Issue Summary

My blog running 0.11.8 hosted at blog.alexellis.io is failing RSS validation and cannot be parsed by some utilities and syndications.

Steps to Reproduce

Embedded YouTube videos

Primary issue appears to be that using the embed code from a YouTube video is brought into the feed - and that uses an iframe. The iframe is not allowed for RSS feeds.

Embedded Tweets

This fails for encoding reasons.

See the output of following for more details:

https://validator.w3.org/feed/check.cgi?url=blog.alexellis.io

Technical details:

  • Ghost Version: 0.11.8
  • Node Version: 6.9.4
  • Browser/OS: n/a - Linux
  • Database: sqlite

Ideally there should be a tweak or opt-out of having Tweets and iframes from being copied into an RSS feed summary. I've run Ghost for 18 months + and I've not been aware of this problem before.

bug help wanted models / data server / core

All 16 comments

One of the other issues raised by a Golang RSS parser is:

WARN[0001] http://blog.alexellis.io/rss/ impossible to read. I jump it please verify error=XML syntax error on line 1323: illegal character code U+0010

Here's the line:

1323 </ul>]]></content:encoded></item><item><title><![CDATA[Docker's 4th Birthday in London]]></title><description><![CDATA[The Docker London meet-up group        +++celebrated Docker's 4th birthday in style at HPEs head office in the City. Get my take as a mentor and Captain.]]></description><link>http://blog.         +++alexellis.io/dockers-4th-birthday/</link><guid isPermaLink="false">ff0a6f62-7b9d-4bf3-b269-df4182ebd084</guid><category><![CDATA[meetup]]></               +++category><category><![CDATA[docker]]></category><category><![CDATA[birthday event]]></category><dc:creator><![CDATA[Alex Ellis]]></dc:                     +++creator><pubDate>Wed, 22 Mar 2017 08:06:00 GMT</pubDate><content:encoded><![CDATA[<p>Docker has been <a href="https://blog.docker.com/2017/03/thank-       +++you-docker-community-2/">celebrating its 4th birthday</a> all over the world with meet-up groups having parties, birthday cakes, stickers and open         +++events for learning about containers.</p>

I have no idea how such a character could have appeared in the Ghost output or how to remove it (if at all possible) "http://graphemica.com/0010"

<blockquote class="twitter-tweet" data-lang="en"><p lang="en" dir="ltr">At the <a href="https://twitter.com/docker">@docker</a> birthday meet-up in London with loads of keen Dockers!! Let&#39;s get started <a href="https://twitter.com/hashtag/DockerLondon?src=hash">#DockerLondon</a> <a href="https://t.co/DaqT4zpI0T">pic.twitter.com/DaqT4zpI0T</a></p>&mdash; Alex Ellis (@alexellisuk) <a href="https://twitter.com/alexellisuk/status/843896691971514368">March 20, 2017</a></blockquote> <script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script>

Hey @alexellis Thanks for your report and sorry to hear you are having trouble with feed validation.

I've raised an issue in one of our dependencies to request script tag deletion for feeds on library level rather than in Ghost.

This feed is valid, but interoperability with the widest range of feed readers could be improved by implementing the following recommendations.

Your feed is still valid and shows only recommendations, that's why i would rate this issue as low priority.

We would be thankful if someone could take over this issue 🙃

Hi @kirrg001 - the feed is now valid due to some changes I made. I found there were several invisible unicode characters hidden within 2-3 of the blog posts - I found each paragraph with the validator then re-wrote the paragraph and deleted the old one. This meant I could pass XML validation - the characters were U+0010.

This should be something that Ghost can check for and warn about - it meant a silent failure of my blog for RSS feeds - which had 70k page-views last month. I'd like to see some kind of info/warning.

Yeah agree, that is annoying 😶

There is an open issue to the unicode character problem you experienced, see https://github.com/dylang/node-rss/issues/49.
There was even an implementation approach, see https://github.com/ErisDS/Ghost/commit/7acb3f9df3e7f2cec54eae8173de6a3947bfaaf8.

Getting the same error and its rather annoying because its so vague:

XML parsing error: <unknown>:304:0: not well-formed (invalid token)

</p>]]></content:encoded></item><item><title><![CDATA[my boring title]]></title><description><![CDATA[<p>

Still no fix?

Fixed by removed some trailing white space from the very end of a single article.

Not ideal, but used this to help diagnose:
https://validator.w3.org
https://validator.w3.org/feed/check.cgi

Ghost's validation on each article would be 🎉

Can this be closed now?

yes

On Oct 28, 2018, at 14:22, Miguel Piedrafita notifications@github.com wrote:

Can this be closed now?

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub https://github.com/TryGhost/Ghost/issues/8442#issuecomment-433742641, or mute the thread https://github.com/notifications/unsubscribe-auth/AAh_bKoIAfRVRyTvZUukklnlqreSV3k1ks5upiAqgaJpZM4NS1a8.

The feed generator is allowing illegal characters to enter the feed. Requiring human to remove invisible characters sounds like a workaround rather than a fix.

Sounds like no, it can't be closed. Issue is still prevalent and super hard to fix manually.

@m1guelpf There is no need for you to comment "can this be closed now" on every open issue, please stop doing it.

Sorry :(

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

We received a couple of bug reports for this issue. Not stale.

The feeds are valid. Removing unsupported iframes is an interoperability recommendation, not an error.

Nobody has worked on this in the last 2 years and the issue has no traction. If someone wants to work on it later then it can always be re-opened.

If there's any work needed around unicode character sanitisation, that should be specced in a separate issue so that someone is able to pick it up. Currently there's not even any info that would allow it to be worked on - so it's a dead end

image

Was this page helpful?
0 / 5 - 0 ratings

Related issues

rishabhgrg picture rishabhgrg  Â·  3Comments

kirrg001 picture kirrg001  Â·  3Comments

krokofant picture krokofant  Â·  3Comments

mattferderer picture mattferderer  Â·  4Comments

hjzheng picture hjzheng  Â·  4Comments