October: Mail HTML content gets trimmed when top-level text-node is found

Created on 22 Jan 2021  路  5Comments  路  Source: octobercms/october

  • OctoberCMS Build: 470
  • PHP Version: 7.4, 7.3

Description:

If you have an HTML mail template that contains a top-level text-node, the HTML content after (and including) the text-node are stripped from the rendered email template.

For example, the following mail template content would be stripped:

subject = "Test email"
==
<p>Should I stay?</p> or <p>Should I go now?</p>

Instead of <p>Should I stay?</p> or <p>Should I go now?</p> it would be stripped to <p>Should I stay?</p>.

Workaround

The issue seems to be easily worked around by performing one of the following actions:

Remove text-node or wrap it in an element, such as <p>

subject = "Test email"
==
<p>Should I stay?</p> <p>or</p> <p>Should I go now?</p>

Wrap all top-level nodes in parent element

subject = "Test email"
==
<div>
    <p>Should I stay?</p> or <p>Should I go now?</p>
</div>

Steps To Reproduce:

  1. Create new mail template with 2x top-level elements and a top-level text-node in between (e.g. example above)
  2. Register this mail template (e.g. acme.test::mail.test)
  3. Send an email using this template (e.g. Mail::send('acme.test::mail.test'))

You should then receive the email containing only the first HTML element.

Deeper look

The message goes through various classes with the correct template then it reaches the Parsedown class:

MailManager.php#143: $html = $this->renderTemplate($template, $data);
MailManager.php#189: $html = $this->render($template->content_html, $data);
MailManager.php#180: $html = Markdown::parseSafe($html);
Markdown.php#74: $result = $this->parse($text);
Markdown.php#46: return $this->parseInternal($text);
Markdown.php#103: $result = $this->getParser()->$method($result);
ParsedownExtra.php#46: $markup = parent::text($text);
Parsedown.php#39: $markup = $this->lines($lines);

It appears that Parsedown->lines() is modifying the HTML content and stripping the text-node and content after.

I don't imagine this would be intentional as it's still technically valid HTML. A quick fix (although a bit hacky) would be to wrap the content in a <div> element before passing it down for parsing, then unwrap the div on the way back up.

Low Accepted Bug

Most helpful comment

Oh ho, an undercover agent eh? 馃槀 As long as we include some unit tests for it I'm fine with having that fix made into a PR @bradietilley

All 5 comments

I don't imagine this would be intentional as it's still technically valid HTML

Are unwrapped text nodes actually valid HTML? Just because a browser will render them doesn't necessarily mean they're valid HTML.

I understand what you mean, although I would say the nodes inside these partials should [almost] never be treated as top level nodes as they're always nested within some other form of HTML. If these partials were dispatched on their own (i.e. no layout) then I would for sure understand them being stripped, but all mail partials have a parent layout (and layouts are [probably] never empty).

The issue at hand is LIKE the following example: You have a top level <td> element in a mail template that is a child partial to a layout that contains: <tr>{partial-here}</tr>. Yes, the <td> element(s) on their own are definitely invalid, but not in the context used.

Another example would be an alert component/partial that has the following content: <div class="alert alert-{{type}}">{% partial 'alert-content' message=message %}</div>, and then the 'alert-content.htm' partial contains the following: Your account has been suspended. The HTML content inside alert-content.htm is invalid, but it's not valid within the parent div.

While I understand this is definitely low priority, it's likely people run into this issue every now and then. For me I came across this because there was a stray semi-colon that I accidentally had floating around in my mail template, and half of the email was missing. Thankfully the project is still in development and the email had nothing important (codes, reset links, etc) so no harm no foul.

In my opinion, provided it doesn't affect others negatively, the following should be changed:

$html = Markdown::parseSafe($html);

to

$html = substr(Markdown::parseSafe('<div>' . $html . '</div>'), 5, -6); // or something a little more stable

Just a thought anyway :)

Your suggested fix seems pretty decent to me, any thoughts on the above @bennothommo?

@LukeTowers I agree (fair disclosure: @bradietilley is my colleague, and we've been discussing this in-house) :P

Oh ho, an undercover agent eh? 馃槀 As long as we include some unit tests for it I'm fine with having that fix made into a PR @bradietilley

Was this page helpful?
0 / 5 - 0 ratings

Related issues

sozonovalexey picture sozonovalexey  路  3Comments

mittultechnobrave picture mittultechnobrave  路  3Comments

EbashuOnHolidays picture EbashuOnHolidays  路  3Comments

oppin picture oppin  路  3Comments

m49n picture m49n  路  3Comments