php:7.1-fpm-alpine - unexpected script behavior

Created on 2 Apr 2018  路  6Comments  路  Source: docker-library/php

Hello!
This script behavior differs between php version you distribute and all other interpreters (even online ones)

<?php
$elementMarkup = '
    <!DOCTYPE html>
    <html lang="en">
    <head>
        <meta charset="UTF-8"/>
        <title>Title</title>
    </head>
    <body>

    </body>
    </html>
';

$DOMDocument = new DOMDocument();

$elementMarkup = mb_convert_encoding($elementMarkup, 'HTML-ENTITIES', 'UTF-8');

$DOMDocument->loadHTML($elementMarkup);

$DOMDocument->removeChild($DOMDocument->doctype);

var_dump($DOMDocument->firstChild->firstChild->firstChild);

What is expected

object(DOMElement)#2 (18) {
  ["tagName"]=>
  string(4) "meta"
  ["schemaTypeInfo"]=>
  NULL
  ["nodeName"]=>
  string(4) "meta"
  ["nodeValue"]=>
  string(0) ""
  ["nodeType"]=>
  int(1)
  ["parentNode"]=>
  string(22) "(object value omitted)"
  ["childNodes"]=>
  string(22) "(object value omitted)"
  ["firstChild"]=>
  NULL
  ["lastChild"]=>
  NULL
  ["previousSibling"]=>
  NULL
  ["nextSibling"]=>
  string(22) "(object value omitted)"
  ["attributes"]=>
  string(22) "(object value omitted)"
  ["ownerDocument"]=>
  string(22) "(object value omitted)"
  ["namespaceURI"]=>
  NULL
  ["prefix"]=>
  string(0) ""
  ["localName"]=>
  string(4) "meta"
  ["baseURI"]=>
  NULL
  ["textContent"]=>
  string(0) ""
}

What happens when the script executed in a container


Dockerfile

FROM php:7.1-fpm-alpine

RUN apk upgrade --update && apk --no-cache add \
    bash nodejs autoconf file g++ gcc binutils isl libatomic libc-dev musl-dev make re2c libstdc++ libgcc libcurl \
    curl-dev binutils-libs mpc1 mpfr3 gmp libgomp coreutils freetype-dev libjpeg-turbo-dev libltdl libmcrypt-dev \
    libpng-dev openssl-dev libxml2-dev expat-dev icu-dev libxslt libxslt-dev

RUN docker-php-ext-install -j$(nproc) iconv mysqli pdo pdo_mysql curl bcmath mcrypt mbstring json xml zip opcache intl xsl \
    && docker-php-ext-configure gd --with-freetype-dir=/usr/include/ --with-jpeg-dir=/usr/include/ \
    && docker-php-ext-install -j$(nproc) gd

RUN apk --no-cache add libintl gettext-dev

RUN docker-php-ext-install -j$(nproc) gettext

RUN apk add --no-cache $PHPIZE_DEPS \
    && echo no | pecl install redis \
    && docker-php-ext-enable redis

RUN apk add --no-cache $PHPIZE_DEPS \
    && pecl install xdebug \
    && docker-php-ext-enable xdebug

RUN apk add postfix

RUN postfix start

ADD launch.sh /launch

CMD ["/launch"]

launch.sh

#!/bin/bash -e

postfix start
php-fpm

php.ini

I tried to find a reason among libxml versions, but I can not. I hope you can make things clear.

This snippet was taken from a library October CMS depends on and the issue makes impossible to run a whole project in a container.
I also tried lower php down to 7.0 but it didn't help.
Thank you.

Issue

Most helpful comment

This actually has nothing to do with alpine, muslc or php...

What you're actually seeing is the effect of https://github.com/GNOME/libxml2/commit/0b2d5c48e3e0c16e434450057927ad4aa52f9f5c .
By default xmlKeepBlanksDefaultValue is true (resulting in var_dump -> NULL !), but libxml2 before v2.9.5 did not initialize that field in htmlInitParserCtxt, so it stayed false (resulting in var_dump -> object(DOMElement)...). The alpine version you were testing against already had a newer libxml2, compared to "all other interpreters".

The discrepancy disappears as soon as htmlCtxtUseOptions is called..., but php does not call that for loadHTML(..., $options=0). To get the pre-v2.9.5-behavior, you might want to use loadHTML(..., LIBXML_NOBLANKS).

All 6 comments

Looks like this is a quirk of something in Alpine (either Alpine's libc implementation, musl, or Alpine's version of libxml); I can reproduce with Alpine's php7 package as well (so there's probably not anything we can do here):

$ cat test.php
<?php
$elementMarkup = '
    <!DOCTYPE html>
...

$ docker run -it --rm -v "$PWD:/foo" -w /foo alpine:3.7
/foo # apk add --no-cache php7 php7-xsl php7-xml php7-mbstring
fetch http://dl-cdn.alpinelinux.org/alpine/v3.7/main/x86_64/APKINDEX.tar.gz
fetch http://dl-cdn.alpinelinux.org/alpine/v3.7/community/x86_64/APKINDEX.tar.gz
(1/15) Installing php7-common (7.1.17-r0)
(2/15) Installing ncurses-terminfo-base (6.0_p20171125-r0)
(3/15) Installing ncurses-terminfo (6.0_p20171125-r0)
(4/15) Installing ncurses-libs (6.0_p20171125-r0)
(5/15) Installing libedit (20170329.3.1-r3)
(6/15) Installing pcre (8.41-r1)
(7/15) Installing libxml2 (2.9.7-r0)
(8/15) Installing php7 (7.1.17-r0)
(9/15) Installing php7-mbstring (7.1.17-r0)
(10/15) Installing php7-xml (7.1.17-r0)
(11/15) Installing php7-dom (7.1.17-r0)
(12/15) Installing libgpg-error (1.27-r1)
(13/15) Installing libgcrypt (1.8.1-r0)
(14/15) Installing libxslt (1.1.31-r0)
(15/15) Installing php7-xsl (7.1.17-r0)
Executing busybox-1.27.2-r7.trigger
OK: 21 MiB in 26 packages
/foo # php test.php
NULL

(If I switch from the Alpine-based images to the Debian-based images, the provided script works properly as-is in both PHP 7.1 _and_ 7.2.)

@tianon yes, due to the fact alpine image ships with 'broken' php we switched to Debian Jessy image.

Closing since the issue is inherent to alpine and not something we would change in the image

Is there any plan in adjusting php sources, so that the issue is fixed? There have been a lot of problems with muslc and a lot of software (ie. mysql), but time is moving forward and alpine-based images are getting more and more popular and more and more of software is adjusted in a way that it compiles/runs fine on alpine-based environments...

This actually has nothing to do with alpine, muslc or php...

What you're actually seeing is the effect of https://github.com/GNOME/libxml2/commit/0b2d5c48e3e0c16e434450057927ad4aa52f9f5c .
By default xmlKeepBlanksDefaultValue is true (resulting in var_dump -> NULL !), but libxml2 before v2.9.5 did not initialize that field in htmlInitParserCtxt, so it stayed false (resulting in var_dump -> object(DOMElement)...). The alpine version you were testing against already had a newer libxml2, compared to "all other interpreters".

The discrepancy disappears as soon as htmlCtxtUseOptions is called..., but php does not call that for loadHTML(..., $options=0). To get the pre-v2.9.5-behavior, you might want to use loadHTML(..., LIBXML_NOBLANKS).

Was this page helpful?
0 / 5 - 0 ratings