I'm trying to debug a segfault while using Gatsby that is potentially in the realm of using sharp.
(I'm a performance eng for Gatsby)
It seems that sites with lots of images can trigger a segfault and we're not sure when exactly this happens.
I get this fairly frequently with a simple site that just has many images (~1500 for my machine) on the example site posted (in http://github.com/gatsbyjs/gatsby/pull/6291) as a repro;
Clone https://github.com/jp887/gatsby-issue6291-repro
Update package.json to set all gatsby deps to *
Clear the npm cache (~/.npm)
Clear node_modules, in case that's relevant
npm install
yarn gen-images 1500 (It triggers with 1500 images for me, this will try to process 9000 images in total)
gatsby build
For me, often it segfaults roughly between 25% and 75%.
I'm on ubuntu (xfce) 19.10
Node v8.16.2 (through nvm)
I went through some of the older Gatsby issues relating to this bug. I am able to reproduce it with .simd(true) and .simd(false), and same for .cache(true) and .cache(false).
Reducing the available memory to node does not seem to make a difference (I set it to 500mb, half of the default, but saw no increase in crash rate). Was also able to repro it running single core.
Please let me know how I can help :)
Hi Peter, I previously helped one of the organisations experiencing the problems reported in https://github.com/gatsbyjs/gatsby/issues/6291 and there's a summary of the findings in https://github.com/gatsbyjs/gatsby/issues/6291#issuecomment-505097465
Thanks for bringing it to my attention again. By stressing my machine with other CPU-bound processes at the same time until it overheated and throttled the clock rate, I finally managed to reproduce a crash and get a backtrace using the latest sharp v0.23.3, its vendored copy of libvips v8.8.1 and the latest Node.js LTS v12.13.1.
(sharp:16801): GLib-GObject-WARNING **: 16:50:33.759: gtype.c:4265: type id '0' is invalid
(sharp:16801): GLib-GObject-WARNING **: 16:50:33.759: can't peek value table for type '<invalid>' which is not currently referenced
(sharp:16801): GLib-GObject-WARNING **: 16:50:33.762: gvalue.c:188: cannot initialize GValue with type '(null)', this type has no GTypeValueTable implementation
[Thread 0x7fff9affd700 (LWP 7136) exited]
Thread 10 "node" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffd6ffc700 (LWP 17345)]
0x00007fffd52c27d7 in g_type_check_value () from /gatsby-issue6291-repro/node_modules/sharp/build/Release/../../vendor/lib/libgobject-2.0.so.0
(gdb) bt
#0 0x00007fffd52c27d7 in g_type_check_value () from /gatsby-issue6291-repro/node_modules/sharp/build/Release/../../vendor/lib/libgobject-2.0.so.0
#1 0x00007fffd52c54e3 in g_value_transform () from /gatsby-issue6291-repro/node_modules/sharp/build/Release/../../vendor/lib/libgobject-2.0.so.0
#2 0x00007fffd5a3fe97 in ?? () from /gatsby-issue6291-repro/node_modules/sharp/build/Release/../../vendor/lib/libvips.so.42
#3 0x00007fffd5a3ff07 in ?? () from /gatsby-issue6291-repro/node_modules/sharp/build/Release/../../vendor/lib/libvips.so.42
#4 0x00007fffd5a50485 in vips_slist_map2 () from /gatsby-issue6291-repro/node_modules/sharp/build/Release/../../vendor/lib/libvips.so.42
#5 0x00007fffd5a405a4 in vips.image_copy_fields_array () from /gatsby-issue6291-repro/node_modules/sharp/build/Release/../../vendor/lib/libvips.so.42
#6 0x00007fffd5a3b262 in vips_image_pipeline_array () from /gatsby-issue6291-repro/node_modules/sharp/build/Release/../../vendor/lib/libvips.so.42
#7 0x00007fffd5a3b363 in vips_image_pipelinev () from /gatsby-issue6291-repro/node_modules/sharp/build/Release/../../vendor/lib/libvips.so.42
#8 0x00007fffd5939a16 in ?? () from /gatsby-issue6291-repro/node_modules/sharp/build/Release/../../vendor/lib/libvips.so.42
#9 0x00007fffd5a31b29 in vips_object_build () from /gatsby-issue6291-repro/node_modules/sharp/build/Release/../../vendor/lib/libvips.so.42
#10 0x00007fffd5a3cf98 in vips_cache_operation_buildp () from /gatsby-issue6291-repro/node_modules/sharp/build/Release/../../vendor/lib/libvips.so.42
#11 0x00007fffd5a4339d in vips_call_required_optional () from /gatsby-issue6291-repro/node_modules/sharp/build/Release/../../vendor/lib/libvips.so.42
#12 0x00007fffd5a43eaf in vips_call_split () from /gatsby-issue6291-repro/node_modules/sharp/build/Release/../../vendor/lib/libvips.so.42
#13 0x00007fffd5939d89 in vips_copy () from /gatsby-issue6291-repro/node_modules/sharp/build/Release/../../vendor/lib/libvips.so.42
#14 0x00007fffd5a383dd in vips_image_decode () from /gatsby-issue6291-repro/node_modules/sharp/build/Release/../../vendor/lib/libvips.so.42
#15 0x00007fffd589e075 in ?? () from /gatsby-issue6291-repro/node_modules/sharp/build/Release/../../vendor/lib/libvips.so.42
#16 0x00007fffd5a31b29 in vips_object_build () from /gatsby-issue6291-repro/node_modules/sharp/build/Release/../../vendor/lib/libvips.so.42
#17 0x00007fffd5a3cf98 in vips_cache_operation_buildp () from /gatsby-issue6291-repro/node_modules/sharp/build/Release/../../vendor/lib/libvips.so.42
#18 0x00007fffd5a4339d in vips_call_required_optional () from /gatsby-issue6291-repro/node_modules/sharp/build/Release/../../vendor/lib/libvips.so.42
#19 0x00007fffd5a43eaf in vips_call_split () from /gatsby-issue6291-repro/node_modules/sharp/build/Release/../../vendor/lib/libvips.so.42
#20 0x00007fffd589e9f7 in vips_reducev () from /gatsby-issue6291-repro/node_modules/sharp/build/Release/../../vendor/lib/libvips.so.42
#21 0x00007fffd58954e5 in ?? () from /gatsby-issue6291-repro/node_modules/sharp/build/Release/../../vendor/lib/libvips.so.42
#22 0x00007fffd5a31b29 in vips_object_build () from /gatsby-issue6291-repro/node_modules/sharp/build/Release/../../vendor/lib/libvips.so.42
#23 0x00007fffd5a3cf98 in vips_cache_operation_buildp () from /gatsby-issue6291-repro/node_modules/sharp/build/Release/../../vendor/lib/libvips.so.42
#24 0x00007ffff41fe905 in vips::VImage::call_option_string(char const*, char const*, vips::VOption*) ()
from /gatsby-issue6291-repro/node_modules/sharp/build/Release/../../vendor/lib/libvips-cpp.so.42
#25 0x00007ffff4210ee3 in vips::VImage::resize(double, vips::VOption*) const () from /gatsby-issue6291-repro/node_modules/sharp/build/Release/../../vendor/lib/libvips-cpp.so.42
#26 0x00007ffff444d7b1 in PipelineWorker::Execute() () from /gatsby-issue6291-repro/node_modules/sharp/build/Release/sharp.node
#27 0x00000000012d1c0e in worker (arg=0x0) at ../deps/uv/src/threadpool.c:122
#28 0x00007ffff7c42182 in start_thread (arg=<optimised out>) at pthread_create.c:486
#29 0x00007ffff7b6bb1f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
From what I can tell, the Gatsby build process is queuing filesystem-based input images one-at-a-time then attempting to concurrently generate 6 output images to memory, resized in various dimensions. All of these images are JPEGs.
sharp uses libuv-managed threads, which by default will limit this to 4 concurrently (and queue the other 2). It looks like Gatsby calls sharp.concurrency() and therefore vips_concurrency_set with the physical core count, so I think that's 12 concurrent threads accessing the same input image on a typical "4 core" CPU with hyper-threading.
If I set the GATSBY_CPU_COUNT=1 environment variable, which Gatsby uses to call sharp.concurrency(1) and therefore vips_concurrency_set(1) I cannot get it to crash, which suggests a possible threading problem. Setting GATSBY_CPU_COUNT=1 was part of the summary of https://github.com/gatsbyjs/gatsby/issues/6291#issuecomment-505097465 although in that context was due to CircleCI incorrectly reporting physical CPU core counts.
@jcupitt Can you see any possible thread-safety issues in this backtrace? Perhaps something in vips__image_copy_fields_array and vips__gslist_gvalue_merge?
I'll attempt to get it to crash with a debug build of libvips to try to discover which functions the ?? in the backtrace refer to.
Hi. I'm really happy to hear you were able to get a repro AND trace, as I was unable to get the trace.
I will try to go through the code in Gatsby tomorrow and see how we can improve that situation. I don't think it should be hammering the process like that in the first place as that has different perf implications as well.
By stressing my machine with other CPU-bound processes at the same time until it overheated and throttled the clock rate, I finally managed to reproduce a crash
That's interesting because on my machine, this happens all the time (whenever I trigger any computation) and have these messages in my logs every time. So perhaps the problem is related to clock throttling and stressing.
Anyways. Happy to hear you have a repro and hoping it'll improve the situation :)
(Perhaps my test with single cpu was botched, I can't say for sure right now, so take that with a grain of salt)
Here's a more detailed backtrace using the latest libvips master branch compiled from source:
#0 0x00007ffff405f047 in g_type_check_value () at /usr/lib/x86_64-linux-gnu/libgobject-2.0.so.0
#1 0x00007ffff4061d53 in g_value_transform () at /usr/lib/x86_64-linux-gnu/libgobject-2.0.so.0
#2 0x00007ffff420a9d7 in meta_new (image=0x7fffc8068640, name=0x7fffb40008d0 "\020\"", value=0x7fffd002de70) at header.c:325
#3 0x00007ffff420aa47 in meta_cp_field (meta=<optimised out>, dst=<optimised out>) at header.c:916
#4 0x00007ffff42144f5 in vips_slist_map2 (list=<optimised out>, fn=fn@entry=0x7ffff420aa30 <meta_cp_field>, a=a@entry=0x7fffc8068640, b=b@entry=0x0) at util.c:129
#5 0x00007ffff420b114 in meta_cp (src=0x7fffc00aa4d0, dst=0x7fffc8068640) at header.c:934
#6 0x00007ffff420b114 in vips__image_copy_fields_array (out=out@entry=0x7fffc8068640, in=in@entry=0x7fffd7ffae00) at header.c:981
#7 0x00007ffff4205e92 in vips_image_pipeline_array (image=image@entry=0x7fffc8068640, hint=hint@entry=VIPS_DEMAND_STYLE_THINSTRIP, in=in@entry=0x7fffd7ffae00) at generate.c:395
#8 0x00007ffff4205f9f in vips_image_pipelinev (image=0x7fffc8068640, hint=hint@entry=VIPS_DEMAND_STYLE_THINSTRIP) at generate.c:435
#9 0x00007ffff4153459 in vips_copy_build (object=0x7fff60070380) at copy.c:183
#10 0x00007ffff41fbdd9 in vips_object_build (object=0x7fff60070380) at object.c:367
#11 0x00007ffff4207ce8 in vips_cache_operation_buildp (operation=0x7fffd7ffd160) at cache.c:866
#12 0x00007ffff4207ce8 in vips_cache_operation_buildp (operation=operation@entry=0x7fffd7ffd160) at cache.c:842
#13 0x00007ffff420dd5d in vips_call_required_optional (operation=operation@entry=0x7fffd7ffd160, required=required@entry=0x7fffd7ffd190, optional=optional@entry=0x7fffd7ffd270) at operation.c:880
#14 0x00007ffff420e466 in vips_call_by_name (operation_name=<optimised out>, option_string=option_string@entry=0x0, required=required@entry=0x7fffd7ffd190, optional=optional@entry=0x7fffd7ffd270)
at operation.c:920
#15 0x00007ffff420e8b9 in vips_call_split (operation_name=operation_name@entry=0x7ffff4245ff6 "copy", optional=optional@entry=0x7fffd7ffd270) at operation.c:1024
#16 0x00007ffff41537c6 in vips_copy (in=in@entry=0x7fffc00aa4d0, out=out@entry=0x7fffd00d8e30) at copy.c:399
#17 0x00007ffff4202f6d in vips_image_decode (in=in@entry=0x7fffc00aa4d0, out=out@entry=0x7fffd00d8e30) at image.c:2910
#18 0x00007ffff40edf45 in vips_reducev_build(VipsObject*) (object=0x7fffd00d84c0) at reducev.cpp:860
#19 0x00007ffff41fbdd9 in vips_object_build (object=0x7fffd00d84c0) at object.c:367
#20 0x00007ffff4207ce8 in vips_cache_operation_buildp (operation=0x7fffd7ffd550) at cache.c:866
#21 0x00007ffff4207ce8 in vips_cache_operation_buildp (operation=operation@entry=0x7fffd7ffd550) at cache.c:842
#22 0x00007ffff420dd5d in vips_call_required_optional (operation=operation@entry=0x7fffd7ffd550, required=required@entry=0x7fffd7ffd580, optional=optional@entry=0x7fffd7ffd660) at operation.c:880
#23 0x00007ffff420e466 in vips_call_by_name (operation_name=<optimised out>, option_string=option_string@entry=0x0, required=required@entry=0x7fffd7ffd580, optional=optional@entry=0x7fffd7ffd660)
at operation.c:920
#24 0x00007ffff420e8b9 in vips_call_split (operation_name=operation_name@entry=0x7ffff423f420 "reducev", optional=optional@entry=0x7fffd7ffd660) at operation.c:1024
#25 0x00007ffff40ee574 in vips_reducev(VipsImage*, VipsImage**, double, ...) (in=in@entry=0x7fffc00aa4d0, out=out@entry=0x7fffd0045b60, vshrink=vshrink@entry=1.0666666666666667) at reducev.cpp:976
#26 0x00007ffff40e64d5 in vips_resize_build (object=0x7fff6c0c08b0) at resize.c:227
#27 0x00007ffff41fbdd9 in vips_object_build (object=0x7fff6c0c08b0) at object.c:367
#28 0x00007ffff4207ce8 in vips_cache_operation_buildp (operation=0x7fffd7ffd810) at cache.c:866
#29 0x00007ffff4207ce8 in vips_cache_operation_buildp (operation=operation@entry=0x7fffd7ffd810) at cache.c:842
#30 0x00007ffff4581b50 in vips::VImage::call_option_string(char const*, char const*, vips::VOption*) (operation_name=<optimised out>, option_string=0x0, options=0x7fffd002d8f0) at VImage.cpp:513
#31 0x00007ffff4597e43 in vips::VImage::resize(double, vips::VOption*) const (this=0x7fffd7ffd908, scale=0.9375, options=<optimised out>) at ../cplusplus/include/vips/VImage8.h:108
#32 0x00007ffff45fab5c in PipelineWorker::Execute() () at /gatsby-issue6291-repro/node_modules/sharp/build/Release/sharp.node
#33 0x00000000012d1c0e in worker (arg=0x0) at ../deps/uv/src/threadpool.c:122
#34 0x00007ffff7c42182 in start_thread (arg=<optimised out>) at pthread_create.c:486
#35 0x00007ffff7b6bb1f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
That's very interesting. I'll have a look today.
I think I found what's making those message before the crash:
https://github.com/libvips/libvips/blob/master/libvips/iofuncs/header.c#L311
If you change that to be:
/*
if( G_VALUE_TYPE( value ) == G_TYPE_STRING )
g_value_init( &meta->value, VIPS_TYPE_REF_STRING );
else
g_value_init( &meta->value, G_VALUE_TYPE( value ) );
*/
g_value_init( &meta->value, 0 );
And try vips copy x.jpg y.jpg you see:
$ vips copy k2.jpg x.v
(vips:14583): GLib-GObject-WARNING **: 16:33:11.184: ../../../gobject/gtype.c:4268: type id '0' is invalid
(vips:14583): GLib-GObject-WARNING **: 16:33:11.184: can't peek value table for type '<invalid>' which is not currently referenced
(vips:14583): GLib-GObject-WARNING **: 16:33:11.184: ../../../gobject/gvalue.c:185: cannot initialize GValue with type '(null)', this type has no GTypeValueTable implementation
The same set of errors, so probably the value param to meta_new() is pointing at a lot of zeros.
meta_cp takes a shallow copy of the list of metadata items on each input image, then calls meta_new() for each item. A metadata item is disappearing (probably) some time between the shallow copy of the list and calling meta_new().
This could be caused by an image being unreffed during the copy, but that seems unlikely to me, the chaos would be even worse.
It's more likely that some other thread is changing the metadata on the input image during the copy. Could this be happening?
libvips images have a list of arbitrary name/value pairs attached to them, and these are inherited as images are processed. They are used for things like ICC profiles and EXIF data, but user code can add extra metadata too if it's convenient. Image metadata is supposed to be immutable: after the image has been constructed, it's all frozen and must not be changed or you get bad behaviour like this.
However, at the moment, there's no lock mechanism to prevent you changing image metadata. How about adding something to enforce this? It could be very simple, eg:
void
vips_image_set( VipsImage *image, const char *name, GValue *value )
{
g_assert( name );
g_assert( value );
/* If this image is shared, block metadata changes.
*/
if( G_OBJECT( image )->ref_count > 1 ) {
g_info( "can't set metadata \"%s\" on shared image", name );
return;
}
meta_init( image );
(void) meta_new( image, name, value );
... etc.
Thanks John, amazing debugging skills there as always.
If I add your suggestion to libvips and run the sharp test suite against it I see things like:
can't set metadata "exif-data" on shared image
can't set metadata "hide-progress" on shared image
can't set metadata "offset" on shared image
which led me to a few possible sources within libvips:
https://github.com/libvips/libvips/blob/master/libvips/foreign/jpeg2vips.c#L613
https://github.com/libvips/libvips/blob/master/libvips/foreign/dzsave.c#L599
https://github.com/libvips/libvips/blob/master/libvips/convolution/convi.c#L841
Next step (for me) is to try to reproduce the original crash with this extra debugging in place.
Yes, bits of the test suite fail too, I'll fix them.
Thanks John, I also added this debug to vips_image_remove() and found the following occurrence:
https://github.com/libvips/libvips/blob/master/libvips/colour/profile_load.c#L208
Commit bb15cd9 ensures sharp always takes a copy before updating/removing metadata.
I made a PR to gather up any changes libvips needs for this. Please push any related patches there. I put your nice _remove one in.
I found a couple of places in libvips which were breaking the "metadata is frozen" rule.
The big one was vips__exif_update(): this merges other metadata fields into the exif block and is called before a save to formats that support exif. I think this is a plausible cause of these crashes: if one thread updates the exif block while another thread is trying to take a copy of the image metadata you'd see something like this.
I'll backport https://github.com/libvips/libvips/pull/1483 (but not the locks on metadata changes) to current stable 8.8 and it should be easy to test.
OK, the head of 8.8 (it's 8.8.4 at the moment) includes this change to metadata handling and (I hope!) should be easy to test.
Brilliant, thanks John, I've been unable to reproduce a segfault using the check-metadata-changes branch of libvips with the latest master branch of sharp.
That's great!
@pvdz would you be able to test 8.8.4? It should be a drop-in replacement for your current libvips.
(or are there related sharp changes that would be needed too, Lovell?)
I found a couple of places in sharp that could potentially cause this problem, addressed in https://github.com/lovell/sharp/commit/bb15cd906717ba054d9c23db2971e94d2e250969, which will be in sharp v0.23.4.
I'll wait for that release and check it. Sounds like great progress! Thanks :purple_heart:
I've also fixed a bug in Gatsby which has a side effect that, if we were hammering the binary before, we're certainly going to be smashing it now :p (The fix removes a perf problem relating to artificial delays caused by the cli output, so there's even way less delay now between scheduling image jobs)
Should I push out 8.8.4? Or should we test a bit first?
@kleisauke net-vips might need testing for this change too.
I found a few places in NetVips' test suite that changes metadata without first doing a copy(), which are resolved with https://github.com/kleisauke/net-vips/commit/3cd1bdf55119c9107c2de38d726cbcd5d3a6e436.
Our service also seems to have a few potential problems with modifying metadata, which are resolved with https://github.com/weserv/images/commit/e72b8fde66558f502437c016bab7e15c0117a49b. The segfault does not (yet) seem to have taken place in our production environment.
Note that I still see some GLib warnings while running libvips' test suite, see comment https://github.com/libvips/libvips/pull/1483#issuecomment-559860643.
Thanks for pointing those out, Kleis. I'll fix them over the weekend.
sharp v0.23.4 now available with the changes in commit bb15cd9. I suspect the changes to libvips discussed here will also be required to fully deal with this.
(I expect the future sharp v0.24.0 to provide a prebuilt version of a future libvips with these fixes, either v8.8.4 or v8.9.0.)
There's a pre-release of 8.8.4 here:
https://github.com/libvips/libvips/releases/tag/v8.8.4
I'll add some win binaries and test on mac, but it should be good to go.
Digging into it again today.
For each step I will;
(I'm not gatsby-dev for the sharp plugin because I'm not actually making changes to it. I'm starting on node 8 because that's where we found it the first time)
Baseline:
--> repro confirmed
[======================= ] 66.982 s 7445/9005 83% Generating image thumbnails
(sharp:24488): GLib-GObject-CRITICAL **: 14:30:50.635: g_value_transform: assertion 'G_IS_VALUE (src_value)' failed
Segmentation fault (core dumped)
After yarn, ran yarn upgrade gatsby-plugin-sharp and yarn upgrade gatsby-transfomer-sharp (not sure why that didn't work immediately when version was specified to *, but ok)
--> repro confirmed (second attempt)
[=== ] 31.495 s 1157/9005 13% Generating image thumbnails
(sharp:8265): GLib-GObject-WARNING **: 15:08:15.352: gtype.c:4265: type id '0' is invalid
(sharp:8265): GLib-GObject-WARNING **: 15:08:15.352: can't peek value table for type '<invalid>' which is not currently referenced
(sharp:8265): GLib-GObject-WARNING **: 15:08:15.352: gvalue.c:188: cannot initialize GValue with type '(null)', this type has no GTypeValueTable implementation
Segmentation fault (core dumped)
How can I test with the fixed libvips version? Even if I npm install sharp --build-from-source I see
info sharp Using cached ~/.npm/_libvips/libvips-8.8.1-linux-x64.tar.gz
@pvdz Please can you try the latest code on the wit branch, which will fetch a prebuilt libvips v8.9.0 for you. Set the version in your package.json file to "sharp": "lovell/sharp#wit".
(Repros after 3x in node 10 with [email protected], [email protected], [email protected], and vips 8.8.1)
Ok I'll try that
How can I tell it's using the proper version? Looks like other things in the sub-dependencies are downloading 8.8.1 while I also see 8.9.0 being downloaded. Any way to confirm which is used at runtime?
Regardless, updated the sharp version in the site's package.json to that
rm -rf node_modules ~/.npm
npm install
yarn
gatsby-dev
gatsby clean
gatsby build
Repro confirmed (node 10):
[========== ] 36.417 s 3551/9005 39% Generating image thumbnails
(sharp:13356): GLib-GObject-WARNING **: 15:59:19.457: gtype.c:4265: type id '0' is invalid
(sharp:13356): GLib-GObject-WARNING **: 15:59:19.457: can't peek value table for type '<invalid>' which is not currently referenced
(sharp:13356): GLib-GObject-WARNING **: 15:59:19.457: gvalue.c:188: cannot initialize GValue with type '(null)', this type has no GTypeValueTable implementation
Segmentation fault (core dumped)
I've tried to bump the dep versions of the gatsby plugin/transformer as well but not convinced that plays very well with gatsby-dev (it bumps the version but doesn't seem to install again). It repros that way too, but perhaps that's still leaning on 8.8.1, I can't tell.
Is there a way to tell sharp to use a particular version of the lib? Or to print which version it's using, at least?
v0.24.0 is now available with a prebuilt libvips v8.9.0.
Thank you! I'll try to repro (my) tomorrow
I think it still repros :(
_I tested this under node 8 (in case it matters, I see you bumped that). So if this is expected for node 8 but not 10 please let me know. It worked fine for three times before the repro happened, which is why I suspect the minimal bump was only related to the node 8 EOL. (Gatsby will have to wait for about a month to transition)._
Here's the output:
😶 11:14:37 (master) ~/gatsby/sites/issue6291-generated-images $ gatsby clean; gatsby build
info Deleting .cache, public
info Successfully deleted directories
success open and validate gatsby-configs - 0.036s
success load plugins - 0.575s
success onPreInit - 0.003s
success delete html and css files from previous builds - 0.152s
success initialize cache - 0.008s
success copy gatsby files - 0.021s
success onPreBootstrap - 0.010s
success createSchemaCustomization - 0.007s
success source and transform nodes - 4.358s
warn Plugin `gatsby-source-filesystem` tried to define the GraphQL type `File`, which has already been defined by the plugin `null`.
warn Plugin `gatsby-source-filesystem` tried to define the GraphQL type `File`, which has already been defined by the plugin `null`.
success building schema - 0.517s
success createPages - 0.025s
success createPagesStatefully - 0.126s
success onPreExtractQueries - 0.004s
success update schema - 0.065s
success extract queries from components - 0.312s
success write out requires - 0.007s
success write out redirect data - 0.003s
success Build manifest and related icons - 0.260s
success onPostBootstrap - 0.300s
â €
info bootstrap finished - 10.095 s
â €
success Generating image thumbnails - 2.303s - 6/6 2.60/s
warn Browserslist: caniuse-lite is outdated. Please run next command `yarn upgrade`
success Building production JavaScript and CSS bundles - 8.882s
success Rewriting compilation hashes - 0.009s
success run queries - 28.054s - 7/7 0.25/s
[======= ] 35.662 s 2255/9005 25% Generating image thumbnails
(sharp:11615): GLib-GObject-WARNING **: 11:15:48.735: ../gobject/gtype.c:4268: type id '0' is invalid
(sharp:11615): GLib-GObject-WARNING **: 11:15:48.735: can't peek value table for type '<invalid>' which is not currently referenced
(sharp:11615): GLib-GObject-WARNING **: 11:15:48.735: ../gobject/gvalue.c:187: cannot initialize GValue with type '(null)', this type has no GTypeValueTable implementation
(sharp:11615): GLib-GObject-CRITICAL **: 11:15:48.735: g_value_type_compatible: assertion 'dest_type' failed
(sharp:11615): GLib-GObject-WARNING **: 11:15:48.735: ../gobject/gtype.c:4268: type id '0' is invalid
(sharp:11615): GLib-GObject-WARNING **: 11:15:48.735: can't peek value table for type '<invalid>' which is not currently referenced
(sharp:11615): GLib-GObject-WARNING **: 11:15:48.735: ../gobject/gvalue.c:187: cannot initialize GValue with type '(null)', this type has no GTypeValueTable implementation
(sharp:11615): GLib-GObject-CRITICAL **: 11:15:48.735: g_value_type_compatible: assertion 'src_type' failed
(sharp:11615): GLib-GObject-WARNING **: 11:15:48.735: ../gobject/gtype.c:4268: type id '0' is invalid
(sharp:11615): GLib-GObject-WARNING **: 11:15:48.735: can't peek value table for type '<invalid>' which is not currently referenced
(sharp:11615): GLib-GObject-WARNING **: 11:15:48.736: ../gobject/gvalue.c:187: cannot initialize GValue with type '(null)', this type has no GTypeValueTable implementation
[======== ] 38.818 s 2675/9005 30% Generating image thumbnails
Mind you, there was no core dump. Nothing in dmesg/syslog. I guess it would be nice if this could somehow print the version of sharp / libvips, just to confirm, but other than that I don't really know.
Same repro steps as before.
For this one I hard bumped all packages and subdeps that were depending on sharp to 0.24.0 (ignored the minimal node requirement) and confirmed that while installing only the 8.9.0 version of libvips got installed. As such I'm fairly certain this reprod on 0.24.0 / 8.9.0
Here's another case. It does complete (I don't know how valid the output is, though), so there's that.
😶 11:23:05 (master) ~/gatsby/sites/issue6291-generated-images $ gatsby clean; gatsby build
info Deleting .cache, public
info Successfully deleted directories
success open and validate gatsby-configs - 0.035s
success load plugins - 0.496s
success onPreInit - 0.003s
success delete html and css files from previous builds - 0.123s
success initialize cache - 0.011s
success copy gatsby files - 0.029s
success onPreBootstrap - 0.010s
success createSchemaCustomization - 0.006s
success source and transform nodes - 4.355s
warn Plugin `gatsby-source-filesystem` tried to define the GraphQL type `File`, which has already been defined by the plugin `null`.
warn Plugin `gatsby-source-filesystem` tried to define the GraphQL type `File`, which has already been defined by the plugin `null`.
success building schema - 0.516s
success createPages - 0.018s
success createPagesStatefully - 0.078s
success onPreExtractQueries - 0.002s
success update schema - 0.037s
success extract queries from components - 0.313s
success write out requires - 0.007s
success write out redirect data - 0.005s
success Build manifest and related icons - 0.153s
success onPostBootstrap - 0.179s
â €
info bootstrap finished - 9.377 s
â €
success Generating image thumbnails - 3.703s - 6/6 1.62/s
warn Browserslist: caniuse-lite is outdated. Please run next command `yarn upgrade`
success Building production JavaScript and CSS bundles - 8.154s
success Rewriting compilation hashes - 0.021s
success run queries - 27.861s - 7/7 0.25/s
[====== ] 36.867 s 2087/9005 23% Generating image thumbnails
(sharp:18786): GLib-GObject-WARNING **: 11:24:04.837: ../gobject/gtype.c:4268: type id '0' is invalid
(sharp:18786): GLib-GObject-WARNING **: 11:24:04.837: can't peek value table for type '<invalid>' which is not currently referenced
(sharp:18786): GLib-GObject-WARNING **: 11:24:04.837: ../gobject/gvalue.c:187: cannot initialize GValue with type '(null)', this type has no GTypeValueTable implementation
(sharp:18786): GLib-GObject-CRITICAL **: 11:24:04.837: g_value_type_compatible: assertion 'dest_type' failed
(sharp:18786): GLib-GObject-WARNING **: 11:24:04.838: ../gobject/gtype.c:4268: type id '0' is invalid
(sharp:18786): GLib-GObject-WARNING **: 11:24:04.838: can't peek value table for type '<invalid>' which is not currently referenced
(sharp:18786): GLib-GObject-WARNING **: 11:24:04.838: ../gobject/gvalue.c:187: cannot initialize GValue with type '(null)', this type has no GTypeValueTable implementation
(sharp:18786): GLib-GObject-CRITICAL **: 11:24:04.838: g_value_type_compatible: assertion 'src_type' failed
(sharp:18786): GLib-GObject-WARNING **: 11:24:04.838: ../gobject/gtype.c:4268: type id '0' is invalid
(sharp:18786): GLib-GObject-WARNING **: 11:24:04.838: can't peek value table for type '<invalid>' which is not currently referenced
(sharp:18786): GLib-GObject-WARNING **: 11:24:04.838: ../gobject/gvalue.c:187: cannot initialize GValue with type '(null)', this type has no GTypeValueTable implementation
[=============== ] 61.293 s 5021/9005 56% Generating image thumbnails
(sharp:18786): GLib-GObject-WARNING **: 11:24:29.273: ../gobject/gtype.c:4268: type id '0' is invalid
(sharp:18786): GLib-GObject-WARNING **: 11:24:29.273: can't peek value table for type '<invalid>' which is not currently referenced
(sharp:18786): GLib-GObject-WARNING **: 11:24:29.273: ../gobject/gvalue.c:187: cannot initialize GValue with type '(null)', this type has no GTypeValueTable implementation
(sharp:18786): GLib-GObject-CRITICAL **: 11:24:29.273: g_value_type_compatible: assertion 'dest_type' failed
(sharp:18786): GLib-GObject-WARNING **: 11:24:29.274: ../gobject/gtype.c:4268: type id '0' is invalid
(sharp:18786): GLib-GObject-WARNING **: 11:24:29.274: can't peek value table for type '<invalid>' which is not currently referenced
(sharp:18786): GLib-GObject-WARNING **: 11:24:29.274: ../gobject/gvalue.c:187: cannot initialize GValue with type '(null)', this type has no GTypeValueTable implementation
(sharp:18786): GLib-GObject-CRITICAL **: 11:24:29.274: g_value_type_compatible: assertion 'src_type' failed
(sharp:18786): GLib-GObject-WARNING **: 11:24:29.274: ../gobject/gtype.c:4268: type id '0' is invalid
(sharp:18786): GLib-GObject-WARNING **: 11:24:29.274: can't peek value table for type '<invalid>' which is not currently referenced
(sharp:18786): GLib-GObject-WARNING **: 11:24:29.274: ../gobject/gvalue.c:187: cannot initialize GValue with type '(null)', this type has no GTypeValueTable implementation
success Generating image thumbnails - 95.105s - 9005/9005 94.68/s
success Building static HTML for pages - 0.878s - 5/5 5.70/s
info Generated public/sw.js, which will precache 6 files, totaling 245132 bytes.
The following pages will be precached:
/offline-plugin-app-shell-fallback/index.html
info Done building in 114.279 sec
Nevermind. It core dumped
😶 11:27:45 (master) ~/gatsby/sites/issue6291-generated-images $ gatsby clean; gatsby build
info Deleting .cache, public
info Successfully deleted directories
success open and validate gatsby-configs - 0.044s
success load plugins - 0.572s
success onPreInit - 0.003s
success delete html and css files from previous builds - 0.135s
success initialize cache - 0.008s
success copy gatsby files - 0.027s
success onPreBootstrap - 0.013s
success createSchemaCustomization - 0.007s
success source and transform nodes - 4.404s
warn Plugin `gatsby-source-filesystem` tried to define the GraphQL type `File`, which has already been defined by the plugin `null`.
warn Plugin `gatsby-source-filesystem` tried to define the GraphQL type `File`, which has already been defined by the plugin `null`.
success building schema - 0.453s
success createPages - 0.019s
success createPagesStatefully - 0.098s
success onPreExtractQueries - 0.003s
success update schema - 0.044s
success extract queries from components - 0.296s
success write out requires - 0.005s
success write out redirect data - 0.002s
success Build manifest and related icons - 0.178s
success onPostBootstrap - 0.205s
â €
info bootstrap finished - 9.670 s
â €
success Generating image thumbnails - 3.573s - 6/6 1.68/s
warn Browserslist: caniuse-lite is outdated. Please run next command `yarn upgrade`
success Building production JavaScript and CSS bundles - 9.059s
success Rewriting compilation hashes - 0.015s
success run queries - 36.429s - 7/7 0.19/s
[==== ] 38.540 s 1433/9005 16% Generating image thumbnails
(sharp:18556): GLib-GObject-WARNING **: 11:31:34.757: ../gobject/gtype.c:4268: type id '0' is invalid
(sharp:18556): GLib-GObject-WARNING **: 11:31:34.758: can't peek value table for type '<invalid>' which is not currently referenced
(sharp:18556): GLib-GObject-WARNING **: 11:31:34.758: ../gobject/gvalue.c:187: cannot initialize GValue with type '(null)', this type has no GTypeValueTable implementation
(sharp:18556): GLib-GObject-CRITICAL **: 11:31:34.758: g_value_type_compatible: assertion 'dest_type' failed
Segmentation fault (core dumped)
dmesg:
[1850720.417552] node[18577]: segfault at 7463656a62bc ip 00007f0f6411c26d sp 00007f0f64dbf5a8 error 4 in libgobject-2.0.so.0.6303.0[7f0f640e4000+5a000]
[1850720.417563] Code: 00 00 e8 b6 56 fd ff 31 c0 48 83 c4 58 c3 0f 1f 44 00 00 66 2e 0f 1f 84 00 00 00 00 00 48 81 ff fc 03 00 00 76 0f 48 83 e7 fc <48> 8b 47 50 c3 66 0f 1f 44 00 00 48 8d 05 41 44 22 00 48 c1 ef 02
Sadly I've been able to get https://github.com/jp887/gatsby-issue6291-repro to segfault with the latest sharp v0.24.0 (libvips v8.9.0) too:
(sharp:28295): GLib-GObject-WARNING **: 13:22:38.300: ../gobject/gtype.c:4268: type id '0' is invalid
(sharp:28295): GLib-GObject-WARNING **: 13:22:38.301: can't peek value table for type '<invalid>' which is not currently referenced
(sharp:28295): GLib-GObject-WARNING **: 13:22:38.301: ../gobject/gvalue.c:187: cannot initialize GValue with type '(null)', this type has no GTypeValueTable implementation
(sharp:28295): GLib-GObject-CRITICAL **: 13:22:38.301: g_value_type_compatible: assertion 'dest_type' failed
#0 0x00007ffff407d26d in g_type_parent () from gatsby-issue6291-repro/node_modules/sharp/build/Release/../../vendor/lib/libgobject-2.0.so.0
#1 0x00007ffff40822bb in ?? () from gatsby-issue6291-repro/node_modules/sharp/build/Release/../../vendor/lib/libgobject-2.0.so.0
#2 0x00007ffff4082fac in g_value_transform () from gatsby-issue6291-repro/node_modules/sharp/build/Release/../../vendor/lib/libgobject-2.0.so.0
#3 0x00007fffd5a88917 in ?? () from gatsby-issue6291-repro/node_modules/sharp/build/Release/../../vendor/lib/libvips.so.42
#4 0x00007fffd5a88987 in ?? () from gatsby-issue6291-repro/node_modules/sharp/build/Release/../../vendor/lib/libvips.so.42
#5 0x00007fffd5a99045 in vips_slist_map2 () from gatsby-issue6291-repro/node_modules/sharp/build/Release/../../vendor/lib/libvips.so.42
#6 0x00007fffd5a89064 in vips.image_copy_fields_array () from gatsby-issue6291-repro/node_modules/sharp/build/Release/../../vendor/lib/libvips.so.42
#7 0x00007fffd5a83bf2 in vips_image_pipeline_array () from gatsby-issue6291-repro/node_modules/sharp/build/Release/../../vendor/lib/libvips.so.42
#8 0x00007fffd5a83cf3 in vips_image_pipelinev () from gatsby-issue6291-repro/node_modules/sharp/build/Release/../../vendor/lib/libvips.so.42
#9 0x00007fffd5978636 in ?? () from gatsby-issue6291-repro/node_modules/sharp/build/Release/../../vendor/lib/libvips.so.42
#10 0x00007fffd5a79c09 in vips_object_build () from gatsby-issue6291-repro/node_modules/sharp/build/Release/../../vendor/lib/libvips.so.42
#11 0x00007fffd5a85998 in vips_cache_operation_buildp () from gatsby-issue6291-repro/node_modules/sharp/build/Release/../../vendor/lib/libvips.so.42
#12 0x00007fffd5a8c0dd in vips_call_required_optional () from gatsby-issue6291-repro/node_modules/sharp/build/Release/../../vendor/lib/libvips.so.42
#13 0x00007fffd5a8cbbf in vips_call_split () from gatsby-issue6291-repro/node_modules/sharp/build/Release/../../vendor/lib/libvips.so.42
#14 0x00007fffd5978979 in vips_copy () from gatsby-issue6291-repro/node_modules/sharp/build/Release/../../vendor/lib/libvips.so.42
#15 0x00007fffd5a80c8d in vips_image_decode () from gatsby-issue6291-repro/node_modules/sharp/build/Release/../../vendor/lib/libvips.so.42
#16 0x00007fffd58d04f8 in ?? () from gatsby-issue6291-repro/node_modules/sharp/build/Release/../../vendor/lib/libvips.so.42
#17 0x00007fffd5a79c09 in vips_object_build () from gatsby-issue6291-repro/node_modules/sharp/build/Release/../../vendor/lib/libvips.so.42
#18 0x00007fffd5a85998 in vips_cache_operation_buildp () from gatsby-issue6291-repro/node_modules/sharp/build/Release/../../vendor/lib/libvips.so.42
#19 0x00007ffff42b6ad5 in vips::VImage::call_option_string(char const*, char const*, vips::VOption*) () from gatsby-issue6291-repro/node_modules/sharp/build/Release/../../vendor/lib/libvips-cpp.so.42
#20 0x00007ffff42c9893 in vips::VImage::resize(double, vips::VOption*) const () from gatsby-issue6291-repro/node_modules/sharp/build/Release/../../vendor/lib/libvips-cpp.so.42
#21 0x00007ffff4507f15 in PipelineWorker::Execute() () from gatsby-issue6291-repro/node_modules/sharp/build/Release/sharp.node
#22 0x00000000012d1c0e in worker (arg=0x0) at ../deps/uv/src/threadpool.c:122
#23 0x00007ffff7c42182 in start_thread (arg=<optimised out>) at pthread_create.c:486
#24 0x00007ffff7b6bb1f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
I added the following to its package.json to achieve this:
+ "resolutions": {
+ "sharp": "v0.24.0"
+ },
How frustrating!
vips.image_copy_fields_array in the stack trace makes it look like this is still the metadata issue biting us in the butt.
Since the warning about shared metadata is not being printed, perhaps the sharing is happening after metadata copy starts? Something like:
vips_image_set() to set an item of metadata.How about adding another check to vips_image_set(), but at the end. It should check to make sure than the image refcount is still 1, ie. no one has taken a copy while it was working.
This won't fix the problem, of course, but it might help confirm that this is what is happening.
void
vips_image_set( VipsImage *image, const char *name, GValue *value )
{
g_assert( name );
g_assert( value );
/* If this image is shared, block metadata changes.
*/
if( G_OBJECT( image )->ref_count > 1 ) {
g_warning( "can't set metadata \"%s\" on shared image", name );
return;
}
meta_init( image );
(void) meta_new( image, name, value );
/* If we're setting an EXIF data block, we need to automatically expand
* out all the tags. This will set things like xres/yres too.
*
* We do this here rather than in meta_new() since we don't want to
* trigger on copy_fields.
*/
if( strcmp( name, VIPS_META_EXIF_NAME ) == 0 )
if( vips__exif_parse( image ) )
g_warning( "image_set: bad exif data" );
+ if( G_OBJECT( image )->ref_count > 1 ) {
+ g_warning( "image was copied during set of metadata \"%s\"", name );
+ return;
+ }
#ifdef DEBUG
meta_sanity( image );
#endif /*DEBUG*/
}
I had another idea: there's a left-over int field in VipsImage called Length (don't ask).
How about setting this to 42 in copy(), then in vips_image_set() verifying that it has the value 42. That should find all cases where we've forgotten to call copy before modifying metadata.
I've made a branch with this change and quite a few tests now fail. I'll investigate this weekend.
PS. bumping the minimal node version would warrant major version bump. The minimal node requirement bump was a bit of a surprise to me. It would be super awesome if, once the fix is there, it could be released with the node version bound to the previous value (8), before bumping that. I don't know how long Gatsby needs to stay 8, hopefully not long, but certainly for a few more weeks.
@jcupitt Brilliant, thanks John.
@pvdz In terms of semver, sharp is at major version zero, which means breaking changes such as removal of support for Node.js 8 arrive in minor increments and remain compliant with e.g. use of the ^ range selector.
The removal of support for Node.js 8 unblocks #1282, which I believe is of interest to Gatsby amongst many others. The ABI stability that N-API will (hopefully) bring is pretty much the last remaining task before we organise the "sharp version 1" release party.
I was thinking about this again.
Even if we find all of the modifications to metadata in potentially shared
images in sharp and libvips, other libvips users could still be in error
and get bitten by these races.
The problem is that the API allows potentially racey behaviour and has no
way of reliably checking or enforcing the conventions that user code is
supposed to follow to be safe.
How about adding some mutexes to make the current API safe, and designing a
new, non-racey API for the next version. I think we’d just need to add a
global lock for setting image metadata and for metadata copy.
It’s a bit ugly, but the performance hit should be minor.
What do you think?
On Sat, 18 Jan 2020 at 10:18, Lovell Fuller notifications@github.com
wrote:
@jcupitt https://github.com/jcupitt Brilliant, thanks John.
@pvdz https://github.com/pvdz In terms of semver, sharp is at major
version zero, which means breaking changes such as removal of support for
Node.js 8 arrive in minor increments and remain compliant with e.g. use of
the ^ range selector.The removal of support for Node.js 8 unblocks #1282
https://github.com/lovell/sharp/issues/1282, which I believe is of
interest to Gatsby amongst many others. The ABI stability that N-API will
(hopefully) bring is pretty much the last remaining task before we organise
the "sharp version 1" release party.—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/lovell/sharp/issues/1986?email_source=notifications&email_token=AAENZ26PUK2JBJBMO3RZJH3Q6LJQTA5CNFSM4JRY4XJ2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEJJVCNI#issuecomment-575885621,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AAENZ22RBBK7RIMDCAYARC3Q6LJQTANCNFSM4JRY4XJQ
.
Does it have to be a global mutex though @jcupitt ?
Can't it possibly be one mutex per object at which level metadata are abstracted in libvips?
Unless the metadata lib is not thread-safe itself?
It could be a lock on the image, but there isn’t a convenient one we can
use, so we’d need to add another member. I think I’d be nervous about doing
that in a minor release.
The libvips global lock is only used very lightly, so there should be
little contention. It ought to be a safe and simple fix for now.
We can try something better for 8.10.
I made a branch with locks for set / remove / copy. Hopefully this will resolve the crash, or at least change the stack trace.
Let's remove this hack and add some new API for 8.10.
I can no longer reproduce this problem when using libvips compiled from the lock-for-metadata-changes branch.
That's great!
@pvdz if you give a thumbs-up as well, let's push out 8.9.1 with this fix.
I'm a little busy to dig deep I'm afraid.
Would suggest to push this as is. Sounds some things have been fixed and that's an improvement regardless. If I can repro it later I'll report back.
Thanks!
OK, let's merge and make 8.9.1.
@lovell with libvips v8.9.1 available now, can we get a sharp v0.24.1 containing the updated dependency? i'd love to see these segfaults gone ;)
@oezi sharp will always look for and use a globally-installed libvips in preference to its prebuilt copy to allow you to use a newer libvips with an older sharp.
If you could also release 8.9.1 in https://github.com/lovell/sharp-libvips/releases I can test it. Or another easy way to temporarily get a contained build (on linux, ubuntu) to test drive. I'm fine with inline replacing the cached 8.9.0 file with a new version as a workaround.
Not sure what your release schedule is/was here? :)
@lovell can you give us any update on this, please?
The yield branch that will become the forthcoming sharp v0.25.0 depends on and provides libvips v8.9.1.
npm i lovell/sharp#yield
sharp v0.25.0 will also include the N-API migration (see #1282) as well as the return of support for 32-bit node.exe on Windows to allow for easier use with Azure.
It seems I can no longer trigger the problem on this branch. 30 or 40 runs on high system load (aka a Zoom call, lol) without signs of trouble. Thank you!
(While the repro above was for about 9k thumbs, I was also able to successfully generate 30k thumbnails, back to back, without signs of the problem)
Can confirm, no more segfaults with the yield branch.
This is great news, thank you very much for testing!
sharp v0.25.0 with libvips v8.9.1 is now available; enjoy!
Awesome. Thanks for the effort on your end(s)! :D We expect to upgrade soon and ship this as well.
We've now bumped Sharp in Gatsby, affecting the following versions. Thanks again!
Most helpful comment
That's very interesting. I'll have a look today.