Darktable: a8f47e409 continuously hanging

Created on 30 Sep 2020  路  29Comments  路  Source: darktable-org/darktable

a8f47e409 continuously hanging

running under gdb, outputs:
http://wahoo.no-ip.org/~paka/dt.a8f47e409.gdb.txt
http://wahoo.no-ip.org/~paka/dt.a8f47e409.1.gdb.txt
http://wahoo.no-ip.org/~paka/dt.a8f47e409.2.gdb.txt
http://wahoo.no-ip.org/~paka/dt.a8f47e409.3.gdb.txt

openSUSE Tumbleweed 20200920
NVIDIA GF106 [GeForce GTS 450], 390.138
darktable-3.3.0~git877.f8b51737c
OpenCL loaded but not available
i7 12-core 36GB

pending average lua high

All 29 comments

w/o gdb

seg fault
double free or corruption (fasttop)
Aborted (core dumped)

coredump @ http://wahoo.no-ip.org/~paka/dt.coredump.a8f47e409.lz4

There is some lua path in backtrace, can you remove all your lua script and test again?

certainly, in fact, I was already testing that as that is the only dt change I have made aside from updating master when available.

I moved luarc out of path.

will report. tks

@wpferguson ping

It appears to be an lua problem

Now what to do to determine the actual problem?

my luarc reads:

require "tools/script_manager"
require "official/generate_image_txt"
require "official/image_path_in_ui"
require "official/import_filter_manager"
require "contrib/ext_editor"
require "contrib/gimp"

Okay, I'm assuming that by pinging me everything ran ok without the luarc file. :-)

First the bad news. I just compiled and ran a8f47e409, with multiple lua scripts running and had no issues.

Now the good(?) news. I have had this same problem happen to me several times with different builds. When I've had it happen, I was usually scrolling through the lighttable. I'm not sure if it really is hung or it just takes a long time to produce a core dump. I've waited it out a couple of times and darktable finally closed.

So, since you have the problem occurring for you let's try a few tests running darktable from a terminal.

  1. start darktable with darktable to make it open in darkroom instead of lighttable. If it's working, try scrolling the filmstrip and see if it hangs.

  2. Move your configuration directory to .old and start darktable and let it create a new configuration directory. Add lua scripts and your luarc. Import some files and see what happens. For what it's worth, this is how I've been getting around the problem.

  3. If 2 works, copy the darktablerc file from .old and see if it still works. Then try data.db, and finally library.db.

I've tried troubleshooting this several times. The culprit appears to be a double free on a widget, but I can never determine the widget, what created it, and what freed it. I'm mystified by this bug because nothing has changed with the lua code that keeps showing up as the problem. Possibly it's a latent bug that some other piece of code has caused to become exposed. The only somewhat common thing I've noticed is that it tends to happen after I've used the liquefy module on an image. Not right away, and not always, but I had never seen this bug until I started playing with liquefy.

when it appears to happen to me is switching forward/backward thru a collection in darkroom mode. Do not think I ever noticed in lighttable

  1. explain how to ensure it opens in darkroom mode. Only time I notice this is opening a single image from command-line or another app.

  2. I just did the lua dance, git,clone/... But that was before the hangs...

I don't believe I have ever utilized the "liquefy" module ...

There will be some delay as I need to finish my current set to publish for the High School parents.

fwiw: before doing the lua dance/git clone/... I hand edited luarc and didn't have this problem.
My hand edited luarc:

require "official/yield"
local darktable = require "darktable"
print("Hello World ! darktable LUA speaking :^)")
require "lib/dtutils"
require "lib/dtutils/file"
require "lib/dtutils/log"
require "lib/dtutils/string"
require "lib/dtutils/debug"
require "lib/dtutils/system"
require "official/copy_paste_metadata"
require "official/delete_long_tags"
require "official/delete_unused_tags"
require "official/enfuse"
require "official/generate_image_txt"
require "official/image_path_in_ui"
require "official/import_filter_manager"
dump = darktable.debug.dump
print(dump(darktable.register_import_filter))
require "contrib/LabelsToTags"
require "contrib/copy_attach_detach_tags"
require "contrib/ext_editor"
require "contrib/gimp"
require "contrib/hugin"
require "contrib/passport_guide"
require "contrib/rename-tags"
require "contrib/select_untagged"

I did notice problems with:
"contrib/autostyle"
"contrib/enfuseAdvanced"
require "contrib/image_time.lua"
require "tools/script_manager"

maybe this will help

explain how to ensure it opens in darkroom mode. Only time I notice this is opening a single image from command-line
that's what I wanted you to do

I thought of something else. Empty your luarc file, so that no script is running, but leave the luarc file there. Then see if you can make it crash.

I save my last crashing darktable config directory. I changed back to it, ran darktable scrolled up and down through lighttable and got a crash. Emptied my luarc file, but letf it there and tried again - no crash. Added just script_manager with no scripts enabled - no crash. Started darktable, scrolled back and forth with no crash, enabled enfuseAdvanced, scrolled some more and got a crash. Disabled enfuseAdvanced - no crash. Enabled postsharpen (since it adds gui elements to the exporter also just to see if it had something to do with the exporter) - no crash. So, it seems, at least on my system that enfuseAdvanced is the culprit though I can see no reason why that should be.

@ptilopteri can you try with your original luarc and enfusedAdvanced disabled?

There will be some delay as I need to finish my current set to publish for the High School parents.

I understand, I shoot high school sports...

@wpferguson My original luarc that "worked for me" had enfusedAdvanced disabled:

--require "contrib/enfuseAdvanced"

And I had no problem with it

More explanation. Yesterday I read more about your automagic lua handling and decided
to try it rather than continually having to check scripts for updates, ... And I followed your
posted instructions to the letter except for adding:
require "official/generate_image_txt"
require "official/image_path_in_ui"
require "official/import_filter_manager"

and had no problems.

Would it make sense to return luarc as your script makes it and disable
"contrib/enfuseAdvanced" to see if the crashes return. I can do that and continue to
process the current set.

Sounds good, let's try that.

quick answer and this time it dumped in lighttable, just scrolling thru the collection

corrupted double-linked list
Aborted (core dumped)

luarc:

require "tools/script_manager"
--require "contrib/enfuseAdvanced"
require "official/generate_image_txt"
require "official/image_path_in_ui"
require "official/import_filter_manager"
require "contrib/ext_editor"
require "contrib/gimp"

commenting out all entries in luarc

Okay, I can make it crash anytime with just enfuseAdvanced. I guess now we need to see what other scripts cause it. script_manager worked for me with no problems. I'll try the rest of your list and see if there are others that cause the problem.

does not appear to dump with all entries in luarc commented out.

will re-enable my previous "hand edited" luarc quoted several comments ago.

note: when I was crashing earlier before determined lua problem, I was running your
script as it comes with three added lines:
require "official/generate_image_txt"
require "official/image_path_in_ui"
require "official/import_filter_manager"

and I had not enabled anything. The only enabled scripts were from your
script_manager. And I was crashing.

Which scripts did you enable from script_manager?

none, the only ones enabled were the ones your script enabled
automagically.

The only ones it starts are the ones that were started by it and not disabled when it shut down. It saves it's state in the darktablerc file with entries like lua/script_manager//

I'll also test these against 3.2.1, 3.0, and 2.6.2 and see if they cause crashes there too.

@wpferguson thanks for the help. Late here, back in the morning if you need anything else. tks

Enabled enfuseAdvanced, image_stack, and geo_toolbox and ran them against 3.2.1 with no crashes. So, it appears that it's something introduced to the API, or that affects the API, since 3.2.1.

@AlicVB

git bisect returns

commit 83322221c64cb1b6653944aa2cfce15f285c70b1
Author: AlicVB dev@lnaa.fr
Date: Sun Aug 30 15:42:45 2020 +0200

thumbtable : ensure mouseover is updated after scroll

The most common error I see iscorrupted double-linked list, though there are others including segfaults.

To create the error

  1. install the lua scripts
  2. in the luarc file put the lines
require "contrib/enfuseAdvanced"
require "contrib/geoToolbox"
require "contrib/image_stack"
  1. Open darktable in lighttable mode with a collection of 300+ images. Scroll from one end of the collection to the other and back until it crashes. On my 337 play raw image collection I go from the top to the bottom back to the top and start down again and then it hangs or crashes. You can also make this happen in darkroom mode scrolling the filmstrip.

@ptilopteri thanks for your help.

@AlicVB I will look at these 3 scripts and see if there is something special about them that triggers this

As a further test, I commented out the 2 lines of code that were added in the bisected commit, compiled and tested, and no crash. Uncommented them, recompiled and crashed.

I've spent the last couple of days crashing darktable hundreds of times trying to understand what's going on. Here's what I found:

First several corrections. This problem exists in 3.2.1, and probably before. It's just harder to create the conditions to trigger it. @AlicVB lighttable speed ups have just made it much easier to trigger it.

The problem is that a widget's garbage collector gets triggered in some manner that I've not been able to determine. The widget hasn't been destroyed, so it shouldn't be garbage collected. The garbage collector adds a task to destroy the widget. When it runs, it frees the widget memory but it triggers the destroy signal which causes the on_destroy function to run and try and free the same widget memory. Hence the double free or corruption (fasttop) and other errors. In the normal sequence of widget operations, the widget gets a destroy signal, is destroyed, then it's garbage collected. The part of the garbage collector that's causing this crash is never executed during a normal widget life cycle, as far as I can determine (a LOT of testing with lot's of fprintf's).

@ptilopteri pointed me in the right direction to determine which scripts were likely to cause/experience the crash. Interestingly in each script it was the same widget that "caused" the crash time after time. In enfuseAdvanced it was a stack widget and it was a section label in geoToolbox and image_stack. Other than that I haven't been able to determine any relationship between the 3 scripts that cause them to crash.

Proposed solution: The best solution would be fixing whatever is triggering the widget's garbage collector. However, since I haven't been able to figure that out yet, the next best solution is to fix the crash. I propose commenting out the code that is causing the crash and adding an error print statement about trying to garbage collect a widget that hasn't been destroyed. I'll also open an issue about widget garbage collection problems and keep working on this (that's why I want to comment out the code for now instead of removing it).

Thoughts?

I think I've figured out why some scripts don't trigger the widget garbage collector, and thus the crash. The scripts that seem immune have all of their widgets contained in a table. The ones that suffer from the problem declare their scripts as local variables. I think the garbage collector gets confused when the widgets are incorporated into the gui and it may not get recorded as a reference. The script finishes, leaving a callback or a gui which may or may not reference the widgets, so the garbage collector thinks they aren't referenced because the script is finished and tries to reap them. At this time their isn't a way to destroy a widget other than when darktable exits, so the widget garbage collector really doesn't have much to do (but what it did when it tried to reap a widget was wrong and caused a crash).

However, now that I know more about how widgets are destroyed I will look into this. Right now in script_manager when a script is disabled I just mark it not to start the next time darktable starts because I don't have a way to destroy the gui elements. Now I have some ideas about how to go about it I'll pursue it so that script_manager can turn off a script and remove it from the gui.

@ptilopteri fixed in master

@wpferguson and I am now using your script again. tks

Was this page helpful?
0 / 5 - 0 ratings

Related issues

schwerdf picture schwerdf  路  4Comments

lapineige picture lapineige  路  4Comments

Nilvus picture Nilvus  路  5Comments

GrahamByrnes picture GrahamByrnes  路  3Comments

lovesegfault picture lovesegfault  路  3Comments