a8f47e409 continuously hanging
running under gdb, outputs:
http://wahoo.no-ip.org/~paka/dt.a8f47e409.gdb.txt
http://wahoo.no-ip.org/~paka/dt.a8f47e409.1.gdb.txt
http://wahoo.no-ip.org/~paka/dt.a8f47e409.2.gdb.txt
http://wahoo.no-ip.org/~paka/dt.a8f47e409.3.gdb.txt
openSUSE Tumbleweed 20200920
NVIDIA GF106 [GeForce GTS 450], 390.138
darktable-3.3.0~git877.f8b51737c
OpenCL loaded but not available
i7 12-core 36GB
w/o gdb
seg fault
double free or corruption (fasttop)
Aborted (core dumped)
coredump @ http://wahoo.no-ip.org/~paka/dt.coredump.a8f47e409.lz4
There is some lua path in backtrace, can you remove all your lua script and test again?
certainly, in fact, I was already testing that as that is the only dt change I have made aside from updating master when available.
I moved luarc out of path.
will report. tks
@wpferguson ping
It appears to be an lua problem
Now what to do to determine the actual problem?
my luarc reads:
require "tools/script_manager"
require "official/generate_image_txt"
require "official/image_path_in_ui"
require "official/import_filter_manager"
require "contrib/ext_editor"
require "contrib/gimp"
Okay, I'm assuming that by pinging me everything ran ok without the luarc file. :-)
First the bad news. I just compiled and ran a8f47e409, with multiple lua scripts running and had no issues.
Now the good(?) news. I have had this same problem happen to me several times with different builds. When I've had it happen, I was usually scrolling through the lighttable. I'm not sure if it really is hung or it just takes a long time to produce a core dump. I've waited it out a couple of times and darktable finally closed.
So, since you have the problem occurring for you let's try a few tests running darktable from a terminal.
start darktable with darktable
Move your configuration directory to
If 2 works, copy the darktablerc file from .old and see if it still works. Then try data.db, and finally library.db.
I've tried troubleshooting this several times. The culprit appears to be a double free on a widget, but I can never determine the widget, what created it, and what freed it. I'm mystified by this bug because nothing has changed with the lua code that keeps showing up as the problem. Possibly it's a latent bug that some other piece of code has caused to become exposed. The only somewhat common thing I've noticed is that it tends to happen after I've used the liquefy module on an image. Not right away, and not always, but I had never seen this bug until I started playing with liquefy.
when it appears to happen to me is switching forward/backward thru a collection in darkroom mode. Do not think I ever noticed in lighttable
explain how to ensure it opens in darkroom mode. Only time I notice this is opening a single image from command-line or another app.
I just did the lua dance, git,clone/... But that was before the hangs...
I don't believe I have ever utilized the "liquefy" module ...
There will be some delay as I need to finish my current set to publish for the High School parents.
fwiw: before doing the lua dance/git clone/... I hand edited luarc and didn't have this problem.
My hand edited luarc:
require "official/yield"
local darktable = require "darktable"
print("Hello World ! darktable LUA speaking :^)")
require "lib/dtutils"
require "lib/dtutils/file"
require "lib/dtutils/log"
require "lib/dtutils/string"
require "lib/dtutils/debug"
require "lib/dtutils/system"
require "official/copy_paste_metadata"
require "official/delete_long_tags"
require "official/delete_unused_tags"
require "official/enfuse"
require "official/generate_image_txt"
require "official/image_path_in_ui"
require "official/import_filter_manager"
dump = darktable.debug.dump
print(dump(darktable.register_import_filter))
require "contrib/LabelsToTags"
require "contrib/copy_attach_detach_tags"
require "contrib/ext_editor"
require "contrib/gimp"
require "contrib/hugin"
require "contrib/passport_guide"
require "contrib/rename-tags"
require "contrib/select_untagged"
I did notice problems with:
"contrib/autostyle"
"contrib/enfuseAdvanced"
require "contrib/image_time.lua"
require "tools/script_manager"
maybe this will help
explain how to ensure it opens in darkroom mode. Only time I notice this is opening a single image from command-line
that's what I wanted you to do
I thought of something else. Empty your luarc file, so that no script is running, but leave the luarc file there. Then see if you can make it crash.
I save my last crashing darktable config directory. I changed back to it, ran darktable scrolled up and down through lighttable and got a crash. Emptied my luarc file, but letf it there and tried again - no crash. Added just script_manager with no scripts enabled - no crash. Started darktable, scrolled back and forth with no crash, enabled enfuseAdvanced, scrolled some more and got a crash. Disabled enfuseAdvanced - no crash. Enabled postsharpen (since it adds gui elements to the exporter also just to see if it had something to do with the exporter) - no crash. So, it seems, at least on my system that enfuseAdvanced is the culprit though I can see no reason why that should be.
@ptilopteri can you try with your original luarc and enfusedAdvanced disabled?
There will be some delay as I need to finish my current set to publish for the High School parents.
I understand, I shoot high school sports...
@wpferguson My original luarc that "worked for me" had enfusedAdvanced disabled:
--require "contrib/enfuseAdvanced"
And I had no problem with it
More explanation. Yesterday I read more about your automagic lua handling and decided
to try it rather than continually having to check scripts for updates, ... And I followed your
posted instructions to the letter except for adding:
require "official/generate_image_txt"
require "official/image_path_in_ui"
require "official/import_filter_manager"
and had no problems.
Would it make sense to return luarc as your script makes it and disable
"contrib/enfuseAdvanced" to see if the crashes return. I can do that and continue to
process the current set.
Sounds good, let's try that.
quick answer and this time it dumped in lighttable, just scrolling thru the collection
corrupted double-linked list
Aborted (core dumped)
luarc:
require "tools/script_manager"
--require "contrib/enfuseAdvanced"
require "official/generate_image_txt"
require "official/image_path_in_ui"
require "official/import_filter_manager"
require "contrib/ext_editor"
require "contrib/gimp"
commenting out all entries in luarc
Okay, I can make it crash anytime with just enfuseAdvanced. I guess now we need to see what other scripts cause it. script_manager worked for me with no problems. I'll try the rest of your list and see if there are others that cause the problem.
does not appear to dump with all entries in luarc commented out.
will re-enable my previous "hand edited" luarc quoted several comments ago.
note: when I was crashing earlier before determined lua problem, I was running your
script as it comes with three added lines:
require "official/generate_image_txt"
require "official/image_path_in_ui"
require "official/import_filter_manager"
and I had not enabled anything. The only enabled scripts were from your
script_manager. And I was crashing.
Which scripts did you enable from script_manager?
none, the only ones enabled were the ones your script enabled
automagically.
The only ones it starts are the ones that were started by it and not disabled when it shut down. It saves it's state in the darktablerc file with entries like lua/script_manager/
I'll also test these against 3.2.1, 3.0, and 2.6.2 and see if they cause crashes there too.
@wpferguson thanks for the help. Late here, back in the morning if you need anything else. tks
Enabled enfuseAdvanced, image_stack, and geo_toolbox and ran them against 3.2.1 with no crashes. So, it appears that it's something introduced to the API, or that affects the API, since 3.2.1.
@AlicVB
git bisect returns
commit 83322221c64cb1b6653944aa2cfce15f285c70b1
Author: AlicVB dev@lnaa.fr
Date: Sun Aug 30 15:42:45 2020 +0200
thumbtable : ensure mouseover is updated after scroll
The most common error I see iscorrupted double-linked list, though there are others including segfaults.
To create the error
require "contrib/enfuseAdvanced"
require "contrib/geoToolbox"
require "contrib/image_stack"
@ptilopteri thanks for your help.
@AlicVB I will look at these 3 scripts and see if there is something special about them that triggers this
As a further test, I commented out the 2 lines of code that were added in the bisected commit, compiled and tested, and no crash. Uncommented them, recompiled and crashed.
I've spent the last couple of days crashing darktable hundreds of times trying to understand what's going on. Here's what I found:
First several corrections. This problem exists in 3.2.1, and probably before. It's just harder to create the conditions to trigger it. @AlicVB lighttable speed ups have just made it much easier to trigger it.
The problem is that a widget's garbage collector gets triggered in some manner that I've not been able to determine. The widget hasn't been destroyed, so it shouldn't be garbage collected. The garbage collector adds a task to destroy the widget. When it runs, it frees the widget memory but it triggers the destroy signal which causes the on_destroy function to run and try and free the same widget memory. Hence the double free or corruption (fasttop) and other errors. In the normal sequence of widget operations, the widget gets a destroy signal, is destroyed, then it's garbage collected. The part of the garbage collector that's causing this crash is never executed during a normal widget life cycle, as far as I can determine (a LOT of testing with lot's of fprintf's).
@ptilopteri pointed me in the right direction to determine which scripts were likely to cause/experience the crash. Interestingly in each script it was the same widget that "caused" the crash time after time. In enfuseAdvanced it was a stack widget and it was a section label in geoToolbox and image_stack. Other than that I haven't been able to determine any relationship between the 3 scripts that cause them to crash.
Proposed solution: The best solution would be fixing whatever is triggering the widget's garbage collector. However, since I haven't been able to figure that out yet, the next best solution is to fix the crash. I propose commenting out the code that is causing the crash and adding an error print statement about trying to garbage collect a widget that hasn't been destroyed. I'll also open an issue about widget garbage collection problems and keep working on this (that's why I want to comment out the code for now instead of removing it).
Thoughts?
I think I've figured out why some scripts don't trigger the widget garbage collector, and thus the crash. The scripts that seem immune have all of their widgets contained in a table. The ones that suffer from the problem declare their scripts as local variables. I think the garbage collector gets confused when the widgets are incorporated into the gui and it may not get recorded as a reference. The script finishes, leaving a callback or a gui which may or may not reference the widgets, so the garbage collector thinks they aren't referenced because the script is finished and tries to reap them. At this time their isn't a way to destroy a widget other than when darktable exits, so the widget garbage collector really doesn't have much to do (but what it did when it tried to reap a widget was wrong and caused a crash).
However, now that I know more about how widgets are destroyed I will look into this. Right now in script_manager when a script is disabled I just mark it not to start the next time darktable starts because I don't have a way to destroy the gui elements. Now I have some ideas about how to go about it I'll pursue it so that script_manager can turn off a script and remove it from the gui.
@ptilopteri fixed in master
@wpferguson and I am now using your script again. tks