While I was working on yet-another-crash without a backtrace, I realized that we could just generate automatic backtraces upon crashes and tell people about it. This is how I ended up writing a debug tool for GIMP, popping-up a dialog with a nice text encouraging to report bugs. You’ll notice that the main text is non-technical. The goal is not to display non-understandable error messages which nobody will understand. All the technical part is in the below section and is just to be copied by a single button click and reported to us verbatim. 🙂
This technical part contains: GIMP version (and commit information if available), compiler, main dependency version, and finally the errors and backtraces of these errors.
Note: this doesn’t “report” the bug on your behalf. Anyone still has to make the conscious action and go on our bug tracker. But we make things easier and just a few buttons and a copy-paste away.
Someone asked me if I could make a blog post about it, so here it is.
How does this work?
Used to be based on glib…
We already had some backtracing capability in GIMP, mostly using glib API g_on_error_stack_trace()
. The main problems of this API:
- that this function outputs to
stdout
(which means that you needed to run GIMP from terminal to get the trace, and until now this was only used with specific command line options or on unstable builds); - sometimes it was not working for weird reasons;
- it works only in Unix-like operating systems (in particular not in Windows);
- it is based on
gdb
only (as I soon discovered) - …
So I ended up looking what this function was doing. As I said, the basics is that it simply uses gdb
if it is installed on the machine. I am still unsure why, but it was doing so using the interactive mode, therefore entering commands through the standard input with a pipe. Why is it weird? Because gdb
has a batch mode especially done for such non-interactive calls. I suspect actually that some of the times g_on_error_stack_trace()
failed to work correctly was maybe because it was stuck (but I am not sure, I have not tried to dig much more, so maybe I say shit). But the worse issue was that it was simply printing to stdout
. So if I wanted to get the output inside a string in order to use it in the graphical interface (we should not expect people to run GIMP from a terminal!), I had to do more piping of the output. Well at some point, that was just ridiculous to stack processes one after another after another after…
… then based directly on GDB…
This is how I started to reimplement the feature. I simply run gdb
in batch mode, and I keep the result in a string for later display in a dialog. This was actually very straightforward. See commit bb88a2d52f.
This also allowed me to get a slightly better stacktrace since I could customize the command. So I request “backtrace full“, getting us local variable contents.
… and LLDB…
Then I remembered that some bug reporter on macOS was using lldb
, the debugger from the LLVM project. Since LLVM is default on macOS, I assumed that LLDB is much more common there than gdb
too. So I added support for it. This was quite easy too, I just had to search for command equivalency. See commit 4ca31b0571.
… and finally the GNU libc!
Finally I was told of the backtrace()
and backtrace_symbols()
API. This seems to be a GNU-only API (man
says these are GNU extensions). Anyway this should make these always present on common Linux distributions, which is very good news. It means that we will always get “something” on Linux (also the result is much quicker than calling gdb
or lldb
). Unfortunately the output of backtrace()
is not that exhaustive: basically you get function names, and in particular neither file name nor line number even if you built with debug info, nor variable and parameters contents. So it’s a bit less useful. Yet it’s better than nothing! See commit 4fd1c6c97c.
So in the end, my tool tries gdb
, then if absent, lldb
, and finally fallbacks to backtrace()
if available. This should hopefully gives us traces of crashes and errors in most cases!
The difficulties
Issue 1: do not rely on memory allocation after a crash
There were still a few issues. One of them is that you may notice that I use this dialog for 2 kind of errors: fatal errors (crashes) and non-fatal errors (WARNING, CRITICAL, etc.). l use the same code, but while testing, I realized that I often could not create the dialog from the main process when GIMP crashed. In Linux at least, once the program crashed, I was able to catch the terminating signal enough to do last minute actions, but it seems allocating more memory was not amongst the possible actions (that was my assumption based on tests, I may be wrong, don’t take this for manual talk). Well I guess that makes sense to forbid more memory management, especially if the crash is related to memory bugs. This means that even just creating a new dialog is not possible (requiring allocation of a new GTK+ widget).
This is why when crashing, I run the dialog as a separate process, whereas I run it from within the main process for non-fatal bugs.
Issue 2: backtrace() needs to be run by the main process
When running as a separate process, should the back trace be generated by this other process or from the main process? At first it made sense to have it generated through the new process, but then this has 2 inconveniences:
- I am duplicating the back-trace generation code (since I sometimes need to run it from within GIMP, sometimes from outside) and code duplication is never good (even maintenance-wise, you end up with different version. This sucks). You can make common core code as exception, but it’s just not ideal (it makes the build rules complicated).
- From the outside process, I can attach to the main process with
gdb
orlldb
but I cannot usebacktrace()
anymore. That would mean that a lot of people would not get the auto-generated traces (not everyone installs a debugger!).
This is why I decided that the backtrace is always generated by the main process and in case of a crash, it is passed along through a file, instead of a parameter. I could have piped it which would have been just as easy, but Dr. Mingw (see below the Win32 section) was already using a file. So I chose to do the same to be as consistent across platforms as possible (also a file has some advantages: in the extreme case where the dialog breaks too, we may ask a bug reporter to look if a file has still been generated with the info).
Also since — as I said in issue 1 — memory allocations are more likely to fail during crash handling, you need to use backtrace_symbols_fd()
instead of backtrace_symbols()
.
The _fd()
variant is guaranteed to run without memory allocation (this is written in the man
). And now we have traces on most systems, still with GNUlibc
fallback!
Issue 3: error avalanches
Another issue is that, in case of non-fatal errors, you may often have a few of them one after another. Sometimes they may be generated as dominos (you get the second as a consequence of the first error), sometimes it’s because of long-running operations which would just reproduce the same errors many times.
Worse case scenario: a long-time contributor, Massimo, directed me to a bug which would output dozens of thousands of errors in a few seconds. Actually that depends on the size of a selection, and in some of my tests, I had hundreds of thousands of errors!
Obviously you don’t want to create a dialog each time (this example was not even a bug which crashes GIMP, but creating hundreds of thousands of dialogs may do the killing job!). So you have to just update the current dialog with additional errors. But even doing so is very time consuming. Updating a dialog hundreds of thousands of times in a few seconds is at least likely to freeze the whole GUI for a dozen of minutes (I know, I tried!).
So I decided to limit the backtracing, but even the error handling. In a single dialog, I add up to 3 backtraces and 10 errors at most. Any more errors would just be redirected to stderr
.
Issue 4: debugging preferences
Moreover do we want the dialog to appear for every kind of errors? In particular, we have WARNING, CRITICAL then all fatal errors. CRITICAL are usually really bad, so we definitely want debug info here. But what about WARNING? I mean, they are bad too, and they are signs of a bug somewhere. But these are more minor bugs, sometimes also bugs on external data which we warn about (and have no control on). Also we often output warnings when we encounter bugs in other software (for instance, one of the recent bugs where my dialog worked was on a bug in KDE’s API for color picking, and there is not much we can do about it in GIMP but report upstream). So I added finer-grained settings, because you certainly don’t want to make creating with GIMP painful if it pop-ups errors every few hours!
Actually it is even possible to disable all debugging through GIMP preferences, even during crashes, if someone is really not interested at all in reporting bugs, hence contributing to GIMP improvement.
Note: on Windows, the debugging preferences page doesn’t exist at all because the backend we use is not customizable anyway. See dedicated section below.
Issue 5: multi-threading
As explained, we don’t only handle crashes, but also runtime errors. Since GEGL is so close to the GIMP project, it made sense to handle its errors as well (actually long-term, it would make sense to handle errors from any dependency, but let’s do it step by step). So I also catch GEGL’s WARNINGs and CRITICALs. But then I realized that since GEGL uses a lot of multi-threading, getting a backtrace from the main thread when the error happened in another was completely useless.
This combined to the fact GTK+ code must be run in the main thread, therefore to create or update my debugging dialog, I need to pass the information from the thread where the bug occurred to the main GTK+ thread. This can be done with gdk_threads_add_idle_full()
. This call obviously adds a delay so you’d end up getting traces from the wrong code, and after an unknown delay. This is double useless.
As a consequence, to handle multi-threaded debugging, I needed to make sure that the stack trace was generated from the thread the error happened, without any delay, and only then it could be sent to the main thread with an idle function.
Issue 6: the tweaking
Then you have all these little details to make the experience not too terrible (at least I am not saying we should make it a good experience, a bug is never a good experience! ;P).
For instance handling a crash, I add a button “Restart”, allowing — as the name implies — to at least restart GIMP immediately.
When non-fatal bugs are reported, we should advise people to save their images and restart GIMP (of course, for crashes, they won’t have the possibility to save themselves, so don’t make them sadder by reminding them).
Also I have to be extra careful to not generate new WARNING or CRITICAL from within this code because then you could create cyclic calls. You don’t want to end up crashing the software because of the debugger which initially fired up only for a minor bug.
Well you get the idea! These are the kind of tweaking you just discover as you implement such a system and you have to take care of them as you go on.
Future work
Something we have been discussing would be to save the opened images in backup files upon crashing. Of course with some kind of crashes, it may not be possible, but that is worth trying at least!
I’ve actually started working on it (with commit d916fedf92Â from yesterday). As expected, it’s working most of the time, but while testing various crash conditions, I had some cases where last-second backup failed. I have not dived into the code yet to understand why and what, and if there is a solution to these.
GIMP is quite stable now (at least on GNU/Linux), and quite rarely crashes (well I say this but we had some instability these last few days because of core changes in selection and channels so the auto-debug dialog was very useful). But for this one time when it happens, handling it the most gracefully possible implies saving the current state of work. Then obviously next step will be to propose recovery on next GIMP start.
More on this later as I will continue working on it…
What about Windows?
Now the last remaining issue is Win32! Having GDB or LLDB there might be possible (I have not checked) but probably not the best path. It turns out a contributor, Mukund Sivaraman, did already add support for backtrace generation on Windows upon a crash, back in 2015. This is using the ExcHndl library from the Dr. Mingw project. Basically this is extremely easy to use since there are only 2 functions in the API: one to init the library, one to choose a file where the backtrace will be outputted.
void ExcHndlInit(void); bool ExcHndlSetLogFileNameA(const char *szLogFileName);
So yes, since 2015, backtraces were simply outputted into a file somewhere, and people just never knew where and how to find it. What I did was simply to piggy-back on this feature, grab the backtrace from the generated file, and display it in our GUI. And that’s it!
Since I needed my own code to run after Dr. Mingw, I had a look how this tool actually made its job. In its code, I saw it was using SetUnhandledExceptionFilter()
to run its action just before the crash. What I did was adding another exception handler with the same function, but registering my handler first before IÂ init()
Dr. Mingw. This way Dr. Mingw call my handler immediately after its own because it keeps track of any handler previously set and call it after itself.
See commits ae3cd00fbd and 4e5a5dbb87.
Now this has a few limitations: the backtrace generated by Dr. Mingw is not that complete compared to a good gdb
backtrace. Also sometimes, I had some crashes which this tool would not catch. I am no Win32 expert and did not spend much time on it, so I don’t know why.
Finally this works only on crashes, in particular I cannot generate backtraces on a whim as I can do on other platforms, which allows to generate backtraces even on WARNINGs or CRITICALs messages for easier debugging, even without a crash.
Well in the end, Win32 always ends up less featured and most annoying to debug. I guess there is nothing to be done since I remind we are still looking for Win32 developers on GIMP. We have had very few contributions of Windows developers for all the years I’ve been around, quite sadly! If you are interested to contribute on this cool piece of software, be very welcome!
We got our first reports with automatic traces!
Even though the tool is still only present in the development version, some people build GIMP from master, and we already got a few bug reports with traces included directly! This is very cool.
Actually even Aryeom got such dialogs, which resulted in some bug fixes already (and more to come)! 🙂
So yeah when I fixed my first bugs thanks to these automatically generated back traces, that made me happy because I felt this new tool will make life a lot simpler and I knew my time was well spent. 😉
You’d think a developer of GIMP would not be happy to get a back trace. And yeah, I’d prefer that GIMP was perfectly bug-free. But there is no such things, and as long as we get bugs, we may as well get well-illustrated reports to easily fix them. This is why I am happy! We are constantly on our way to a much more stable GIMP.
Reminder: my Free Software coding can be funded on: Liberapay, Patreon or Tipeee through ZeMarmot project.