Automatic bug report and stack traces for GIMP

While I was working on yet-another-crash without a backtrace, I realized that we could just generate automatic backtraces upon crashes and tell people about it. This is how I ended up writing a debug tool for GIMP, popping-up a dialog with a nice text encouraging to report bugs. You’ll notice that the main text is non-technical. The goal is not to display non-understandable error messages which nobody will understand. All the technical part is in the below section and is just to be copied by a single button click and reported to us verbatim. 🙂

This technical part contains: GIMP version (and commit information if available), compiler, main dependency version, and finally the errors and backtraces of these errors.

Note: this doesn’t “report” the bug on your behalf. Anyone still has to make the conscious action and go on our bug tracker. But we make things easier and just a few buttons and a copy-paste away.

Someone asked me if I could make a blog post about it, so here it is.

How does this work?

Used to be based on glib…

We already had some backtracing capability in GIMP, mostly using glib API g_on_error_stack_trace(). The main problems of this API:

that this function outputs to stdout (which means that you needed to run GIMP from terminal to get the trace, and until now this was only used with specific command line options or on unstable builds);
sometimes it was not working for weird reasons;
it works only in Unix-like operating systems (in particular not in Windows);
it is based on gdb only (as I soon discovered)
…

So I ended up looking what this function was doing. As I said, the basics is that it simply uses gdb if it is installed on the machine. I am still unsure why, but it was doing so using the interactive mode, therefore entering commands through the standard input with a pipe. Why is it weird? Because gdb has a batch mode especially done for such non-interactive calls. I suspect actually that some of the times g_on_error_stack_trace() failed to work correctly was maybe because it was stuck (but I am not sure, I have not tried to dig much more, so maybe I say shit). But the worse issue was that it was simply printing to stdout. So if I wanted to get the output inside a string in order to use it in the graphical interface (we should not expect people to run GIMP from a terminal!), I had to do more piping of the output. Well at some point, that was just ridiculous to stack processes one after another after another after…

… then based directly on GDB…

This is how I started to reimplement the feature. I simply run gdb in batch mode, and I keep the result in a string for later display in a dialog. This was actually very straightforward. See commit bb88a2d52f.

This also allowed me to get a slightly better stacktrace since I could customize the command. So I request “backtrace full“, getting us local variable contents.

… and LLDB…

Then I remembered that some bug reporter on macOS was using lldb, the debugger from the LLVM project. Since LLVM is default on macOS, I assumed that LLDB is much more common there than gdb too. So I added support for it. This was quite easy too, I just had to search for command equivalency. See commit 4ca31b0571.

… and finally the GNU libc!

Finally I was told of the backtrace() and backtrace_symbols() API. This seems to be a GNU-only API (man says these are GNU extensions). Anyway this should make these always present on common Linux distributions, which is very good news. It means that we will always get “something” on Linux (also the result is much quicker than calling gdb or lldb). Unfortunately the output of backtrace() is not that exhaustive: basically you get function names, and in particular neither file name nor line number even if you built with debug info, nor variable and parameters contents. So it’s a bit less useful. Yet it’s better than nothing! See commit 4fd1c6c97c.

So in the end, my tool tries gdb, then if absent, lldb, and finally fallbacks to backtrace() if available. This should hopefully gives us traces of crashes and errors in most cases!

The difficulties

Issue 1: do not rely on memory allocation after a crash

There were still a few issues. One of them is that you may notice that I use this dialog for 2 kind of errors: fatal errors (crashes) and non-fatal errors (WARNING, CRITICAL, etc.). l use the same code, but while testing, I realized that I often could not create the dialog from the main process when GIMP crashed. In Linux at least, once the program crashed, I was able to catch the terminating signal enough to do last minute actions, but it seems allocating more memory was not amongst the possible actions (that was my assumption based on tests, I may be wrong, don’t take this for manual talk). Well I guess that makes sense to forbid more memory management, especially if the crash is related to memory bugs. This means that even just creating a new dialog is not possible (requiring allocation of a new GTK+ widget).

This is why when crashing, I run the dialog as a separate process, whereas I run it from within the main process for non-fatal bugs.

Issue 2: backtrace() needs to be run by the main process

When running as a separate process, should the back trace be generated by this other process or from the main process? At first it made sense to have it generated through the new process, but then this has 2 inconveniences:

I am duplicating the back-trace generation code (since I sometimes need to run it from within GIMP, sometimes from outside) and code duplication is never good (even maintenance-wise, you end up with different version. This sucks). You can make common core code as exception, but it’s just not ideal (it makes the build rules complicated).
From the outside process, I can attach to the main process with gdb or lldb but I cannot use backtrace()anymore. That would mean that a lot of people would not get the auto-generated traces (not everyone installs a debugger!).

This is why I decided that the backtrace is always generated by the main process and in case of a crash, it is passed along through a file, instead of a parameter. I could have piped it which would have been just as easy, but Dr. Mingw (see below the Win32 section) was already using a file. So I chose to do the same to be as consistent across platforms as possible (also a file has some advantages: in the extreme case where the dialog breaks too, we may ask a bug reporter to look if a file has still been generated with the info).

Also since — as I said in issue 1 — memory allocations are more likely to fail during crash handling, you need to use backtrace_symbols_fd() instead of backtrace_symbols().
The _fd() variant is guaranteed to run without memory allocation (this is written in the man). And now we have traces on most systems, still with GNUlibc fallback!

Issue 3: error avalanches

Another issue is that, in case of non-fatal errors, you may often have a few of them one after another. Sometimes they may be generated as dominos (you get the second as a consequence of the first error), sometimes it’s because of long-running operations which would just reproduce the same errors many times.

Worse case scenario: a long-time contributor, Massimo, directed me to a bug which would output dozens of thousands of errors in a few seconds. Actually that depends on the size of a selection, and in some of my tests, I had hundreds of thousands of errors!
Obviously you don’t want to create a dialog each time (this example was not even a bug which crashes GIMP, but creating hundreds of thousands of dialogs may do the killing job!). So you have to just update the current dialog with additional errors. But even doing so is very time consuming. Updating a dialog hundreds of thousands of times in a few seconds is at least likely to freeze the whole GUI for a dozen of minutes (I know, I tried!).

So I decided to limit the backtracing, but even the error handling. In a single dialog, I add up to 3 backtraces and 10 errors at most. Any more errors would just be redirected to stderr.

Issue 4: debugging preferences

Moreover do we want the dialog to appear for every kind of errors? In particular, we have WARNING, CRITICAL then all fatal errors. CRITICAL are usually really bad, so we definitely want debug info here. But what about WARNING? I mean, they are bad too, and they are signs of a bug somewhere. But these are more minor bugs, sometimes also bugs on external data which we warn about (and have no control on). Also we often output warnings when we encounter bugs in other software (for instance, one of the recent bugs where my dialog worked was on a bug in KDE’s API for color picking, and there is not much we can do about it in GIMP but report upstream). So I added finer-grained settings, because you certainly don’t want to make creating with GIMP painful if it pop-ups errors every few hours!

Actually it is even possible to disable all debugging through GIMP preferences, even during crashes, if someone is really not interested at all in reporting bugs, hence contributing to GIMP improvement.

Note: on Windows, the debugging preferences page doesn’t exist at all because the backend we use is not customizable anyway. See dedicated section below.

Issue 5: multi-threading

As explained, we don’t only handle crashes, but also runtime errors. Since GEGL is so close to the GIMP project, it made sense to handle its errors as well (actually long-term, it would make sense to handle errors from any dependency, but let’s do it step by step). So I also catch GEGL’s WARNINGs and CRITICALs. But then I realized that since GEGL uses a lot of multi-threading, getting a backtrace from the main thread when the error happened in another was completely useless.

This combined to the fact GTK+ code must be run in the main thread, therefore to create or update my debugging dialog, I need to pass the information from the thread where the bug occurred to the main GTK+ thread. This can be done with gdk_threads_add_idle_full(). This call obviously adds a delay so you’d end up getting traces from the wrong code, and after an unknown delay. This is double useless.

As a consequence, to handle multi-threaded debugging, I needed to make sure that the stack trace was generated from the thread the error happened, without any delay, and only then it could be sent to the main thread with an idle function.

Issue 6: the tweaking

Then you have all these little details to make the experience not too terrible (at least I am not saying we should make it a good experience, a bug is never a good experience! ;P).

For instance handling a crash, I add a button “Restart”, allowing — as the name implies — to at least restart GIMP immediately.

When non-fatal bugs are reported, we should advise people to save their images and restart GIMP (of course, for crashes, they won’t have the possibility to save themselves, so don’t make them sadder by reminding them).

Also I have to be extra careful to not generate new WARNING or CRITICAL from within this code because then you could create cyclic calls. You don’t want to end up crashing the software because of the debugger which initially fired up only for a minor bug.

Well you get the idea! These are the kind of tweaking you just discover as you implement such a system and you have to take care of them as you go on.

Future work

Something we have been discussing would be to save the opened images in backup files upon crashing. Of course with some kind of crashes, it may not be possible, but that is worth trying at least!

I’ve actually started working on it (with commit d916fedf92 from yesterday). As expected, it’s working most of the time, but while testing various crash conditions, I had some cases where last-second backup failed. I have not dived into the code yet to understand why and what, and if there is a solution to these.

GIMP is quite stable now (at least on GNU/Linux), and quite rarely crashes (well I say this but we had some instability these last few days because of core changes in selection and channels so the auto-debug dialog was very useful). But for this one time when it happens, handling it the most gracefully possible implies saving the current state of work. Then obviously next step will be to propose recovery on next GIMP start.

More on this later as I will continue working on it…

What about Windows?

Now the last remaining issue is Win32! Having GDB or LLDB there might be possible (I have not checked) but probably not the best path. It turns out a contributor, Mukund Sivaraman, did already add support for backtrace generation on Windows upon a crash, back in 2015. This is using the ExcHndl library from the Dr. Mingw project. Basically this is extremely easy to use since there are only 2 functions in the API: one to init the library, one to choose a file where the backtrace will be outputted.

void ExcHndlInit(void);
bool ExcHndlSetLogFileNameA(const char *szLogFileName);

So yes, since 2015, backtraces were simply outputted into a file somewhere, and people just never knew where and how to find it. What I did was simply to piggy-back on this feature, grab the backtrace from the generated file, and display it in our GUI. And that’s it!

Since I needed my own code to run after Dr. Mingw, I had a look how this tool actually made its job. In its code, I saw it was using SetUnhandledExceptionFilter()to run its action just before the crash. What I did was adding another exception handler with the same function, but registering my handler first before I init() Dr. Mingw. This way Dr. Mingw call my handler immediately after its own because it keeps track of any handler previously set and call it after itself.
See commits ae3cd00fbd and 4e5a5dbb87.

Now this has a few limitations: the backtrace generated by Dr. Mingw is not that complete compared to a good gdb backtrace. Also sometimes, I had some crashes which this tool would not catch. I am no Win32 expert and did not spend much time on it, so I don’t know why.
Finally this works only on crashes, in particular I cannot generate backtraces on a whim as I can do on other platforms, which allows to generate backtraces even on WARNINGs or CRITICALs messages for easier debugging, even without a crash.

Well in the end, Win32 always ends up less featured and most annoying to debug. I guess there is nothing to be done since I remind we are still looking for Win32 developers on GIMP. We have had very few contributions of Windows developers for all the years I’ve been around, quite sadly! If you are interested to contribute on this cool piece of software, be very welcome!

We got our first reports with automatic traces!

Even though the tool is still only present in the development version, some people build GIMP from master, and we already got a few bug reports with traces included directly! This is very cool.
Actually even Aryeom got such dialogs, which resulted in some bug fixes already (and more to come)! 🙂

So yeah when I fixed my first bugs thanks to these automatically generated back traces, that made me happy because I felt this new tool will make life a lot simpler and I knew my time was well spent. 😉

You’d think a developer of GIMP would not be happy to get a back trace. And yeah, I’d prefer that GIMP was perfectly bug-free. But there is no such things, and as long as we get bugs, we may as well get well-illustrated reports to easily fix them. This is why I am happy! We are constantly on our way to a much more stable GIMP.

Yeah!

Reminder: my Free Software coding can be funded on:
Liberapay, Patreon or Tipeee through ZeMarmot project.

February 5, 2018February 6, 2018

Ibus-Hangul and Compose key: the incredible journey of a simple patch

Today I decided to tell how I reported a bug (then ended up fixing it) on a non-GIMP related project. Well I do regularly this kind of stuff, and this could have just been one more of these silent commits to a random project as I did many times in my life. But since I decided recently to post more articles, well… I may as well tell a story as one-time contributor (as opposed to “regular contributor”) for once!

Also I think the whole process of reporting a bug on projects you don’t know at all — worse! A whole stack of software you don’t know much! — is quite interesting for people wondering how they should report bugs happening to them.

Finally another reason is to advertise a bit my work, since I remind I am trying to get my Free Software coding crowdfunded through ZeMarmot project. I am hopeful these kinds of side-stories highlight how our project is useful for Free Software as a whole, not just GIMP. Because yes, that seems unrelated, but in the same time, if I can be funded to hack Free Software full-time, it means more such unrelated bugs can be fixed! So you can consider that this kind of patch is also funded when you crowdfund ZeMarmot. 😉

The context

My main input is the ibus-hangul (Korean) input engine, and the main languages I write are English (ibus-hangul is basically the same as “English (US)” layout when in LATIN mode, which is my default mode) and French! Sometimes when needed, I can easily switch to writing Korean while keeping the same layout.

To write French, I simply mapped a Compose key on the CapsLock key (which I don’t need). Why not switching to a layout with French characters? Well mostly because I don’t like the default French layout and am used to the US one (I have used it for a few years now), and I am also used to using Compose (I can write quickly enough all French characters with Compose, most people would not know the difference in term of typing speed).
Moreover I don’t mind changing layouts but this can be a bit confusing at times, especially if you need to change your layouts every few minutes. It’s better to just keep the same if possible.

Finally this is also the input which Aryeom uses (her main language being Korean) and she writes both in Korean and in French on daily use.

The route from bug to patch…

Or: how to report a bug … when you know very little about a software stack of dozens of software (really device input goes through so many layers that it’s making me crazy!)

Step 0: the bug!

We were using successfully our setup up until Fedora 25: Korean’s ibus-hangul IME with a Compose key. This Compose key settings was done through GNOME Tweaks Tool.

Then some fateful day, we updated to Fedora 26. And paf! The Compose key stopped working. Oh fateful day!

We hoped that it would be fixed by Fedora 27 (since we updated late to Fedora 26, Fedora 27 release was just a month away), but it didn’t.

Step 1: I click this button and it doesn’t work => GUI problem?

Back then though, I didn’t link the bug to ibus or ibus-hangul yet.

I first tried to track this issue in GNOME Tweaks tool (hoping that might have just been just a GUI bug; by the way, now the bug is fixed and the issue was not in Tweaks, but it seems I cannot close the report in gitlab and nobody is answering! If someone from the project reads me: just close the report, please) since that’s where I was setting the feature.

Of course, my secret hope was to just see other developers take care of it and not having to dive into unknown code. Don’t get me wrong, I’d love to fix every bug I encounter myself, but I also have a limited life span! ;P
Also I am lazy. I hear that’s a good trait by the way, nothing to be ashamed of. So really hoping someone else does the job at your place is my best expectation.

Step 2: going deeper into layers

But then after waiting for more than a month with a broken compose key, I couldn’t stand anymore to write my French texts without accents. So I investigated a bit more by myself and whistled in the wind (as you can see, I still haven’t had a single answer in 3 months on this first bug report. Bit sad ?), looking for which dconf key Tweaks Tool was changing, and seeing that it worked at least here: the dconf key was properly updated so Tweaks was likely not at fault.

So was GNOME the culprit not doing proper action with this dconf value? To decide this, after I understood that this was just a X11 options (the bug also happened on Wayland though), I tried various combinations of trying to set “compose:caps” directly with setxkbmap, or even updating XKBOPTIONS directly inside X11 config files (something I hoped I would never have to do ever again, and for years it seemed like it would hold) and restarting X.

It didn’t work. So I became more certain the problem may be deeper and something was broken in X and Wayland. I asked someone who knows better than I on this whole graphics desktop stack (Thanks Carlos for always helping me out! I know I can be annoying sometimes! ?). He advised me to open a bug report at libxkbcommon, which was used both for X11 and Wayland. So I did.

Step 3: …and deeper…

Continuing my own investigations, I realized that the problem was not happening on every input method, but even stranger, the input order changed the behavior! ?

This is when I was told the order indeed matters and that’s expected, which I find a bit weird since it makes for very particularly unique bugs. But anyway I was not here to argue about these kind of design decision. So I followed libxkbcommon‘s developer’s recommendation and opened another report at xkeyboard-config instead.

Step 4: and back up!

Then I left time pass again for about a month, but once again failed to get an answer, therefore tried to investigate more myself, once again. I reread and thought back about my previous investigations and realized that the broken inputs were all Ibus ones (it turns out I only use Ibus inputs nowadays, mainly the Korean or sometimes the Japanese one). For people wondering what is Ibus, this is the input framework for writing languages complicated enough for which just a layout change would not work (for instance Japanese, Korean, Chinese, etc.). This is when I first started to be on the right path.

But is the issue in Ibus or specifically Ibus-Hangul?
To answer this question, I thought to be clever (I wasn’t necessarily!) by checking Fedora repository contents. There I realized that Ibus-Hangul package didn’t change its version (1.5.0) from Fedora 25 to 27. Since my issue appeared in Fedora 26, I went with the assumption that the broken part was most likely Ibus (which indeed upgraded from Ibus 1.5.14 to 1.5.16). So I opened a bug report for ibus.

This time, I had a very quick response (thanks Takao!), then a few exchanges not only telling me that the issue was indeed triggered by a change in Ibus, yet that was not really a bug in Ibus (well it’s a bit sad that Ibus does not guarantee compatibility, but I can also understand why). Ibus-Hangul had to be updated to follow Ibus changes. Better, the developer of Ibus gave me the necessary hints to update Ibus-Hangul.

Step 5: the patch

And then a few hours later, I was able to clone, read, understand and finally simply patch myself Ibus-Hangul. I immediately installed it for myself and for Aryeom, then I went and did a pull-request on upstream project (⚠ important: never forget to contribute your patches upstream! ⚠ I know, it means a bit more work; sometimes making clean code which will be accepted upstream may even take longer than making the quick-and-dirty fix for just yourself. Yet that’s how the whole software ecosystem will improve! Imagine how bugged Free Software would be if others were not contributing their patches!).

Hopefully my pull request will be accepted soon even though ibus-hangul activity seems a bit slow (yet existing! There were commits 2 months ago). But anyone in our case reading this, you can already build and install with my patch if you need.

This is also a good example to work from if ever you use another Ibus input method with the same problem. That should be quite easy to update all Ibus engines hopefully.

Step 6: it’s never finished!

The issue is mostly fixed, at least on Ibus side apparently. There is still some remnant of issue, which is that my remapped CapsLock was now working both as a CapsLock and a Compose key which is obviously not the expected result! So if I want to write ‘œ’, it would output ‘Œ’ and locks capitalization. That’s not good…

Getting again a bit of help from Takao Fujiware from Ibus project (really thank you a lot!), it seems that the bug is in GNOME this time, since I can work around it by running directly a command line command:

setxkbmap -layout us -option compose:caps

Also I realize that the issue happens even with non-Ibus inputs this time. So somehow it would seem that GNOME may not set properly the option? Let’s wait and see for answers on this report. At least this time, I have a workaround (I even have 2 different workarounds, but let’s not complicate things!)! 🙂

Conclusion

On our beloved GNU/Linux operating systems, the text input situation is mostly ok, though sometimes I must admit I wonder how many people use slightly more exotic input settings since I regularly encounter funny bugs. Well I have lived in Japan for 2 years and in Korea for 1 year, so I also know that Free Software communities are unfortunately less common over there. And among these communities, people also needing a Compose key must be even rarer.

As you may know, I love languages, so I really hope that things will improve regarding any language support. 🙂

As for the whole route from a bug to a patch, it is interesting to see how it took 5 bug reports to finally get to the right project, and a little more than 3 months for a patch written in a few minutes once I knew what this was about. Somehow this is part of the “unseen”, slow, lengthy and boring work done by every contributors out there (even more one-time contributors since they start without the whole package). And no matter how good you may think you are, you are always a newbie somewhere. I personally even consider to be a constant beginner on most topics, even the ones I know best. This makes things easier because I don’t set impossible expectations on myself but also others.

Thanks for reading my story!

Reminder: my Free Software coding can be funded on:
Liberapay, Patreon or Tipeee through ZeMarmot project.