Note: this documentation is in beta format right now. I plan to expand on it sooner rather than later. Considering it is a very important point when doing industrial applications, I thought it should be covered now.
As programs grow larger and more make use of multi-threaded operations, your chance of creating bugs also grows, usually faster than the size of your program. Here, I'm going to make a suggestion on how I cope with these bugs and prevent them from shutting down an application that is required to run 24/7 in a critical use case, such as industrial controls.
The key to robust programming is to control for all exceptions, not to mention the simple things, like using structured programming, state machines and general good design practice. It never hurts to use tools, like the excellent AQtime, to check for memory misuse and leaks. But with all the options that you normally support in your program there will always be the one you didn't fully test, not to mention all the interactions possible between threads. If you program high performance applications, like I do, you will find that AQtime places a heavy load on a multi-threaded application and slows things down so much that complete testing may not be practical.
Now suppose you've tried to do everything right and still something goes wrong. Believe me, give your program to a customer and something will always go wrong. If only we could write programs for programmers who understand bugs happen. But then if something went wrong they would fix it and not tell us, but I digress. Here are the two main things I've found that go wrong:
- You free a pointer to an object and another thread or the GUI uses it. I usually set pointers NULL when I free them to force an exception on use. If you don't do this the other thread or GUI may still run correctly, or incorrectly if the data the pointer referenced is overwritten.
- You free a pointer to an object and the memory it points to is reused by another thread. The GUI accesses it using the old pointer and crashes.
Many other similar cases are possible. They will usually result in an exception and your GUI and all related process threads will disappear from the screen. If you are lucky, you might see an error dialog right before the GUI disappears, but usually it just goes away without warning.
Fixing Exceptions on Win32
Exceptions are fairly easy to fix on Win32 when programming in C++. Simply turn on structured exceptions in the C++ compiler, include the windows.h and eh.h headers in your program and use the _set_se_translator function to set an exception trap function. You can then throw a normal C++ exception from the trap function, catch it and take the proper action which doesn't need to be termination of your program. You need to do this for each thread and also for each major part of GUI code you execute. Miss anything and goodbye program.
On Qt you must make sure you catch exceptions in all signals called in your GUI code. If you don't, the framework will complain that you are trying to handle an exception from the event loop and will still terminate your program.
Fixing Exceptions on Linux
Doing this in Linux is much more involved. Basically, you need to trap signals (that is Linux process signals, not Qt signals), but they normally only allow you to terminate your program. Actually taking control of the saved process exception state, changing the IP register to point to your trap function, generating an ordinary exception and catching it is very much undocumented. There is documentation out there on how to do this, but you really need to search for it. It is basically an undocumented kernel function. When I get some more time I plan to solve this problem, but for now it will need to wait.