Saturday 14 February 2009

Designing Good Error Handling

Usability is an important subject in software design. Your application can be the coolest thing since sliced bread, if it's difficult to use, people will just not bother. We all know that. There are lots of books and articles on usability. However, such material often forgets about one important aspect of usability: error handling and messages. When your application encounters an error, whether it is due to something the user did or to a bug, it is extremely important to ensure it behaves in a way that makes sense to the user, even more important than for normal behaviour. This is because when an error occurs, your user will typically be stopped in his tracks and asked to resolve a problem he wasn't expecting to be faced with. If the error handling adds to the confusion rather than help, the user will not know what to do and his confidence about the application and his own ability to understand it will significantly diminish. So here are a couple of example of what not to do, helpfully provided over the past week by applications I use at work.

Cryptic error message

We use Borland StarTeam at work to manage change and configuration. When starting the application earlier this week, I was faced with the following error message:

Borland StarTeam error message consisting of a single zero

Yes, the error message consists of a single zero. Difficult to do any shorter or any more cryptic. At that point, I am given two options: a Details button and an OK one. Assuming the Details button may lead me to a more understable explanation, I clicked it and it showed me the following message:

Borland StarTeam detailed error message showing a Java exception

This is absolutely meaningless for anybody who is not a Java developer. And even for those who are, it doesn't really help, it only shows that there is a bug in the product where an internal array is not checked for size before accessing its first entry. It doesn't tell me whether the application will keep working or whether it will horribly crash when I click OK. It so happens that clicking OK made the message box disappear and the application kept working normally after that.

So, how can we improve on such a message? The best way in my opinion would be to display a message to the user explaining that an internal problem has occurred in the application and how they can report it to Borland. What is displayed in the details screen would be invaluable to the developers in order to correct the bug. So directions on how to send that information to the product team would make the user feel that the company cares about the qulity of their product and that they are ready to receive bug reports. Not all users would actually go through the steps of reporting the bug but some would.

Lack of information

Another error I was faced with this week came from the automated backup tool. This tool, which is installed on all laptops will continually backup changes in the user's folder to a central server. It runs in the background, is relatively unobtrusive and flexible enough that you can select sub-directories that you don't want to have backed up. However, it sometimes require your attention and will pop up a message. I was faced with this one yesterday:

Error message saying: Part-complete actions have been found due to an interruption during last sync. Click OK to complete there actions before scanning. Click Cancel to continue with the scan and ignore the part-completed actions

Contrary to the previous example, this message is in plain English and I can understand every word of it. However, it is worded in such a generic way that I have no idea whether I should click OK or Cancel. I don't know what part-completed actions it refers to and therefore don't know whether those actions should be completed or not. I suspect that they should because the default button is the OK one but I can't be sure. Also note that the word scan is misleading: it only makes sense to me because I have a fair idea of how the backup tool works but it could make the user think this is a message from the anti-virus system.

So, what could be done to make it more useful? First, the message should identify what application triggered it. It is important in this case because it is triggered by a background process and is completely unrelated to what the user was doing at the time. Then the obvious thing to do would be to detail what actions are pending and require attention. However, this may not be enough. If the actions' descriptions are too cryptic, that won't help the user make a decision. So the message should also ensure that the descriptions for the pending actions are clear and concise.

Conclusion

Usability is good but it should also go hand in hand with error handling. So next time you design a piece of software, make sure all the error messages that can potentially be displayed to the user are clear and include enough information, especially if the user is expected to take a decision based on the message. If the message is triggered by an unexpected error, consider giving the user a way to report the error so that you can correct the bug in a subsequent version.

No comments: