Buffer Overflows, how to find and fix them

This little write up is about how to recognize, diagnose and fix memory bounds violations (otherwise know as the dreaded buffer overflow, BO or BOF). First, how to recognize this problem: When you have a portion of your program that suddenly starts to show bugs or crash and you didn’t make any changes to it that you feel could possibly cause the problem, this is a sign. If you put in debug statements or make small changes to the program (anywhere) and the problem goes away as mysteriously as it arose, this is a sign. If the program runs fine (i.e., does not crash) but your data output is suddenly corrupted, this is a sign. If the problem is present in the release version (optimized, without debug info) but not present in the debug version, this is a sign. If, when stepping through the code with your debugger, you cannot reproduce the problem, this is a sign. A little explanation for why you can see this effect: when you recompile the changed code you potentially shift around the layout of your code and data. Since the compiler places padding all over the place to maximize the CPU's ability to read data into cache and registers (or, in the case of debug mode, to track program state), it is unpredictable (unless you want to study how your compiler works) where the overwrite can happen.

String/character array buffer overflows often ‘trash the stack’ resulting in program crash. However, an astute hacker/cracker may be able to take advantage of this stack trashing to take control of your program and execute arbitrary commands (a very BAD THING). Less commonly seen are bounds violations on other types of data (typically arrays of these, though it is possible to screw up memory surrounding single instances of a data type with some low-level routines (like memset)), but the end result is exactly the same: corruption of the memory outside the allocated bounds of the object. The reasons for running off the end of your block of memory tend to be miss counting the number of elements in your array (the pernicious off-by-one error so common due to the array index being zero-based and programmers counting things one-based) or using functions/methods that are not capable of detecting bounds violations (like scanf (if improperly used) and gets (which, btw, should NEVER be used! It CANNOT be made safe)). Most common are overflows, though it is possible to underflow as well. This shifts where the error(s) is (are) going to manifest as an underflow will corrupt variables allocated just prior to the start address of the variable where the bounds violation occurs while overflows hammer variables allocated just after the end address. Things can get more complex to diagnose if you are using (and trashing) dynamic memory as you need to be a very astute debugger to know where your variables are actually stored (if you are that experienced, you won’t be reading this article because you will have made this mistake so many times that fixing it becomes a reflex). You can, generally with underflows, though massive overflows can cause the same thing, corrupt the pointers that malloc/new use to track the memory allocated. When that happens you generally get a program crash and looking at the core will give you all sorts of wild and apparently meaningless errors.

The best way to look for bounds violations (other than getting someone else to review your code) is to set the variable that is getting trashed to a known value, then check its contents after every single statement in a debugger (yes, very tedious, but that is why they pay us the big bucks: to fix the problems we created in the first place). When you find the statement that causes the memory to change, you have found your overflow. Fix it and likely your problem will go away. A word of caution: this is a very common mistake even amongst experienced programmers. If you have one, you almost certainly have several; you need to look for all places where you write data into buffers. The vast majority of the offences are using gets and s/scanf, since they do not (by default, in the case of scanf) take a buffer size as an argument, and array writes that are off by one. There are some tools that can help you, but it builds character to trace them down and fix them yourself. I have, a couple of times, just thrown out code I was working on and started all over with a clean slate because I knew that somewhere I had made this mistake. Tracing it can be a nightmare, and even if you use tools they can provide too much information or even miss the problem entirely.

As a side note: In an effort to plug holes from poor programming (mostly for buffer overflows) a lot of compilers are making use of so-called 'canary' values (based on the 'canary in a coal mine' idea looking for methane gas) that are set before and after the initial stack space. If these canary values are stepped on after a return from a function/method call then the program will throw an exception (which, btw, can be caught) and quit as it is presuming that there has been tampering with the stack. Since this is a relatively new concept (a few years or so in production) there is a chance that the compiler has written bad code, but if you suspect you have a buffer overflow and stepped on the stack yourself you are 99.99% likely to be correct.

Keith (mitakeet) Oxenrider
March 30, 2005
koxenrider[at]sol[dash]biotech[dot]com
http://sol-biotech.com/code/BufferOverflows.html