Using GDB to Diagnose a Crash
< Newer Topic
:: Older Topic >
Pages:<< prev 1 next >>
#1 Mar 8, 2010 3:53 am
Last edited Mar 8, 2010 3:53 am by Samson
Black Hand
GroupAdministrators
Posts3,722
JoinedJan 1, 2002
Ok. So you're fiddling around one day with the latest and greatest nifty new feature for your MUD. You've labored for hours adding the code, playing with it to get things just right. You've compiled it, and GCC didn't raise any complaints. You're home free.... except... wait? What the hell does it mean "Segmentation fault (core dumped)" and why won't the MUD boot!
Chances are at some point in your coding career you'll be greeted with this dreadful scenario. All of us have been there at one time or another. All of us know what it feels like to scratch your head wondering what happened.
My background in coding is primarily with Smaug muds, and specifically with the AFKMud project. I've had my fair share of things go wrong over the years and I'm no stranger to core dumps. I also find it's best to cover these things with real examples, so I'll share one I just caused in my own code today. In order for GDB to provide you with meaningful information, you need to make sure your MUD has been compiled to provide debug information. This is generally done with the -g parameter. I tend to stick with -g2 or better. This will usually be found on one of the flag lines in your Makefile.
We're in the process of moving AFKMud to use C++ code, and some of you may be aware of pitfalls involved. I just got done making descriptors into a class and am still shaking things down. Lo and behold, I reboot, run a command, and am greeted with:
Uh oh, looks like I fubared something. The first thing you need to do when a core dump happens is determine where your core file is. With Smaug, the core file will usually end up in your area directory. So you'll need to go there. Change into your area directory, and you should type something like:
In my case, AFKMud moves the core to the same directory as the source code, so I would do this:
Upon doing so, I am greeted by a whole bunch of output:
Wow. Ok. So it's basically told me it loaded all of the symbols for everything the MUD uses. What does all that mean? Generally not a great deal. Everything above where it says #0 is system libraries you won't need to worry about. It's the stuff after that you need to pay attention to.
So now you have a general idea of what caused the problem. Something in the compressEnd() function did something it wasn't supposed to do. This however is generally not enough information to go on. You probably want to know what led up to this problem. So with that in mind, you'll want to trace the history of what caused this. Fortunately GDB makes that easy with the bt, or backtrace
command.
The backtrace will be listed in reverse, starting with the first function the MUD called, and ending with the last one it was in when it crashed. In this case, it began in main() and ended in compressEnd(). So why did it do this? You find that out by entering the stack "frames", or functions, and asking it what certain things were at the time. So in this case, we'll check frame 1,
which is in descriptor.c on line 136:
You see here the call to compressEnd(), ok, that's not enough info yet.
Lets look at the call that killed it, in frame 0:
Aha, this is a hint - something in features.c on line 158 is amiss. Start checking this line methodically. Begin by asking it what "mccp" was equal to at the time:
This tells you that the "mccp" portion of the call was NULL, 0x0 stands for NULL, basically the absence of any data. Nothing, zero, zilch, etc. In this particular case, telling us that the structure which holds the data for this person's mccp_data is empty. It hasn't been initialized. Attempting to access NULL data in any way will result in a crash, which is what happened.
Now that you know what happened, lets exit GDB.
You should return to a shell prompt. It's time to go fix your bug and try again.
Hopefully this article has proven useful. There are more advanced things you can do with GDB, but this should cover the basics of investigating a crash after the fact.
In reference, this is the code which crashed:
And this is what fixes it:
Noting in the second version that we verify mccp isn't NULL before ending compression.
For a much more in-depth article on the use of GDB which also covers things like setting breakpoints and debugging while the game is running, check out Nick Gammon's GDB Guide.
Chances are at some point in your coding career you'll be greeted with this dreadful scenario. All of us have been there at one time or another. All of us know what it feels like to scratch your head wondering what happened.
My background in coding is primarily with Smaug muds, and specifically with the AFKMud project. I've had my fair share of things go wrong over the years and I'm no stranger to core dumps. I also find it's best to cover these things with real examples, so I'll share one I just caused in my own code today. In order for GDB to provide you with meaningful information, you need to make sure your MUD has been compiled to provide debug information. This is generally done with the -g parameter. I tend to stick with -g2 or better. This will usually be found on one of the flag lines in your Makefile.
We're in the process of moving AFKMud to use C++ code, and some of you may be aware of pitfalls involved. I just got done making descriptors into a class and am still shaking things down. Lo and behold, I reboot, run a command, and am greeted with:
[samson@boralis: ~/Alsherok/src] Segmentation fault (core dumped)
Uh oh, looks like I fubared something. The first thing you need to do when a core dump happens is determine where your core file is. With Smaug, the core file will usually end up in your area directory. So you'll need to go there. Change into your area directory, and you should type something like:
gdb -c core ../src/smaug
In my case, AFKMud moves the core to the same directory as the source code, so I would do this:
[samson@boralis: ~/Alsherok/src] gdb -c core afkmud
Upon doing so, I am greeted by a whole bunch of output:
GNU gdb Red Hat Linux (5.3post-0.20021129.18rh)Copyright 2003 Free Software Foundation, Inc.GDB is free software, covered by the GNU General Public License, and you arewelcome to change it and/or distribute copies of it under certain conditions.Type "show copying" to see the conditions.There is absolutely no warranty for GDB. Type "show warranty" for details.This GDB was configured as "i386-redhat-linux-gnu"...Core was generated by `../src/afkmud 9500'.Program terminated with signal 11, Segmentation fault.Reading symbols from /lib/libcrypt.so.1...done.Loaded symbols for /lib/libcrypt.so.1Reading symbols from /usr/lib/libz.so.1...done.Loaded symbols for /usr/lib/libz.so.1Reading symbols from /lib/libdl.so.2...done.Loaded symbols for /lib/libdl.so.2Reading symbols from /usr/lib/libstdc++.so.5...done.Loaded symbols for /usr/lib/libstdc++.so.5Reading symbols from /lib/tls/libm.so.6...done.Loaded symbols for /lib/tls/libm.so.6Reading symbols from /lib/libgcc_s.so.1...done.Loaded symbols for /lib/libgcc_s.so.1Reading symbols from /lib/tls/libc.so.6...done.Loaded symbols for /lib/tls/libc.so.6Reading symbols from /lib/ld-linux.so.2...done.Loaded symbols for /lib/ld-linux.so.2Reading symbols from /lib/libnss_files.so.2...done.Loaded symbols for /lib/libnss_files.so.2Reading symbols from /lib/libnss_nisplus.so.2...done.Loaded symbols for /lib/libnss_nisplus.so.2Reading symbols from /lib/libnsl.so.1...done.Loaded symbols for /lib/libnsl.so.1Reading symbols from /lib/libnss_dns.so.2...done.Loaded symbols for /lib/libnss_dns.so.2Reading symbols from /lib/libresolv.so.2...done.Loaded symbols for /lib/libresolv.so.2#0 0x080fafa5 in descriptor_data::compressEnd() (this=0x8a5a2f0) at features.c:158158 if( !mccp->out_compress )
Wow. Ok. So it's basically told me it loaded all of the symbols for everything the MUD uses. What does all that mean? Generally not a great deal. Everything above where it says #0 is system libraries you won't need to worry about. It's the stuff after that you need to pay attention to.
So now you have a general idea of what caused the problem. Something in the compressEnd() function did something it wasn't supposed to do. This however is generally not enough information to go on. You probably want to know what led up to this problem. So with that in mind, you'll want to trace the history of what caused this. Fortunately GDB makes that easy with the bt, or backtrace
command.
(gdb) bt#0 0x080fafa5 in descriptor_data::compressEnd() (this=0x8a5a2f0) at features.c:158#1 0x080f2634 in ~descriptor_data (this=0x8a5a2f0) at descriptor.c:136#2 0x0814ece7 in rent_adjust_pfile(char*) (argument=0x8a573df "******" at rent.c:1542#3 0x0814fcc9 in rent_update() () at rent.c:2108#4 0x0814530f in do_pfiles (ch=0x8a0b570, argument=0xbfffbf8c "tar -cf ../player/pfiles.tar ../player/*" at pfiles.c:550#5 0x08113e61 in interpret(char_data*, char*) (ch=0x8a0b570, argument=0xbfffd9f6 "" at interp.c:907#6 0x080dc9e9 in game_loop() () at comm.c:785#7 0x080dd651 in main (argc=2, argv=0x8a0bcf8) at comm.c:1233#8 0x40154758 in __libc_start_main () from /lib/tls/libc.so.6Current language: auto; currently c++
The backtrace will be listed in reverse, starting with the first function the MUD called, and ending with the last one it was in when it crashed. In this case, it began in main() and ended in compressEnd(). So why did it do this? You find that out by entering the stack "frames", or functions, and asking it what certain things were at the time. So in this case, we'll check frame 1,
which is in descriptor.c on line 136:
(gdb) frame 1#1 0x080f2634 in ~descriptor_data (this=0x8a5a2f0) at descriptor.c:136136 compressEnd( );
You see here the call to compressEnd(), ok, that's not enough info yet.
Lets look at the call that killed it, in frame 0:
(gdb) frame 0#0 0x080fafa5 in descriptor_data::compressEnd() (this=0x8a5a2f0) at features.c:158158 if( !mccp->out_compress )
Aha, this is a hint - something in features.c on line 158 is amiss. Start checking this line methodically. Begin by asking it what "mccp" was equal to at the time:
(gdb) print mccp$1 = (mccp_data *) 0x0
This tells you that the "mccp" portion of the call was NULL, 0x0 stands for NULL, basically the absence of any data. Nothing, zero, zilch, etc. In this particular case, telling us that the structure which holds the data for this person's mccp_data is empty. It hasn't been initialized. Attempting to access NULL data in any way will result in a crash, which is what happened.
Now that you know what happened, lets exit GDB.
(gdb) quit
You should return to a shell prompt. It's time to go fix your bug and try again.
Hopefully this article has proven useful. There are more advanced things you can do with GDB, but this should cover the basics of investigating a crash after the fact.
In reference, this is the code which crashed:
descriptor_data::~descriptor_data(){ close( descriptor ); DISPOSE( host ); delete [] outbuf; DISPOSE( pagebuf ); STRFREE( client ); compressEnd( ); DISPOSE( mccp );}
And this is what fixes it:
descriptor_data::~descriptor_data(){ close( descriptor ); DISPOSE( host ); delete [] outbuf; DISPOSE( pagebuf ); STRFREE( client ); if( mccp != NULL ) compressEnd( ); DISPOSE( mccp );}
Noting in the second version that we verify mccp isn't NULL before ending compression.
For a much more in-depth article on the use of GDB which also covers things like setting breakpoints and debugging while the game is running, check out Nick Gammon's GDB Guide.
Pages:<< prev 1 next >>