A
Guide to DEBUG
The Microsoft® Windows .EXE
DOS Stub
Program
Copyright©2004 by Daniel B. Sedory
This page
may be freely copied for PERSONAL use ONLY !
( It may NOT be used for ANY other purpose unless you have
first contacted and received permission
from the author ! )
In
the early days of Microsoft® Windows, The Windows1.x, 2.x and
3.xx OS not only existed in the same volumes as Microsoft® DOS, but also
ran on top of an MS-DOS OS. It
was not only possible, but very probable that a user might attempt to run some
of the Windows® programs under DOS. Therefore, Microsoft® programmers
made sure that all Windows® programs would have a simple 16-bit DOS program
placed at the front of each Windows executable, alerting the user that it was
in fact a Windows® program and could not be run under DOS; and that's all
the DOS "Stub" program does.
One of the simplest .EXE
programs you can run under DEBUG is the so-called DOS "Stub"
found inside many Windows® executables. Let's examine one of these in detail.
If you open a copy
of NOTEPAD.EXE inside a Hex editor (such as FRHED), it will appear similar to
this:
Offset 0 1 2 3 4 5 6 7 8 9 A B C D E F 000000 4d 5a 90 00 03 00 00 00 04 00 00 00 ff ff 00 00 MZ..........ÿÿ.. 000010 b8 00 00 00 00 00 00 00 40 00 00 00 00 00 00 00 ¸.......@....... 000020 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 000030 00 00 00 00 00 00 00 00 00 00 00 00 d8 00 00 00 ............Ø... 000040 0e 1f ba 0e 00 b4 09 cd 21 b8 01 4c CD 21 54 68 ..º..´.Í!¸.LÍ!Th 000050 69 73 20 70 72 6F 67 72 61 6D 20 63 61 6E 6E 6F is program canno 000060 74 20 62 65 20 72 75 6E 20 69 6E 20 44 4F 53 20 t be run in DOS 000070 6D 6F 64 65 2E 0D 0D 0A 24 00 00 00 00 00 00 00 mode....$....... TABLE 1. |
Note the first two bytes, "4d 5a" or their ASCII equivalent: "MZ". Whenever the DOS EXEC function is called to examine a file (anytime you load an .EXE or .COM program into DEBUG 2.0+ for example) and it finds "MZ" as the first two bytes, that file will always be considered an .EXE executable! So, what happens if you enter: debug notepad.exe at the prompt in a DOS-box? Well, the first bytes you'll see when you do a dump command are:
CS:0000 0E 1F BA 0E 00 B4 09 CD-21 B8 01 4C CD 21 54 68 ........!..L.!Th
"Hey, I thought DEBUG always loaded files from the command-line at offset 0100?" Well, if it were a .COM program or any other kind of file, it would. But, in the case of .EXE files, that isn't true. The EXEC function will examine an .EXE file's header area, which among other things, determines the location of its first instruction (CS:IP) and also that of the Stack Pointer (SS:SP). In this case, the header told EXEC to load this code at offset zero and set the IP register to that location as well.
Before proceeding with DEBUG, we should mention that Windows® executables can be very complex programs. When we load NOTEPAD.EXE into DEBUG, its length is given as 50,448 bytes. We already told you that the actual size is 50,960 bytes. From Table 1 above, which shows the actual beginning of the program, you can see the first 40h bytes are not loaded into DEBUG; that's NOTEPAD's DOS .EXE header. But, 50,448 plus 64 (40h) adds up to only 50,512 bytes, appearing to leave 448 bytes unaccounted for. The reason is because the DOS header contains different information about this file than its Windows® header! We told you they were complex! This particular PE (Portable Executable) program header says the file has the following sections and sizes: Stub 216, Header 1320, Image 49152, Overlay 272. Those numbers add up to a file size of 50,960 bytes. Yet the DOS header works out to: Header 28 (not the whole area), Relocations 0, Empty 36, Image 1104, Overlay 49792; which adds up to the same total. At some time in the future, we might create a few pages dealing with all this header information and how to interpret it.
The following illustrations show exactly what happened when we stepped through our copy of the NOTEPAD program using the following DEBUG commands (Segment values on your computer will most likely vary from those shown here!):
C:\WINNT>debug
notepad.exe
-r
First we enter the R command, to bring up the Registers display!
AX=0000 BX=0000 CX=C510 DX=0000 SP=00B8 BP=0000 SI=0000 DI=0000 DS=0B5C ES=0B5C SS=0B6C CS=0B6C IP=0000 NV UP EI PL NZ NA PO NC 0B6C:0000 0E PUSH CS
Note the CX Register above. This tells us the executable portion of NOTEPAD has a length of C510h (or 50,448) bytes; at least that's how EXEC interpreted the DOS header. But this value cannot be trusted for a complete picture of Windows executables. The Data Segment (DS Register) is 0B5C, Code Segment (CS) is 0B6C and the Instruction Pointer (IP) is at 0000. Each time an instruction is executed, the IP value will change. This first instruction will push the value of the CS Register onto the Stack. After entering the Trace (-t) command, you should see the following:
AX=0000 BX=0000 CX=C510 DX=0000 SP=00B6 BP=0000 SI=0000 DI=0000 DS=0B5C ES=0B5C SS=0B6C CS=0B6C IP=0001 NV UP EI PL NZ NA PO NC 0B6C:0001 1F POP DS
Before continuing, let's take a quick look at the Stack. You can see above that the Stack Pointer (SP) changed from 00B8 to 00B6. Stacks always fill-up (push) and get depleted (pop) in much the same manner as a spring-loaded tray rack at a cafeteria. Once a memory location has been assigned to the first byte in a Stack, every byte added to the Stack will subtract one from the Stack Pointer (SP). In this case, a Word (of two bytes) was added to our Stack. Since the Stack Segment (SS) is set to 0B6C, but our Data Segment is still at 0B5C, we'll do a Dump of b6c:00b6 to b8 here:
-d b6c:00b6 b8 0B6C:00B0 6C 0B-00 l.. |
Note that values which contain more than one byte, such as this Word 0B6Ch, are always stored in Memory with the Least Significant Byte first! Let's carry out another Trace:
AX=0000 BX=0000 CX=C510 DX=0000 SP=00B8 BP=0000 SI=0000 DI=0000 DS=0B6C ES=0B5C SS=0B6C CS=0B6C IP=0002 NV UP EI PL NZ NA PO NC 0B6C:0002 BA0E00 MOV DX,000E
The POP instruction moved 0B6C from the Stack to the DS Register, and changed the SP Register back to 00B8. And now that the Data Segment has been changed to the same value as the Code Segment, we can do a Dump of Offset 000Eh (and following) to see why the program wants to load that value into the DX (Data) Register. Enter the command "d 0e 38" and you should see:
-d 0e 38 0B6C:0000 54 68 Th 0B6C:0010 69 73 20 70 72 6F 67 72-61 6D 20 63 61 6E 6E 6F is program canno 0B6C:0020 74 20 62 65 20 72 75 6E-20 69 6E 20 44 4F 53 20 t be run in DOS 0B6C:0030 6D 6F 64 65 2E 0D 0D 0A-24 mode....$ |
We already knew that the string data would end with a "$" sign, so went ahead and used offset 38h as the last location for the Dump command. These are the ASCII bytes and the characters they represent (shown on the right-side of the display). Although many non-displayable bytes are shown as 'dots' in the ASCII part of DEBUG's Dump display, a "2Eh" byte (shown in light blue above) is the real ASCII value for a period (punctuation character). The yellow 'dots' show the non-displayable characters, 0Dh and 0Ah, which are a Line Feed and Carriage Return, repectively. We'll comment on the 24h byte below. Yet another Trace (-t) command gives us:
AX=0000 BX=0000 CX=C510 DX=000E SP=00B8 BP=0000 SI=0000 DI=0000 DS=0B6C ES=0B5C SS=0B6C CS=0B6C IP=0005 NV UP EI PL NZ NA PO NC 0B6C:0005 B409 MOV AH,09 -t
Before you carry out the next instruction, you need some information: INT 21h executes DOS Interrupts; in this case, Function 09h (because AH=09). You should never use the Trace command on Interrupts! (Unless you really do want to attempt stepping through all of the MS-DOS code that comprises one.) Basically, Function 09 of INT 21, will print out a string of characters (at an offset pointed to by the DS:DX registers), until it encounters a 24h ("$") byte. After entering the Proceed command, you should see the string displayed on your screen as follows:
AX=0900 BX=0000 CX=C510 DX=000E SP=00B8 BP=0000 SI=0000 DI=0000 DS=0B6C ES=0B5C SS=0B6C CS=0B6C IP=0007 NV UP EI PL NZ NA PO NC 0B6C:0007 CD21 INT 21 -p This program cannot be run in DOS mode. |
AX=0924 BX=0000 CX=C510 DX=000E SP=00B8 BP=0000 SI=0000 DI=0000 DS=0B6C ES=0B5C SS=0B6C CS=0B6C IP=0009 NV UP EI PL NZ NA PO NC 0B6C:0009 B8014C MOV AX,4C01
This is yet another DOS Interrupt (INT 21h) in the making... Function 4Ch (AH=4C) is the standard "Exit" (Terminate) code with Return (AL=return value; 01 in this case). By now, you should see that it's very important to obtain a list of all the Interrupts! Look for the link to Ralf Brown's (Free) Interrupt Listing on our Assembly page.
-t AX=4C01 BX=0000 CX=C510 DX=000E SP=00B8 BP=0000 SI=0000 DI=0000 DS=0B6C ES=0B5C SS=0B6C CS=0B6C IP=000C NV UP EI PL NZ NA PO NC 0B6C:000C CD21 INT 21 -p Program terminated normally -q
As you can see, the "Program terminated normally" and we Quit the DEBUG session.
There are variations of the "DOS Stub" program in existence. Basically they depend upon which software company made the compiler that was used to create a Windows® program. For example, the string displayed by a program which used Borland's tlink32 compiler, should state: "This program must be run under Win32." when run under a real 16-bit DOS or in DEBUG.