A Guide to DEBUG
The Microsoft® Windows™ .EXE
DOS Stub Program

Copyright©2004 by Daniel B. Sedory

This page may be freely copied for PERSONAL use ONLY !
( It may NOT be used for ANY other purpose unless you have
first contacted and received permission from the author ! )




Why is there a DOS Stub Program
in a Windows™ Executable?

In the early days of Microsoft® Windows, The Windows™1.x, 2.x and 3.xx OS not only existed in the same volumes as Microsoft® DOS, but also ran on top of an MS-DOS OS. It was not only possible, but very probable that a user might attempt to run some of the Windows® programs under DOS. Therefore, Microsoft® programmers made sure that all Windows® programs would have a simple 16-bit DOS program placed at the front of each Windows executable, alerting the user that it was in fact a Windows® program and could not be run under DOS; and that's all the DOS "Stub" program does.

All the Details of the DOS "Stub" Program

One of the simplest .EXE programs you can run under DEBUG is the so-called DOS "Stub" found inside many Windows® executables. Let's examine one of these in detail. If you open a copy of NOTEPAD.EXE inside a Hex editor (such as FRHED), it will appear similar to this:

Offset   0  1  2  3  4  5  6  7   8  9  A  B  C  D  E  F
 
000000  4d 5a 90 00 03 00 00 00  04 00 00 00 ff ff 00 00  MZ..........ÿÿ..
000010  b8 00 00 00 00 00 00 00  40 00 00 00 00 00 00 00  ¸.......@.......
000020  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  ................
000030  00 00 00 00 00 00 00 00  00 00 00 00 d8 00 00 00  ............Ø...
000040  0e 1f ba 0e 00 b4 09 cd  21 b8 01 4c CD 21 54 68  ..º..´.Í!¸.LÍ!Th
000050  69 73 20 70 72 6F 67 72  61 6D 20 63 61 6E 6E 6F  is program canno
000060  74 20 62 65 20 72 75 6E  20 69 6E 20 44 4F 53 20  t be run in DOS 
000070  6D 6F 64 65 2E 0D 0D 0A  24 00 00 00 00 00 00 00  mode....$.......
( The beginning of NOTEPAD.EXE from Windows™ 2000; 12/7/1999, 5:00AM, 50,960 bytes.)
TABLE 1.

Note the first two bytes, "4d 5a" or their ASCII equivalent: "MZ". Whenever the DOS EXEC function is called to examine a file (anytime you load an .EXE or .COM program into DEBUG 2.0+ for example) and it finds "MZ" as the first two bytes, that file will always be considered an .EXE executable! So, what happens if you enter: debug notepad.exe at the prompt in a DOS-box? Well, the first bytes you'll see when you do a dump command are:

CS:0000  0E 1F BA 0E 00 B4 09 CD-21 B8 01 4C CD 21 54 68  ........!..L.!Th

"Hey, I thought DEBUG always loaded files from the command-line at offset 0100?" Well, if it were a .COM program or any other kind of file, it would. But, in the case of .EXE files, that isn't true. The EXEC function will examine an .EXE file's header area, which among other things, determines the location of its first instruction (CS:IP) and also that of the Stack Pointer (SS:SP). In this case, the header told EXEC to load this code at offset zero and set the IP register to that location as well.

Before proceeding with DEBUG, we should mention that Windows® executables can be very complex programs. When we load NOTEPAD.EXE into DEBUG, its length is given as 50,448 bytes. We already told you that the actual size is 50,960 bytes. From Table 1 above, which shows the actual beginning of the program, you can see the first 40h bytes are not loaded into DEBUG; that's NOTEPAD's DOS .EXE header. But, 50,448 plus 64 (40h) adds up to only 50,512 bytes, appearing to leave 448 bytes unaccounted for. The reason is because the DOS header contains different information about this file than its Windows® header! We told you they were complex! This particular PE (Portable Executable) program header says the file has the following sections and sizes: Stub 216, Header 1320, Image 49152, Overlay 272. Those numbers add up to a file size of 50,960 bytes. Yet the DOS header works out to: Header 28 (not the whole area), Relocations 0, Empty 36, Image 1104, Overlay 49792; which adds up to the same total. At some time in the future, we might create a few pages dealing with all this header information and how to interpret it.

Stepping Through a DOS Stub with DEBUG

The following illustrations show exactly what happened when we stepped through our copy of the NOTEPAD program using the following DEBUG commands (Segment values on your computer will most likely vary from those shown here!):

C:\WINNT>debug notepad.exe
-r

First we enter the R command, to bring up the Registers display!

AX=0000  BX=0000  CX=C510  DX=0000  SP=00B8  BP=0000  SI=0000  DI=0000
DS=0B5C  ES=0B5C  SS=0B6C  CS=0B6C  IP=0000   NV UP EI PL NZ NA PO NC
0B6C:0000 0E            PUSH    CS

Note the CX Register above. This tells us the executable portion of NOTEPAD has a length of C510h (or 50,448) bytes; at least that's how EXEC interpreted the DOS header. But this value cannot be trusted for a complete picture of Windows executables. The Data Segment (DS Register) is 0B5C, Code Segment (CS) is 0B6C and the Instruction Pointer (IP) is at 0000. Each time an instruction is executed, the IP value will change. This first instruction will push the value of the CS Register onto the Stack. After entering the Trace (-t) command, you should see the following:

AX=0000  BX=0000  CX=C510  DX=0000  SP=00B6  BP=0000  SI=0000  DI=0000
DS=0B5C  ES=0B5C  SS=0B6C  CS=0B6C  IP=0001   NV UP EI PL NZ NA PO NC
0B6C:0001 1F            POP     DS

Before continuing, let's take a quick look at the Stack. You can see above that the Stack Pointer (SP) changed from 00B8 to 00B6. Stacks always fill-up (push) and get depleted (pop) in much the same manner as a spring-loaded tray rack at a cafeteria. Once a memory location has been assigned to the first byte in a Stack, every byte added to the Stack will subtract one from the Stack Pointer (SP). In this case, a Word (of two bytes) was added to our Stack. Since the Stack Segment (SS) is set to 0B6C, but our Data Segment is still at 0B5C, we'll do a Dump of b6c:00b6 to b8 here:

-d b6c:00b6 b8
0B6C:00B0                    6C 0B-00                              l..

Note that values which contain more than one byte, such as this Word 0B6Ch, are always stored in Memory with the Least Significant Byte first! Let's carry out another Trace:

AX=0000  BX=0000  CX=C510  DX=0000  SP=00B8  BP=0000  SI=0000  DI=0000
DS=0B6C  ES=0B5C  SS=0B6C  CS=0B6C  IP=0002   NV UP EI PL NZ NA PO NC
0B6C:0002 BA0E00        MOV     DX,000E

The POP instruction moved 0B6C from the Stack to the DS Register, and changed the SP Register back to 00B8. And now that the Data Segment has been changed to the same value as the Code Segment, we can do a Dump of Offset 000Eh (and following) to see why the program wants to load that value into the DX (Data) Register. Enter the command "d 0e 38" and you should see:

-d 0e 38
0B6C:0000                                            54 68                 Th
0B6C:0010  69 73 20 70 72 6F 67 72-61 6D 20 63 61 6E 6E 6F   is program canno
0B6C:0020  74 20 62 65 20 72 75 6E-20 69 6E 20 44 4F 53 20   t be run in DOS
0B6C:0030  6D 6F 64 65 2E 0D 0D 0A-24                        mode....$

We already knew that the string data would end with a "$" sign, so went ahead and used offset 38h as the last location for the Dump command. These are the ASCII bytes and the characters they represent (shown on the right-side of the display). Although many non-displayable bytes are shown as 'dots' in the ASCII part of DEBUG's Dump display, a "2Eh" byte (shown in light blue above) is the real ASCII value for a period (punctuation character). The yellow 'dots' show the non-displayable characters, 0Dh and 0Ah, which are a Line Feed and Carriage Return, repectively. We'll comment on the 24h byte below. Yet another Trace (-t) command gives us:

AX=0000  BX=0000  CX=C510  DX=000E  SP=00B8  BP=0000  SI=0000  DI=0000
DS=0B6C  ES=0B5C  SS=0B6C  CS=0B6C  IP=0005   NV UP EI PL NZ NA PO NC
0B6C:0005 B409          MOV     AH,09
-t

Before you carry out the next instruction, you need some information: INT 21h executes DOS Interrupts; in this case, Function 09h (because AH=09). You should never use the Trace command on Interrupts! (Unless you really do want to attempt stepping through all of the MS-DOS code that comprises one.) Basically, Function 09 of INT 21, will print out a string of characters (at an offset pointed to by the DS:DX registers), until it encounters a 24h ("$") byte. After entering the Proceed command, you should see the string displayed on your screen as follows:

AX=0900  BX=0000  CX=C510  DX=000E  SP=00B8  BP=0000  SI=0000  DI=0000
DS=0B6C  ES=0B5C  SS=0B6C  CS=0B6C  IP=0007   NV UP EI PL NZ NA PO NC
0B6C:0007 CD21          INT     21
-p
This program cannot be run in DOS mode.
AX=0924  BX=0000  CX=C510  DX=000E  SP=00B8  BP=0000  SI=0000  DI=0000
DS=0B6C  ES=0B5C  SS=0B6C  CS=0B6C  IP=0009   NV UP EI PL NZ NA PO NC
0B6C:0009 B8014C        MOV     AX,4C01

This is yet another DOS Interrupt (INT 21h) in the making... Function 4Ch (AH=4C) is the standard "Exit" (Terminate) code with Return (AL=return value; 01 in this case). By now, you should see that it's very important to obtain a list of all the Interrupts! Look for the link to Ralf Brown's (Free) Interrupt Listing on our Assembly page.

-t

AX=4C01  BX=0000  CX=C510  DX=000E  SP=00B8  BP=0000  SI=0000  DI=0000
DS=0B6C  ES=0B5C  SS=0B6C  CS=0B6C  IP=000C   NV UP EI PL NZ NA PO NC
0B6C:000C CD21          INT     21
-p

Program terminated normally
-q

As you can see, the "Program terminated normally" and we Quit the DEBUG session.

There are variations of the "DOS Stub" program in existence. Basically they depend upon which software company made the compiler that was used to create a Windows® program. For example, the string displayed by a program which used Borland's tlink32 compiler, should state: "This program must be run under Win32." when run under a real 16-bit DOS or in DEBUG.

 


Last Update: October 12, 2004. (12.10.2004)

A Guide to DEBUG

The Starman's Realm Assembly Page



 

 

 

Hosted by uCoz