Sunday, 20 November 2011

How are command-line parameters obtained? in C programming

How are command-line parameters obtained?

Every time you run a DOS or Windows program, a Program Segment Prefix, or PSP, is created. When the DOS program loader copies the program into RAM to execute it, it first allocates 256 bytes for the PSP, then places the executable in the memory immediately after the PSP. The PSP contains all kinds of information that DOS needs in order to facilitate the execution of the program, most of which do not apply to this FAQ. , there is at least one piece of data in the PSP that does apply here: thecommand line. At offset 128 in the PSP is a single byte that contains the number of characters of the command line. The next 127 bytes contain thecommand line itself. Coincidentally, that is why DOS limits your typing at the DOS prompt to 127 characters—it allocates only that much to hold the command line. Unfortunately, the command-line buffer in the PSP does not contain the name of the executable—it contains only the characters you typed after the executable’s name (including the spaces).

For example, if you type

XCOPY AUTOEXEC.BAT AUTOEXEC.BAK
at the DOS prompt, XCOPY.EXE’s PSP command-line buffer will contain
AUTOEXEC.BAT AUTOEXEC.BAK

assuming that the xcopy program resides in the DOS directory of drive C. It’s difficult to see in print, but  you should note that the space character immediately after the XCOPY word on the command line is also copied into the PSP’s buffer.Another negative side to the PSP is that, in addition to the fact that you cannot find your own program’sname, any redirection of output or input noted on the command line is not shown in the PSP’s commandline buffer. This means that you also cannot know (from the PSP, anyway) that your program’s input or output was redirected.

By now you are familiar with using the argc and argv argument parameters in your C programs to retrieve the information. But how does the information get from the DOS program loader to the argv pointer in your  program? It does this in the start-up code, which is executed before the first line of code in your main() function. During the initial program execution, a function called _setargv() is called. This function copies the program name and command line from the PSP and DOS environment into the buffer pointed to by your main() function’s argv pointer. The _setargv() function is found in the xLIBCE.LIB file, x being S for Small memory model, M for Medium memory model, and L for Large memory model. This library file  is automatically linked to your executable program when you build it. Copying the argument parameters isn’t the only thing the C start-up code does. When the start-up code is completed, the code you wrote in your main() function starts being executed.

OK, that’s fine for DOS, but what about Windows? Actually, most of the preceding description applies to Windows programs as well. When a Windows program is executed, the Windows program loader creates a PSP just like the DOS program loader, containing the same information. The major difference is that the command line is copied into the lpszCmdLine argument, which is the third (next-to-last) argument in yourWinMain() function’s parameter list. The Windows C library file xLIBCEW.LIB contains the start-up function _setargv(), which copies the command-line information into this lpszCmdLine buffer. Again, the x represents the memory model you are using with your program. If you are using QuickC, the start-up code is contained in the xLIBCEWQ.LIB library file.

Although the command-line information between DOS and Windows programs is managed in basically the same way, the format of the command line arrives in your C program in slightly different arrangements. In DOS, the start-up code takes the command line, which is delimited by spaces, and turns each argument into its own NULL-terminated string. You therefore could prototype argv as an array of pointers (char * argv[]) and access each argument using an index value of 0 to n, in which n is the number of arguments in the command line minus one. On the other hand, you could prototype argv as a pointer to pointers (char ** argv) and access each argument by incrementing or decrementing argv.

In Windows, the command line arrives as an LPSTR, or char _far *. Each argument in the command line is delimited by spaces, just as they would appear at the DOS prompt had you actually typed the characters yourself (which is unlikely, considering that this is Windows and they want you to think you are using a Macintosh by double-clicking the application’s icon). To access the different arguments of the Windows command line, you must manually walk across the memory pointed to by lpszCmdLine, separating the arguments, or use a standard C function such as strtok() to hand you each argument one at a time.

If you are adventurous enough, you could peruse the PSP itself to retrieve the command-line information. To do so, use DOS interrupt 21 as follows (using Microsoft C):
#include <stdio.h>
#include <dos.h>
main(int argc, char ** argv)
{
union REGS regs; /* DOS register access struct */
char far * pspPtr; /* pointer to PSP */
int cmdLineCnt; /* num of chars in cmd line */
regs.h.ah = 0x62; /* use DOS interrupt 62 */
int86(0x21, &regs, &regs); /* call DOS */
FP_SEG(pspPtr) = regs.x.bx; /* save PSP segment */
FP_OFF(pspPtr) = 0x80; /* set pointer offset */
/* *pspPtr now points to the command-line count byte */
cmdLineCnt = *pspPtr;
}

It should be noted that in the Small memory model, or in assembly language programs with only one code segment, the segment value returned by DOS into the BX register is your program’s code segment. In the case of Large memory model C programs, or assembly programs with multiple code segments, the valuereturned is the code segment of your program that contains the PSP. After you have set up a pointer to this data, you can use this data in your program.

Cross Reference:

XX.2: Should programs always assume that command-line parameters can be used?

No comments:

Post a Comment