Navigation Bar

17 January 2012

[C++] - The duty of Compiler and Linker on EXE file creation


1. Introduction

We know that a project delivers the output as exe files. I mean, there are other forms also like dll, ocx etc., but here in this write-up I will discuss about the exe. The EXE stands for executable and that means the operating system with the help of the processor executes the instruction specified in the Exe file. This exe is a binary file and it also has information required by the Operating system in the file header. In the meantime its majority of the content is for the Processor of the machine that runs it.

2. Exe File headers

The header of the exe file contains information required for the operating system. OS knows how much memory to allocate in the data segment and how much code segment memory is required to load the instruction set packed inside the executable from this header. Well. As already told in the blogs, data segment is responsible for the entire program scoped constants, global variables etc. Whereas code segment is responsible for holding the instruction set that a Processor understands. We call this header as Process control Block. You can do a web search to know more about it.

Next to the process control block is the actual exe contents, which is nothing but the instruction set and its data processing in the form of microprocessor’s opcode format. We call this assembly language and that mean it is not that opening an exe file in word processor will reveal the assembly code for you.

3. How the Exe is generated?

Let us consider a simple example. Say, you have exe project, and the project has three header files and three implementation files. After the coding done without any error, you build the project. The build project operation will create the exe output. Have a look at the below picture.


When you build the project the following actions are taken by the development IDE (Say VS2005):

1)      The compiler conducts a pre-processing operation before doing its actual job of compilation. The processing is conducted on the source file; I mean the cpp files. This pre-processing replaces the macros to its content, #include header files, to its content etc.
2)      Once the above said operation is completed for the single file say a.cpp, the compiler starts compiling that file to generate the a.obj file. And this continues till all the cpp files on a specific project (Exe or dll or ocx; whatever it is) is converted to object file.
3)      Now linker comes into picture. The linker understands more robust and compact form of input that comes as object file for each compiler-processed cpp in the previous step. The linker combines all object files and generates the required binary say the exe file in our case.

If you we feed input to the compiler in the form of the cpp programming language and the linker actually generates the output binaries.

If your Solution workspace contains 57 projects with dependency properly set, the above said compilation and linking takes place for each project. When you build such big solution, just sit back and watch the output window. You will see for each project, the build operation displays cpp file processing and at the end the linker will generate the output.

4 Swapping the EXE process

The Operating system reads the information from process control block of the exe file and loads the exe data and instruction sets to the memory location. Once loaded the Operating see your exe as the running process. The processor will execute the instruction set of the all loaded process (Multi-tasking OS like windows). When the memory required to run the new process waiting in the queue is not adequate and the process has high priority the OS swaps the exe content (Instruction set part) and the state (All the global variable values) to the physical disc. The process (Your exe) is not terminated and suspended for some time. Once your process needs the execution of instruction set, OS will take the Image of the process from the disc and keeps that in main memory. This is called swapping. Some people say virtual memory. This is shown below:



In the above picture, processor executes the instruction set from code segment of exe, which allocated on the main memory. When they’re multiple processes to manage the limited availability of the main memory, sometime the OS swaps the exe process to disc. In the above case, Exe process P1, P2 and P3 in Main memory. And P4 and P5 are kept is physical disc as temporarily suspended.


08 January 2012

[C++] - Safe Guard Header file inclusion using Pragma Once


Pragma Once - Introduction

Header files usually have the declarations required by the definitions. Say for example a class template and its layout is specified in the header file. This layout will just have variable and function names with their scopes. The implementation file actually makes use of the variables and links them with the function implementation.


Including the header file

Header files are referred by the implementation files by using the #include pre-processor directive. The file name extension for the header file is .h and some old C style programmer uses .hpp also.  The implementation file with an extension .cpp refers these header files to know the declarations.  So if we have some common declarations required for more than one implementation groups, we do refer the same header file for both the implementation files.

You can include the header file in two different ways.
1) Using the <>
2) Using the “”

Below are the examples for it:

#include "stdafx.h"
#include <conio.h>

In the below section we will see the difference.


Two types of inclusion

The header file specified in between the angle braces <> tells that the header file is part of the c++ libraries. The preprocessor will search for the file first in the IDE specific paths and then in the path specified path the compiler switch /I. This is shown below:

Specifying search path


The project property shows where you can set this /I option.

OK. What about the other option, which is using the header between double quotes? In that case, the compiler will search for the current directory of the file that has the #include statement. When the referred header file is not found, then it searches the file like it did for the <> braces.


Multiple inclusions Problem

The #include is the pre-processor and the statement is replaced by the file content before the actual compilation takes place. This leads to a situation of getting declaration or definition twice in a file say ab.cpp when is includes a header file that includes the one more header file which used by the file ab.cpp already.

To explain this, let us go with a simple example.

SimpleMath.h

Below is the content of the file and no explanation is required her as it is a two simple function declaration and its implementation.

int Add_Numbers(int, int);
int Mult_Numbers(int, int);

int Add_Numbers(int a, int b)
{
            return (a + b);
}

int Mult_Numbers(int a, int b)
{
            return (a * b);
}


ExtendedMath.h

In this header file, the basic mathematic function to add two numbers is extended to support adding three numbers. In this file, the function to add three numbers is implemented by making use of the function that adds two numbers, which is already defined in the SimpleMath.h. So ExtendedMath.h header file includes the simplemath.h header file. This file content is shown below:

#include "SimpleMath.h"

int Add_three_numbers(int, int, int);

int Add_three_numbers(int a, int b, int c)
{
            return ( Add_Numbers(a, b) + c );
}

Explanation

When the pre-processing (before the compilation) takes place, the #include is replaced with file content by the compiler. Also, note that the compiler will not generate any object code for the header file say SimpleMath.obj, extendedmath.obj


CppTest.cpp

Do you got confused in the previous section when I said compiler will not generate any object code for the header file say SimpleMath.obj, extendedmath.obj? And you may have a question that each header file provided some processing that computes multiplication as well as addition. Where is the object code? And How do the linker will create the exe for those functions?

Right! The compiler will generate object code for the CPP files. When the CPP file includes the header files, then the file has the replaced content of the header and that replaced content will go into an object file and linker will generate the exe. Look at the code for the CPP file:

// CPPTST.cpp : Defines the entry point for the console application.
#include "stdafx.h"
#include "SimpleMath.h"
#include "ExtendedMath.h"
#include <conio.h>
int _tmain(int argc, _TCHAR* argv[])
{
            return 0;
}

When you compile the above file or the project, which has this cpp file, you will get the error shown below:
C2084 Compiler Error


Why?

As I already told, #include is the pre-processor directive, and the compiler will replace the content of header file before processing the cpptest.cpp. When it does the file looks like as shown below (I am skipping the stdafx.h, conio.h):

int Add_Numbers(int, int);
int Mult_Numbers(int, int);

int Add_Numbers(int a, int b)
{
            return (a + b);
}

int Mult_Numbers(int a, int b)
{
            return (a * b);
}

int Add_Numbers(int, int);
int Mult_Numbers(int, int);

int Add_Numbers(int a, int b)
{
            return (a + b);
}

int Mult_Numbers(int a, int b)
{
            return (a * b);
}

int Add_three_numbers(int, int, int);

int Add_three_numbers(int a, int b, int c)
{
            return ( Add_Numbers(a, b) * c );
}
int _tmain(int argc, _TCHAR* argv[])
{

            return 0;
}

The first set of code in red color is the content of simplemath.h
The second set of code in Green Normal font style is also the content of simplemath.h (Note that extendedmath.h includes the simplemath.h)
The stuff is green bold is actual content of the extendedmath.h
The final Red bold text is the original content of the CPP file.
Now once all the #include s are replaced, the compiler check return an error and now I no need to explain that error as you can understand it now.


Avoiding multiple inclusions – A

We can avoid these multiple inclusion problems in two different ways. First, we will look at the conditional inclusion of pre-processor statements. To do this we should use the #define in combination with #ifndef.  This is shown below. The #define statement informs the compiler that marks a macro called TAG and knows that it is defined or set for use. So the header file content is placed in between the #ifndef and #endif with a very first statement that defines the macro tested by the #ifndef.

Safe Guarding the Header file


First, think that compiler comes to header file when it is referred by the source file(.cpp). Keeping that in mind, now follow the description given below for the above illustration:
1) The compiler first checks that the TAG is not already defined.
2) When it is already defined none of the header file content is included in the referring source file
3) When it is not defined, it first defines the preprocessor tag TAG. The scope of this TAG is until the generation of object file for a referring (#include ‘ing source file)

Now look at the change for SimpleMath.h header file as well as the ExtendedMath.h header file.

#ifndef _SIMPLEMATH_H__
#define _SIMPLEMATH_H__

int Add_Numbers(int, int);
int Mult_Numbers(int, int);

int Add_Numbers(int a, int b)
{
            return (a + b);
}

int Mult_Numbers(int a, int b)
{
            return (a * b);
}
#endif

==

#ifndef _EXTMATH_H_
#define _EXTMATH_H_
#include "SimpleMath.h"

int Add_three_numbers(int, int, int);
int Add_three_numbers(int a, int b, int c)
{
            return ( Add_Numbers(a, b) * c );
}

#endif

Now the compilation goes successful as the #include SimpleMath.h inside the ExtendedMath.h is skipped by the compiler when it is referred from the CppTest.Cpp


Avoiding multiple inclusions – B

Oh! That’s simple. You no need to use the #define, #ifndef & #endif set of directives. Instead of that, we should just use #pragma once pre-processor directive. And it will take care of everything. Below is sample for extendedmath.h

#pragma once
#include "SimpleMath.h"
int Add_three_numbers(int, int, int);

int Add_three_numbers(int a, int b, int c)
{
            return ( Add_Numbers(a, b) * c );
}


Closing notes

When going through the example you may be thinking, why we included SimpleMath.h and ExtendedMath.h in our Cpp file? Without using any pre-processor macro I can solve the problem just by including the extendedmath.h.

If you thought like that, I can say you are right. That definitely that solves the problem with a cost of the person who makes use of our simple and extended math should have the knowledge of the header file. Also, designing a header file not considering multiple inclusions is not a good habit for c++ coders.