Memory Layout of A Running Program (Process)

When a program is scheduled to run by the CPU, it allocates memory space for it. Specific part of this allocated memory is used for a specific purpose. Memory layout of a process pictorially represents different segments of the process’s memory to help us visualize what-why-how of the program logic as it runs, the understanding of which enables us to debug the program effectively.

There are six distinct segments in the memory layout of a process, each of which serves a different purpose as depicted in the following diagram.
memory-layout-of-a-process-csea

  1. Text segment: This is the actual code to be executed by the CPU. This area of memory is sharable and multiple instances of the program make use of a common copy of text segment to lower memory requirements. And is usually read-only so the program can’t edit its own code once loaded into the memory.
  2. Initialized data segment: This contains the “global variables” initialized by the programmer.
  3. Uninitialized data segment (bss): As the name suggests this segment contains all the uninitialized global variables, and are initialized to 0 (zero) or NULL pointer before the program begins to execute. This segment is also known as bss (Block Started by Symbol), an old assembly operator used by few old assemblers.
  4. Stack: Stack is a special memory area used by processes to keep track of the flow of execution during function calls. It’s a collection of stack frames, each of which corresponds to a function call.
  5. Heap: Heap is also a special memory area used by processes when they need memory “on the fly”. It’s the most dynamic memory in that chunks of memory allocated, coalesced and merged frequently to effectively manage the free space. When a programmer uses malloc() and friends or new, he is explicitly marking the usage of heap at run time.
  6. Commandline arguments & environment variables: The higher part of memory stores the commandline arguments and other environment variables if required by the program.

Given an object file or executable, you can see the size of each segment. Do note that these are files on the disk which eventually become residents of memory. Consider the following program.

// Memory layout learning
#include<stdio.h>

char banner[] = "Hello World";
int main()
{
   printf("%s\n",banner);
   return 0;
}

To compare the size of different sections, compile and link separately.

$ gcc -c hello.c
$ gcc -o hello hello.o

The size command can be used to list the various sections in the object file (hello.o) and the executable (hello).

$ size hello.o hello
   text	   data	    bss	    dec	    hex	filename
     77	     12	      0	     89	     59	hello.o
   1170	    564	      4	   1738	    6ca	hello

Here data is the combined size of initialized and uninitialized data segments. The dec and hex values represent the total size in decimal and hexadecimal respectively.

The size of the sections of object file can also be obtained by running objdump -h or objdump -x.

$ objdump -h hello.o

hello.o:     file format elf64-x86-64

Sections:
Idx Name          Size      VMA               LMA               File off  Algn
  0 .text         00000015  0000000000000000  0000000000000000  00000040  2**0
                  CONTENTS, ALLOC, LOAD, RELOC, READONLY, CODE
  1 .data         0000000c  0000000000000000  0000000000000000  00000058  2**3
                  CONTENTS, ALLOC, LOAD, DATA
  2 .bss          00000000  0000000000000000  0000000000000000  00000064  2**0
                  ALLOC
  3 .comment      00000035  0000000000000000  0000000000000000  00000064  2**0
                  CONTENTS, READONLY
  4 .note.GNU-stack 00000000  0000000000000000  0000000000000000  00000099  2**0
                  CONTENTS, READONLY
  5 .eh_frame     00000038  0000000000000000  0000000000000000  000000a0  2**3
                  CONTENTS, ALLOC, LOAD, RELOC, READONLY, DATA

Think?!
size and objdump report different sizes for the text segment. Can you guess where the discrepancy comes from? Hint: How big is the discrepancy? See anything of that length in the source code?

Stack Implementation in C

In computer science, a stack is an abstract data type that serves as a collection of elements, with two principal operations: push, which adds an element to the collection, and pop, which removes the most recently added element that was not yet removed. The order in which elements come off a stack gives rise to its alternative name, LIFO (for last in, first out). Additionally, a peek operation may give access to the top without modifying the stack.

The name “stack” for this type of structure comes from the analogy to a set of physical items stacked on top of each other, which makes it easy to take an item off the top of the stack, while getting to an item deeper in the stack may require taking off multiple other items first.

Considered as a linear data structure, or more abstractly a sequential collection, the push and pop operations occur only at one end of the structure, referred to as the top of the stack. This makes it possible to implement a stack as a singly linked list and a pointer to the top element.

A stack may be implemented to have a bounded capacity. If the stack is full and does not contain enough space to accept an entity to be pushed, the stack is then considered to be in an overflow state. The pop operation removes an item from the top of the stack.

/*
 * C program to implement stack. Stack is a LIFO data structure.
 * Stack operations: PUSH(insert operation), 
                     POP(Delete operation)
 *                   and Display stack.
 */

#include <stdio.h>
#define MAXSIZE 5
 
struct stack
{
    int stk[MAXSIZE];
    int top;
};

typedef struct stack STACK;
STACK s;
 
void push(void);
int  pop(void);
void display(void);
 
void main()
{
    int choice;
    int option = 1;
    s.top = -1;
 
    printf ("STACK OPERATION\n");
    while (option)
    {
        printf ("--------------------------------\n");
        printf ("      1    -->    PUSH          \n");
        printf ("      2    -->    POP           \n");
        printf ("      3    -->    DISPLAY       \n");
        printf ("      4    -->    EXIT          \n");
        printf ("--------------------------------\n");
 
        printf("Enter your choice\n");
        scanf("%d", &choice);
        switch (choice)
        {
        case 1:
            push();
            break;
        case 2:
            pop();
            break;
        case 3:
            display();
            break;
        case 4:
            return;
        }
        fflush(stdin);
        printf("Do you want to continue(Type 0 or 1)?\n");
        scanf("%d", &option);
    }
}

/*  Function to add an element to the stack */
void push()
{
    int num;
    if (s.top == (MAXSIZE - 1))
    {
        printf ("Stack is Full\n");
        return;
    }
    else
    {
        printf("Enter the element to be pushed\n");
        scanf("%d", &num);
        s.top = s.top + 1;
        s.stk[s.top] = num;
    }
    return;
}

/*  Function to delete an element from the stack */
int pop()
{
    int num;
    if(s.top == - 1)
    {
        printf("Stack is Empty\n");
        return(s.top);
    }
    else
    {
        num = s.stk[s.top];
        printf ("poped element is = %dn", s.stk[s.top]);
        s.top = s.top - 1;
    }
    return(num);
}

/*  Function to display the status of the stack */
void display()
{
    int i;
    if (s.top == -1)
    {
        printf("Stack is empty\n");
        return;
    }
    else
    {
        printf("\n The status of the stack is \n");
        for(i = s.top; i >= 0; i--)
        {
            printf("%d\n", s.stk[i]);
        }
    }
    printf("\n");
}
OUTPUT
$ ./a.out
STACK OPERATION
------------------------------------------
      1    -->    PUSH
      2    -->    POP
      3    -->    DISPLAY
      4    -->    EXIT
------------------------------------------
Enter your choice
1
Enter the element to be pushed
34
Do you want to continue(Type 0 or 1)?
0
$ a.out
STACK OPERATION
------------------------------------------
      1    -->    PUSH
      2    -->    POP
      3    -->    DISPLAY
      4    -->    EXIT
------------------------------------------
Enter your choice
1
Enter the element to be pushed
34
Do you want to continue(Type 0 or 1)?
1
------------------------------------------
      1    -->    PUSH
      2    -->    POP
      3    -->    DISPLAY
      4    -->    EXIT
------------------------------------------
Enter your choice
2
poped element is = 34
Do you want to continue(Type 0 or 1)?
1
------------------------------------------
      1    -->    PUSH
      2    -->    POP
      3    -->    DISPLAY
      4    -->    EXIT
------------------------------------------
Enter your choice
3
Stack is empty
Do you want to continue(Type 0 or 1)?
1
------------------------------------------
      1    -->    PUSH
      2    -->    POP
      3    -->    DISPLAY
      4    -->    EXIT
------------------------------------------
Enter your choice
1
Enter the element to be pushed
50
Do you want to continue(Type 0 or 1)?
1
------------------------------------------
      1    -->    PUSH
      2    -->    POP
      3    -->    DISPLAY
      4    -->    EXIT
------------------------------------------
Enter your choice
1
Enter the element to be pushed
60
Do you want to continue(Type 0 or 1)?
1
------------------------------------------
      1    -->    PUSH
      2    -->    POP
      3    -->    DISPLAY
      4    -->    EXIT
------------------------------------------
Enter your choice
3
 
The status of the stack is
60
50
 
Do you want to continue(Type 0 or 1)?
1
------------------------------------------
      1    -->    PUSH
      2    -->    POP
      3    -->    DISPLAY
      4    -->    EXIT
------------------------------------------
Enter your choice
4

Compile and Run C Program From Within VIM

For those of us who write C programs in Vim, nothing could be more frustrating than to leave the editor (:x, :wq, shift+ZZ) to compile and run the source code we are working on. Here are a few productivity-enhancing tips in this regard.

1. “!”
The bang “!” symbol allows the execution of arbitrary commands inside the Vim editor. To compile and run the current source code file run,

:! gcc % && ./a.out

Here % is substituted by the currently opened file. Vim displays the output in a blank shell screen until we press Enter. Forget not to save the source file (:w) before you compile since Vim runs the command on the file, not on the buffer content.

vim-bang-command-gcc

In order to compile a specific file,

:!gcc somecode.c -o somecode && ./somecode

2. “make”
The Vim :make command can be used to build the source code from the Makefile. The simplest Makefile would contain,

program
        gcc hello.c

We can write a Makefile that encompasses our entire project.

3. “keymap”
The best solution is to associate an hotkey to compile-and-run just like in an IDE. Open .vimrc (create one if you don’t have) and add,

map <F8> :w <CR> :!gcc % -o %< && ./%< <CR>

Here “:w” saves the current file, and “%<” is the file name without the extension. Now every time you hit F8, Vim compiles the source code, creates the object file (without any file extension) and runs it for us.

Fuzzing

Fuzzing is an anomaly-triggering attempt on an application by providing it with random, often invalid data. The idea is to force the application to crash through unexpected inputs in order to detect potential memory leaks and other hard-to-find bugs in the application. Fuzzing is tangential to traditional software functional testing in that it deliberately focuses on the failing cases of the software, the root-cause analysis of which greatly enhances the robustness of the software. In this post, let’s pickle-taste the principles and general concepts of fuzzing before we can chew the deeper aspects of it in the coming weeks.

What to Fuzz? — The Inputs

fuzzing-inputs-of-a-program

The fuzzing vectors of a program are nothing but its input vectors. The input to a program can be from,

  1. Console: User interacting with the program through a terminal window
  2. File: Program reading the file containing arguments, settings and other inputs
  3. Network: Program making use of a well-defined network protocol
  4. API: Several ready-made libraries utilized by the program
  5. Devices: Data coming from I/O and sensors

All these possible data input paths to an application can be fuzzed independently or in combination to stress-test the application’s input validating, bounds checking, and self-defending capability.

Types of Fuzzers

A fuzzer can be categorized as follows:

  1. A fuzzer can be generation-based or mutation-based depending on whether inputs are generated from scratch or by modifying existing inputs,
  2. A fuzzer can be dumb or smart depending on whether it is aware of input structure, and
  3. A fuzzer can be white, grey, or blackbox depending on whether it is aware of program structure.

Reuse of existing input seeds

A mutation-based fuzzer leverages an existing corpus of seed inputs during fuzzing. It generates inputs by modifying (or rather mutating) the provided seeds. For example, when fuzzing the image library libpng, the user would provide a set of valid PNG image files as seeds while a mutation-based fuzzer would modify these seeds to produce semi-valid variants of each seed. The corpus of seed files may contain thousands of potentially similar inputs. Automated seed selection (or test suite reduction) allows to pick the best seeds in order to maximize the total number of bugs found during a fuzz campaign.

A generation-based fuzzer generates inputs from scratch. For instance, a smart generation-based fuzzer takes the input model that was provided by the user to generate new inputs. Unlike mutation-based fuzzers, a generation-based fuzzer does not depend on the existence or quality of a corpus of seed inputs.

Some fuzzers have the capability to do both, to generate inputs from scratch and to generate inputs by mutation of existing seeds.

Aware of input structure

Typically, fuzzers are used to generate inputs for programs that take structured inputs, such as a file, a sequence of keyboard or mouse events, or a sequence of messages. This structure distinguishes valid input that is accepted and processed by the program from invalid input that is quickly rejected by the program. What constitutes a valid input may be explicitly specified in an input model. Examples of input models are formal grammars, file formats, GUI-models, and network protocols. Even items not normally considered as input can be fuzzed, such as the contents of databases, shared memory, environment variables or the precise interleaving of threads. An effective fuzzer generates semi-valid inputs that are “valid enough” so that they are not directly rejected from the parser and “invalid enough” so that they might stress corner cases and exercise interesting program behaviours.

A smart (model-based, grammar-based, or protocol-based) fuzzer leverages the input model to generate a greater proportion of valid inputs. For instance, if the input can be modelled as abstract syntax tree, then a smart mutation-based fuzzer would employ random transformations to move complete subtrees from one node to another. If the input can be modelled by a formal grammar, a smart generation-based fuzzer would instantiate the production rules to generate inputs that are valid w.r.t. the grammar. However, generally the input model must be explicitly provided which is difficult when it is proprietary, unknown, or very complex. If a large corpus of valid and invalid inputs are available, a grammar induction technique, such as Angluin‘s L* algorithm would be able to generate an input model.

A dumb fuzzer does not require the input model and can thus be employed to fuzz a wider variety of programs. For instance, AFL is a dumb mutation-based fuzzer that modifies a seed file by flipping random bits, by substituting random bytes with “interesting” values, and by moving or deleting blocks of data. However, a dumb fuzzer might generate a lower proportion of valid inputs and stress the parser code rather than the main components of a program. The disadvantage of dumb fuzzers can be illustrated by means of the construction of a valid checksum for a cyclic redundancy check (CRC). A CRC is an is an error-detecting code that ensures that the integrity of the data contained in the input file is preserved during transmission. A checksum is computed over the input data and recorded in the file. When the program processes the received file and the recorded checksum does not match the re-computed checksum, then the file is rejected as invalid. Now, a fuzzer that is unaware of the CRC is unlikely to generate the correct checksum. However, there are attempts to identify and re-compute a potential checksum in the mutated input, once a dumb mutation-based fuzzer has modified the protected data.

Aware of program structure

Typically, a fuzzer is considered more effective if it achieves a higher degree of code coverage. The rationale is, if a fuzzer does not exercise certain structural elements in the program, then it is also not able to reveal bugs that are hiding in these elements. Some program elements are considered more critical than others. For instance, a division operator might cause a division by zero error, or a system call may crash the program.

A blackbox fuzzer treats the program as a black box and is unaware of internal program structure. For instance, a random testing tool that generates inputs at random is considered a blackbox fuzzer. Hence, a blackbox fuzzer can execute several hundred inputs per second, can be easily parallelized, and can scale to programs of arbitrary size. However, blackbox fuzzers may only scratch the surface and expose “shallow” bugs. Hence, there are attempts to develop blackbox fuzzers that can incrementally learn about the internal structure (and behavior) of a program during fuzzing by observing the program’s output given an input. For instance, LearnLib employs active learning to generate an automaton that represents the behavior of a web application.

A whitebox fuzzer leverages program analysis to systematically increase code coverage or to reach certain critical program locations. For instance, SAGE leverages symbolic execution to systematically explore different paths in the program. If the program’s specification is available, a whitebox fuzzer might leverage techniques from model-based testing to generate inputs and check the program outputs against the program specification. A whitebox fuzzer can be very effective at exposing bugs that hide deep in the program. However, the time used for analysis (of the program or its specification) can become prohibitive. If the whitebox fuzzer takes relatively too long to generate an input, a blackbox fuzzer will be more efficient. Hence, there are attempts to combine the efficiency of blackbox fuzzers and the effectiveness of whitebox fuzzers.

A greybox fuzzer leverages instrumentation rather than program analysis to glean information about the program. For instance, AFL and libFuzzer utilize lightweight instrumentation to trace basic block transitions exercised by an input. This leads to a reasonable performance overhead but informs the fuzzer about the increase in code coverage during fuzzing, which makes greybox fuzzers extremely efficient vulnerability detection tools.

C Tricky Question: Print “Hello World” Without Using Any Semicolon In The Program

Unlike a natural language such as English, a formal language such as C adheres to strict code of conduct in which no redundancy, ambiguity or word-play exists. Every statement in C must terminate with a semicolon for it to be syntactically correct. However, the conditional statements such as “if” contains two parts:

  1. Header — where the truth-test of the logical expression is evaluated
  2. Body — statement(s) to be executed should the logical expression evaluates to true (non-zero)

Considering if itself as a micro-function, only its body need to be terminated with semi-colon. Neither its header nor its closing brace should have a semi-colon. This is the key to answer the aforementioned question. Just embed the print statement inside the if’s logical condition and leave its body statement-less.

/*Program to print "Hello World" or any message 
without using any “;” in the program*/

#include <stdio.h>
void main()
{
    if(printf("Hello World\n"))
    { }
}

c-no-semicolon-hello-world-output