Users Online

· Guests Online: 1

· Members Online: 0

· Total Members: 185
· Newest Member: meenachowdary055

Forum Threads

Newest Threads
No Threads created
Hottest Threads
No Threads created

Latest Articles

FAQ: C Runtime Environment

FAQ (Frequently Asked Questions) >C Runtime Environment
01 How to Produce Assembly Language Code of a C Program on Linux System02 What are the Limits of One’s Runtime Environment03 What is Function Prologue and Epilogue in a C Program04 What is Stack Frame in C
05 Do Local Variables and Function Prototypes in a C Program Produce any Assembly Code06 Define Different Segments of a C Program’s Address Space07 Is this True that all Variables Declared to be Registers in a C Program Allocated Necessarily in Register Memory08 Which Variable Should be Declared as Registers in a C Program
09 What is Difference Between Frame Pointer and Stack Pointer10 How Stack Frame is Organised and How Protocols for Calling and Returning from Functions are Interpreted11 What Happens During Function Epilogue Phase
01 How to Produce Assembly Language Code of a C Program on Linux System
Question: What is Assembly Language Code and How is it Produced for a C Program on Linux System
Answer: Assembly language code is a close representation for machine code for a given implementation. Therefore assembly language is specific for each architecture type. Assembly instructions begin with mnemonic. A mnemonic is a short string that is usually an abbreviation of the instruction’s function. For example, MULPS is the mnemonic used to represent a ‘MULtiply of Packed Single precision floating-point’ values. Following the mnemonic are zero or more operands. The legal number and types of operands depend on the instruction, but by convention, in AT&T syntax assembly the output operand is always the rightmost operand. An operand may be a register, a memory reference, or a literal value, usually called an immediate. Register operands can be used as input, output, or both and are indicated in assembly by prefixing the register name with the % character. Immediate operands are always used as input and are indicated in assembly by prefixing the literal value with the $ character. Sometimes, the literal value is something that is not known until link time such as the address of a global variable. In assembly, a link-time constant is represented symbolically as shown in the following instruction, which moves the address of x into register EAX.

movl $x, %eax
Operands not prefixed either by ‘%’ or ‘$’ are memory operands. Like register operands, memory operands can be used as input, output, or both.

My Linux runs on ‘Intel x86_64-bit’ architecture and therefore ‘gcc’ generates assembly code for it. In contrast to IA32 ISA (Instruction Set Architecture for Intel’s 32-bit), there have been introduced several new features apart from General Purpose Registers which are now doubled and are all 64-bit registers. Though, however, you can access different sub-sections of each register. For example, register ‘%rax’ is 64-bit and it’s 8-, 16- and 32- bit sub-sections can be accessed as ‘%al’, ‘%ax’ and ‘%eax’. It might be that your machine is different from mine, so, please refer to Instruction Set Manual for your machine to better understand assembly code it generates. Let’s now write a simple C program, generates assembly code and try to understand that,

/* count.c */
#include <stdio.h>

int main(void)
{
int i, j = 0;

for (i = 0; i < 10; i++)
j = j + 1;

return 0;
}
Let’s produce assembly code for above program by executing following command

# gcc -S count.c
Above command when executed, gcc produced assembly code for ‘count.c’ C program into a file ‘count.s’. Notice the suffix “.s” indicates file contains assembly code.

Let’s give our attention to assembly code produced for ‘count.c’ program given below

.file "count.c"
.text
.globl main
.type main, @function
main:
.LFB0:
.cfi_startproc
#pushes the return address on the top of stack
pushq %rbp
#moves current stack pointer into register 'rbp' (base pointer)
movq %rsp, %rbp
#pushes local variable j = 0 to 8 bytes below the base pointer
#and variable i = 0 to 4 bytes
below from base pointer
movl $0, -8(%rbp)
movl $0, -4(%rbp)
#jump to label .L2
jmp .L2
.L3:
#body of for loop; increments both i and j by 1
addl $1, -8(%rbp)
addl $1, -4(%rbp)
.L2:
#for loop comparision takes here
cmpl $9, -4(%rbp)
#jump to lable .L3 if variable i is less than equal to 9
jle .L3
movl $0, %eax
popq %rbp
ret
.cfi_endproc
.LFE0:
.size main, .-main
.ident "GCC: (GNU) 4.6.0 20110428 (Red Hat 4.6.0-6)"
.section .note.GNU-stack,"",@progbits
Notice that “.text” directive causes to enter text segment where all executable instructions are stored. Further, I edited the code by deleting some irrelevant instructions and inserted comments prefixed with ‘#’ character to simplify understanding assembly code for ‘count.c’ program.


Top
02 What are the Limits of One’s Runtime Environment
Question: What are the Limits of One’s Runtime Environment
Answer: To begin with exploring one’s runtime environment, first step is to obtain assembly language listing for your compiler. On Linux system, ‘-S’ compiler option causes compiler to produce assembly code for your C program in a file that has ‘.s’ suffix. For example, say, ‘hello.c’ is C program, then command

# gcc -S hello.c
when executes produces ‘hello.s’ file containing assembly code for ‘hello.c’. Also, you need to interpret and understand assembly code, though it’s not required you must be an expert assembly language programmer. Just you should have basic understanding of what each instruction is doing and how to interpret addressing modes.

Firstly, let’s determine C declarations their corresponding Intel data types, x86-64 sizes and GCC assembler (GAS) suffixes in a table below:

C Declaration Intel data type GAS Suffix x86-64 Size (Bytes)
char Byte b 1
short Word w 2
int Double word l 4
unsigned Double word l 4
long int Quad word q 8
unsigned long Quad word q 8
char * Quad word q 8
float Single precision s 4
double Double precision d 8
long double Extended precision t 16
Let’s put together important features below:

1. Notice that sizes of both long integers and pointers require 8 bytes, as compared to 4 for IA32.
2. Pointers and long integers are 64 bits long. Integer arithmetic operations support 8, 16, 32, and 64-bit data types.
3. The set of general-purpose registers is expanded from 8 to 16. The new registers are %r8 through %r15.
4. Much of the program state is held in registers rather than on the stack. Integer and pointer procedure arguments (up to 6) are passed via registers. Some procedures do not need to access the stack at all.
5. Conditional operations are implemented using conditional move instructions when possible, yielding better performance than traditional branching code.
6. Floating-point operations are implemented using a register-oriented instruction set, rather than the the stack-based support provided by IA32.

Arguments to procedures via registers:

With x86-64, we can pass up to max 6 arguments viz. integers and pointers, to procedures via registers, let’s consider a simple C program with more than 6 arguments being passed to procedure ‘max_reg_parameters()’

void max_reg_parameters(int i, unsigned int ui, long int li,
unsigned long ul, char * p2c, long int *p2li, long double *p2ld)
{
/* manipulations here*/
}
x86-64 implementation of the above C code fragment

1. max_reg_parameters:
2. .cfi_startproc function begins here
3. pushq %rbp callee save
4. movq %rsp, %rbp stack-pointer copied to reg 'rbp'
5. movl %edi, -4(%rbp) int i in reg '%edi'
6. movl %esi, -8(%rbp) unsigned int ui in reg '%esi'
7. movq %rdx, -16(%rbp) long int li in reg '%rdx'
8. movq %rcx, -24(%rbp) unsigned long ul in reg '%rcx'
9. movq %r8, -32(%rbp) p2c in reg '%r8'
10. movq %r9, -40(%rbp) p2li in reg '%r9'
11. popq %rbp restore callee
12. ret function returns
13. .cfi_endproc function ends here
Notice that first six of seven arguments passed to procedure ‘max_reg_parameters()’ were stored in registers. Also notice that int ‘i’ and unsigned int ‘ui’ are stored in 32-bit sub-sections, ‘%edi’ and ‘%esi’ of registers ‘%rdi’ and ‘%rsi’ respectively. long int ‘li’, unsigned long ‘ul’, pointer-to-char ‘p2c’ and pointer-to-long-int ‘p2li’, each is 64-bit in size and are allocated in registers, in sequence, ‘%rdx’, ‘%rcx’, ‘%r8’ and ‘%r9’.

Notice also a very interesting feature of this implementation that there wasn’t created any stack frame. x86-64 ISA allows any program to access up to 128 bytes beyond (lower addresses to) the current value of ‘%rsp’ (stack pointer). Allocation of memory in this region is done by Virtual Memory Management System and this region is called ‘red zone’. Advantage of this technique is to avoid any overheads incurred in allocation and deallocation of stack frame and therefore there’s no need for frame pointer and values on stack can be accessed simply with relative to stack-pointer. Moreover, if arguments being passed to a procedure are less than six, then procedure might not require stack at all.

General Purpose Registers of x86-64 ISA

63 31 15 8 7 0
%rax %eax %ax %ah %al Return value
%rbx %ebx %ax %bh %bl Callee saved
%rcx %ecx %cx %ch %cl 4th argument
%rdx %edx %dx %dh %dl 3rd argument
%rsi %esi %si %sil 2nd argument
%rdi %edi %di %dil 1st argument
%rbp %ebp %bp %bpl Callee saved
%rsp %esp %sp %spl Stack pointer
%r8 %r8d %r8w %r8b 5th argument
%r9 %r9d %r9w %r9b 6th argument
%r10 %r10d %r10w %r10b Callee saved
%r11 %r11d %r11w %r11b Used for linking
%r12 %r12d %r12w %r12b Unused for C
%r13 %r13d %r13w %r13b Callee saved
%r14 %r14d %r14w %r14b Callee saved
%r15 %r15d %r15w %r15b Callee saved
Notice that existing eight registers are extended to 64-bit versions, and eight new registers are added. Each register can be accessed as either 8 bits (byte), 16 bits (word), 32 bits (double word), or 64 bits (quad word).

Top
03 What is Function Prologue and Epilogue in a C Program
Question4: What do Prologue and Epilogue Terms Stand For in Context with Functions in a C Program
Answer: Every functions has three components, function prologue, function body and function epilogue. Function prologue is part of function does all work needed to start up the function. It begins by reserving space for local variables and other values on the runtime stack. Function body is where all the required work is accomplished and function epilogue cleans up the stack frame just before function is done and returns to the caller. For example, let’s consider a simple C function written below and its assembly code following that,

void pro_epi(long x, long y)
{
long z;

z = x + y;
}
x86-64 implementation of the above C function follows

1. pro_epi:
2. .cfi_startproc # pro_epi() begins here
3. pushq %rbp # callee saved
4. movq %rsp, %rbp # stack-pointer copied to '%rbp'
5. movq %rdi, -24(%rbp) # copying x onto stack
6. movq %rsi, -32(%rbp) # copying y onto stack
7. movq -32(%rbp), %rax
8. movq -24(%rbp), %rdx
9. addq %rdx, %rax # adding x and y
10. movq %rax, -8(%rbp)
11. popq %rbp # restoring the callee save
12. ret
13. .cfi_endproc # pro_epi() ends here
Notice in the assembly code produced for the given C function “pro_epi()”, function prologue comprises of instruction nos. 3, 4, 5 and 6 while epilogue component executes instruction 11. Between prologue and epilogue is where instuctions within the function executed.

Notice that register ‘%rbp’ is exclusive to callee save meaning that contents of this must be saved onto stack before using it for temporary holding of values. This must be restored before stack deallocates. There are several callee save registers in x86-64 bit implementation and these are, in sequence, %rbp, %bx, %r10, %r13, %r14 and %r15.

Notice also a very interesting feature of x86-64 bit implementation is that it allows upto max of 6 arguments to be passed to procedures via registers. Therefore when arguments are less in no., as in the example C function above, stack frame isn’t needed at all. Further, any program can access upto 128 bytes beyond but towards lower address to the current value of stack-pointer (%rsp) with relative to stack-pointer.

Top
04 What is Stack Frame in C
Question: What is Stack Frame in C
Answer: As we know that when one function calls other function, a runtime stack is created for the called function for execution of its instructions, holding automatic variables and other values and return address. Prologue component of called function begins by reserving space for local variables and other values and thus creating the stack frame for the called function. Stack frame is the area on stack which function uses to store local variables and other values. For example, consider a C fragment of code below,

void sf_fun()
{ /* declarations */

long a1, a2, a3, a4, a5, a6,
a7, a8, a9, a10;

int *pi1, *pi2, *pi3, *pi4, *pi5,
*pi6, *pi7, *pi8, *pi9, *pi10;

/* assignments */

a1 = 10; a2 = 20; a3 = 30; a4 = 40; a5 = 50;
a6 = 60; a7 = 70; a8 = 80; a9 = 90; a10 = 100;

pi1 = (int *)100; pi2 = (int *)110; pi3 = (int *)120; pi4 = (int *)140,
pi5 = (int *)150; pi6 = (int *)160; pi7 = (int *)170; pi8 = (int *)180,
pi9 = (int *)190; pi10 = (int *)200;

}
And it’s x86-64 bit implementation

1. sf_fun:
2. pushq %rbp # current value of '%rbp' pushed onto stack
3. movq %rsp, %rbp # copied old stack-pointer to '%rbp'
4. subq $40, %rsp # reserving 40 bytes on stack
5. movq $10, -8(%rbp)
6. movq $20, -16(%rbp)
7. movq $30, -24(%rbp)
8. movq $40, -32(%rbp)
9. movq $50, -40(%rbp)
movq $60, -48(%rbp)
movq $70, -56(%rbp)
movq $80, -64(%rbp)
movq $90, -72(%rbp)
movq $100, -80(%rbp)
movq $100, -88(%rbp)
movq $110, -96(%rbp)
movq $120, -104(%rbp)
movq $140, -112(%rbp)
movq $150, -120(%rbp)
movq $160, -128(%rbp)
movq $170, -136(%rbp)
movq $180, -144(%rbp)
movq $190, -152(%rbp)
movq $200, -160(%rbp)
leave
ret
Notice that although function ‘sf_fun()’ hasn’t passed any arguments but has declared several local variables of type ‘long’ and several pointer-to-ints. Then they were assigned with values.

Notice instruction no. 4, in assembly code above,
subq $40 %rsp
which subtracts 40 from current stack-pointer value and allocates 40 bytes for stack for this function. x86-64 bit implementation has a very interesting feature that it allows any program to access up to 128 bytes but to lower addresses to current stack-pointer value. This special area is called ‘red zone’.


Top
05 Do Local Variables and Function Prototypes in a C Program Produce any Assembly Code
Question6: Do Local Variables and Function Prototypes in a C Program Produce any Assembly Code
Answer: This is an easy task! Let’s experiment with our machine to confirm this! I’ve written below a C function within which declared several variables, pointers and also a few function prototypes.

void lvar_funpt()
{
long a1, a2, a3, a4, a5, a6,
a7, a8, a9, a10;

int *pi1, *pi2, *pi3, *pi4, *pi5,
*pi6, *pi7, *pi8, *pi9, *pi10;

fun_ret_ptr();
fun_ret_int();
}
x86-64 bit assembly implementation for above code fragment follows

lvar_funpt:
pushq %rbp # pushed return address
movq %rsp, %rbp # copied stack-pointer to base-pointer
movl $0, %eax # move value '0' into '%eax'
call fun_ret_ptr # call to function
movl $0, %eax
call fun_ret_int # call to function
popq %rbp # popping off of return add
ret

Notice in the assembly code produced for above C function that there aren’t any assembly instructions generated for variables’ declarations or for function prototypes. Notice that I annotated each assembly instruction to its extremely right.

Now, let’s try and see what happens if variables are initialized? We modify the above C code fragment and set to begin with another interesting experiment below,

void lvar_funpt()
{
long a1, a2, a3, a4, a5, a6,
a7, a8, a9, a10;

int *pi1, *pi2, *pi3, *pi4, *pi5,
*pi6, *pi7, *pi8, *pi9, *pi10;

/* assignment */
a1 = 10; a2 = 20; a3 = 30; a4 = 40; a5 = 50;

pi1 = (int *)100;
pi2 = (int *)200;

fun_ret_ptr();
fun_ret_int();
}
x86-64 bit implementation of the above modified C function

lvar_funpt:
pushq %rbp # return address saved
movq %rsp, %rbp # stack-pointer copied to base-pointer
subq $64, %rsp # 64 bytes allocated for stack
movq $10, -8(%rbp) # copied a1
movq $20, -16(%rbp) # copied a2
movq $30, -24(%rbp) # copied a3
movq $40, -32(%rbp) # copied a4
movq $50, -40(%rbp) # copied a5
movq $100, -48(%rbp) # copied pi1
movq $200, -56(%rbp) # copied pi2
movl $0, %eax
call fun_ret_ptr
movl $0, %eax
call fun_ret_int
leave
ret
So, what do you observe differently now? Don’t you see assembly instructions for variables and pointers you had all initialized there. Remember that had any local variables been initialized in their declarations, assembly instructions would appear here to perform assignment.


Top
06 Define Different Segments of a C Program’s Address Space
Question: Define Different Segments of a C Program’s Address Space
Answer: Generally, every program we write contains instructions, initialized and uninitialized data, variables local to a function and instructions for dynamically allocated storage. Once the program is done with writing we, then, compile it to generate a default output “a.out”, an executable linkable format (ELF) file, stored on secondary storage. This ‘a.out’ file is binary file of our C program and is ready to execute. This is organised into several segments. In order to see which different segments comprise it and size of each segment, run the ‘size’ command on it. For example, say, for segments.c program,

gcc segments.c
size ./a.out
this outputs something like this, as given below,

text data bss dec hex filename
1283 496 16 1795 703 ./a.out
Note the head ‘dec’ gives total size requirement of your C program. Now. let’s explore these segments individually to see which segment comprises what. ‘Text segment’ is where all executable instructions are stored; ‘data segment’ contains all initialized data viz., global and static data. And what do you think about where uninitialized data go? BSS (Block started by symbol) segment doesn’t actually contain the uninitialized data except a note for the their size requirement.

Stack segment is created dynamically on demand where local variables of a function, return address and values of registers are stored. Block of memory is allocated from heap when first call to malloc() is made.

You might wonder why default output file ‘a.out’ is organised into segments? Actually, this organisation gives loader convenience to load the images of these segments into the process’s address space. Loader firstly copies text segment and doesn’t worry about it as this segment typically neither grows in size nor in value. Then it copies the image of data segment into process’s address space. In order To allocate bss segment, loader takes the note of size requirement from the executable then obtains a block of that size for uninitialized data storage. At this point, both BSS and data segment refers to jointly as data segment. This is because a segment, in OS memory management terms, is simply a range of virtual addresses so, these segments coalesced. Data segment is typically the largest segment.

We still need some memory space for local variables, temporaries, parameters passing in procedures. A stack segment is allocated for this requirement. this segment is created at run time. Let’s, now, look into how C organises this runtime data structure.

A stack segment contains single stack data structure. As we already know about it through ours’ past programming experience that it’s dynamic area of memory and is implemented as last-in-first-out queue. Thus we have only valid operations with stack are either to push values on the top of stack or retrieve them by popping off from top of the stack. A push operation augments stack while pop reduces it. A function can access variables local to a calling function via parameters or global pointers. The runtime maintains a pointer, often in a register which is usually called sp (stack-pointer), which indicates the current top of the stack. Basically, stack is used for following three purposes:

1. It provides storage space for local variables declared inside a function. These are called ‘automatic variables’ in C programming.

2. Stack stores “housekeeping” information when a function call is made. This housekeeping information is known as “stack frame” which includes address from which function was called in order to return to when called function is finished execution, parameters that don’t fit into registers and saved values of registers.

3. Stack is used as a “Scratch-pad area”- every time when program seeks for temporary storage, perhaps while computing some lengthy arithmetic expression, it can push partial results onto stack and popping them off when needed.

Just as stack segment grows dynamically on demand, so the data segment contains an object that does this is called “heap”. Heap provides storage when needed for dynamically allocated storage through call to ‘malloc()’ function and access to that block of allocated memory through pointer. Everything in heap is anonymous that is we can’t access it directly by name, only indirectly through pointer. malloc() and other friends viz., calloc(), realloc(), library call is just provide the way to obtain allocation from heap. The difference lies in that calloc() clears the block to ‘0’ before returning a pointer to it while realloc() is same as malloc() except that it resizes the block either by growing or shrinking it, or more often by copying the contents somewhere else and returning you a pointer to new location.

Since allocation from heap is done with calls to malloc(), therefore memory allocated from heap doesn’t have to be returned in the order it was acquired or it doesn’t have to be returned at all. Unordered call to malloc()/free() causes heap fragmentation. Heap must keep track of different regions, and whether they are in use or available to malloc(). One scheme is to have a linked list of available blocks (free store), and each block handed to malloc() is preceded by a size count that goes with it.

Heap memory space is limited by a pointer called “brk”. If your program accesses past the ‘brk’, it’ll be aborted. In order to malloc() more memory, simply shift ‘brk’ pointer away.

Notice that lowest part of the virtual address space is left unmapped; that is, this is within the process’s address space but has not been assigned to a physical address. Therefore any reference to this is illegal. This space is only a few Kbytes memory from address ‘0’ up. Though, this catches references from ‘null pointer’, and pointers that have small integer values.
Top
07 Is this True that all Variables Declared to be Registers in a C Program Allocated Necessarily in Register Memory
Question: Is this true that all the Variables Declared to be Registers in a C Program Given Allocation Necessarily in Register Memory
Answer: Thus far, we have become familiar with ‘general purpose register set of x86-64 bit’ architecture. There are 6 registers reserved for obtaining arguments being passed to a procedure. These are:

%rdi
%rsi
%rdx
%rcx
%r8
%r9
When one function calls other function with up to max 6 arguments (integers and pointers) being passed to it, these arguments are stored into the above mentioned registers in sequence from upside down. Notice that floating-point variables are passed through floating-point registers apart from registers mentioned above.

So, what will happen if one function calls other function with more than 6 arguments and are all declared registers in the called function? Don’t you like to experiment this case and observe the result? Aren’t you? Sure! Let’s write a simple C program for the issue in hand, then generate assembly for that to clear the confusion.

1 #include <stdio.h>
2
3 void register_args(int, long, long, int, int *, long *,
4 register int, register char *);
5
6
7 int main(void)
8 {
9 int a1 = 10, d4 = 20, e7 = 30;
10 int *pi5 = &a1;
11 long b2 = 40, c3 = 50;
12 long *pl6 = &b2;
13 char f9 = 'A';
14 char *pc8 = &f9;
15
16 register_args(a1, b2, c3, d4, pi5, pl6, e7, pc8);
17 return 0;
18 }
19
20 void register_args(int a1, long b2, long c3, int d4, int *pi5,
21 long *pl6, register int e7, register char *pc8)
22 {
23 /* do here manipulations */
24 e7 += 1;
25 *pc8 = 'B';
26
27 }
Notice that in above C program, main() called ‘register_args()’ with 8 arguments, Arguments 7 and 8 are declared registers in called function. Let’s switch to assembly output to look into what interesting happened!

register_args:
pushq %rbp
movq %rsp, %rbp
movl %edi, -4(%rbp)
movq %rsi, -16(%rbp)
movq %rdx, -24(%rbp)
movl %ecx, -8(%rbp)
movq %r8, -32(%rbp)
movq %r9, -40(%rbp)
movq 24(%rbp), %rax
movb $66, (%rax)
popq %rbp
ret

So, what do you observe? It’s obvious by looking into assembly. As it was sure that the first six parameters would be allocated into register to the available 6 and it’s so. The rest two, ‘e7’ and ‘pc8’, even though , were declared registers not given register allocation. They were pushed onto stack. Recall that variables, more than six and excepting the first six, if all declared registers are not given register allocation because of fixed no. of registers for arguments being passed to a procedure.
Top
08 Which Variable Should be Declared as Registers in a C Program
Question9: Which Variable Should be Declared as Registers in a C Program
Answer: Recall that ‘x86-64 bit’ architecture allows up to max of 6 arguments being passed to a procedure to be stored via registers. Register allocation eliminates the need to fetch the arguments from memory thereby reducing the overheads incurred in reading from and writing back to memory. Because of numerous registers available for function parameters, much of the program state is held in registers. Also, some procedures don’t need to access the stack at all.

Since registers eliminate the overheads incurred in reading from and writing back to memory, therefore, a C programmer must declare a variable in register memory which has quite frequent use in the program.


Top
09 What is Difference Between Frame Pointer and Stack Pointer
Question: What is Difference Between Frame Pointer and Stack Pointer
Answer: Consider fragment of C code below,

long int simple_long(long int *xp, long int y)
{
long int t = *xp + y;
*xp = t;

return t;
}
When gcc is run on x86-64 bit architecture with the command-line

gcc -osl32 -S -m32 slong.c
it generates code that is compatible with any IA32 machine:

simple_long:
pushl %ebp Save frame pointer
movl %esp, %ebp Create new frame pointer
subl $16, %esp Creating stack by sub 16 from '%esp'
movl 8(%ebp), %eax xp copied into '%eax'
movl (%eax), %eax Retrieve *xp
addl 12(%ebp), %eax Add y to get t and return value
movl %eax, -4(%ebp) Copied t onto stack
movl 8(%ebp), %eax xp copied into '%eax'
movl -4(%ebp), %edx t copied to '%edx'
movl %edx, (%eax) '%edx' copied to *xp
movl -4(%ebp), %eax t copied to '%eax' to return to calling fun
leave stack deallocated; previous '%ebp' restored
ret function returns
Notice in above assembly code that ‘stack frame’ exits between ‘frame pointer’ (%ebp) and ‘stack pointer’ (%esp). All locations are accessed relative to ‘frame pointer’. ‘Frame pointer’ is also called ‘base pointer’. Notice also that each assembly instruction is annotated to its right.

Let’s now, generate assembly for the same C fragment of code for x86-64 bit implementation with the command-line as:

gcc -osl64 -S -m64 slong.c
and below is assembly code, notice that xp in ‘%rdi’ and y in ‘%rsi’

simple_l:
pushq %rbp saved old frame pointer
movq %rsp, %rbp creating new frame pointer
movq %rdi, -24(%rbp) xp is copied
movq %rsi, -32(%rbp) y is copied
movq -24(%rbp), %rax xp copied to '%rax'
movq (%rax), %rax retrieved *xp
addq -32(%rbp), %rax added y to *xp, obtained t
movq %rax, -8(%rbp) t copied
movq -24(%rbp), %rax xp copied to '%rax'
movq -8(%rbp), %rdx t copied to '%rdx'
movq %rdx, (%rax) t copied to *xp
movq -8(%rbp), %rax t copied to '%rax'
popq %rbp old frame pointer restored
ret function returns
Notice in assembly code produced for x86-64 bit that locations were accessed relative to stack pointer, at this point both ‘%rsp’ and ‘%rbp’ pointing to same location. There’s not created any stack frame and values of variables, partial results of computations etc. were stored in area of memory called “red zone”. Actually, x86-64 bit implementation allows a program to access up to 128 bytes of space (towards lower addresses to current value of stack pointer) without incrementing or decrementing the stack pointer.

In fact, frame pointer serves the base of stack frame while stack pointer refers to top location of the stack. Stack frame, on most processors, grows downwards i.e. towards lower memory addresses.

Top
10 How Stack Frame is Organised and How Protocols for Calling and Returning from Functions are Interpreted
Question: How Stack Frame is Organised and How Protocols for Calling and Returning from Functions are Interpreted
Answer: Unlike IA32 ISA, ‘x86-64 bit’ architecture implements stack differently. Because of max 6 registers allowed for parameters being passed to a procedure to be strored via registers, some procedures do not need stack at all. At times when arguments being passed to a procedure are more than 6 and/or called routine has its own local arguments, need for stack frame arises. At this point, x86-64 allows any program to access up to 128 bytes of memory (towards lower memory addresses with current value of stack pointer) in stack segment with locations accessible with relative to stack pointer ‘%rsp’. This area of memory is called ‘red zone’ and this is managed by virtual memory management system.

Now, what will happen if local arguments within called function are big sized arrays or structures? How will then demand of increased required amount of memory got to be fulfilled? In such a case, stack frame is created with required amount of memory which is pre-calculated and known at compile time with stack pointer fixed and refferring to top location in the stack frame. All locations in stack frame are accessed with relative to stack pointer. Surprisingly, x86-64 bit doesn’t have the concept for frame pointer.

Recall that by doubling the register set, programs need not be so dependent on the stack for storing and retrieving procedure information. This can greatly reduce the overhead for procedure calls and returns.

Here are some of the highlights of how procedures are implemented with x86-64:

1. Arguments (up to the first six) are passed to procedures via registers, rather than on the stack. This eliminates the overhead of storing and retrieving values on the stack.

2. The call instruction stores a 64-bit return pointer on the stack.

3. Many functions do not require a stack frame. Only functions that cannot keep all local variables in registers need to allocate space on the stack.

4. Functions can access storage on the stack up to 128 bytes beyond (i.e., at a lower address than) the current value of the stack pointer. This allows some functions to store information on the stack without incrementing or decrementing the stack pointer.

5. There is no frame pointer. Instead, references to stack locations are made relative to the stack pointer. Typical functions allocate their total stack storage needs at the beginning of the call and keep the stack pointer at a fixed position.

6. As with IA32, some registers are designated as callee-save registers. These must be saved and restored.

Top
11 What Happens During Function Epilogue Phase
Question: What Happens During Function Epilogue Phase?
Answer: As we know that called function has three components viz., function prologue, function body and lastly function epilogue. Function prologue begins with creating space for local variables, saved registers and for parameters being passed to a procedure. Function body is where useful work is performed. When function completes execution and finishes, function epilogue begins by restoring values of saved registers, previous frame pointer and restores the stack pointer before function returns to where it’s called from.

As we have become familiar that ‘x86-64 bit’ architecture doesn’t have frame pointer and all locations on the stack frame are accessed relative to stack pointer. Moreover, it allows max 6 parameters being passed to a procedure to be stored via registers and therefore programs need not be so dependent on the stack for storing and retrieving procedure information. This can greatly reduce the overhead for procedure calls and returns.

Therefore, in an instance where stack frame was created, only useful work that is done during the epilogue phase is to restore ‘callee save registers’ if, any, modified, restoring the stack pointer and popping off of the return address. Recall that in x86-64 bit implementation, a ‘callee save’ register is one whose contents must be copied onto stack before the same is used to hold temporaries and must be restored when function is done and about to return. On the other hand, when there’s no stack frame created, epilogue pops off return address which was pushed onto stack by calling function and where called function would return after having finished. Control is then transferred to calling program.

Top
Render time: 0.30 seconds
7,457,328 unique visits