Linked List in Assembly
A while ago I took a course where I learned the (Intel) processor architecture and coding in Assembly Language. If you could get past the lackluster presentation and a certain lack of motivation from the teacher, the stuff presented was pretty impressive.
It was fun seeing C code that was usually taken for granted be deconstructed in such a low level yet logical manner. Later I wrote a small program that took advantage of existing C libraries and the OS to do something that, despite being better suited for higher level languages, was nice to tinker with. I will present it in today’s post.
The program is a set of procedures and data structures that represent a linked list, implementing a queue actually. The source code can be found in my Github corner. I wrote it in Linux, but forced the assembler (nasm) to compile Intel syntax, which I am more accustomed with.
First of all, the well known standard library functions used are:
- malloc
- free
- putchar
- puts
We have to declare them for the linker. Then, any self-respecting program has a data segment reserved, where I define the structure for the list node and several strings for messages. Finally, the .bss section is for our uninitialized variables, in our case the pointer to the NULL list head, because the list is empty at first.
extern malloc extern free extern putchar extern puts section .data size_i: ; Used to determine the size of the structure struc node info: resd 1 next: resd 1 endstruc len: equ $ - size_i ; Size of the data type nullMes: db 'Null pointer - the list is empty', 0 addMes: db 'Adding a new element', 0 printMes: db 'Printing a linked list:', 0 cleanMes: db 'Cleaning an element', 0 doneCleanMes: db 'Ready for cleaning...', 0 emptyListMes: db '- empty list -', 0 section .bss prim: resd 1
Several functions are required to define the behavior of our queue. They are:
- append: add an element at the end of the queue. If the queue is empty, it will be initialized
- pop: remove the element at the tail of the queue
- cleanup: free all resources and destroy the queue
- print: nicely print the elements to the screen. The data type that it holds is the ASCII character, but it can be extended to anything
Here is the code, in that order. First the append procedure:
; Procedure: append ; Appends an element at the end of a linked list ; If the linked list is empty, initialize the list ; Params (in order of pushing on the stack): ; dword element - data to be added ; dword prim - first element in the list ; Return: none ; Modifies the value of prim if it is null append: push ebp ; Save the stack mov ebp, esp push eax ; Save the registers push ebx push len ; Size to get from the heap and pass the size to the malloc function call malloc ; Call the malloc function - now eax has the address of the allocated memory mov ebx, [ebp + 12] mov [eax + info], ebx ; Add the element to the node data field mov dword [eax + next], 0 ; Address of the next element is NULL, because it is the last element in the list mov ebx, [ebp + 8] ; Retrieve the address to the first element cmp dword [ebx], 0 je null_pointer mov ebx, [ebx] ; This parameter was the address of the address ; Now it is the address of the first element, in this case, not null ; If it is not NULL, find the address of the last element next_element: cmp dword [ebx + next], 0 je found_last mov ebx, [ebx + next] jmp next_element found_last: push eax push addMes call puts add esp, 4 ; Restore the stack pop eax mov [ebx + next], eax ; Last element is this one from the newly allocated memory block go_out: pop ebx ; Restore registers pop eax mov esp, ebp pop ebp ret 8 ; Return to the caller function and cleaning the stack null_pointer: push eax push nullMes call puts add esp, 4 pop eax mov [ebx], eax ; Point the address of the first element to the allocated memory jmp go_out
Yup, “noodles” is the first thing that comes to mind when trying to analyze the code. In a nutshell, the code above retrieves one of the parameters from the stack – the data to add in the new node, then memory is allocated for the new node and its pointer placed in the EAX register. The node structure is filled with the required data and pointer to next element – NULL – as it will be the last element. Then a loop through all elements until the last one is reached, that will be linked to the newly created node through its “next” field. A nice message is printed when an element is added, or if something horrible happens.
Next on our menu is pop:
; Procedure: pop ; Retrieves the first element in the list in eax and deletes it. ; The list acts like a queue ; Params: ; - address to the address of the first element ; Return: eax = the element removed pop: push ebp mov ebp, esp push ebx push edx push ecx mov ebx, [ebp + 8] ; Address of the address of the first element mov eax, [ebx] ; Address of the first element cmp eax, 0 je donePop ; Don't do anything if the list is empty mov edx, eax mov eax, [eax + info] ; The information stored in the element mov ecx, [edx + next] mov [ebx], ecx ; The address to the first element changed to point to the next one pusha ; This is suboptimal but I don't know what registers are messed up by free (I cannot believe they coded it this way) push edx ; Free the allocated resources call free add esp, 4 popa donePop: pop ecx pop edx ; Done - restore the registers and return to the calling procedure pop ebx mov esp, ebp pop ebp ret 4
This one will store the data in the first element in EAX as a return value. Then the address to the first element is changed to its successor. Memory is freed and everyone is happy. “cleanup” is next in line:
; Procedure: cleanup ; Deallocate the memory used by a linked list ; This function is recursive ; Params: ; - address of the list node to free ([ebp + 8]) ; Return: none cleanup: push ebp ; Save the stack pointer mov ebp, esp push eax ; Save the working registers mov eax, [ebp + 8] ; Retrieve the parameters cmp eax, 0 ; If the address is NULL, then it is past the end of the list je doneCleaning ; No more recursive calls; print an appropriate message push dword [eax + next] ; Push the address of the next element as parameter for the next call to this procedure call cleanup pusha ; Print a message that cleaning is underway push cleanMes call puts add esp, 4 popa push eax ; Push the address of the current element as parameter for the free function call free add esp, 4 ; Aaargh I hate these unpredictable stdlib procedures! doneAll: ; Prepare to exit the procedure pop eax mov esp, ebp pop ebp ret 4 doneCleaning: ; Print a message that the last element was passed to the procedure pusha push doneCleanMes call puts add esp, 4 popa jmp doneAll
This is a recursive function. Memory is deallocated starting with the last element. Finally, to print to stdout use:
; Procedure: print ; Prints the elements of a linked list ; The elements are considered ASCII characters, otherwise things might blow in your face with various degrees of terribleness ; Params (in order of pushing on the stack): ; dword prim - address of the first element ; Return: none print: push ebp mov ebp, esp push ebx mov ebx, [ebp + 8] ; Address of the first element cmp ebx, 0 je emptyList push eax push printMes ; Print message "Printing a linked list" call puts add esp, 4 pop eax next_char: push dword [ebx + info] call putchar mov ebx, [ebx + next] cmp ebx, 0 jne next_char push dword 10 call putchar done: pop ebx mov esp, ebp pop ebp ret 4 emptyList: pusha push emptyListMes call puts add esp, 4 popa jmp done
Just some more pasta that iterates sequentially through the list to display each character in its glory.
Now lets write the main procedure. Here is where the .text segment is also declared, so all other procedures should follow this code snippet in the source file. The main function just appends elements in different, more or less catastrophic ways, prints the list, pops it, destroys it, etc.
section .text global main main: push ebp ; Standard procedure entry mov ebp, esp mov dword [prim], 0 ; Pointer is NULL initially - the list is empty push dword 'a' ; Element to add to the list push prim ; Address of the first element in the list and pass it to the init function, through the stack call append ; Call the init function push dword 'b' push prim call append push dword 'c' push prim call append mov ecx, 1 zzzZZZ: push ecx ; I don't know why or where ecx is changed, but if I don't save it, the loop breaks and becomes infinite push dword 'Z' push prim call append pop ecx loop zzzZZZ push dword [prim] ; Send the list to printing to the stdout call print push prim ; Remove the first element: a call pop push dword [prim] call print push prim ; Remove the first element: b call pop push dword [prim] call print push prim ; Remove the first element: c call pop push dword [prim] call print push prim ; Remove the first element: Z call pop push dword [prim] call print push prim ; Remove the first element: nil call pop push dword [prim] call print push dword [prim] ; Deallocate memory call cleanup mov esp, ebp ; Restore the stack pop ebp ret
In Linux, you can use this nifty script to assemble the program and start messing with it:
# Test if there is a filename argument
if test $# -eq 0;
then
echo "Usage: asm filename"
exit 0
fi
# If there is a file to be assembled
nasm -f elf $1
name=$(echo $1 | cut -d. -f1)
gcc -o $name $name.o
rm $name.o

Awesome work, it really helped me out!
I also wanted to ask, if I want to change it so it will accept a string or 2 digits, what changes will i need to do?
Thanks,
alex
Alex, do you mean the same thing that like is in my question??
Hi alex,
Sorry I’m a little late with the reply. Anyway, this code pretty much assumes it runs on a system with 32 bit addresses. Since the “info” field in the structure is also 4 bytes long, you can use it to store pointers to strings. I haven’t tried it, but I don’t see why this wouldn’t work:
push emptyListMes
push prim
call append
Of course, you’d have to change the way you print this data.
For 2 digits of 4 bytes length, you can reserve another 4 bytes in the structure, something like this:
struc node
digit1: resd 1
digit2: resd 1
next: resd 1
endstruc
You’d also have to change the append function to accept an additional parameter. Alternatively, you can just use the “info” field to point to whatever data structure you want, which could store your 2 digits, like in the strings example.
Awesome! But i don’t understand the first element of the list!
the address “prim” have the first element of the list, but if i have a struct like:
struc agenda
name resb 15
number resb 10
type resb 10
endstruc
How can i represent the first element of the list?
Hi Da Costa! I suppose you need to add something like a “next” field that stores the address to the next element. That’s what makes a linked list “linked” :).