Review S: ARM stacks & subroutines
printable versionS1: [1] [2] [3] [4] // S2: [1] // S3: [1] [2] [3] [4] [5] [6] [7] // S4: [1] [2] [3]
Problem S1.1
An ARM assembly language subroutine can be called from multiple locations within a program. From wherever it is called, the subroutine should return back to the instruction following the one that entered the subroutine. How does an ARM subroutine determine where to return after completing its computation? Your answer should describe where it can find the return location and how the value is normally placed there.
To enter a subroutine, ARM code uses the BL
instruction, which places the address of the instruction after
BL
into the “link register,” R14
.
Upon finishing, then, the subroutine knows that it can look into
R14
to find which instruction should be executed
next.
(In practice, the subroutine will often place R14
on
the stack, since R14
would change whenever the subroutine
calls any other subroutines.)
Problem S1.2
Explain the function of the link register (R14
) on the ARM
processor.
When calling a subroutine, we place into the link register
the address of the instruction following the entry in the
subroutine.
This way, the subroutine code can determine the next
instruction to be executed after it is ready to return.
For example, if subroutine A decides that it wishes
to utilize subroutine B, then before A
jumps into B, it first places into R14
the address
of the next instruction to be executed after B completes.
Subroutine B will be written so that once it
completes, it looks into R14
to determine which
instruction to use next. In this way, B can be
written so that it will be able to work regardless of which
subroutine uses B.
Problem S1.3
Explain how an ARM CPU changes its registers when it executes a BL
instruction to enter the subroutine starting at the address 0x560.
It copies the current value of R15
(the program
counter) into R14
(the link register),
and it places 0x560
into R15
so that the next
instruction fetched for execution will be the one at address
0x560
.
Problem S1.4
In the following ARM assembly language subroutine using the
ARM Procedure Call Standard, we take a value n
found in R0
and raise it to the tenth power
(n10), placing the result back
into R0
. The subroutine accomplishes this by calling a
subroutine pow
, which takes the value in R0
and
raises it to the power found in R1
, placing the value of
ab into R0
.
toTenth MOV R1, #10
BL pow
MOV PC, LR
Even though pow
abides by the ARM Procedure Call
Standard and works correctly,
we find that this toTenth
subroutine never returns.
Explain why precisely this happens, and describe
what you would do to repair it (without necessarily resorting to the
precise ARM assembly code you would add or change).
When we execute “BL pow
”, the address of the
following instruction (“MOV PC, LR
”) is placed
into the link register. Once pow
returns, the link
register still contains the address of that instruction, so
when we execute “MOV PC, LR
”, we are simply
resetting PC
to point again to that same MOV
instruction, and it will end up repeating that single instruction
indefinitely.
To repair it, we would need to store LR
somehow
(probably on the program stack) before entering the pow
subroutine. In returning, you would need to ensure that this
stored value of LR
is placed into PC
.
Problem S2.1
Write an ARM assembly language instruction(s) to push
R4
onto the program stack referenced by SP
.
Both STR R4, [SP, #-4]!
and STMDB SP!, { R4 }
are good solutions.
Problem S3.1
Distinguish between callee-save registers (R4
through
R11
in the ARM Procedure Call Standard) and caller-save registers (R0
through R3
in the ARM Procedure Call Standard).
If a subroutine uses callee-save registers, then it must ensure that their value upon returning is the same as their value when the subroutine begins. With caller-save registers, though, a subroutine is allowed to change them without restoring their initial values. (If the caller needs their values, it should save those registers' values itself.)
Problem S3.2
Below is a C function and a first attempt at translating it into
equivalent ARM subroutine.
However, while the ARM code assembles successfully, execution of
it reveals that it regularly gets stuck within fact
in
what is apparently an infinite loop. Explain why, and explain
how you can repair the ARM code so that it returns correctly
while preserving the recursive nature of the original C
function.
int fact(int n) { | fact CMP R0, #0 ; if argument n is 0, return 1 |
The BL fact
instruction modifies the link register
to reference the MUL
instruction, so that once the
recursive call completes it will return back to MUL
to
complete the original call. However, the original call depends
on the original value of the link register for its final
instruction, MOV PC, LR
; with the link register changed
to hold the MUL
instruction's address, it will
indefinitely loop back to this instruction instead.
The solution is to push the link register's value onto the
stack by inserting an instruction such as
“STMDB SP!, {LR}
” following the second
MOVEQ
instruction.
We'd then replace the “MOV PC, LR
”
instruction with something thta pops the pushed value back into
PC
: “LDMIA SP!, {PC}
”.
(Another problem — though not a correct answer to the
question — is that the recursive call will obliterate
the saved value of R3
. The solution there is to push it
onto the stack prior to the call and then to restore it.
This answer is not correct, because repairing this problem doesn't
remove the infinite-loop problem
mentioned in the question. Admittedly, though,
the above repair relating to LR
will lead to the function returning with the wrong value
without this issue also addressed.)
Problem S3.3
Using the ARM Procedure Call Standard, write a subroutine
max
that takes two parameters a and b and
returns the larger of the two.
max CMP R0, R1
MOVLT R0, R1
MOV PC, LR
Problem S3.4
Suppose a subroutine gcd
, using the ARM Procedure Call Standard,
already exists elsewhere in a program
to compute the greatest common denominator of two integer parameters.
We want to use this to compose a subroutine gcd3
that computes the greatest common denominator of \emph{three}
integer parameters.
Describe in English what goes wrong with the following solution, and
show how to repair the assembly language so that it works
correctly.
gcd3 BL gcd ; R0 = gcd(param0, param1)
MOV R1, R2
BL gcd ; R0 = gcd(R0, param2)
MOV PC, LR ; return R0
In entering gcd3
,
R2
and LR
will have values that the subroutine
later uses: The second line needs R2
, while the last
one needs LR
. However, the first BL
instruction will lead to both values being lost:
The BL
instruction will itself overwrite LR
as it enters gcd
to hold the address of gcd3
's
second instruction (so gcd
knows where to resume),
while the gcd
subroutine is allowed to change all
caller-save registers, including R2
.
Consequently, the second and fourth lines will use the wrong
values for R2
and LR
.
The solution is to push R4
and LR
onto the
stack upon entering the subroutine, using the callee-save
register R4
to remember the third parameter over the
subroutine call, and then restoring R4
and LR
just before returning.
gcd3 STMDB SP!, {R4, LR}
MOV R4, R2
BL gcd ; R0 = gcd(param0, param1)
MOV R1, R4
BL gcd ; R0 = gcd(R0, param2)
LDMIA SP!, {R4, LR}
MOV PC, LR ; return R0
Problem S3.5
Suppose we have in another file written a subroutine labeled by
remain
that given two parameters k and
n
returns the remainder when dividing n by k. This
subroutine uses the ARM Procedure Call Standard for parameter passing
and register usage.
Write an ARM program fragment that uses this remain
subroutine to add all the factors of R4
and place the
sum into R5
. For example, if R4
holds 12
before your fragment is executed, then after it is executed
R5
should hold 16, since 1 + 2 + 3 + 4 + 6 = 16.
You may assume that R4
is at least 2.
MOV R5, #0
SUB R6, R4, #1
loop MOV R0, R4
MOV R1, R6
BL remain
TST R0, R0
BNE next
ADD R5, R5, #1
next SUBS R6, R6, #1
BNE loop
Problem S3.6
Assuming we already have an ARM subroutine named sin
,
translate the following C function
into an ARM subroutine following the ARM Procedure Call Standard.
int addsin(int a, int b) {
int fa = sin(a);
int fb = sin(b);
return fa + fb;
}
addsin STMDB SP!, {R4, R5, LR}
MOV R4, R1
BL sin
MOV R5, R0
MOV R0, R4
BL sin
ADD R0, R0, R5
LDMIA SP!, {R4, R5, PC}
Problem S3.7
In the following ARM assembly code using the ARM Procedure Call
Standard, we wish to compute
n10 + n9 + n8 + … + n1,
where n is the value found in R0
upon entering
the addPows
subroutine. However, it doesn't work because
the call to pow
ends up changing the registers R0
through R3
, and this code depends on R1
,
R2
, and R3
remaining unchanged.
What specific ARM assembly code would you insert or change so that this works, while still conforming to the ARM Procedure Call Standard?
addPows MOV R1, #10
MOV R2, #0
MOV R3, R0
again MOV R0, R3
BL pow
ADD R2, R2, R0
SUBS R1, R1, #1
BNE again
MOV R0, R2
MOV PC, LR
We need to stash the values of these registers so that they
can be recovered following completion of the subroutine call.
You can store the register values on the program stack by inserting
“STMDB SP!, {R1-R3}
” preceding the “BL pow
”
instruction,
and you can restore their values by inserting
“LDMIA SP!, {R1-R3}
” following the BL
instruction.
Problem S4.1
Complete the partial fragment below for setting the last value
of a linked list to 0,
where R4
is initially the address of the list's first node.
Each node holds its integer data in the first four bytes, and
in the next four bytes is the address of the node following it in the list.
The last node has “0” marked as the address of the following
node.
loop ; load address of node following R4 into R5; i.e., r5 = r4->next
TST R5, R5
MOVNE R4, R5
BNE loop
MOV R5, #0
; change value in node R4 to be R5; i.e., r4->value = r5
The short solution:
LDR R5, [R4, #4]
and then
STR R5, [R4]
.
The first instruction might be replaced by two lines:
ADD R0, R4, #4
LDR R5, [R0]
Problem S4.2
Complete the partial fragment below for finding the sum of the integer
values of a linked list, where R0
is initially the address
of the list's first node. Each node holds its integer value in the
first four bytes, and in the next four bytes is the address of the
node following it in the list. The list's last node is marked by
having 0 as the address of its following node.
sumList MOV R1, #0 ; R1 accumulates sum of nodes seen
sumLoop ; TODO: R1 += R0->value
sumNext ; TODO: R0 = R0->next
TST R0, R0 ; if R0 isn't 0, repeat loop for following node
BNE sumLoop
; R1 will now hold sum of list's integer values
sumList MOV R1, #0 ; R1 should be sum after completing loop
sumLoop LDR R2, [R0] ; R1 += R0->value
ADD R1, R1, R2
sumNext LDR R0, [R0, #4] ; R0 = R0->next
TST R0, R0 ; if R0 isn't 0, repeat loop for following node
BNE sumLoop
Problem S4.3
Suppose R4
holds the memory address of the first node
in a linked list of integers, where each node holds its integer
data in its first four bytes and the address of the node
following it in the next four bytes. The linked list is
terminated by an address of 0. Write an ARM assembly
fragment that places into R0
how many nodes the list contains.
Your fragment may change any other registers as it pleases.
MOV R0, #0
B start
next LDR R4, [R4, #4]
ADD R0, R0, #1
start CMP R4, #0
BNE next