Review S: ARM stacks & subroutines: Questions

S1.1.

An ARM assembly language subroutine can be called from multiple locations within a program. From wherever it is called, the subroutine should return back to the instruction following the one that entered the subroutine. How does an ARM subroutine determine where to return after completing its computation? Your answer should describe where it can find the return location and how the value is normally placed there.

S1.2.

Explain the function of the link register (R14) on the ARM processor.

S1.3.

Explain how an ARM CPU changes its registers when it executes a BL instruction to enter the subroutine starting at the address 0x560.

S1.4.

In the following ARM assembly language subroutine using the ARM Procedure Call Standard, we take a value n found in R0 and raise it to the tenth power (n10), placing the result back into R0. The subroutine accomplishes this by calling a subroutine pow, which takes the value in R0 and raises it to the power found in R1, placing the value of ab into R0.

toTenth MOV R1#10
        BL pow
        MOV PCLR

Even though pow abides by the ARM Procedure Call Standard and works correctly, we find that this toTenth subroutine never returns. Explain why precisely this happens, and describe what you would do to repair it (without necessarily resorting to the precise ARM assembly code you would add or change).

S2.1.

Write an ARM assembly language instruction(s) to push R4 onto the program stack referenced by SP.

S3.1.

Distinguish between callee-save registers (R4 through R11 in the ARM Procedure Call Standard) and caller-save registers (R0 through R3 in the ARM Procedure Call Standard).

S3.2.

Below is a C function and a first attempt at translating it into equivalent ARM subroutine. However, while the ARM code assembles successfully, execution of it reveals that it regularly gets stuck within fact in what is apparently an infinite loop. Explain why, and explain how you can repair the ARM code so that it returns correctly while preserving the recursive nature of the original C function.

int fact(int n) {
  if (n == 0) {
    return 1;
  } else {
    return n * fact(n - 1);
  }
}
fact    CMP R0#0      ; if argument n is 0, return 1
        MOVEQ R0#1
        MOVEQ PCLR
        MOV R3R0      ; otherwise save argument n into R3
        SUB R0R0#1  ; and perform recursive call on R3 - 1
        BL fact
        MUL R0R3R0  ; multiply returned value by n
        MOV PCLR      ; and return
S3.3.

Using the ARM Procedure Call Standard, write a subroutine max that takes two parameters a and b and returns the larger of the two.

S3.4.

Suppose a subroutine gcd, using the ARM Procedure Call Standard, already exists elsewhere in a program to compute the greatest common denominator of two integer parameters. We want to use this to compose a subroutine gcd3 that computes the greatest common denominator of \emph{three} integer parameters. Describe in English what goes wrong with the following solution, and show how to repair the assembly language so that it works correctly.

gcd3 BL gcd     ; R0 = gcd(param0, param1)
     MOV R1R2
     BL gcd     ; R0 = gcd(R0, param2)
     MOV PCLR ; return R0
S3.5.

Suppose we have in another file written a subroutine labeled by remain that given two parameters k and n returns the remainder when dividing n by k. This subroutine uses the ARM Procedure Call Standard for parameter passing and register usage.

Write an ARM program fragment that uses this remain subroutine to add all the factors of R4 and place the sum into R5. For example, if R4 holds 12 before your fragment is executed, then after it is executed R5 should hold 16, since 1 + 2 + 3 + 4 + 6 = 16. You may assume that R4 is at least 2.

S3.6.

Assuming we already have an ARM subroutine named sin, translate the following C function into an ARM subroutine following the ARM Procedure Call Standard.

int addsin(int aint b) {
  int fa = sin(a);
  int fb = sin(b);
  return fa + fb;
}
S3.7.

In the following ARM assembly code using the ARM Procedure Call Standard, we wish to compute n10 + n9 + n8 + … + n1, where n is the value found in R0 upon entering the addPows subroutine. However, it doesn't work because the call to pow ends up changing the registers R0 through R3, and this code depends on R1, R2, and R3 remaining unchanged.

What specific ARM assembly code would you insert or change so that this works, while still conforming to the ARM Procedure Call Standard?

addPows MOV R1#10
        MOV R2#0
        MOV R3R0
again   MOV R0R3
        BL pow
        ADD R2R2R0
        SUBS R1R1#1
        BNE again
        MOV R0R2
        MOV PCLR
S4.1.

Complete the partial fragment below for setting the last value of a linked list to 0, where R4 is initially the address of the list's first node. Each node holds its integer data in the first four bytes, and in the next four bytes is the address of the node following it in the list. The last node has “0” marked as the address of the following node.

loop   ; load address of node following R4 into R5; i.e., r5 = r4->next

       TST R5R5
       MOVNE R4R5
       BNE loop
       MOV R5#0
       ; change value in node R4 to be R5; i.e., r4->value = r5
S4.2.

Complete the partial fragment below for finding the sum of the integer values of a linked list, where R0 is initially the address of the list's first node. Each node holds its integer value in the first four bytes, and in the next four bytes is the address of the node following it in the list. The list's last node is marked by having 0 as the address of its following node.

sumList   MOV R1#0           ; R1 accumulates sum of nodes seen
sumLoop                        ; TODO: R1 += R0->value


sumNext                        ; TODO: R0 = R0->next


          TST R0R0           ; if R0 isn't 0, repeat loop for following node
          BNE sumLoop
          ; R1 will now hold sum of list's integer values
S4.3.

Suppose R4 holds the memory address of the first node in a linked list of integers, where each node holds its integer data in its first four bytes and the address of the node following it in the next four bytes. The linked list is terminated by an address of 0. Write an ARM assembly fragment that places into R0 how many nodes the list contains. Your fragment may change any other registers as it pleases.

Review S: ARM stacks & subroutines: Solutions

S1.1.

To enter a subroutine, ARM code uses the BL instruction, which places the address of the instruction after BL into the “link register,” R14. Upon finishing, then, the subroutine knows that it can look into R14 to find which instruction should be executed next.

(In practice, the subroutine will often place R14 on the stack, since R14 would change whenever the subroutine calls any other subroutines.)

S1.2.

When calling a subroutine, we place into the link register the address of the instruction following the entry in the subroutine. This way, the subroutine code can determine the next instruction to be executed after it is ready to return. For example, if subroutine A decides that it wishes to utilize subroutine B, then before A jumps into B, it first places into R14 the address of the next instruction to be executed after B completes. Subroutine B will be written so that once it completes, it looks into R14 to determine which instruction to use next. In this way, B can be written so that it will be able to work regardless of which subroutine uses B.

S1.3.

It copies the current value of R15 (the program counter) into R14 (the link register), and it places 0x560 into R15 so that the next instruction fetched for execution will be the one at address 0x560.

S1.4.

When we execute “BL pow”, the address of the following instruction (“MOV PCLR”) is placed into the link register. Once pow returns, the link register still contains the address of that instruction, so when we execute “MOV PCLR”, we are simply resetting PC to point again to that same MOV instruction, and it will end up repeating that single instruction indefinitely.

To repair it, we would need to store LR somehow (probably on the program stack) before entering the pow subroutine. In returning, you would need to ensure that this stored value of LR is placed into PC.

S2.1.

Both STR R4, [SP#-4]! and STMDB SP!, { R4 } are good solutions.

S3.1.

If a subroutine uses callee-save registers, then it must ensure that their value upon returning is the same as their value when the subroutine begins. With caller-save registers, though, a subroutine is allowed to change them without restoring their initial values. (If the caller needs their values, it should save those registers' values itself.)

S3.2.

The BL fact instruction modifies the link register to reference the MUL instruction, so that once the recursive call completes it will return back to MUL to complete the original call. However, the original call depends on the original value of the link register for its final instruction, MOV PCLR; with the link register changed to hold the MUL instruction's address, it will indefinitely loop back to this instruction instead.

The solution is to push the link register's value onto the stack by inserting an instruction such as “STMDB SP!, {LR}” following the second MOVEQ instruction. We'd then replace the “MOV PCLR” instruction with something thta pops the pushed value back into PC: “LDMIA SP!, {PC}”.

(Another problem — though not a correct answer to the question — is that the recursive call will obliterate the saved value of R3. The solution there is to push it onto the stack prior to the call and then to restore it. This answer is not correct, because repairing this problem doesn't remove the infinite-loop problem mentioned in the question. Admittedly, though, the above repair relating to LR will lead to the function returning with the wrong value without this issue also addressed.)

S3.3.
max    CMP R0R1
       MOVLT R0R1
       MOV PCLR
S3.4.

In entering gcd3, R2 and LR will have values that the subroutine later uses: The second line needs R2, while the last one needs LR. However, the first BL instruction will lead to both values being lost: The BL instruction will itself overwrite LR as it enters gcd to hold the address of gcd3's second instruction (so gcd knows where to resume), while the gcd subroutine is allowed to change all caller-save registers, including R2. Consequently, the second and fourth lines will use the wrong values for R2 and LR.

The solution is to push R4 and LR onto the stack upon entering the subroutine, using the callee-save register R4 to remember the third parameter over the subroutine call, and then restoring R4 and LR just before returning.

gcd3 STMDB SP!, {R4LR}
     MOV R4R2
     BL gcd     ; R0 = gcd(param0, param1)
     MOV R1R4
     BL gcd     ; R0 = gcd(R0, param2)
     LDMIA SP!, {R4LR}
     MOV PCLR ; return R0
S3.5.
       MOV R5#0
       SUB R6R4#1
loop   MOV R0R4
       MOV R1R6
       BL remain
       TST R0R0
       BNE next
       ADD R5R5#1
next   SUBS R6R6#1
       BNE loop
S3.6.
addsin STMDB SP!, {R4R5LR}
       MOV R4R1
       BL sin
       MOV R5R0
       MOV R0R4
       BL sin
       ADD R0R0R5
       LDMIA SP!, {R4R5PC}
S3.7.

We need to stash the values of these registers so that they can be recovered following completion of the subroutine call. You can store the register values on the program stack by inserting “STMDB SP!, {R1-R3}” preceding the “BL pow” instruction, and you can restore their values by inserting “LDMIA SP!, {R1-R3}” following the BL instruction.

S4.1.

The short solution: LDR R5, [R4#4] and then STR R5, [R4].

The first instruction might be replaced by two lines:

       ADD R0R4#4
       LDR R5, [R0]
S4.2.
sumList   MOV R1#0           ; R1 should be sum after completing loop
sumLoop   LDR R2, [R0]         ; R1 += R0->value
          ADD R1R1R2
sumNext   LDR R0, [R0#4]     ; R0 = R0->next
          TST R0R0           ; if R0 isn't 0, repeat loop for following node
          BNE sumLoop
S4.3.
       MOV R0#0
       B start
next   LDR R4, [R4#4]
       ADD R0R0#1
start  CMP R4#0
       BNE next