Subroutines

We have already used subroutines in our example 4-bit counter. They hopefully made some sense at the time, but as is often the case, there is slightly more to them that at first appears. So now is a good time to look in more detail, uncovering some essential facts along the way.

How does a program run?

We've made the assumption that one line executes after another, in a logical fashion - and this is quite correct some or most of the time. But what actually happens inside the PIC?

It's time to introduce the concept of a program counter. This is a register that is reset to zero at power-on, or at reset, and always contains the address in program memory of the next instruction to be executed. As the next instruction is fetched from program memory, this register is automatically incremented. The only exception to this rule is when the instruction causes a branch.

The program counter is available to us - we can write to it at will, and as we'll see later this opens the door to some very powerful techniques. But for now, refer to the memory map (below), and note PCL, which is at memory address 02h - it's also mapped to 82h in Bank 1.

  Bank 0 Bank 1  
00h (0) Indirect addr Indirect addr 80h (128)
01h (1) TMR0 OPTION 81h (129)
02h (2) PCL PCL 82h (130)
03h (3) STATUS STATUS 83h (131)
04h (4) FSR FSR 84h (132)
05h (5) PORTA TRISA 85h (133)
06h (6) PORTB TRISB 86h (134)
07h (7)     87h (135)
08h (8) EEDATA EECON1 88h (136)
09h (9) EEADR EECON2 89h (137)
0Ah (10) PCLATH PCLATH 8Ah (138)
0Bh (11) INTCON INTCON 8Bh (139)
0Ch (12) 68 GPR'S
General
purpose
registers)
Mapped to Bank 0 8Ch (140)
7Fh (127)     FFh (255)
       
  - Not implemented  

Just to clarify things, a short program is shown in table form below to highlight how PCL relates to commands in memory. This simple program flashes an LED connected to RB0 (bit 0 of PORTB - pin 6) - only it's not very useful as it stands because there is no time delay, and so it will flash far too quickly to be visible to the naked eye. But, that's not important here - we just need something that is easy to follow so that we can concentrate on the significance of the PCL.

Memory
location
        PCL
    ORG 0   Reset vector 0
           
0   clrf PORTA ;all of porta low 1
1   clrf PORTB ;all of portb low 2
2   bsf STATUS, RP0 ;change to bank1 3
3   clrf TRISA ;all of porta outputs 4
4   clrf TRISB ;all of portb outputs 5
5   bcf STATUS, RP0 ;back to bank0 6
          6
6 Main_loop bsf PORTB, 0 ;Set RB0 - LED on 7
7   bcf PORTB, 1 ;Clear RB0 - LED off 8
8   goto Main_loop ;and repeat... 9 6

This table demonstrates the behaviour described above. Remember, PCL always points to the next command to be fectched and executed. At startup, it starts at zero and changes to 1 as soon as the first instruction has been fetched. This simple pattern repeats until the PIC gets to the goto in memory location 8 - PCL automatically changed to 9 as the fetch occurred, but upon decoding the instruction and realising it was a "goto", it changed PCL to equal 6. It knew that 6 was the required value because the assembler noted the fact that Main_loop is a label for memory location 6. In actual fact, this simple modification of the program counter is all that a goto instruction does.

It's not quite so simple...

Data memory is 8 bits wide, which means it can store a total of 256 different values. And PCL exists in data memory, which means that it's only able to refer to 256 different locations in program memory. This might be OK for small PICs, but the PIC16F84 has 1024 words of program memory, which needs 10 bits. These extra bits are stored in a register called PCLATH (0Ah). So, while PCL stands for program counter low - PCLATH stands for program counter latch (holding).

As the latter name suggests, it's slightly more complicated than that. Strictly speaking, PCLATH is not the high bits of PC (program counter). Rather, it's a holding register that we can set up prior to doing a computated goto. We will look at this later in much more detail.

Subroutines

We have seen already that we can jump to a subroutine using a "call" statement, and that the program will return to where it left off when the processor meets a "return". So how does "call" differ from "goto"?

Memory
location
        PCL
           
10   incf Counter, f ;Increment Counter 11
11   call Output ;Output 12 100
12   movfw Loop ;W = Loop 13
           
100 Output movfw Counter ;W = Counter 101
101   movwf PORTB ;Output W to portb 102
102   return   ;done... 103 12

This table shows two snippets of code taken from a larger program. The first 3 lines are an extract from the main program, where a variable called Counter is incremented, then a subroutine called Output is called. This subroutine simply writes the current value of Counter to PORTB. The actual code is not really important, but the program flow is.

The first line shown above happens to be in memory location 10, hence PCL is pointing to 11, the next location. When the processor fetches and decodes this next instruction, it realises that it is a "call". As before with "goto", the PIC will load PC with the new address, which is 100 in this case. But before doing this, it will store the current value of PCL somewhere safe - this is how it knows where to go when it meets a "return". Understanding this is key!

This "safe house" is called a stack, and this is a commonly found construct in microprocessors and software engineering in general. It is a simple buffer where the last number "pushed" onto the stack will be the first number taken from the stack. Think of a simple pile of playing cards on your desk. Just don't shuffle them!

This "last-on, first off" behaviour is what allows us to "nest" subroutines. You saw this when we looked at the timing routines - it is perfectly acceptable to call a subroutine from within a subroutine:

PROCESSOR STACK:
0 1 2
Loop movlw d'100'        
  call WaitNms        
    WaitNms movwf timer_local    
      etc...    
      call loop1ms    
        loop1ms addlw d'255'
          etc...
          return
      etc...    
      return    
  rest of code...        

This diagram shows a simplified program that creates a 100ms delay using the delay routines studied before. Starting at the top left, the main program puts 100 into the working register and calls WaitNms. At this point, the return address is pushed onto the stack, and the stack pointer is incremented, moving us across the page. We then enter the WaitNms subroutine, represented by the different background colour. When the processor meets "call loop1ms", it pushes another return address onto the stack, and jumps to another subroutine. At the end of "loop1ms", the processor meets the first "return" statement, and "pops" the last number from the stack. This moves us left, as the stack pointer is decremented. The processor will meet the next return, and again will retrieve the return address from the stack, and we're back in our main program.

The stack is a separate section of memory - it is not in program memory or data memory, and we can not access it in any way. It is entirely managed by the processor during subroutine operations. This is in contrast to some other processors that let you store your own variables there.

VERY IMPORTANT

The stack is subject to one major limitation - it can only contain 8 locations!

This means that you CAN NOT nest more than 8 subroutines. It is vitally important that you understand this point. Subroutines are an excellent way of making code somewhat structured and logical, but use them carefully! Unfortunately, there is no mechanism within the processor to tell you that you have exceeded the maximum number of locations.

Worse than that, the stack operates as a circular buffer - this means that when you push the ninth location onto the stack, it wraps around and overwrites the first location in the stack! Using the example above, imagine that the WaitNms subroutine somehow managed to make calls to many nested subroutines and used too many locations in the stack - when the processor meets the return, it will try to recover the return address from the stack and find the wrong address because it was overwritten during the subroutine. At this point, the program will return to the wrong point and probably crash!

This is especially frustrating with larger programs because the point where the code misbehaves will likely be quite separate to the part of the code that you might be working on at a given point in time. For example, when I was working on the hi-fi pre-amp, I remember working on code related to a menu option which seemed to work until I backed out of the menu structure and tried to return to the main screen - at which point the program crashed because there was one "call" too many in the menu option routine.

When writing your program, it's essential to match every call with a return - don't leave a subroutine by jumping out to another part of your code. It's really bad practice at the best of times, but you might get away with it when you have a larger stack, but PICs are especially unforgiving!

Optimising subroutines

This next tip might seem somewhat esoteric at this stage, but it's surprising how frequently you'll be able to implement this, so I don't see any reason to not mention it at this stage. It might appear to contradict the advice give in the previous paragraph, but providing you're careful, it's a useful technique.

Wait1Second  movlw    d'250'     ;250mS
             call     WaitNms    ;
             movlw    d'250'     ;250mS
             call     WaitNms    ;
             movlw    d'250'     ;250mS
             call     WaitNms    ;
             movlw    d'250'     ;250mS
             call     WaitNms    ;
             return
    

This subroutine generates a 1 second delay by calling WaitNms four times with each time taking 250 milliseconds. Note that the last time we call WaitNms - when returning from the WaitNms subroutines, we meet a return statement, and the Wait1Second subroutines ends.

Let's replace the last three lines with these two:

             movlw    d'250'     ;250mS
             goto     WaitNms    ;
    

When the processor arrives at the goto statement, it will jump to the WaitNms subroutine instead of calling it. At the end of the WaitNms subroutine, the processor will meet a return. This return actually serves to mark the end of the Wait1Second subroutine - the value on the top of the stack is the return address for whatever called Wait1Second in the first place.

PROCESSOR STACK:
0 1 2
Start bsf PORTB,0        
  call Wait1Second        
    Wait1Second movlw d'250'    
      call WaitNms    
        WaitNms movwf
          etc...
          return
      movlw d'250'    
      call WaitNms    
        WaitNms   movwf
          etc...
          return
      movlw d'250'    
      call WaitNms    
        WaitNms movwf
          etc...
          return
      movlw d'250'    
      goto WaitNms    
    WaitNms movwf    
      etc...    
      return    
  rest of code...        

This diagram attempts to show this. Note how the subroutines are called normally the first three times, but for the last time, the program just jumps to WaitNms. This shortcut has saved a program memory word, and a couple of processor cycles. Although it might not seem like a big deal, I promise you that you'll be able to use this often, and the savings soon stack up.

Incidentally, some years after realising you could do this, I stumbled across an article written in an old magazine from the 1980s discussing the concept. And it has a name: tail call.

Summary and conclusions

In this section, we've looked in detail at the mechanism of gotos, subroutines, and the program counter. We will return to these subjects in the future.