Time Delays

If you've completed the 4-bit counter on the previous page, you'll know that we used a pre-written subroutine, and that we didn't look at the contents of this subroutine. Even at this early stage, we were using "modular programming". You'll find that as you write more code, you'll start to re-use bits of code for common applications and before long, you'll have your own "software library" to call on.

However, it's important to understand the time delays, and as we explain them, some new, important topics will be introduced.

There are two basic ways to create a time delay:

  1. Go around in a loop for a defined number of times.
  2. Wait for a pre-determined event.

We'll examine the first option here. There is an obvious problem with it - while sitting in a loop, the processor can not do anything else, and for some applications, this is unacceptable. Option 2 might allow a solution to this, but it's one for later!

Overview

First, consider how quickly the processor runs. Depending on what version you buy, a mid-range PIC can run up to 20MHz - check the suffix in the part number for the highest speed applicable. For example, a PIC16F84-04 can run at up to 4MHz, whereas a PIC16F877-10 can work reliably up to 10MHz. As you might expect, the faster versions are more expensive, but not prohibitively so.

Note that these processors can run at any speed you require up to the rated speed. So, there's nothing stopping you running the processor at 1Hz if you want! If you have a variable frequency square-wave generator, you can use it as the clock - from an educational point of view, this is something to try when your PIC is running the 4 bit counter. This is possible because the internal memory of the PICs uses static memory technology (as opposed to dynamic memory, that must be "refreshed" frequently for reliable operation).

The clock is divided by 4 (which allows the internal "pipelining" to work), so a 4MHz crystal will result in an internal CPU running at 1MHz. This means that the clock cycle lasts 1 microsecond. Thanks to the highly optimised core, the CPU is able to execute most instructions in just one clock cycle, so each line of your code will execute in just 1 microsecond. This also explains why 4MHz is a popular choice of clock frequency.

Time Delay design

When I wrote these routines, I decided to build in as much flexibility as possible. So, I wrote a subroutine that would take exactly 1 millisecond to execute - I then reasoned that I could call this N times when I required a subroutine of N ms.

In contrast to the last section, I've provided you with written code, and the exercise now is to try and understand it...

Wait1ms

Here is the 1 mS subroutine:

Wait1ms   movlw    d'250'       ;Initial value - tweak if req.
          nop
loop1ms   addlw    d'255'       ;dec W
          btfss    STATUS, Z    ;Zero flag set?
          goto     loop1ms      ;  No, keep looping
          return                ;  Yes, 1ms done
    

This short snippet of code is rather more complicated than it first appears, so lets begin with a simple explanation. The "entry point" is the first line, so by inserting call Wait1ms into your code, a 1ms delay is invoked. Note the label 2 lines down - this is effectively an "internal" label - my convention is to write "entry point" labels in mixed case, and "internal" labels in lower case. Feel free to choose your own method...

This subroutine works entirely with the working register - no other variables are required. It starts by loading 250 into W, and decrementing W until it equals zero. This is detected by the btfss statement. I should say that I found the starting value empirically, so this routine might not be exactly accurate!

Why are we decrimenting the W register? Well, it's much easier this way - we could count up, just like the 4 bit counter did, but we would need all the extra lines needed to do the XOR comparison. This version is more code-efficient, important because programme memory is limited.

Note also that, referring to the instruction set, you'll see that there are no instructions to directly decrement (or increment) the working register, hence the use of addlw.

The curious part of the routine is the addlw d'255', which is supposed to decrement the working register. There are two questions that should spring to mind - first, how does that work, and second, why not use sublw d'1'?

We will answer the second question in more detail later, but suffice to say at this point, the subtracting instructions don't work exactly as you might expect. Let's deal with the first question - how does adding 255 to a number actually reduce it by one?

It's basically because of the magic of binary numbers. Data memory in the PICs is 8 bits wide. An 8-bit binary system can represent a total of 256 different states - 0 up to 255. But for the purposes of this explanation, let's work with a 3-bit binary system - otherwise should you print this page out, it will require a lot of paper!

Decimal Binary
"4" "2" "1"
0 0 0 0
1 0 0 1
2 0 1 0
3 0 1 1
4 1 0 0
5 1 0 1
6 1 1 0
7 1 1 1

Just to be explicit, this table shows the full range of numbers available in our 3-bit binary system. As we only have 3 bits to represent the number, there is no question of displaying 8 or higher here.

So what happens if we are at 7, and add 1? Of course, 7+1 is 8, and 8 in binary is "1000". But that first "1" can't be held displayed by our 3-bit system, so we would only see "000". So, the system will "wrap around". What if we were at 7, and added 2? This is 9, which looks like "1001" in binary. Thanks to wrap-around, we loose the first "1",and see "001". If you want to follow this through to its logical conclusion, try adding 7 to 7 - that gives 14, or "1110". Ignoring the leading "1", that's "110", or 6!

So, the rule is as follows: to decrement a number in a binary system, add N-1 to the number, where N is the total number of possible states. I'll leave it to the reader to satisfy themselves of this - it's a good paper exercise.

Back to the code. Assuming a 4MHz clock, just how long does this subroutine take to execute?

Wait1ms   movlw    d'250'       ;Initial value - tweak if req.
          nop
loop1ms   addlw    d'255'       ;dec W
          btfss    STATUS, Z    ;Zero flag set?
          goto     loop1ms      ;  No, keep looping
          return                ;  Yes, 1ms done
    

The first line takes 1 clock cycle, or 1uS. As does the nop, which means no operation. These two lines are only executed once. But we now get into the loop, and the following will execute 250-1 times.

The first line within the loop, addlw d'255' will take 1uS. The next line will take either 1 or 2 clock cycles. Having previously said that most instructions take one clock cycle, the exceptions are when a branch occurs. A branch is when an instruction changes the normal flow of programme flow - example instructions include goto, call, return and btfss. The exact reasons for this will be explained later!

So most of the time the Z flag is not set, meaning this decision line will execute in 1 clock cycle, and the programme flows normally onto the next line. This, however, is a goto, and it will take two clock cycles. We have a loop that takes 4 clock cycles to execute, and the loop will be executed 249 times. That's 996us, or 0.996ms - pretty good! Actually, the very last time around the loop takes one less clock cycle because the btfss statement bypasses the goto - taking 2 clock cycles instead of the 3 that it plus the goto take - so that's 0.995ms. Add the 2 clock cycles from the start of the subroutine to give 0.997ms.

Sharp-eyed readers will have spotted the final problem here - we haven't included the 2 clock cycles taken by the initial call Wait1ms, or the 2 clock cycles caused by the return. Adding these in gives a total of 1.001ms Removing the nop should make this exactly 1ms - I can't remember exactly why I left it in.

Such obsession to accuracy may seem pointless, but any errors will be magnified by the next routine:

WaitNms

A fixed 1ms delay is not all that useful by itself - the chances are you'll be needing much longer delays - we wanted a whole second for the 4-bit counter. So here is one solution:

WaitNms   movfw    timer_local    ;timer_local=W
loopNms   movlw    d'248'         ;revised value due to extras
          call     loop1ms        ;
          goto     $+1            ;2 clock cycles
          decfsz   timer_local,f  ;Dec loop variable and check
          goto     loopNms        ;  No, keep looping
          return                  ;  Yes, Nms done
    

This introduces yet more new ideas. Firstly, before this routine is called, the required delay in milliseconds must be placed into the working register - this is a simple way of passing parameters from one part of a program to another. Also, it uses the 1ms subroutine, so it's a subroutine that calls another subroutine.

Upon entry, the current value of the working register is stored in a variable called "timer_local". This variable must be declared at the start of the program - check 4bit.ASM to see how this is done. Once we've safely stored this number, we can enter the loop. Note that we are using the Wait1ms subroutine, but we are actually using a different entry point - this is perfectly valid, but could be confusing - try to keep this sort of practice to a minimum! But we need to do this if we want our delays to be accurate - having established that the routine will take exactly 1ms, the other stuff in this loop will add to the overhead. Hence, we are using the routine but with a different starting value.

The goto $+1 is a confusing line, as there is no label called "$+1". Actually, the assembler recognises $ as a special symbol, and it means the actual address that the assembler is placing the line of code. So, it's actually a short-hand way of writing the following:

          goto     next_line      ;
next_line decfsz   timer_local,f  ;
    

And the only reason to do this is to take up two clock cycles! We could have achieved the same thing with two NOP statements, but this would have required more programme memory. Saving just one word of programme memory really is worth doing wherever possible!

It's time to introduce another new instruction. PIC's are superbly thought-out pieces of kit - they might appear awkward and limited at first, but when you start working with them you realise that they are highly optimised, and this instruction is a good example of that.

Refer back to the 4-bit counter, and recall how we decided if we'd got to 15 - we had to XOR the current value of Counter with our target of 16, and check the Z flag to see if we were there yet. This was necessary because we wanted our counter to count up. But often, we need a loop that executes a certain number of times, and we actually don't care if the loop variable counts up or down - other stuff within the loop deals with the outcomes and the loop variable is just that. This subroutine is a good example of that.

For loops that count down, you could decrement the loop variable and check for zero this way:

          decf     Loop, f        ;Loop = Loop - 1
          movfw    Loop           ;Move Loop into W, affecting Z
          btfss    STATUS,Z       ;Is Z set?
          goto     main_loop      ;  No, so keep looping
                                  ;  Yes, all done
    

Hopefully this makes sense - using the decf instruction, we decrement the loop variable and make sure the result is placed back into memory. We then move the loop variable into the working register for no reason other than to affect the Z flag. Note that this is probably not absolutely necessary as the decf instruction affects the Z flag, but personally I've found that it's best to include the extra movfw statements when in doubt - once the program is working, you can try "commenting out" the statements that you think you don't need and test the result.

Next, a bit-test statement is used, followed by a goto statement. This is the same as you saw in the 4-bit counter. However, while this would work, there is a much better way using a rather useful instruction:

          decfsz   Loop, f        ;Loop = Loop - 1. Is Loop=0 yet?
          goto     main_loop      ;  No, so keep looping
                                  ;  Yes, all done
    

This new instruction decrements a file register and skips the next instruction if the result is zero. It effectively combines several instructions into one, making for very fast and compact loop-handling. There is a complimentary instruction - incfsz which increments a file register and skips the next instruction if the result is zero.

So, let's try to work out the execution times exactly.

WaitNms   movfw    timer_local    ;timer_local=W
loopNms   movlw    d'248'         ;revised value due to extras
          call     loop1ms        ;
          goto     $+1            ;2 clock cycles
          decfsz   timer_local,f  ;Dec loop variable and check
          goto     loopNms        ;  No, keep looping
          return                  ;  Yes, Nms done
    

The initial call will take 2 clock cycles, and the first line takes another. When the loop is finished, the branch caused by the decfsz will take 2 cycles, and the return accounts for another 2. So the one-off delays amount to 7uS, but these are perhaps less important when considering that delays of 100ms might be required.

Into the loop, loading W with 248 takes 1uS. How long does the above 1ms delay take with the new entry point and starting conditions? By my reckoning, it should take 991us, including the call and return. The next line takes 2 clock cycles, so the total so far is 994us. The decfsz will take 1 cycle normally, and the goto takes another 2 - total: 997us. To get exactly 1ms within the loop, we need to find another 3us. Increasing the start value by one will give us another 4 clock cycles, so we'd need to lose 1 cycle. Simple - change the goto $+1 for a nop.

From a personal point of view, I wrote these delays many years ago, and I can't quite remember how I decided that the results were accurate. In fact, I wouldn't even claim that they are all my own work, as the tricks used are far to smart for me back then! This recent deconstructions has been useful - with much more experience under my belt, I can see some inconsistencies that crept in. Although this probably won't matter for 99.9% of all applications, it's intellectually satisfying to produce perfectly accurate code wherever possible. So, I will test these one day, using accurate test equipment.

Longer delays

WaitNms gives us a delay of up to 255ms - remember, this is an 8-bit processor, so we can't load anything more than 255 into W. For longer delays, we can simply concatenate the routine. Consider this subroutine:

Wait1Second   movlw    d'250'    ;250mS
              call     WaitNms
              movlw    d'250'    ;250mS
              call     WaitNms
              movlw    d'250'    ;250mS
              call     WaitNms
              movlw    d'250'    ;250mS
              goto     WaitNms
    

This simply calls WaitNms four times, loading 250 into the working register first. Each call takes quarter of a second - remember that the working register will be significantly altered by the subroutine, so we need to reset it before calling WaitNms again. Simple!

And this is how we got a 1 second delay for our 4-bit counter. Hopefully this makes sense. But as you look at the above lines of code, do you notice anything strange? Look at the last line - it says "goto WaitNms", instead of "call WaitNms". So what is going on here? This leads us neatly into the next section - Subroutine explained - but first:

Summary and conclusions

In this section, we've looked in detail at how we can create precise time delays with micro-second precision. We've built on the foundations learned during the 4-bit counter design, and hopefully by continued exposure to assembly language, you're feeling more confident with the subject.

We've learned the following new instructions:

decfsz FileReg, dest Decrement a file register and skip the next instruction if the result is zero. Result is placed in dest.
incfsz FileReg, dest Increment a file register and skip the next instruction if the result is zero. Result is placed in dest.
nop No operation