How to use Assembler? part 4: Jumps (code branching and looping)
Previously we have covered operations on ALU. Now we will build on that… In the introduction I have mentioned that your CPU can store data within itself in the form of registers. So far we have used registers all the time, so you should be familiar with them by now. I have also mentioned flags. Unlike registers which generally hold numeric values with certain bitwidth (eax~32bit; xmm*~4*32bit; st(*)~80bit), flags are only one bit values (true or false) which indicate some information about the last operation that was preformed.
Most notable flags are:
overflow flag – this flag indicate whether overflow occurred.
zero flag – true if the result of the last operation was 0.
sign flag – indicates whether the result was positive or negative.
and many others…
There is group of operations that take these flags as arguments and may do stuff. In FS some of these operations are present – the conditional jumps. Conditional jump is a operation that reads flags and if they have specific true/false pattern they jump to different part of the code marked with label. Labels are defined by writing LabelName:
How do I specifically update flags? The best way is to use cmp reg,reg; operation. cmp is a short for compare – electronically it preforms subtraction – the same as sub eax,int; but it doesn’t store the result in destination operand. However it updates zero and sign flag. zero flag is true if the two values were equal ( N-N=0 ), so you may consider it “is equal” flag in this context.
Here is the list of conditional jumps in FS:
There is also an unconditional jump (when code reaches it it always jumps). This is not implemented in FS assembler, however you can easily mimic the behavior as seen in the image above.
Now obviously flags are updated only by following operations: add,sub,cmp. SSE operations can’t update flags, because they have 4 channels and what should the flags be, if first value is negative, second positive, third zero and the last one is NaN which is not even a number? ¯\(°_o)/¯
Here is the implementation of fixed sized loop and hop from code component with explanation notes:
Static sized loops and hops work OK in streams and mono4 because they basically only repeat or skip code predetermined number of times. To implement code branching ( if then else – type of thing) and custom sized loops (for loop, while loop repeat until loop) you will have to pick one channel that will be the decisive channel (the one that will be the source of the decision to loop,skip,how many times, etc.). That naturally means that the code will very likely be stream incompatible. Or you may extract the decisive data from a combination of channels (for example to skip code if input in all channels is zero). Here are implementations of most common branching and looping codes (this is also identical or very similar to what compilers produce on most programing languages).
Do not expect this to work simply by copy-pasting it into your code. Each label must have unique name – if two labels are the same, the code will crash. That is another very common crash, especially in more complicated algorithms that involve multiple branching points and nested loops.
Here is another example of code of modified RBJ lowpass filter. Filter itself is usually very CPU friendly, however the part that calculates the coefficients is the CPU eater. In code you usually save CPU by using hop on the coefficient part. With assembly you may use jump to skip the coefficient calculation if all input parameters in all channels do not change. Note that this is pseudo code (the processing parts are DSP code because the assembly version is too long to fit one image:
Now you can see, we have added quite a bunch of code, but notice that the coefficient calculation involves some sine,cosine and division which blast your CPU through the roof. In the “smart hop” code we used only moving subtracting, logical OR and shuffling which are very CPU cheap – possibly even cheaper than a single division. Also notice that we’ve used OR operation. We could have easily used add, but there is a slim chance that the input parameters change the same amount in opposite directions – adding their difference would result in 0 and the algorithm would skip where it shouldn’t. With OR operation on two floats you will most likely get a random blob – but that is OK since we are not interested in HOW the value changed – all we need to know is IF it has changed (the difference is nonzero). OR operation performed on two zeros (both int or float) will result in zero and that is all we need to know.
So far we have covered (almost) everything about assembler in FS that you’ll ever need to know. There is only one topic left – MEM array management. This should normally go to Array management, but we need jumps to setup a failsafe to prevent crashing. In flowstone there is a MEM connector which lets you load wave files and wave tables. Both Code component and assembler have memin input. However, this is a bit “cheaty” way to use mems – Flowstone basically creates an SSE array and copies the contents of the mem into it. You may then handle this “mem array” as normal array – because it basically is a normal array. It however means, that you can only read data from the mem (if you write data to the array it will have effect only in that Code/assembler block – cos’ you’re working with the copy of the MEM).
To truly work with the original content of the MEM you need to use Flowstone primitive called “to int” (or “mem address” in 3.0.5). This primitive will output the size of the mem in bytes and also the address of the mem as 32bit integer. Now it is a little problematic to get 32bit integer loose-lessly into assembler because it receives only float inputs. However you can create float number with same binary structure as given int. Here are two ways to do it:
Can convert the address of a MEM into float (with same binary structure), input that to assembler and use that value as a pointer of array. We also need to make sure the MEM is initialized – otherwise we would attempt to read/write data to nonexisting array = CRASH. We can do so by comparing the address with 0 and -1 which are default values if array is not initialized. To read data from specific place in RAM (directly by address) we may use following operations:
Now it is time to show one trick to make free the ebx register – you may use ebx register if you push it’s value onto the stack before all the code and pop it after all the code. Now I’ll show you how can you transmit whole arrays of data between assembler blocks using the mem:
Off course the code might have been done much more elegantly, but for the sake of education to show how free ebx register it works fine. Notice that the two assembler components are actually not connected – the first one writes data to the mem and the second read data from it. This opens new possibilities in using assembler for transmitting arrays.
The last thing to mention are outline function calls. I will say right away this is not possible without digging into machine code and understanding it. Outline function calls should look something like this:
Call of the function is made by first writing the code of the function, starting with a label and ending with ret; command. This whole part that contains function calls should be jumped. Then inside the code you call the function using call label; command. this command pushes its position onto the stack and jumps to the label. Code of the function is then executed and once it reaches ret; (return) command it pop the position from memory stack and jumps back where the call happened. This ensures the function always knows where to return from the functions. To provide inputs and outputs for the function there are conventions to do that (for example in C convention you push the inputs into the stack in specific order and the outputs will be also pushed on the stack after function is executed). The problem is, call label; command is not supported in FS. Instead call reg; is implemented. The value in the register specifies how many bytes in machine code to jump. Now, it is possible to display the machine code, find the call and the position of first instruction of the function and count how many bytes to jump… but you will have to do that every time you revise the code which is nearly impossible and completely impractical. To make long story short WRITE YOUR FUNCTIONS INLINE.
And that is all there is to say for now… Hope you’ve learned something new and have a good time coding and optimizing in assembly. Cheers…