In the Y86 instruction-set architecture, instruction encoding rely on a little-endian encoding of integers, where bytes appear in reverse order. However, in registers and memory integers are treated as big-endian numbers, with bytes appearing in normal order, which can be a little confusing and potentially tricky if you generate self-modifying code.
Consider how the following yo file shows instruct encoding with little-endian coding.
Now consider the output from running the program.
Note how the instructions were encoded using little-endian, while both registers and storage was represented using big-endian.
Assembler programmers can be fanatics, attempting to squeeze bytes and cycles out of a functioning program.
One common trick of the trade in the Y86 instruction architecture is to avoid the use of immediate move when attempting to set a register to zero, such as it is done, for instance, in the statement irmovl %eax, $0. Instead this assignment of zero to the register is achieved by xor-ing the register with itself, for instance by issuing the instruction xorl %ebx, %ebx, as shown in the program below.
Note from the program that the statement irmovl $0, %eax consumes 6 bytes, while the statement xorl %ebx, %ebx consumes only 2 bytes. Furthermore, presumably, the xorl instruction is also far faster than the irmovl instruction. And yet, the two instructions achieve the same purpose: Setting a register to zero.
Sometimes reading a dump can be confusing — in particular if the reference documentation is wrong.
Most reference information for the Y86 instruction architecture indicates that the nop instruction has opcode 0×00 and the halt instruction has opcode 0×10. However, as shown in the program below, in current implementations of YAS, the opcode for the nop instruction becomes 0×10 while the opcode for the halt instruction becomes 0×00. current implementations of YIS processes these opcodes on the same basis, ensuring correct execution.
A student-client contacted us about a problem that she was experiencing. She claimed to have entered code that was syntax-wise valid, but failed with a peculiar error code.
We recreated the problem, and found that she was correct. We then simplified the failing code, so as to better understand the issue. Here is the simplified code:
Looks good? Here is what happens when YIS crunches the code:
What? So here is the yo code:
And there we see it. Our halt instruction, opcode 0×00, has been inserted at address 0×0000, causing the immediate halt.
Note: if you are wondering why the opcode for the the halt instruction is 0×00, rather than 0×10, read this posting.
So YIS does what it is supposed to, but YAS is misbehaving.
A little experimentation showed us that any statement that is provided on the last line of the ys file and is not followed by a new-line, gets processed by YAS as if though it was entered on the first line in the ys file, and so the solution to our client-students problem was simple. Always terminate ys file with a new-line.
|Lesson learned: For a yo file to be processed correctly by YAS, it should always contain a last line that is blank.|
As we discussed in an earlier posting, assembler programmers can be fanatics, attempting to squeeze bytes and cycles out of a functioning program.
They can also be aestheticians, seeking to create elegant programs. And sometimes they are able to combine aesthetics and efficiency.
When students create their first assembly programs that require looping they tend to count upwards, towards the limit, rather than downwards, towards zero. In the Y86 instruction architecture, however, counting upwards is both inefficient and aesthetically unpleasing compared to counting downwards, because the architecture provides an automatic, neat, and free check for zero each time we execute a subl instruction, setting the zero flag as appropriate.
As we can see from the program below, we can achieve a 20 iteration loop without declaring and testing explicitly on the loop limit, simply by counting down and allowing the architecture to do its thing.
Note that the first loop, created around a count-upwards approach, consume 0×25 bytes, while the second loop, created around a count-downwards approach consume 0×22 bytes, yielding a 8% improvement. Moreover, it looks better, and by eliminating the explicit declaration of a loop limit, we have one less register integrity issue to worry about.
Sometimes faster and smaller truly is more beautiful.
Here is a question that we get a lot:
“The nop instruction is content neutral, the equivalent of doing nothing, so what is its value?”
Some computer instruction sets include an instruction whose explicit purpose is not to change the state of any of the programmer accessible registers, status flags, or memory and which may require a specific number of clock cycles to execute. In other instruction sets, a nop or noop instruction does not exist and has to be simulated by executing an instruction having operands that cause the same effect (on a SPARC processor, for instance, the instruction sethi 0, %g0 is the recommended way of achieving NOP.)
NOP instructions are most commonly used for timing purposes, to force memory alignment, to prevent hazards, to occupy a branch delay slot, or as a place-holder to be replaced by active instructions later on in program development (or to replace removed instructions when refactoring would be problematic or time-consuming). In some cases, nop can have minor side effects; for example, on the Motorola 68000 series of processors, the nop opcode causes a synchronization of the pipeline.
Y86 is related to the Intel x86 architecture. In the x86 architecture, the nop instruction is a one byte instruction with opcode 0×90 (note: i686 allows for longer NOP instructions,) the equivalent of the xchng eax, eax instruction, which also has the 0×90 opcode.
Why the nop instruction matters in Y86
In the Y86 instruction architecture, the nop instruction is a one byte instruction with opcode 0×10, which can be used for the usual purposes, i.e. timing, forced memory alignment, prevention of hazards, occupation of a branch delay slot, or place-holding.
Note: If you are wondering why the opcode for the nop instruction is 0×10, rather than 0×00, read this posting.
If you are running the SEQ implementation of the Y86 instruction architecture you will probably have access to the excellent SSIM simulator, which supports single-stepping of the execution of your assembler program, a critical debugging feature, and, also, enables visualization of the processor activity. Some of those application of Y86 can be considered pretty exotic, but are really very useful in a real-life scenario (to see why this is the case read this postingt about self modifying code or read this posting about error fixing on the fly.)
However, in the SSIM simulator only instructions that generate opcodes appear, and, so, if you have a program label on a stand-alone line, it will not appear in the simulator, which can be somewhat confusing. To see program labels in the SSIM simulator, simply add a nop instruction on each program label line.
In the above example notice how we have attached a nop instructions to each program label (but not to the data-labels). This will cost us an occasional byte, but will increase our ability to debug in SSIM.
|Lesson learned: The nop instruction is highly useful in the Y86 instruction architecture. Issuing a nop instruction on all program labels that would otherwise be empty will ensure that the program label is included in the SSIM simulator’s execution of the yo file, making the program labels visible and, therefore, greatly helping you in debugging your Y86 assembler code.|