ARM stands for Advanced RISC Machine and it was known before as ACORN RISC MACHINE.
The ARM company was founded in 1990 , where its headquarter is in Cambridge in UK.
The company has design centers in Cambridge, Austine and Sophia Antipolis, and other sales and engineering offices spread all over the world.
It is the leading company in Semiconductor IP Industry where the amount of revenues of the ARM company is higher than the revenue of all the semiconductor industry.
The company is best known of producing ARM processors, which are the processors that are used in almost all famous smartphones. The company also fabricates IPs, develop software tools and cell libraries. ARM today has more than one thousand partner companies which form the ARM connected company. So for example, ARM does not produce their own silicon, so they get silicon from their silicon partners. In the ARM connected community, there is also design support partners and software, training and consortia partners.
Some of ARM’s famous applications are smartphones processors, processors for some Laptops, Nitendo, Gameboy and TV systems.
Thumb Instruction set:
Thumb instruction set is provided by the compressing the ARM instruction set from 32 bits long to instructions of 16 bits long. The main goal of this compression is to improve the code density where on average , we can observe a 40% improvement only by performing this compression.
Code Density definition: When the more work is performed by the microprocessor per instruction is done , and the less space each instruction take from the memory, the higher is the code density.
Thumb is not a complete architecture, because it does not cover all the instructions provided by Arm Instruction Set. So there are some instructions in ARM that cannot be found in the Thumb instruction set, and hence, all cores that support Thumb, support ARM instruction set also.
Also, Thumb does not have instructions that aren’t found in the ARM instruction Set.
Another benefit of Thumb is that improve the processor’s performance in some cases, like the case of a processor of a 16 bit data bus, the performance of Thumb is better than ARM, but with the case of a processor of a 32 bit data bus, the performance of ARM is better than Thumb.
This technology is used in memory constrained systems, like smartphones, because for these systems, code density is very important which is provided by thumb.
To switch to thumb mode, the programmer have to just execute the BX (branch exchange) instruction which will set the T bit in the CPSR equal to 1, this will make the processor to work in the thumb mode. If T =0 it is in the ARM mode.
Because of the compression of the instruction to 16 bits, this will force all shift operations to be done in a separate from the ALU so all shift operation are not done in the ALU. In ARM mode, shift operations are embedded in the ALU.
Here we have two codes, one is an ARM code and the other is a Thumb code. Both codes has the same functionality, they perform the same operations, but in the ARM code, as we can see we have 5 instructions, but each instruction is of 32 bits, so 4 bytes. So 5*4=20 bytes. The ARM code will reserve a space of 20 bytes in the memory. In the second code, the Thumb code, we have 6 instruction but each instruction is of 16 bits (because of Thumb) so 2 bytes, so the total memory space that this code will reserve in the memory is 12 bytes which is a lot less than the ARM code. So we can conclude that this Thumb code is a lot denser than the ARM code. In average, A thumb code is 70% of the memory space taken by an ARM code, and consume 30% less memory power than ARM .
For the ADD instruction, this addition adds #3 to the register R0 and put it in R0. This operation takes 2 operands in Thumb and 3 operands in ARM. All in all, most of thumb instruction has a 2 address format, and most ARM instructions has a 3 address format.
In the figure, we can also see the representation of the CPSR in thumb state, where all the letters are in lower case which means these bits are equal to 0, except the T bit is in upper case because it is set to 1, and the SVC which implies that it run in the Supervisor mode.
Most Thumb instructions are not conditional but all ARM instructions are conditional.
r0-r7 fully accessible
r8-r12 only accessible by MOV, ADD, and CMP
r13 sp limited accessibility
r14 lr limited accessibility
r15 pc limited accessibility
cpsr only indirect access
spsr no access
The ARM processor registers set is presented in the above table, so when the processor runs in the thumb mode, thumb instructions as a full access over registers r0 to r7, and a limited access over the registers r8 to r15, and only some few instructions can access these registers like MOV, ADD,CMP…And because there is no direct access to the CPSR and SPSR, some ARM instructions cannot be performed like MSR and MRS ( instruction that saves the CPSR value in a register) which proves that not all ARM instructions has an equivalent Thumb Instruction.
VFP stands for vector floating point, it is a coprocessor ( a processor assists that the Main processor) that is mapped as the coprocessor number 10 and 11 in the ARM processor. It is used to perform arithmetic operations of floating points. So it’s a hardware support that perform operation in half precision, single precision and double precision floating points.
Single-precision floating point: It is number format that takes 4 bytes in the memory
Double-precision floating point: It is number format that takes 8 bytes in the memory
The VFP hardware can perform a wide quantity of operation with single or double precision like MUL, SQRT, COPY etc…. and each of these instructions are executed by the VFP by only a single instruction cycle with the exception of some operations like MUL, it is done by 2 instruction cycles.
So in general , the main purpose of the VFP is accelerating the operations of floating points. Without the VFP, the processor cannot perform these operations.
The most important applications of the VFP are Automotive control applications like Powertrain, 3D Graphics like games, Imaging like Laser printers and still digital cameras and Industrial control systems like motion control. All these apps needs necessarily precision and predictions which is provided by the VFP.
ARM company has introduced three VFP versions. The first version is the VFPv1, it is obsolete, and information about it are provided by ARM company upon request. The second version is VFPv2, it is provided in the ARM 5 and ARM 6 designs, and finally VFPv3 which is available in the ARM 7 and ThumbEE designs, which is with 32 bits or 16 double word registers. This new version adds some new arithmetic operations and is more precise.
NEON instruction set:
NEON technology is name of the addition to the ARM processor which is a 128 bit SIMD extension. This technology is now found in all ARM cortex A processors, and this technology works within its own register files and pipeline.
NEON technology can be run on both ARM and THUMB states which will lead to more efficient software development and integration, compared to using the VFP accelerator. Today VFP is replaced by NEON, so NEON can perform floating points arithmetic operation rather than VFP.
It also accelerate multimedia and Signal processing algorithms like pictures, audio, video, 3D graphics, games and phone telephony by three time faster the performance of the ARM 5 and two times the ARM 6’s performance.
NEON instructions also perform datatype conversion, data processing and accessing memory.
NEON has a 32 registers of 64 bits, or 16 registers of 128 bits as shown in the figure:
In the figure above, the registers are used as elements, and all elements are considered as vectors, where their values are of the same data type. These data type are signed or unsigned. The same operation is performed in all lanes.
void add_int(int * __restrict pa,
int * __restrict pb,
unsigned int n, int x)
unsigned int i;
for(i = 0; i < (n & ~3); i++)
pa[i] = pb[i] + x;
This code is done without NEON. In this code, we have a for loop that will perform 4 iterations serially, it will add pb[i] with X and will put it in pA[i].
So it will perform all 4 iterations one after the other.
Adding NEON technology, this will change the code to this form:
void add_int(int *pa, int *pb,
unsigned n, int x)
unsigned int i;
for (i = ((n & ~3) >> 2); i; i--)
*(pa + 0) = *(pb + 0) + x;
*(pa + 1) = *(pb + 1) + x;
*(pa + 2) = *(pb + 2) + x;
*(pa + 3) = *(pb + 3) + x;
pa += 4; pb += 4;
So with NEON, the same 4 iterations will be done in the same time. Each PB[i] will be put in a register forming a vector of registers containing Pb to pb, and the value X will also be put in a vector of registers, and will add each pb[i] to X and will put the result in a vector of registers containing pa[i].
The NEON technology is today used in ARM cortex A and ARM Mali, which are processors of smartphones, mobile computing devices and HDTVs.
JAZELLE instruction set.
Jazelle is both, a hardware solution and a software solution from ARM. It is really a hardware extension to the processor that is main purpose is accelerating execution environment. The beauty of Jazelle is that this solution provides high performance, with a low cost,a low memory consumption and a low power budget. Its software is a JAVA virtual machine.
Definition of JAVA virtual Machine: because Java does not has a direct access to the machine, so Java uses the virtual machine so it can execute its instructions and send it to the processor.
Jazelle is today found in many ARM processors, it provides high performance to games and Apps and a faster startup time. It also provide an excellent user experience with a very low System cost and a very big range of industry adoption.
To switch to Jazelle mode, the user should execute the BXJ (branch exchange Jazelle) instruction which will set the J bit in the CPSR equal to 1, so in this case the CPSR is in the Jazelle mode.
There are two Modes in Jazelle, the Jazelle DBX and the Jazelle RCT.
First, the Jazelle DBX. The Jazelle DBX stands for Jazelle Direct Bytecode Execution.The main aim of this mode is to allow the processor to execute java bytecodes.
Java bytecodes is the instruction set that is used by Java virtual machines to execute instructions of Java.
Jazelle dbx provides use of java in mobile without affecting the memory consumption, the battery life and the user experience. It also improves the startup time and the user experience. It’s a proven solution, and is integrated with java platforms, and it compilation takes no time. Today manufacturers of smartphones support jazelle dbx to have a faster execution of java games and apps.
Second, Jazelle RCT which stands for Jazelle runtime compilation target.
Its different from Jazelle DBX, and run on THUMBEE mode. It provides AOT and JIT compilation
The JIT (just in time) compilation is a compilation which converts the java bytecodes to normal instructions that can be executed by the processor.
The AOT ( ahead of time) compilation is a compilation that transforms high level languages like java to low level machine codes.
Jazelle RCT made the AOT compilation be more frequent and used more with the JIT compilation.
Finally, Jazelle DBX and Jazelle RCT are like two flavors of jazelle , Jazelle DBX used mainly to save memory space, and Jazelle RCT is used to have a very high performance.
To clarify the mode switching in ARM, assuming we have the following two bits of the cpsr the J bit and the T bit:
0 0 its in the ARM mode
0 1 Thumb mode
1 0 Jazelle DBX mode
1 1 Jazelle RCT or ThumbEE mode
TrustZone technology is known as a System on Chip and a CPU system that provides security to the Processor.
This technology is found on billions of devices in the market as smartphones, tablets and laptops to protect its content. This technology can be used by all ARM processors.
Trustzone in ARM cortex A processor is used to run trusted boot and trusted OS, which will lead to the creation of the TEE.
The TEE is a trusted execution environment is a secure area in the processor made by TrustZone.
All application that run in the TEE are known as the Trusted Applications.
When addint TrustZone to cortex M, we can find a new benefit than other processors that has TrustZones, is that the addition of trustzone provided the ability of switching from the secure area to unsecure area is done in hardware for faster switching and increase in the power efficiency.
TrustZone in software offers protected debug operations.
TrustZone also handles interrupts automatically , and protects registers which assure a high secured system.
Assuming the picture represent a processor of a smartphone, in green is the trusted Area called TEE. This separation in the processor is made by TrustZone using a SAU (secure Attribution Unit). It isolate hardware, software and data from the non trusted Area.
In the green Area, the environment runs in parallel with the operating system, but contain the rich environment of the smartphone. Trustzone use hardware and software to protect its data which provide a high level of security. In case of an application that is running in the trusted area, it can use both trusted and non trusted data, but an application that is running in a non trusted area can only access non trusted data.
A processor with TrustZone is like a normal processor, but it adds a bit to each data traversing the processor, the buses, the memory and cache, where this extra bit is a tag, if this tag is 1 the data is secured, if its 0 it is non secured.
...(download the rest of the essay above)