Skip to content

26bit and 32bit Mode

Alban edited this page Feb 14, 2021 · 5 revisions

26bit and 32bit ARM modes

None of the Acorn computer systems could take Gigabytes of memory.

The original systems shipped with 512K to 4MB - back when 4MB was very expensive. Applications written for RISC OS are seldom very large.

Over time the support for physical memory grew. I recall the RISC PC maxed out at 256MB of memory and 2MB VRAM. Even so it still made sense that these systems used a 26 bit addressing mode. Applications could cheerfully run in the (at the time huge) available address space. If there were many cooperating applications; they all benefited from the larger physical memory.

In 26Bit mode; the hardware in the processor used parts of the program counter (and hence link register) for status registers and flags. This was efficient; but means that these registers are not just addresses.

In the FORTH kernel; the flow of control often refers to the position of the program counter; by examining the link register. As it uses this address from the link register it must mask out the 26bit flags.

DOCOL for 26bit.

labcreate docol
  stmfd rp !, { ip }
  bic ip, link, # &fc000003
next c;

Every regular FORTH word starts with DOCOL. DOCOL saves the IP onto the return stack; and sets IP to the start of the current word. This is a part of the subroutine call mechanism, being run by the callee.

IP is the (interpreter pointer) - essentially IP is the program counter of the FORTH virtual machine. IP is stored on the FORTH return stack (RP is return stack pointer.)

For 26bit code The bic instruction clears the reserved bits in link (the hardware return address) and loads the cleaned address into IP.

In 32 bit mode; there are no reserved bits; the mask would be masking out valid address space.

The reason the 26bit kernel still works in 32bit mode; is that the FORTH dictionary is located in application memory which starts at 32K; also the size of the application area is typically sized as 640K or less. It is improbable verging on impossible that our FORTH IP would find itself high in memory.

None the less 32bit Apps might be harmed - and are not helped in any way by this 26bit code. Also I would assume that the incorrect bit clear function is more complicated than a move.

The docol for 32bit just has a move instruction.

labcreate docol
  stmfd rp !, { ip }
  mov ip, link
next c;

I would have assumed this should be faster but it is not obvious at all in benchmarks.

Best practice for RISC OS

The best practice for RISCOS (ROOL forum advice) is to write code that behaves correctly on both 26bit and 32bit systems.

To do that you need to test for the mode.

Here is a FORTH utility word; that tests the mode the processor is in.

\ util2

\ is the ARM in 32 bit mode.
\ true if running in 32 bit mode false otherwise.

code 32bit?
 stmfd sp !, { tos }
 teq r0, r0
 teq pc, pc
 mvn eq tos, # 0
 mov ne tos, # 0
next c;

: mode? 32bit? if ." 32 " else ." 26 " then ." bit mode " cr ;

The best practice is to always use these test instructions and conditionally either do a mov or bic. This is described by RISC OS developers as a part of being 32 bit clean. Meaning the same compiled App works on either system.

I imagine that this is a fairly rare scenario in most apps while this test happens for every word that runs in Forth; The Forth APP encounters the test code all of the time.

I tested the impact by amending every word using bic on link and the result was significantly slower than either the 26bit or 32bit code.

So however smart an A72 branch predictor is; it is not smart enough for this.

Conclusion

I tried making the kernel 32 bit clean; this involved adding test instructions to a number of words; I tested a 32 bit clean kernel - it was consistently measurably slower.

Although I am interested in running RISCOS 5.28 and above on modern ARM processors; I do not want to prevent earlier systems from also working.

Presently I have split the kernel for 26 bit and 32 bit. There is no significant performance difference between these two kernels. Both kernels work on the 32bit systems that I have tested. Only the 26bit kernel would work on a 26 bit operating system.

I imagine there may be potential scenarios where chopping parts out of the address space could cause issues on a 4GB 32 bit system; but in practice forth code is not likely to use much more than a few megabytes of space starting at 32K.

Other approaches

On x86_64 mixing code and data also has a major performance impact.

Some versions of FORTH do split the dictionary into different code and data sections.

This involves tracking multiple dictionary pointers so obviously is more complicated.