Oct 28, 2015
Reaction score
First Language
Primarily Uses
Event programming
or that time I simulated an ARM processor with events on RMVX Ace


So, I think this one warrants a bit of context.

I'm not sure how this is on the international community, but I come from the brazilian RPG Maker community, and, at least to me, there seems to exist a big hesitancy from people there to learn scripting, often with the justification that it is too hard. People would actively avoid learning Ruby (and, more recently, JS) and make all sorts of systems with events (while occasionally throwing around script calls, mostly copied from somewhere); these would range from custom menus using pictures (which are mostly fine, to be honest) to whole battle systems (ABSs being specially common).

This always bugged me a little. The resulting systems were not bad per se (some were pretty awesome, actually), but more often than not they were insanely complicated, with a lot of duplication (I swear, one guy made a digital clock system that had a conditional for each minute of the day), and could be hugely simplified with some abstraction, which events unfortunately do not allow much of (and which scripting languages are specifically designed for). Making such systems with events (and especially making them look good and be usable) seems even harder than learning some programming and then writing them as scripts, at least to me.

With this in mind, I made this post [pt-br] a while back. It was mostly intended as a joke, pretending to be an introductory tutorial to "event programming", which then quickly derailed into radix math and then low level programming with C, kind of as a way of poking fun at that idea that events were easier to work with and that programming was some kind of dark magic that only the chosen ones could understand (in the end it might've ended up becoming more mystifying of programming than anything, though; my bad).

The end result was a number sorting program compiled from C with GCC (with an ARMv6 backend) and run inside an RMVX Ace game, all purely via events (no script calls involved), which I honestly think is pretty cool (albeit not really useful for making actual games, at least not directly).

I don't think translating the post word-by-word would be appropriate, because it was very snarky originally, and specifically targeted at a community I was a part of for a long time, so this post is going to have a more serious and direct tone. I still think this is relevant to post here because I learned some things in the process, which I'd like to share.

Throughout this post I'll be assuming that you have at least some understanding of programming; I'm not going to spend much time explaining how stuff works, because that's not really the point here. I'll try my best to offer some additional reading material for the interested reader that might not be familiar with some particular concept used here; ideally, the reader should have at least a basic understanding of programming, some math (especially radixes) and how CPUs generally operate. The goal here is not to explain these, but to use them to simulate a processor on RMVX Ace with events only.

Final product

So, just to give a little taste of what the final product looks like, this is the system working:

The first few seconds there, when nothing happens, I'm actually locked into the event while it loads the program (which is encoded as a sequence of variable assignments and common event calls) into memory. This takes some time because writing stuff into memory is kinda expensive (we'll see why later).

Some time later, the event "asks" for a numeric input (this is done by executing a LOAD instruction reading from a specific part of memory, which is mapped as a "peripheral device"). This corresponds to how many numbers I'm going to input after (i.e. the size of the array it's going to sort).

It then asks for numbers three more times (corresponding to the number I gave in the first prompt), and stays "silent" for some time (while I run around it to show that it is doing that in parallel).

When it finishes processing (sorting the list), it then shows three messages in sequence (again, this is done via a STORE instruction, writing into a specific part of memory): the numbers I gave it, but in ascending order.

It's clear that it's not the most efficient thing ever, and it would probably be much easier to just emulate the same behavior with event commands directly. But where's the fun in that?

System overview

You can download the complete system, with the sorting program loaded, here: Dropbox. I'm going to use it as reference throughout this post. It'd be kinda unwieldy to show the whole implementation here; it's much easier if you download it and have it by your side while you read this post, this way I can give a higher-level explanation of what each part does, and you can get a more concrete notion of how it's done by looking directly into the implementation.

We are going to simulate an ARMv6-M processor (architecture reference is available here). More specifically, the registers and execution state (section A2.3 in the documentation), memory architecture (chapter A3) and instruction set (A4, A5 and A6, and some details spread around the rest of the document). This ignores some aspects of the processor architecture (notably, exception and interrupt handling, privileged execution, and caches), but should be enough to be able to run most simple programs.

I chose ARMv6-M because it has an open specification, and it's a relatively simple processor, with a fairly reduced and very simple instruction set (no turing-complete MOV instructions on this one), which makes it easier to simulate.

The system will consist essentially of a bunch of common events, each of which does some aspect of the function of the CPU. There are two main "sections" of common events: memory-related (read/write from ROM, RAM and peripheral) and instruction-related (decoding and execution). The system also uses some variables: one for ROM, one for RAM (yes, the ROM and RAM are both a single variable each; more on that later), one for each register, and some auxiliary variables used by the different processes. Processor flags are represented by switches.

The next two sections will describe the general idea behind the memory-related events and instruction-related events, respectively. The last section will then show how a program is compiled and loaded into an event.


Personally, I think the most interesting part of the system is how it represents and manipulates memory. The method shown here also relates to why this is only possible on RM XP through VX Ace, and not on the more recent versions (yes, it has to do with Ruby).

There's two key concepts to explore here: Bignums (hence why we want Ruby-based engines, which offer automatic conversion into Bignums for numeric variables) and radixes.

How so? Well, memory in computers is usually a big array of bytes (at least conceptually, in practice that's not really true, but for our purposes this serves well as an abstraction). Likewise, numbers in positional notation are essentially arrays of digits. It's not hard then to see that numbers are essentially a generalization of memory! In particular, memory is just a number in base 256 (because each byte has one of 256 values, like a base-256 digit).

This Numberphile video exemplifies this concept well:

The idea here is to encode our memory as a single (very big) number, and then everytime we need to store or load a byte (or many bytes) we just run the algorithm that gets or sets a digit on the number.

We kind of need to be able to do this to simulate memory minimally effectively. We can't afford to go around creating one variable for each position in our memory, because they're limited to just 5k, and it would be hell to select which variable to set and read from a given position (also, reading multiple bytes becomes less convenient).

Anyhow, the algorithms for getting and setting a position in memory are the following:

Rich (BB code):
load(M: bignum, addr: int) -> byte
    exp = 1                    // start with 256^0 = 1
    addr times do    // multiply by 256 until we get 256^addr
        exp *= 256
    return (M / exp) mod 256   // divide by the exponent and take modulus with 256
store(ref M: bignum, addr: int, new: byte)
    // first we read the value, the same way we did on load
    exp = 1
    addr times do
        exp *= 256
    old = (M / exp) mod 256
    // subtract the old value times the exponent to make the corresponding digit 0
    M -= old * exp
    // add the new value times the exponent, setting the digit to the new value
    M += new * exp

The actual implementation has some much needed optimizations (in particular for sequential access), and could have even more (e.g. doing exponentiation by squaring might be nice, and we could take advantage of alignment when reading more than a single byte). These functions also only read/write bytes (8 bits), while the implementation has common events for reading/writing half-words (16-bit) and words (32-bit) too, but they shouldn't be hard to generalize.

One unfortunate aspect of this implementation is that it is slow. Bignum operations are not the cheapest things, and we need a fair amount of them, on fairly big numbers, to do our memory manipulation. Sadly, I couldn't think of a better way of doing this. One possible solution would be to implement some kind of cache, which I didn't really look into much at the time (because it was not really essential, to be honest).

You can find this implemented on the following events in the provided complete implementation:
  • [041: (AUX) Find Address] Does the "address finding" job, i.e. calculates the corresponding exponent for a given address. It is used as a subroutine of all the reading/writing common events.
  • [002-006: (ROM) Read/Write Byte/Half-Word/Word] [011-016: (RAM) Read/Write Byte/Half-Word/Word] Do the remaining work of subtracting and adding values into the memory (ROM, which contains programs, and RAM, respectively) using the calculated exponent.
One thing to note is that the ARMv6-M is a memory-mapped architecture, and as such when programs access memory they actually do it with a different address than the actual address in memory. For instance, the RAM actually "starts" at address 0x60000000, i.e. when a program reads from address 0x60000100, it's actually reading from address 0x100 (256) in RAM.

This also allows programs to communicate with peripheral devices (which, in our case, are message boxes and number inputs, i.e. the player) by reading from a specific range of addresses in memory (in our case, 0x40000000).

The peripheral device behavior is handles by these events in the provided implementation:
  • [031-036: (PER) Read/Write Byte/Half-Word/Word] Reads and writes bytes into the "peripheral devices". It shows messages with values written into it, and shows numeric inputs when read (it then returns the value it gets to the LOAD instruction as if it was written in that position in memory).
And the memory mapping is handled by these events:
  • [011-016: (MEM) Read/Write Byte/Half-Word/Word] Reads and writes bytes into "memory". It maps addresses accordingly, forwarding processing to the appropriate common event (ROM, RAM or PER, depending on the address range). This is the event that's actually used by the simulated CPU to run LOAD and STORE instructions.
And that's pretty much it, as far as memory goes. Now we turn to the actual ISA implementation.


This is the party where we go and actually simulate the processor. It's also waaaaay boring.

You see, most of what a CPU does is brainless, "manual" work:
  • Load an instruction from memory (this is done for us by the common events in the previous section);
  • Decode it (which is basically a long sequence of conditional statements, with some occasional bit manipulation);
  • Run it (which usually involves only very simple tasks, like adding numbers, comparing them, doing jumps, etc.);
  • Repeat.
(of course, actual, modern CPUs have very involved processes for each of these stages of execution, which is very important for performance, but not really necessary for our purposes here, so I'll ignore them)

The common events responsible for the CPU simulation in the implementation are the following:
  • [191: (CPU) Step] Does a single iteration of the process described above; it calls the next common events as subroutines as needed.
  • [192-203: (CPU) Group 1-7] Handles decoding of a given group of instructions. Instructions are split into groups according to similarities in how they're encoded (which usually has to do with them having related functionality, too);
  • [101-176: (CPU) <Instruction>] Simulates the execution of an instruction. Admittedly, I didn't implement some of the instructions that weren't needed for the example here; their implementation is left as an exercise to the reader, I guess? :kaoswt:
And that's it. Most of the implementation of these events is not really that remarkable, and writing them was mostly grunt work of reading the documentation, seeing how instructions are encoded (section A6), doing the decoding and implementing the instructions (the documentation has some useful pseudocode that can be used as reference to implement the more complicated ones).

Some auxiliary events used in the implementation of certain operations are pretty interesting though. Namely:
  • [045: (AUX) 2-Complement] Computes the two's complement of a given number (making it "negative in binary"). This is useful for subtraction (there are no negative bytes in memory!), as one might guess.
  • [046: (AUX) Not] Computes the binary negation of a given number. This is useful for (you guessed it), computing the binary negation of a given number.
  • [050: (AUX) Sign Extend 32] Extends a signed number (a.k.a. sexting; I know, lmao) to 32-bits. This is useful when you have to convert signed bytes or half-words into signed words (which happens for some instructions).

Compiling and loading code

Ok, so far we've got a functional CPU simulator, but that's no good if we have nothing to run on it.

Ideally, we'd also not really want to manually write ARMv6 assembly code and encode it to load into our system. So let's do something a bit more sophisticated: we'll compile C code into our target architecture, and then just use the compiler toolchain to get our binary code that the processor understands. This process is quite similar to how one would do it for an embedded system (which, not by coincidence, is the target of the ARMv6-M architecture).

For this section, I'm assuming you have access to a system running Ubuntu. If you're on Windows 10+, try WSL if you haven't already. It's usually pretty decent, and definitely good enough for our purposes.

So here's what you're going to need to do this:

sudo apt-get update -y
sudo apt-get install -y gcc-arm-none-eabi binutils-arm-none-eabi libc6-armel-cross libc6-dev-armel-cross

Each of these packages provide a component necessary to use GCC with an ARMv6 back-end (i.e. make GCC generate binary code targeting our processor architecture). In particular, gcc-arm-none-eabi will install the GCC toolchain for the specific architecture (by default, the commands are the usual gcc, objdump, objcopy, etc., but with the prefix gcc-arm-none-).

Now, let's get started. Here's the code we'll be compiling:
// file: sample.c

#include <stdint.h>

#define RAM(offset) ((uint8_t*) 0x60000000 + (offset))
#define STACK_BASE RAM(1024)
#define MEM(offset) (STACK_BASE + 1 + (offset))

volatile uint32_t* peripheral = (uint32_t*) 0x40000000;

// Read a word (32-bits) from the mapped peripheral device (executes the
// numeric input command and returns the value given)
inline static uint32_t input() {
    return *peripheral;

// Writes a word (32-bits) into the mapped peripheral device (executes the
// show message command with the given value)
inline static void output(uint32_t word) {
    *peripheral = word;

// Insertion sort
void sort(uint32_t* arr, uint32_t n) {
    for (int i = 1; i < n; i++) {
        uint32_t key = arr[i];

        int j;
        for (j = i - 1; j >= 0 && arr[j] > key; j--) {
            arr[j + 1] = arr[j];

        arr[j + 1] = key;

uint32_t* elements = (uint32_t*) MEM(0);

void _start() {
    uint32_t n = input();

    for (int i = 0; i < n; i++) {
        elements[i] = input();

    sort(elements, n);

    for (int i = 0; i < n; i++) {

The code will read a given amount of numbers as input, sort them, and then display them in order. There's a few things to unpack here before we proceed with compiling it. From the beginning:

#include <stdint.h>

#define RAM(offset) ((uint32_t*) 0x60000000 + (offset))
#define STACK_BASE RAM(1024)
#define MEM(offset) (STACK_BASE + 1 + (offset))

These macros are here to help us refer to our special memory mapped location without having to calculate them every time:
  • RAM calculates the mapped address for some address in memory;
  • STACK_BASE is the stack base pointer; it points to address 1024 in the RAM, meaning that we have a reserved stack of size 1024 bytes (the stack grows backwards);
  • MEM calculates the mapped address for some offset in memory that we can use (i.e. not the stack).
Notice that I'm also including the standard header stdint.h. This is OK: as long as the compiler knows how to get the code for the header, it should be able to compile it into our target architecture just fine (in this case, it doesn't even need to generate any code, it's just grabbing some type aliases). Just don't go including huge libraries with humungous functions (looking at you malloc), otherwise you'll be waiting a long time for your program to even load (remember, we're dealing with a potato here :kaoswt2:).

Further down, there's the code that deals with the input and output of the program. Notice that it simply reads from and writes into memory at the peripheral mapped address:

volatile uint32_t* peripheral = (uint32_t*) 0x40000000;

// Read a word (32-bits) from the mapped peripheral device (executes the
// numeric input command and returns the value given)
inline static uint32_t input() {
    return *peripheral;

// Writes a word (32-bits) into the mapped peripheral device (executes the
// show message command with the given value)
inline static void output(uint32_t word) {
    *peripheral = word;

One important detail here is that peripheral is marked as volatile; this disables some compiler optimizations that could otherwise break the program logic.

Right below, we see our sorting algorithm:
// Insertion sort
void sort(uint32_t* arr, uint32_t n) {
    for (int i = 1; i < n; i++) {
        uint32_t key = arr[i];

        int j;
        for (j = i - 1; j >= 0 && arr[j] > key; j--) {
            arr[j + 1] = arr[j];

        arr[j + 1] = key;

This is a pretty standard implementation of insertion sort. I chose it for the sorting algorithm because it is simple, does relatively few memory accesses (which are very expensive here) if done right, and it performs well for small lists (which is what we're going to have any chance to be sorting in an acceptable amount of time, anyway).

Other than that, there's nothing much special here.

Finally, the entrypoint of our program:
uint32_t* elements = (uint32_t*) MEM(0);

void _start() {
    uint32_t n = input();

    for (int i = 0; i < n; i++) {
        elements[i] = input();

    sort(elements, n);

    for (int i = 0; i < n; i++) {

The implementation of the _start function is pretty standard C code. There's no special meaning to the function name, by the way, it's just something we can use to identify where the program starts later.

Notice that elements is just some pointer into memory. If you're used to programming in C on a higher level, you might be worried that this could cause a segmentation fault or something like that; but that's of course not the case, because we don't have an operating system to get in our way here. The memory is all ours, to use as we please! :kaopride:

One detail here that you might have also noticed is that elements is a global variable. This is because it is constant, and having it be local on _start would make it part of the stack (which requires writing into memory, which takes time and make the program bigger). Making it global makes the program faster and smaller, which is a big win here. But please don't go making variables global everywhere when this is not a concern :kaosigh:

With the code explained, we can finally go and compile it.

Well... almost. First we need to tell the linker (the thing that links what the compiler spits out and turns it into an actual program) what our system looks like (memory mapping, stack pointer, that sort of stuff), because it's not really something very standard.

This is done with a linker script:
/* file: */

    ROM(rx) : ORIGIN = 0x00000000, LENGTH = 512M
    RAM(rwx) : ORIGIN = 0x60000000, LENGTH = 512M

    . = ORIGIN(ROM);
    .text : {
    } > ROM

    .data ALIGN(4) : {
        _data_start = .;
        . = ALIGN(4);
        _data_end = .;
    } > ROM

    _stacktop = ORIGIN(RAM) + 1024;
    _data_load = LOADADDR(.data);

This basically tells the linker that we have two regions in memory: ROM (from 0x00000000 to 0x1fffffff) and RAM (from 0x60000000 to 0x7fffffff); the program (here, the .text section) should go into ROM, as should the static data (.data). Also, the stack has 1024 bytes and is part of the RAM.

With this, we can finally compile our program (for real this time):
arm-none-eabi-gcc -nostdlib -Wall -O2 -Wl, -march=armv6 -mthumb sample.c -o sample.elf
arm-none-eabi-objcopy -O binary sample.elf sample.bin

This generates a file sample.elf, which contains the binary of our program, along with some extra data required by the ELF file format. We then extract only the program code into a file sample.bin with arm-none-eabi-objcopy.

You can see the resulting assembly code with arm-none-eabi-objdump, like this:
arm-none-eabi-objdump -d sample.elf --visualize-jumps

Which should give you something like this:
sample.elf:     file format elf32-littlearm

Disassembly of section .text:

00000000 <sort>:
   0:                b5f0       push    {r4, r5, r6, r7, lr}
   2:                4684       mov     ip, r0
   4:                2901       cmp     r1, #1
   6:   /----------- d916       bls.n   36 <sort+0x36>
   8:   |            0003       movs    r3, r0
   a:   |            3b04       subs    r3, #4
   c:   |            0089       lsls    r1, r1, #2
   e:   |            0006       movs    r6, r0
  10:   |            2700       movs    r7, #0
  12:   |            1858       adds    r0, r3, r1
  14:   |  /-------> 6875       ldr     r5, [r6, #4]
  16:   |  |         0039       movs    r1, r7
  18:   |  |         0033       movs    r3, r6
  1a:   |  |  /----> 001a       movs    r2, r3
  1c:   |  |  |      ca10       ldmia   r2!, {r4}
  1e:   |  |  |      42ac       cmp     r4, r5
  20:   |  |  |  /-- d904       bls.n   2c <sort+0x2c>
  22:   |  |  |  |   605c       str     r4, [r3, #4]
  24:   |  |  |  |   3b04       subs    r3, #4
  26:   |  |  |  |   3901       subs    r1, #1
  28:   |  |  \--|-- d2f7       bcs.n   1a <sort+0x1a>
  2a:   |  |     |   4662       mov     r2, ip
  2c:   |  |     \-> 3604       adds    r6, #4
  2e:   |  |         6015       str     r5, [r2, #0]
  30:   |  |         3701       adds    r7, #1
  32:   |  |         42b0       cmp     r0, r6
  34:   |  \-------- d1ee       bne.n   14 <sort+0x14>
  36:   \----------> bdf0       pop     {r4, r5, r6, r7, pc}

00000038 <_start>:
  38:          b5f8             push    {r3, r4, r5, r6, r7, lr}
  3a:          4f0d             ldr     r7, [pc, #52]   ; (70 <_start+0x38>)
  3c:          683d             ldr     r5, [r7, #0]
  3e:          6878             ldr     r0, [r7, #4]
  40:          6829             ldr     r1, [r5, #0]
  42:          2900             cmp     r1, #0
  44:   /----- d00f             beq.n   66 <_start+0x2e>
  46:   |      008c             lsls    r4, r1, #2
  48:   |      0003             movs    r3, r0
  4a:   |      1826             adds    r6, r4, r0
  4c:   |  /-> 682a             ldr     r2, [r5, #0]
  4e:   |  |   c304             stmia   r3!, {r2}
  50:   |  |   429e             cmp     r6, r3
  52:   |  \-- d1fb             bne.n   4c <_start+0x14>
  54:   |      f7ff ffd4        bl      0 <sort>
  58:   |      cf09             ldmia   r7!, {r0, r3}
  5a:   |      18e4             adds    r4, r4, r3
  5c:   |  /-> cb04             ldmia   r3!, {r2}
  5e:   |  |   6002             str     r2, [r0, #0]
  60:   |  |   42a3             cmp     r3, r4
  62:   |  \-- d1fb             bne.n   5c <_start+0x24>
  64:   |  /-> bdf8             pop     {r3, r4, r5, r6, r7, pc}
  66:   \--|-> 2100             movs    r1, #0
  68:      |   f7ff ffca        bl      0 <sort>
  6c:      \-- e7fa             b.n     64 <_start+0x2c>
  6e:          46c0             nop                     ; (mov r8, r8)
  70:          00000074         .word   0x00000074

This also gives us the exact position of the entrypoint on the generated binary code (here, 0x38, the address of the _start function).

We can then get the hexadecimal representation of that binary with hexdump:
hexdump -e '16/1 "%02x " "\n"' sample.bin

The output should be something like this:
f0 b5 84 46 01 29 16 d9 03 00 04 3b 89 00 06 00
00 27 58 18 75 68 39 00 33 00 1a 00 10 ca ac 42
04 d9 5c 60 04 3b 01 39 f7 d2 62 46 04 36 15 60
01 37 b0 42 ee d1 f0 bd f8 b5 0d 4f 3d 68 78 68
29 68 00 29 0f d0 8c 00 03 00 26 18 2a 68 04 c3
9e 42 fb d1 ff f7 d4 ff 09 cf e4 18 04 cb 02 60
a3 42 fb d1 f8 bd 00 21 ff f7 ca ff fa e7 c0 46
74 00 00 00 00 00 00 40 01 04 00 60

And we're done! Now all that's left is to copy the output of hexdump and the entrypoint position into the "Load binary code" script given in the implementation, and it should automatically configure the event of ID 1 on map 1 of your project to load the code of your program (just make sure that such event exists).

Once it's loaded, running the program is as simple as calling the common event [(CPU) Step] until the switch [Halt] is ON :kaojoy:

Final words

Overall, I was (and am) very happy with the result of this undertaking. It's pretty cool being able to compile C programs and run them on RPG Maker without a single line of script (with scripts it becomes kinda trivial, you can even make programs and run them on the fly, as I demonstrated in another early post of mine [pt-br]).

I guess this also shows just how powerful events can be, and how much you can do with them alone (not that you should). Limiting myself to them also made me have to think of some creative solutions to problems that would never have been even a thing had I been using a scripting language, so I think it also counts as a cool brain teaser.

One funny consequence of this is that events are Turing complete (I mean, we probably could've guessed, but it's nice to have an actual proof by construction, right?), and thus it's impossible to decide if a given event ever halts, in general. I guess this shows that events are at least as complicated as scripts...?

My point here is, events are pretty cool, but please don't overuse them. And VX Ace is still the superior engine.

Thank you for your time.:kaohi:
Last edited:


Keeper of the Nuts
Global Mod
Mar 2, 2012
Reaction score
First Language
Primarily Uses
I really got a kick out of this.

I know you've give a step-by-step, so it's kind of a tutorial. But I think it's more for general interest and I really can't see anyone actually using this in a game, so I'm going to move it to General Discussion (where I also think it might get more views and discussion).

If you protest vehemently or other mods believe it should be a tutorial, we can move it back.

I've moved this thread to General Discussion. Thank you.


Latest Threads

Latest Profile Posts

imgur sure is getting weird, one day I lose gradually all my images, the other I get them back randomly and then again they disappear again.
Despite OPT2 praise it still has some strange stories in it. Lady Mikka is trying to work herself to death because of guilt. Guilt over what? ¯\_(ツ)_/¯ So instead of finding a NPC to have a heart to heart with her they decide the cure is a new kimono. So when she drops dead she'll at least be well dressed. I haven't even got to the strange part yet.
Did so much work on the game today. I wish I could post it all in this status update but there is a character limit of course haha. I thought about making a topic for updates, though... maybe.
The most recent sign that I am old. I have done martial arts for over 4 decades. Never HAD to stretch out. Good, of course, but never required. Was doing some kicks in the kitchen because, why not, and I felt a pop in the back of my thigh. Now I am limping around. Gotta love getting old.
One of the biggest perks of being fluent in English is how many new and interesting recipes I am able to find and try out that I would have hardly come across if I just spoke my mothertounge :3

Forum statistics

Latest member