2019-11-17 21:02:35 +01:00
2021-11-09 19:41:10 +01:00
2021-11-09 19:41:10 +01:00
2021-11-09 19:41:10 +01:00
2021-11-09 19:41:10 +01:00
2019-11-17 21:02:35 +01:00
2021-11-09 19:41:10 +01:00
2021-11-09 19:41:10 +01:00
2020-09-03 19:03:32 +02:00

About

This is an exploratory project into virtual machines and assembly language. By no means is this ready for production use or particularly well maintained. The language is inspired by x86 and ARM assembly and does very little hand holding.

Checkout the bin/example.wasm example source to get an overview of the language, or keep on reading!

Design

From Text To Runtime Behaviour

In order to turn the source text into executable code we use 3 passes:

  • Pass 1: tokenization (syntax check) and preprocessing (substitution)
  • Pass 2: interpretation (semantics check)
  • Pass 3: execution (runtime check)

After pass 2 ties to the source code are lost, meaning that any error occurring afterwards can be a bit cryptic as to where it originated.

Notation

  • [operation][number type], e.g. divi for divide (div) integer
  • %[register] for addressing registers, e.g. %A
  • $[value] for using immediate (literal) integer values, e.g. $38
  • '[character]' for using immediate character values, e.g. 'r'
  • ; for end of statement (mandatory)
  • [label]: for labels, e.g. loop:
  • #[text] for comments: any text is ignored till a newline (\n) is found
  • [[%register|$value]] for accessing memory, e.g. [$104]
  • Elements must be separated by whitespace character(s)
    • Good: add $2 $5 %A;
    • Bad: add $2$5%A;

Examples

Divide register A by 5 and store the result in register A: divi %A $5 %A;

Increment B until it is 10:

# Set B to zero
addi $0 $0 %B;

loop:
addi $1 %B %B;
lti %B $10;
jmp loop;

Read the integer at memory location 1024 into register A:

seti %A [$1024];

Remember not to use spaces inside the [ brackets.

Reserved Symbols

The following whitespace characters are used to separate symbols:

  • space ( )
  • tab (\t)
  • return carriage (\r)
  • newline (\n)

The following characters are used as identifiers:

  • dollar ($) for immediate (literal) integer values
  • single quote (') for immediate character values
  • percentage (%) for register identifiers
  • colon (:) for jump labels
  • semicolon (;) for statement termination
  • hash (#) for comments
  • square brackets ([ and ]) for addressing memory

Memory Model

The stack, with which you interact through pop/push operations, grows from memory location 0 to the end of the memory. There is no strict checking on whether your own memory operations through [] affect the stack: this is a feature, not a bug. Keep in mind that the stack can underflow and overflow and that the memory uses byte units (8 bits), whereas the registers are all 32 bits wide. This means that reading from location $900 overlaps with 3 bytes when reading from location $901 (the first byte of $901 is the second byte of location $900).

Symbols

All symbols are reserved keywords and can therefore NOT be used as labels. There is currently no strict checking, so be careful.

Preprocessor

All preprocessor directives are prefixed by a #. Ill formed preprocessor directives do not halt compilation, they are merely ignored. All preprocessing is done in a single pass. Recursion or definition of a directive by another directive is not supported therefore.

  • DEFINE <x> [y] replaces any occurrence of the first argument (x) by the second optional argument (y). The second argument can be empty, effectively deleting all occurrences of x. Quotes are currently not supported and arguments are separated by whitespace. If multiple defines exist the later declarations will overwrite the previous.

Registers

All registers are 32 bits wide. The following 4 general purpose registers currently exist:

  • A
  • B
  • C
  • D

Immediates

An immediate integer value for 42 is for examle $42. Negative values are allowed, for example $-42. Notation must be in decimal, hexadecimal and octals are not supported.

The immediate character value for the letter g is 'g'. Character values must be a single character, escaped or multi byte characters are not supported.

Operands

  • addi add the first to the second argument and store the result in the third argument
  • subi subtract the first from the second argument and store the result in the third argument
  • divi divide the first by the second argument and store the result in the third argument
  • muli multiply the first by the second argument and store the result in the third argument
  • shli shift left the first argument by the number of positions given by the second argument and store the result in the third argument
  • shri shift right the first argument by the number of positions given by the second argument and store the result in the third argument
  • seti set the first register argument to the second argument
  • int calls the interrupt specified by the first (integer) argument

Control Flow

  • jmp jump to the label given by the first argument
  • call put the next statement to execute on the stack and jump to the label given by the first argument
  • ret pop the the next statement to execute off the stack, e.g. returning to the next execution statement before calling call
  • lti execute next statement if argument 1 is less than argument 2 else skip the next statement
  • gti execute next statement if argument 1 is greater than argument 2 else skip the next statement
  • eqi execute the next statement if argument 1 is equal to argument 2 else skip the next statement

Memory

  • popi pops the first value on the stack into the register specified as the first argument
  • pushi pushes the value on the stack from the register or immediate value as the first argument

Interupts

  • [0..9] Output to STDOUT
    • 0 put value of register A as ASCII character on stdout
    • 1 put value of register A as decimal integer on stdout
    • 2 put value of register A as hexadecimal integer on stdout
    • 3 put the string pointed at by register A for the amount of characters defined by register B on stdout
  • [10..19] Input from STDIN
    • 10 get a single ASCII character from STDIN and store it in register A
    • 11 get a string of a maximum length determined by register B and store it in the address specified by register A. After execution register B will contain the number of characters actually read.
Description
No description provided
Readme 173 KiB
Languages
C++ 99.8%
Makefile 0.2%