Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Internal arrays (counter arrays) #213

Open
20 of 26 tasks
cardillan opened this issue Jan 27, 2025 · 7 comments
Open
20 of 26 tasks

Internal arrays (counter arrays) #213

cardillan opened this issue Jan 27, 2025 · 7 comments

Comments

@cardillan
Copy link
Owner

cardillan commented Jan 27, 2025

Internal arrays

Internal arrays ("counter arrays") are planned for release 3.1. Implementing them requires a number of changes.

Status

Arrays (internal/external) - grammar & code generation

  • Array declaration
  • Array initialization
  • Array assignments
  • For-each loop over arrays
  • Parallel for-each loop
  • Passing arrays as varargs

Subarrays (internal/external) - grammar & code generation

  • Subarray specification
  • Subarray assignments
  • For-each loop over subarrays
  • Passing subarrays as varargs
  • Subarray syntax on memory blocks (cell1[32 ... 64])

Error detection and handling

  • Static bound checks (compile time)
  • Static bound checks (optimization)
  • Compiler option for dynamic bound checks
  • Generating bound checks

Optimization/finalization

  • Virtual instruction expansion
  • Adapting unreachable code detection
  • Handling virtual instruction side effects
  • Resolving constant indexes
  • Data Flow Optimization
  • Loop Hoisting
  • Update benefit computation in loop unrolling
  • Array access inlining
  • Updating compiler output to match new operations

Documentation

  • Update readme/syntax
  • Prepare/adapt samples to use internal arrays

Grammar

Declarations

Allowing array variable declarations using the var keyword:

var a[10];                // Uninitialized
var a[] = (1, 2, 3);      // Initialized
var a[3] = (1, 2, 3);     // Initialized
var a[4] = (1, 2, 3);     // Error - size mismatch

For now, always global and with zero-based indexes.

Array declaration creates the following variables:

  • .<array>*<index>: array members, e.g. .array*0 to .array*99.
  • .<array>*rval: transfer variable for reads, e.g. .array*rval
  • .<array>*wval: transfer variable for writes, e.g. .array*wval
  • .<array>*rret: return address from read routine, e.g. .array*rret
  • .<array>*wret: return address from write routine, e.g. .array*wret

The * separator corresponds to a compiler generated variable. It ensures no collision with other variables. Would work even for local arrays.

Subarray selection

Allows to select a subset of the array for later operation. Primarily for passing a portion of the array as a vararg argument, or to use with for-each loop over a portion of the array.

var a[10];
for i in a[5 .. 7] do
    // Loops over a[5], a[6] and a[7] 
end;

The subarray indexes will have to be constant for now. Dynamic ranges might be achievable with for-each loops and perhaps would allow a jump-in-the-middle loop unrolling optimizations. That would be nice.

For-each loops

For-each loops won't use jump tables (the counter array mechanism), but will expand the array automatically into individual elements and loop over them. Leads to way more efficient code than random access.

Current vararg syntax concatenates the vararg list with other arguments. Arrays will behave the same:

var a[] = (1, 2, 3);

// prints "01234"
for i in 0, a, 4 do
    print(i);
end;

var x[] = (1, 2, 3);
var y[] = (4, 5, 6);


// prints "123456"
for i, j in x, y do
    print(i, j);
end;

We need new syntax to express the arrays should be traversed in parallel. Proposal:

var x[] = (1, 2, 3);
var y[] = (4, 5, 6);

// prints "142536"
for var i; var j in x; y do
    print(i, j);
end;

Can be combined:

var x[] = (1, 2, 3);
var y[] = (4, 5, 6);
var z[] = (7, 8, 9);

// prints "127348569"
for var i, j; var k in x, y; z do
    print(i, j, k);
end;

The new syntax allows this as well:

var a[10];

// Bubblesort!
do
    swapped = false;
    for out i; out j in a[0 ... 9]; a[1 ... 10] do
        if i > j then
            x = i; i = j; j = x;
            swapped = true; 
        end;
    end;
while swapped;

Code generation

A new virtual instruction will be made for array access, modeled after read and write (value is a standard input/output parameter to be freely optimized by DFO):

readarr value array index
writearr value array index

Instructions will be resolved into a local code + jump table (a read/write jump table shared among all reads/writes).

For each array, the following jump tables will be generated:

# Read jump table
label readLabel
set .array*rval .array*0
set @counter .array*rret
set .array*rval .array*1
set @counter .array*rret
...

# Write jump table
label writeLabel
set .array*0 .array*wval
set @counter .array*wret
set .array*1 .array*wval
set @counter .array*wret
...

Virtual instructions will be resolved into this code:

# readarr value array index
set .array*rret labelX 
op mul *tmpXxx index 2
op add @counter readLabel *tmpXxx
label labelX 
set value .array*rval

# writearr value array index
set .array*rret labelY
set .array*wval value 
op mul *tmpYyy index 2
op add @counter writeLabel *tmpXxx
labelY

The multijump instruction (op add @counter) in the expanded code needs to be aware of implicit reads/writes to the transfer variables and communicate this to the DFO. A new mechanism for this will be needed.

Optimizations

Unoptimized cost: 6 instructions per read/write.

  • Only the very basic optimization: readarr/writearr with constant index will be replaced by a set value <array variable>/set <array variable> value.
  • A new optimization phase EXPANDED executed after the ITERATED phase if there are array access instructions.

Between ITERATED and EXPANDED phases jump tables will be generated and instructions will be resolved. Then another round of optimization iterations will happen. The current optimizations should then improve the code like this:

  • The assignment of the return addresses will be hoisted out of loops if possible (this is why we have a separate read and write return address).
  • Computation of the jump index will be performed just once in the read-then-write scenarios.
  • The array transfer variables for reads will be propagated to expressions down the line when possible. (Current optimizations probably won't propagate array transfer write variables into previous instructions, a specific "opposite direction" Data Flow optimization might be needed for that.)

In the EXPANDED phase, replacing dynamic access with static access probably won't be possible anymore (certainly not in 3.1).

Possible future optimizations

Future optimizations (not in 3.1) - optimizations for speed (performed on cost/benefit basis):

  • Array access inlining: a specific jump table will be generated for given array access instruction. That way the return address will be baked in (so no need to set .array*rret/.array*wret), and the transfer variable will be replaced directly by the value being read/written. Reduces cost to 4 instructions per access.
    • Inlining should only be applied inside loops/functions that access the array at multiple indexes. A single read-then-write access in a loop will be optimized to nearly the same performance in the un-inlined form, and won't bloat the code if there's another access to the same array at different place.
  • Array access fusing: if two or more array access instructions (of different arrays) are used together with the same index, they can be replaced by a fused instruction, saving the costs of out-of-line calls of all accesses except the first. Always performed together with inlining. Reduces cost to 3+n instructions, where n is the number of instructions fused. Complex, as the resulting instruction needs to be aware of all input/output variables.
    • A non-inlined fused jump table could be used as a jump table for accessing the last item in the array, saving instruction space - a possibility but not planned at the moment.
    • More complex fusing is possible, e.g. fusing access to two adjacent elements of the same array: optimizes cases like array[i] + array[i + 1].
  • Array code injection: when a value is read, updated and written again (think array[i]++), the update can be injected to the jump table. Saves 3 instructions.
    • More complex injection is possible, e.g. comparing and swapping two adjacent elements of the array (bubblesort?).
@cardillan
Copy link
Owner Author

cardillan commented Jan 28, 2025

Additions:

Array assignments

Assigning an array to another array will assign its elements. Subarray syntax will allow assigning parts of the array, or even shifting the array up or down (needs to be made in reverse order when shifting up). Code will be generated for direct assignment from element to element. In the future, a mechanism for indirect access (a for loop) might be generated, suitable for large arrays where instruction limits could be hit.

var a[10], b[10];
a = b;

// Shift all values towards the beginning, add new value at the end
a[0 .. 8] = a[1 .. 9];
a[9] = newValue;

External arrays

Allows to allocate an external array from the heap (in 3.1, support for declaring an external variable with a storage different from heap will also be added, but it's not part of this issue). Unless the array starts at index 0, this incurs the cost of adding the base index to the element index, but this might get optimized away, as the base index will be constant.

Array assignment between internal and external arrays will also be supported.

Array assignment between two external arrays might be encoded using a loop.

Bounds checking

A new compiler option for bounds checking. Possible values:

  • None: no checking. Undefined behavior when accessing wrong index
  • Simple: two conditional jumps around a stop instruction which gets triggered if out of bounds.
  • Described: two conditional jumps around a print statement containing error message and a stop instruction.

When an out-of-bounds access is detected at compile time, a compile error will always be triggered. A way to find the position of the error in optimized (say, loop unrolled) code might need to be implemented. Currently, temporary variables do not hold source position. They should be assigned source position of the expression they correspond to.

@JeanJPNM
Copy link

Will internal arrays be passed by value to functions?

@cardillan
Copy link
Owner Author

cardillan commented Jan 28, 2025

At this moment, only global arrays are planned, and no support for passing them as function arguments, except into inline functions.
Passing arrays as vararg arguments is very easy (there's no support to randomly access elements of varargs as of now) and I'll probably be able to implement this in 3.1.

I don't plan having support for passing arrays by value in any way. All other problems apart, creating copies of arrays is costly both in execution time and instruction space. Is there a particular use case for passing arrays by value?

Support could be relatively easily added to pass arrays into inline functions as general parameters of an array type. However, as inline functions are compiled anew for each call, no element copying will take place and the arrays will effectively be passed by reference. This will get implemented at the same time as variable types, or shortly thereafter.

I have a hazy idea of emulating "pointers to arrays" which could be stored in variables and passed around to functions, something similar to function pointers. It's a plan, but a distant one.

Edit: for the "array pointers", it might be better to interweave the read and write jump tables. To read, a jump to baseAddress + 4 * index would be executed, and a jump to baseAddress + 2 + 4 * index for write, so just one pointer would need to be stored in such a variable. Hm. Maybe I'll layout the jump tables like this to be ready for this possibility.

@cardillan
Copy link
Owner Author

I have an early prototype - and it works! Yay! (Implementing the code generation was pretty straightforward - the compiler rewite definitely paid off. Optimizations will be worse.)

#set loop-unrolling = none;
const SIZE = 3;

var a[SIZE];

for i in 0 ... SIZE do
    a[i] = i;
end;

for i in 0 ... SIZE do
    println(i, ": ", a[i]);
end;

# Compiled code

set :i 0
set .a*wret 5
set .a*w :i
op mul *tmp4 :i 2
op add @counter *tmp4 27
op add :i :i 1
jump 1 lessThan :i 3
set :i 0
print :i
print ": "
set .a*rret 13
op mul *tmp5 :i 2
op add @counter *tmp5 20
set *tmp3 .a*r
print *tmp3
print "\n"
op add :i :i 1
jump 8 lessThan :i 3
printflush message1
end
set .a*r .a*0
set @counter .a*rret
set .a*r .a*1
set @counter .a*rret
set .a*r .a*2
set @counter .a*rret
end
set .a*0 .a*w
set @counter .a*wret
set .a*1 .a*w
set @counter .a*wret
set .a*2 .a*w
set @counter .a*wret
print "Compiled by Mindcode - github.com/cardillan/mindcode"

(There's no optimization of internal arrays whatsoever yet. They get resolved after all optimizations are done.)

For comparison, almost the same syntax can be used to create external arrays:

#set loop-unrolling = none;
allocate heap in cell1[10 ... 20]; // Non-zero index, forces index computation
const SIZE = 3;

external a[SIZE];

for i in 0 ... SIZE do
    a[i] = i;
end;

for i in 0 ... SIZE do
    println(i, ": ", a[i]);
end;

# Compiled code

jump 0 equal cell1 null
set :i 0
op add *tmp3 :i 10
write :i cell1 *tmp3
op add :i :i 1
jump 2 lessThan :i 3
set :i 0
op add *tmp5 :i 10
print :i
print ": "
read *tmp6 cell1 *tmp5
print *tmp6
print "\n"
op add :i :i 1
jump 7 lessThan :i 3
printflush message1

@cardillan
Copy link
Owner Author

Code after applying some (already existing) optimizations to it:

const SIZE = 3;
param LIMIT = SIZE;

var a[SIZE];

// This loop will be unrolled
for i in 0 ... SIZE do
    a[i] = i + 1;
end;

// Using LIMIT prevents loop unrolling
for i in 0 ... LIMIT do
    print(a[i]++);
end;

// Verify the values were updated
print(a[0]);
print(a[1]);
print(a[2]);

// Outputs 123234

# Compiled code

set LIMIT 3
set .a*0 1
set .a*1 2
set .a*2 3
set :i 0
set .a*rret 10
set .a*wret 13
jump 16 greaterThanEq 0 LIMIT
op mul *tmp6 :i 2
op add @counter 21 *tmp6
op add *tmp4 .a*r 1
set .a*w *tmp4
op add @counter 27 *tmp6
print .a*r
op add :i :i 1
jump 8 lessThan :i LIMIT
print .a*0
print .a*1
print .a*2
printflush message1
end
set .a*r .a*0
set @counter .a*rret
set .a*r .a*1
set @counter .a*rret
set .a*r .a*2
set @counter .a*rret
set .a*0 .a*w
set @counter .a*wret
set .a*1 .a*w
set @counter .a*wret
set .a*2 .a*w
set @counter .a*wret
print "Compiled by Mindcode - github.com/cardillan/mindcode"

Notes:

  • Initialization loop got unrolled and replaced with static access.
  • Both set .a*rret 10 and set .a*wret 13 got hoisted out of the loop.
  • Jump offset op mul *tmp6 :i 2 gets computed only once and reused for write access.
  • .a*r gets propagated into print .a*r.
  • There's still problem with op add *tmp4 .a*r 1; set .a*w *tmp4 - the goal is to have these two instructions merged, but it isn't happening now. I don't immediately see the reason, so it's going to be a debugging season...
  • print(a[0]) prints the array element directly (this is done by the code generator actually, not by the optimizer).

This also illustrates the benefits of having different transfer variables for reads and writes: if it weren't so, .a*r would have to be copied to a temporary variable for printing. There might be use-cases where the read value could be written right back without modification (say, if the update was conditional), but I assume these will be less frequent than use cases where the written value is always updated.

Instructions now hold information about side effects (i.e. variable read/written implicitly by the instruction without being part of the argument list). I'll need to make sure this new information is used at all places. Currently the optimization code is pretty convoluted, unfortunately.

I'm also aware there is a possibility to perform induction variables optimization on i and *tmp6 (i.e. compute LIMIT * 2 and increase i by two in each iteration, avoiding the op mul *tmp6 :i 2 instruction altogether), but this is actually a separate issue planned for later. It will be quite useful to all loops accessing arrays by index.

@cardillan
Copy link
Owner Author

cardillan commented Jan 31, 2025

I'm kinda bragging here, so please forgive me: @A4-Tacks created a program computing a Pascal Triangle as a compiler benchmark (decsription in English).

When adapting it to use internal arrays - like this:

const TRIANGLE_SIZE = 10;

noinit linked message1;

var previousLine[TRIANGLE_SIZE];
var currentLine[TRIANGLE_SIZE];

begin
    println("1");
    println("1 1");

    var lineLength = 2;

    previousLine[0] = 1;
    previousLine[1] = 1;

    for var i in 3 .. TRIANGLE_SIZE do
        currentLine[0] = 1;
        for var j in 1 .. lineLength do
            currentLine[j] = previousLine[j - 1] + previousLine[j];
        end;
        currentLine[lineLength] = 1;
        lineLength++;

        for var c in 0 ... lineLength do
            print(currentLine[c]);
            previousLine[c] = currentLine[c];
            if c < lineLength - 1 then
                print(" ");
            end;
        end;
        println();
    end;

    printflush(message1);
end;

it evaluates everything at a compile time, producing

print "1\n1 1\n1 2 1\n1 3 3 1\n1 4 6 4 1\n1 5 10 10 5 1\n1 6 15 20 15 6 1\n1 7 21 35 35 21 7 1\n1 8 28 56 70 56 28 8 1\n1 9 36 84 126 126 84 36 9 1\n"
printflush message1

It can be done because the triangle size is fixed.

Compile-time evaluation is one of Mindcode strengths, but it's impact in real-life programs isn't as profound, so the benchmark is a bit unbalanced. A version using external arrays (memory cells) would still unroll all loops, but wouldn't compile-time evaluate expressions based on memory cells. When further declaring TRIANGLE_SIZE as a parameter instead of a constant, loops couldn't be unrolled and the resulting code would be a wee bit worse than BangLang, since BangLang does not re-read currentLine[c] from external memory in previousLine[c] = currentLine[c]; That's an optimization Mindcode currently doesn't have.

@cardillan
Copy link
Owner Author

I wanted to implement passing arrays into vararg functions, but ended up just expanding arrays into the parameter list, as is already the case with varargs themselves. As a result, it is possible to pass arrays into any function. Eg:

var a[3] = (1, 2, 3);
print(a);   // prints "123"

allocate heap in cell1;
external b[3] = (1, 2, 3);
print(max(b));  // prints "3"

compiles into

print "123"
jump 1 equal cell1 null
write 1 cell1 0
write 2 cell1 1
write 3 cell1 2
read *tmp0 cell1 0
read *tmp1 cell1 1
op max *tmp3 *tmp0 *tmp1
read *tmp2 cell1 2
op max *tmp3 *tmp3 *tmp2
print *tmp3
printflush message1

The function call mechanism will be completely rewritten with type system. However, all vararg functions (which includes built-in ones, such as print or max) will continue to accept arrays.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants