Replies: 2 comments 1 reply
-
Thanks @damip! Overall, I think this conveys the general idea of the IR, but I think we should try to pin down a specification of sorts, or at least, describe in detail the following elements:
Aside from elaborating on those items, I have a few questions/notes on the structure of the IR:
Blocks and Control FlowThe notion of a block here is a bit unusual. Typically, a block in a compiler IR refers to the concept of a basic block, i.e. a sequence of primitive instructions free from control flow. The only time you end up with multiple blocks in the IR is in the presence of conditional branches, i.e. code which should only be executed when a given condition holds, and the branch is taken. This is in contrast to something like While AirScript doesn't really support control flow in the normal sense, it does have some forms of it, both implicit and explicit - notably comprehensions and conditional constraints both express some limited forms of control. What you've described as "block calls" is really an unstructured control flow primitive called an "unconditional branch". Unstructured control flow is very general, and very powerful, but difficult to analyze and optimize as a result of its lack of constraints - this tradeoff is usually worth it for general-purpose programming languages though. Proposal: A Dataflow-Centric AlternativeIn our case though, I think we might benefit from a different style of IR that is more suited to representing limited forms of control, i.e. structured control flow. As part of that, I think it also would be good to make the dataflow graph the focus, rather than the control flow graph (typically, compiler IRs, especially SSA IRs, tend to be CFG-oriented, and use analysis to reify a DFG). There is in fact a formalism for the type of IR I'm describing, called the Regionalized Value-State Dependence Graph (RVSDG), in case you are interested in digging deeper into that, you can also look at MLIR, which has a design that takes a lot of inspiration from RVSDG. I'm not necessarily saying we implement it 1:1, but rather use some of its concepts/primitives for the AirScript IR. The RVSDG can be described as a hierarchical dataflow graph, where regions and various kinds of operations form the hierarchy. For our purposes, we're primarily interested in two kinds of operations: "simple", i.e. primitive operations like arithmetic, loads and stores, etc.; and "structural", operations which introduce new nested regions, e.g. if/then/else expressions, Dataflow in RVSDG is expressed via operations, in the form of:
Inputs to an operation are always the output of some other operation, thus RVSDG is by construction, always in SSA form. In many cases, there is a 1:1 correlation between inputs/operands and outputs/results. An example of when this isn't the case, can be observed with the "Structured" operations also have one or more regions, the semantics of which depend on the specific operation. Regions consist of blocks, which are your typical basic block. A block contains one or more operations. A region always has at least one block, called the entry block, whose parameter list corresponds to the operands for its containing region. For example, a So to make this more concrete, I'm imagining the following IR entities and general structure:
The way in which value definitions and their uses are tracked can vary, but one of the advantages of the structure I described above, is that it makes traversing and manipulating the use-def graph of a function very simple. Implementing it is a bit tricky however. So assuming we have such a basic structure, and assuming all of the primitive ops are pretty self-explanatory, I imagine AirScript IR would have the following "structured" ops:
This would provide a straightforward lowering for the current AirScript syntactic constructs like list comprehensions, comprehension constraints, conditional constraints, etc. A couple of other key benefits:
The above suggestion/proposal doesn't have to be where we go with this by the way, but hopefully it will be useful as a way to spur some discussion on the design we do ultimately end up with. I'd also point to it as an example of the kind of information I'm looking for in the design document, i.e. the specific elements of the IR you've defined, their semantics/relationships, etc. I'm less focused on the specific details of the IR - more so that it is cohesive, has a comprehensive spec, and delivers on the benefits we're hoping to gain from the effort, i.e. that it makes analysis and optimization easier, is more maintainable, and can be extended without significant refactoring (within reason). So long as all of that is accounted for, I'm not too particular about the specific design details - so even though I just brain-dumped a whole bunch of information about a possible design, feel free to discard that in favor of your own approach if you have a clear vision for how to implement it. |
Beta Was this translation helpful? Give feedback.
-
Thanks for the precious feedback @bitwalker! A better explanation of the choicesGuidelinesThe fundamental design guideline we followed is:
As for the "set of target optimizations" we targeted:
Basically the goal is to move as many optimization steps as possible away from AST/AIR and have them applied on the IR itself. AST to IRTo achieve the goal of having a low-vocabulary IR, we thought of eliminating the concept of type and unrolling loops/vectors/lists when passing from the AST to the AIR. Instructions and "blocks"The instructions are regrouped under "blocks" that take arguments and can return values. The instructions are:
More on blocksThe reason why we didn't go for the usual We went for function-like blocks of conditions or values that are semantically likely to be reused by being called multiple times from different places. As for why the arguments of each block start with an The translation rules are essentially:
This has the advantage of explicitly including the multiplications generated by nested Values and constraints within list comprehensions (lists are unrolled on translation to IR), evaluators and functions are all replaced with the unique concept of block. Note that constant propagation automatically eliminates unused Here is an example translation process for a nested match: Input (as AST):
We first expand the First, blocks are created for the
Then the match is replaced within the
We proceed the same way for
The nice thing is that the IR remains readable and keeps a similar structure as the initial code thanks to blocks, despite being very close to the AIR. For IR -> IR optimizations:
Then, to translate to AIR we simply need to expand blocks into the AIR tree and link them according to calls. Questions relative to your proposal and RVSDGWe read the article presenting RVSDG (https://arxiv.org/pdf/1912.05036), and it makes a lot of sense. In particular, the structure you described, and the optimization pass description, would certainly be useful to have during implementation. Given the guidelines outlined above that we used to guide our design proposal, our questions are mainly:
This would be translated into a block, containing an operation with two regions, within which we have blocks containing an operation with two regions (nested structure).
Last considerationsOne more idea. In the end (AIR) we are talking about symbolic math using only additions, constants, subtractions and multiplications. There are several libraries that would allow us to optimize math much better than traditional sequential compilers do, for example by resorting to factoring, algebraic identities and so on. Making sure that after all its optimization passes the IR only resorts to basic arithmetic and is compatible with symbolic algebra frameworks would open a whole new world of optimizations using existing libs directly on the IR. It would also make the IR -> AIR transform straightforward. To wrap up everything, we feel that we could take the structure from RVSDG and apply it with a limited vocabulary (especially if we can unroll the loops during lowering from AST to the IR and work on scalar elements), to take advantage of all this structure and the underlying information to optimize better, while still staying as close to the AIR as possible. We really appreciate your feedback on this, and aside from our planned meetings we can also discuss more about the constraints that guide this design in more depth if needed. |
Beta Was this translation helpful? Give feedback.
-
The goal of this proposal is to create an Intermediate Representation (IR) in order to move some optimizations away from the AST and towards the IR itself.
AirScript code:
IR:
Note that all expressions in the IR are in SSA form
Optimizations (IR -> IR):
Beta Was this translation helpful? Give feedback.
All reactions