Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add mnemonic mapping for pseudoinstructions #45

Draft
wants to merge 12 commits into
base: json
Choose a base branch
from

Conversation

Linda-Njau
Copy link

This PR introduces functionality to retrieve the mnemonic for a base instruction associated with a pseudoinstruction.
This lays the groundwork for converting pseudoinstruction data into JSON.

It includes the following key updates:

Parse pseudoinstruction data

Add functionality to parse AST data and store pseudoinstruction-to-base-instruction mappings in a Hashtable.
Example:

LA: [
  assembly(UTYPE(imm[31..12], rd, RISCV_AUIPC)),
  assembly(ITYPE(imm[11..0], reg_name("x0"), rd, RISCV_ADDI))
]

Stored as:

Adding to hashtable with key: LA, id_inner: UTYPE, args_inner_list: [subrange_bits(imm, 31, 12), rd, RISCV_AUIPC]
Adding to hashtable with key: LA, id_inner: ITYPE, args_inner_list: [subrange_bits(imm, 11, 0), reg_name_backwards("x0"), rd, RISCV_ADDI]

Mnemonic retrieval (get_mnemonic)
Add logic to extract a mnemonic by either direct extraction or mapping parameter to argument through map_param_to_arg and argument to mnemonic through map_arg_to_mnemonic.

Direct retrieval:

C_ZEXT_W: c.zext.w, creg_name(rsdc)
Returns: c.zext.w

param-arg-mnemonic:

ITYPE: itype_mnemonic(op), spc, reg_name(rd), sep, reg_name(rs1), sep, hex_bits_signed_12(imm)
Returns: op -> RISCV_ADDI -> addi

Map parameter to argument (map_param_to_arg)
Add function to match parameter with its corresponding argument in the parsed data.

Example:

Input list: ITYPE:  imm, rs1, rd, op
Argument list: [subrange_bits(imm, 11, 0), reg_name("x0"), rd, RISCV_ADDI]
Returns: RISCV_ADDI for op.

Map mnemonic to argument (map_arg_to_mnemonic)
Add function to match argument to its corresponding mnemonic using a predefined enum-mnemonic mapping.

Example:

Matched RISCV_AUIPC with mnemonic auipc.
Matched RISCV_ADDI with mnemonic addi.

These changes enable the process_base_instruction function to complete the first step of converting raw pseudoinstruction data into a JSON by retrieving mnemonics for all associated base instructions.

Each entry in the `base_instructions` Hashtbl uses the pseudoinstruction AST key as the key, with the value being a tuple that contains:

1. The base instruction's type.
2. The base instruction's argument list.

Each base instruction is stored as its own entry.
```
Adding to hashtable with key: LA, id_inner: UTYPE, args_inner_list: [subrange_bits(imm, 31, 12), rd, RISCV_AUIPC]
Adding to hashtable with key: LA, id_inner: ITYPE, args_inner_list: [subrange_bits(imm, 11, 0), reg_name_backwards("x0"), rd, RISCV_ADDI]
```
`process_base_instruction` iterates over `base_instructions` Hashtbl and calls `get_mnemonic` for each entry to retrieve its mnemonic.
`get_mnemonic` retrieves a mnemonic in one of two ways:
1. Parameter extraction:
If a parameter exists in the assembly string, it is extracted and mapped to an argument using `map_param_to_arg`.
The result is then processed by `map_arg_to_mnemonic` to get the final mnemonic.

For example in the assembly string:
```
ITYPE:itype_mnemonic(op), spc, reg_name(rd), sep, reg_name(rs1), sep, hex_bits_signed_12(imm)

```
`op` is mapped to `RISCV_ADDI`, which results in `addi`.

2. Direct mnemonic:
If no parameter is found, the mnemonic is directly retrieved from the assembly string and cross-checked against `assembly_clean` Hashtbl for accuracy.

Example:
```
C_ZEXT_W:c.zext.w, spc, creg_name(rsdc)
```

Returns:
```
c.zext.w
```

Returns `None` if no mnemonic is found in either path.
`map_param_to_arg` gets the input list from `inputs` Hashtbl:
```
ITYPE: imm, rs1, rd, op
```
It then finds the index of the parameter (`op` at index 3) and returns the corresponding argument at that index.

For example, with `op` at index 3, the function returns `RISCV_ADDI` from the argument list `[subrange_bits(imm, 11, 0), reg_name("x0"), rd, RISCV_ADDI]`.
`map_arg_to_mnemonic` looks up a list of enum_mnemonic pairs in the `mappings` Hashtbl.
It then matches the provided `arg` with the enum and returns the corresponding mnemonic.

```
Matched RISCV_ADDI with mnemonic: addi
```
Copy link
Owner

@ThinkOpenly ThinkOpenly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for breaking this PR into a group of smaller commits.
And, it's good work, especially given the complexity! I only have minor comments.

@@ -433,10 +434,51 @@ let parse_funcl fcl =
debug_print ("id_of_dependent: " ^ id);
let source_code = extract_source_code (Ast_util.exp_loc e) in
Hashtbl.add functions id source_code
| Pat_exp (P_aux (P_app (i, pl), _), e) | Pat_when (P_aux (P_app (i, pl), _), e, _) ->
| Pat_exp (P_aux (P_app (i, pl), _), e) | Pat_when (P_aux (P_app (i, pl), _), e, _) -> (
debug_print ("FCL_funcl execute " ^ string_of_id i);
let source_code = extract_source_code (Ast_util.exp_loc e) in
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move this line down to just above Hashtbl.add executes below, since it's not used in the pseudo_of processing.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ThinkOpenly, I intended for the line you're suggesting to move to be part of pseudo_of processing. My thought process is that pat matches similar patterns with different id values: pseudo_of, execute, and pseudo_execute. That's why I nested the id matches within the pat match. Do you think this approach is flawed?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to make sure we're talking about the same line, I was referring to:

  let source_code = extract_source_code (Ast_util.exp_loc e) in

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aah, okay... I had the wrong line. We're on the same page now : )

Comment on lines 454 to 455
List.iteri
(fun index inner_value ->
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you need List.iteri here? I don't see a use of index below.

let get_mnemonic id args_list =
match Hashtbl.find_opt assembly id with
| Some (str :: _) ->
if Str.string_match (Str.regexp ".+(\\(.*\\))") str 0 then (
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be nice to describe what this regex is looking for, or make it a variable with a meaningful name.

Comment on lines 503 to 504
| Some _ -> None
| None -> None
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These could be combined in a catch-all using "_".

Comment on lines 506 to 507
| Some [] -> None
| None -> None
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These could be combined in a catch-all using "_".

let map_param_to_arg id param args_list =
match Hashtbl.find_opt inputs id with
| Some inputl -> (
match get_index param inputl with Some index -> List.nth_opt args_list index | None -> None
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does dune fmt leave this on a single line? If possible, I'd prefer multiple lines here.
Nice use of List.nth_opt! 🙂

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's dune fmt
Thanks : )

Comment on lines 488 to 489
let get_index elem lst =
List.find_map (fun (i, x) -> if x = elem then Some i else None) (List.mapi (fun i x -> (i, x)) lst)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you use List.find_index instead?

)
else None
)
(Hashtbl.find_all mappings (String.lowercase_ascii (id ^ "_mnemonic")))
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is relying on a convention used to identify mappings for mnemonics in the Sail code. Since this is not guaranteed, please add a comment indicating assumptions you are making here.

Also, I see similar conventions which likely violate the assumptions, like:

mapping f_madd_type_mnemonic_D : f_madd_op_D <-> string = {
    FMADD_D  <-> "fmadd.d",
    FMSUB_D  <-> "fmsub.d",
    FNMSUB_D <-> "fnmsub.d",
    FNMADD_D <-> "fnmadd.d"
}

Would it work to use all entries in the "mappings" table?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Decided to go the all entries route for this

Move a misplaced line in `parse_funcl` for logical flow.
Replace `List.iteri` with `List.iter` for function appropriateness, as index is not used.
Update `get_index` to use `List.find_index`, simplifying the implementation while maintaining functionality.
Replace the key concatenation assumption with a fold over the entire `mappings` Hashtbl, ensuring correctness across all cases.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants