Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JSON parsing passes through and emits Unicode characters that Python's json module can't parse #14

Open
ThinkOpenly opened this issue Apr 1, 2024 · 2 comments

Comments

@ThinkOpenly
Copy link
Owner

In the RISC-V Sail specification, model/riscv_insts_vext_vset.sail:

function clause execute VSETI_TYPE(ma, ta, sew, lmul, uimm, rd) = {
[...]
  /* set vl according to VLMAX and AVL */
  vl = if AVL <= VLMAX then to_bits(sizeof(xlen), AVL)
       else if AVL < 2 * VLMAX then to_bits(sizeof(xlen), (AVL + 1) / 2)
       else to_bits(sizeof(xlen), VLMAX);
  /* Note: ceil(AVL / 2) ≤ vl ≤ VLMAX when VLMAX < AVL < (2 * VLMAX)
   * TODO: configuration support for either using ceil(AVL / 2) or VLMAX
   */

Note that the code uses <=, but the comment uses .

In the JSON emitted by the JSON backend, the latter Unicode character trips up the Python (3.10.12) "json" module:

$ python3 -c "import json; f = open('json.out'); j = json.load(f); print(json.dumps(j),indent=2)"
[...]
json.decoder.JSONDecodeError: Invalid \escape: line 2090 column 1290 (char 403777)

We already use String.escaped before emitting the function text. sigh

@vishisht-dubey
Copy link

Hi @ThinkOpenly what is the expected behaviour would you please elaborate??

@ThinkOpenly
Copy link
Owner Author

what is the expected behaviour would you please elaborate??

I'm not sure exactly what is being asked. There are two places in the RISC-V Sail code where Unicode characters are embedded in the comments within a function clause execute. The JSON backend is, as of very recently, extracting the code in the body of the function verbatim and emitting it in JSON, as an escaped (by OCaml's String.escaped) string. When that JSON is read by Python (3.10.12), those characters result in an error being emitted by the Python JSON decoder, as shown above.

Expected behavior is that the function body code in the Sail specification can be encoded as a JSON string and parsed by Python's JSON decoder without error.

I'm not asserting that the Sail comments are wrong, but it does seem like using different characters between the code ("<=") and the comments ("") could cause confusion, and arguably should be discouraged.

When I replaced the single character by the two-character sequence, the flow did not emit any errors. Unless there is great need for the single-character representation, a simple replacement by the two-character representation may be the best solution. This should happen upstream.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants