Macros: support for character and string literals #409

sheaf · 2025-02-07T16:15:16Z

This commit adds support for C character literals and string literals in macros.

Some remarks:

The implementation assumes that the source file is encoded in UTF-8. I believe that's the only encoding that the Clang library supports anyway.
The implementation doesn't support wide character types, such as char16_t.
C character literals are of type int, not char. We distinguish two types of such literals, as per the C standard:
- code unit: a specific int value, whose interpretation as a linguistic or symbolic character depends on the choice of a text encoding,
- Unicode code point: a unique numeric identifier for a specific character, whose value (e.g. in a char * array) depends on the text encoding chosen.
We translate character literals to CInt, and use unboxed Addr# literals to translate string literals to CStringLen (allowing for the possibility of inner null bytes).

phadej · 2025-02-07T16:44:34Z

Add examples (tests)

sheaf · 2025-02-07T16:46:04Z

Add examples (tests)

Yes, that's the next step. This is a draft.

This commit adds support for C character literals and string literals in macros. Some remarks: - The implementation assumes that the source file is encoded in UTF-8. I believe that's the only encoding that the Clang library supports anyway. - The implementation doesn't support wide character types, such as 'uint16_t'. - C character literals are of type 'int', not 'char'. We distinguish two types of such literals, as per the C standard: * code unit: a specific 'int' value, whose interpretation as a linguistic or symbolic character depends on the choice of a text encoding, * Unicode code point: a unique numeric identifier for a specific character, whose value (e.g. in a 'char *' array) depends on the text encoding chosen. - We translate character literals to 'CInt', and use unboxed 'Addr#' literals to translate string literals to 'CStringLen' (allowing for the possibility of inner null bytes).

phadej · 2025-02-07T16:51:57Z

hs-bindgen/src/HsBindgen/SHs/AST.hs

@@ -142,6 +145,7 @@ data SExpr ctx =
  | EIntegral Integer (Maybe HsPrimType)
  | EFloat Float HsPrimType -- ^ Type annotation to distinguish Float/CFLoat
  | EDouble Double HsPrimType
+  | EString [Word8]


Can't we use ByteArray, or some better representation. [Word8] doesn't feel good.

phadej · 2025-02-07T16:53:11Z

hs-bindgen/src/HsBindgen/Hs/Translation.hs

+    goChar (C.CharLiteral { charLiteralValue = c }) =
+      ( `Hs.VarDeclIntegral` HsPrimCInt ) <$>
+        case c of
+          C.CodeUnit u


extract into separate functions. Preferably don't invent UTF8 parsing validation, there are plenty in the libs.

sheaf force-pushed the macro-strings branch 2 times, most recently from 0a5e856 to d225e6d Compare February 7, 2025 16:26

sheaf force-pushed the macro-strings branch from d225e6d to 8a6748f Compare February 7, 2025 16:49

phadej reviewed Feb 7, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Macros: support for character and string literals #409

Macros: support for character and string literals #409

sheaf commented Feb 7, 2025 •

edited

Loading

phadej commented Feb 7, 2025

sheaf commented Feb 7, 2025

phadej Feb 7, 2025

phadej Feb 7, 2025

Macros: support for character and string literals #409

Are you sure you want to change the base?

Macros: support for character and string literals #409

Conversation

sheaf commented Feb 7, 2025 • edited Loading

phadej commented Feb 7, 2025

sheaf commented Feb 7, 2025

phadej Feb 7, 2025

Choose a reason for hiding this comment

phadej Feb 7, 2025

Choose a reason for hiding this comment

sheaf commented Feb 7, 2025 •

edited

Loading