Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Macros: support for character and string literals #409

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

sheaf
Copy link
Collaborator

@sheaf sheaf commented Feb 7, 2025

This commit adds support for C character literals and string literals in macros.

Some remarks:

  • The implementation assumes that the source file is encoded in UTF-8. I believe that's the only encoding that the Clang library supports anyway.
  • The implementation doesn't support wide character types, such as char16_t.
  • C character literals are of type int, not char. We distinguish two types of such literals, as per the C standard:
    • code unit: a specific int value, whose interpretation as a linguistic or symbolic character depends on the choice of a text encoding,
    • Unicode code point: a unique numeric identifier for a specific character, whose value (e.g. in a char * array) depends on the text encoding chosen.
  • We translate character literals to CInt, and use unboxed Addr# literals to translate string literals to CStringLen (allowing for the possibility of inner null bytes).

@sheaf sheaf force-pushed the macro-strings branch 2 times, most recently from 0a5e856 to d225e6d Compare February 7, 2025 16:26
@phadej
Copy link
Collaborator

phadej commented Feb 7, 2025

Add examples (tests)

@sheaf
Copy link
Collaborator Author

sheaf commented Feb 7, 2025

Add examples (tests)

Yes, that's the next step. This is a draft.

This commit adds support for C character literals and string literals
in macros.

Some remarks:

  - The implementation assumes that the source file is encoded in UTF-8.
    I believe that's the only encoding that the Clang library supports
    anyway.
  - The implementation doesn't support wide character types,
    such as 'uint16_t'.
  - C character literals are of type 'int', not 'char'. We distinguish
    two types of such literals, as per the C standard:
      * code unit: a specific 'int' value, whose interpretation as a
        linguistic or symbolic character depends on the choice of a
        text encoding,
      * Unicode code point: a unique numeric identifier for a specific
        character, whose value (e.g. in a 'char *' array) depends on the
        text encoding chosen.
  - We translate character literals to 'CInt', and use unboxed 'Addr#'
    literals to translate string literals to 'CStringLen' (allowing for
    the possibility of inner null bytes).
@@ -142,6 +145,7 @@ data SExpr ctx =
| EIntegral Integer (Maybe HsPrimType)
| EFloat Float HsPrimType -- ^ Type annotation to distinguish Float/CFLoat
| EDouble Double HsPrimType
| EString [Word8]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we use ByteArray, or some better representation. [Word8] doesn't feel good.

goChar (C.CharLiteral { charLiteralValue = c }) =
( `Hs.VarDeclIntegral` HsPrimCInt ) <$>
case c of
C.CodeUnit u
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extract into separate functions. Preferably don't invent UTF8 parsing validation, there are plenty in the libs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants