Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[configuration] Force type when doing environment variable substitution #4281

Open
flyfire2002 opened this issue Nov 4, 2024 · 12 comments
Open
Labels
spec:miscellaneous For issues that don't match any other spec label triage:deciding:community-feedback Open to community discussion. If the community can provide sufficient reasoning, it may be accepted

Comments

@flyfire2002
Copy link

flyfire2002 commented Nov 4, 2024

What are you trying to achieve?

When substituting env variables like "00001" (as a string) (e.g. into an attribute), the type of the produced value is automatically Int. I want to be able to force it to type Str at substitution. Note, currently with attribute: convert it is done after the value is already substituted as an Int, so in this case it will be doing Int(1) => Str(1), which is still not the same as Str(00001).

** Context/Example **

With config

  resource/common_labels:
    attributes:
    - key: some_id
      value: "${id}"
      action: upsert

When the env var id = "00001", the value of the attribute "some_id" will be Int(1).
If we do a

    - key: some_id
      action: convert
      converted_type: string

afterwards, the value is Str(1) which is still different from "00001" / Str(00001).

@flyfire2002 flyfire2002 added the spec:miscellaneous For issues that don't match any other spec label label Nov 4, 2024
@jack-berg
Copy link
Member

Seems like this should be transferred to opentelemetry-collector-contrib.

@flyfire2002
Copy link
Author

Filed open-telemetry/opentelemetry-collector-contrib#36183 though I feel like this is not a one-processor-specific problem.

@trask
Copy link
Member

trask commented Nov 5, 2024

hi @flyfire2002, we don't think this is a spec issue since the spec doesn't say anything about collector processors

@trask trask added the triage:deciding:needs-info Not enough information. Left open to provide the author with time to add more details label Nov 5, 2024
@Aneurysm9
Copy link
Member

I think this is indeed a spec issue as the issue is not with collector processors but with how the collector configuration system handles environment variable substitution. The expressed intent there is for the behavior to "support the exact same syntax as the Configuration WG." Here collector processors were only used as the medium for conveying a description of the issue.

If the behavior the Configuration WG expects is for an environment variable containing 0001 to be cast as an integer 1 before any further processing can happen with no mechanism for signaling that it should instead be treated as a string then I think the specification is defective. As an example, AWS account IDs are strings that match the regex ^[0-9]{12}$. That means 012345678912 is a valid AWS account ID that would be impossible to represent correctly in a configuration using environment variable substitution as it will be interpreted as the integer value 12345678912.

@github-actions github-actions bot added the triage:followup Needs follow up during triage label Jan 21, 2025
@trask trask added triage:deciding:community-feedback Open to community discussion. If the community can provide sufficient reasoning, it may be accepted and removed triage:deciding:needs-info Not enough information. Left open to provide the author with time to add more details triage:followup Needs follow up during triage labels Jan 21, 2025
@trask
Copy link
Member

trask commented Jan 21, 2025

@open-telemetry/configuration-approvers can you review @Aneurysm9's #4281 (comment)? thanks!

@jack-berg
Copy link
Member

jack-berg commented Jan 22, 2025

with no mechanism for signaling that it should instead be treated as a string

There are two mechanisms for this. First, if you want to ensure that an ambiguous value like 0001 is interpreted as string "0001" instead of an integer 1, you can put it in double quotes:

# input, with id=0001
key: "${id}" # yields string "0001"

You can see an example of this in the spec here.

Second, you can use YAML tags to coerce the parser to interpret the value as a specific type:

key1: 1 # yields integer 1
key2: !!str 1 # yields string "1"
key3: !!str ${id} # yields "0001"

@Aneurysm9
Copy link
Member

There are two mechanisms for this. First, if you want to ensure that an ambiguous value like 0001 is interpreted as string "0001" instead of an integer 1, you can put it in double quotes:

# input, with id=0001
key: "${id}" # yields string "0001"

You can see an example of this in the spec here.

This does not work. Perhaps this is a defect in the collector's handling of substitutions. I'm also not sure the linked example is illustrating this as I can't tell whether "true" is being coerced to the boolean true and then inserted into the exogenous quotes. This would not have the expected results with an integer type as coercing it to an integer would make 0001 become 1, which was the behavior we observed.

To be more concrete, here's an example of a collector configuration chunk that has the issue:

ACCOUNT_ID=012345678912
  attributes/add_labels:
    actions:
      - action: insert
        key: account_id
        value: ${env:ACCOUNT_ID} # evaluates to `12345678912`

Wrapping the substitution in quotes at this layer of processing has no effect as the input has already been cast as an integer. IOW, this configuration produced the same result, namely a truncated account ID:

  attributes/add_labels:
    actions:
      - action: insert
        key: account_id
        value: "${env:ACCOUNT_ID}" # evaluates to `12345678912`

Wrapping the value in the environment variable in quotes such that they were part of the value prevented integer conversion and truncation, but then resulted in undesired literal quotes in the attribute value:

ACCOUNT_ID="\"012345678912\""
  attributes/add_labels:
    actions:
      - action: insert
        key: account_id
        value: ${env:ACCOUNT_ID} # evaluates to `"012345678912"`

Ultimately, in this particular situation we were able to work around the issue by reading the environment variable with the quotes into a temporary attribute, extracting the desired digits from that temporary attribute, and then deleting the temporary attribute. This is obviously less than ideal and simply not an option in some situations.

  attributes/add_labels:
    actions:
      - action: insert
        key: tmp_account_id
        value: ${env:ACCOUNT_ID}
      - action: extract
        key: tmp_account_id
        pattern: (?<account_id>\d{12})
      - action: delete
        key: tmp_account_id

Second, you can use YAML tags to coerce the parser to interpret the value as a specific type:

key1: 1 # yields integer 1
key2: !!str 1 # yields string "1"
key3: !!str ${id} # yields "0001"

Perhaps this is a collector defect, but I tried this:

ACCOUNT_ID="!!str 012345678912"

and the injected attribute value was !!str 012345678912 and not 012345678912 as expected. This seems to be the desired behavior based on the collector SIG's documentation.

@jack-berg
Copy link
Member

I'm also not sure the linked example is illustrating this as I can't tell whether "true" is being coerced to the boolean true and then inserted into the exogenous quotes.

The string value of the environment variable is substituted in place of the env var reference as is, without coercion. Then, the YAML parser is response for taking the resulting value and resolving the tag type.

For example:

key: "true" # value resolves as type string, tag URI tag:yaml.org,2002:str
key: true     # value resolves as type bool, tag URI tag:yaml.org,2002:bool

This behavior should produce the results you're looking for. Assuming ACCOUNT_ID=0001, I would expect the following:

key: "${env:ACCOUNT_ID}"
# expected
key: "0001" # value resolves as type string, tag URI tag:yaml.org,2002:str

Perhaps this is a defect in the collector's handling of substitutions
Perhaps this is a collector defect

I can't say whether this is intentional design of collector or a defect, but I can say that this is not the intended or actual behavior of declarative config.

ACCOUNT_ID="!!str 012345678912"

I would have expected you to do ACCOUNT_ID="012345678912", and:

key: !!str ${env:ACCOUNT_ID}

That should result in a value resolved to type string, tag URI tag:yaml.org,2002:str.

@Aneurysm9
Copy link
Member

This behavior should produce the results you're looking for. Assuming ACCOUNT_ID=0001, I would expect the following:

key: "${env:ACCOUNT_ID}"
# expected
key: "0001" # value resolves as type string, tag URI tag:yaml.org,2002:str

Yes. That's exactly the behavior I would expect as well. It is not the behavior that was encountered nor is it the documented expected behavior from the collector SIG where they believe they are aligning with this specification.

I can't say whether this is intentional design of collector or a defect, but I can say that this is not the intended or actual behavior of declarative config.

This is worrying, then, as the collector SIG's docs say:

This means we support the exact same syntax as the Configuration WG.

If the collector SIG doesn't correctly interpret this spec, can anyone?

I would have expected you to do ACCOUNT_ID="012345678912", and:

key: !!str ${env:ACCOUNT_ID}

That doesn't work because the ${env:ACCOUNT_ID} value has already been parsed as an integer by the time it is evaluated as part of the larger structure. See also open-telemetry/opentelemetry-collector#8565 (comment) which indicates the expectation that the !!str modifier would need to be in the environment variable value to ensure the environment variable was evaluated to a string.

@Aneurysm9
Copy link
Member

@mx-psi can you or someone else from @open-telemetry/collector-maintainers take a look at this and confirm where the issue lies?

@jack-berg
Copy link
Member

If the collector SIG doesn't correctly interpret this spec, can anyone?

Here's a unit test in the java implementation showing how it resolves types when env var substitution is used, demonstrating with and without quotes: https://github.com/open-telemetry/opentelemetry-java/blob/13e2b8c4dfeba09f361b47986febbd792ab1ae60/sdk-extensions/incubator/src/test/java/io/opentelemetry/sdk/extension/incubator/fileconfig/FileConfigurationParseTest.java#L591-L659

That doesn't work because the ${env:ACCOUNT_ID} value has already been parsed as an integer by the time it is evaluated as part of the larger structure.

Oh I see. It depends on how the implementation substitutes the value and when the type is resolved. In the java implementation a reference, given !!str ${env:ACCOUNT_ID}, the env var reference is replaced as is, without any attempt to interpret its type. After all env vars values are all strings and carry no type. So we end up with !!str 012345678912, which a YAML parser should resolve to type string with tag URI tag:yaml.org,2002:str.

I'd argue this is the correct implementation. It seems like the collector is trying to interpret the type of the env var value and injecting a string representation of the interpreted type. If this is indeed the case, I'm not sure what the advantage of this could be.

@Aneurysm9
Copy link
Member

It seems like the collector is trying to interpret the type of the env var value and injecting a string representation of the interpreted type. If this is indeed the case, I'm not sure what the advantage of this could be.

Oh, it's (better|worse) than that. It's not injecting a string representation of the interpreted type, it is evaluating the contents of the environment variable as a YAML document and injecting the resulting value in the place where the placeholder had been, potentially as a structure in its own right. This is wonderful if you want to have entire chunks of your configuration injected from environment variables. Not so much when you need to ensure a value is treated as a string.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
spec:miscellaneous For issues that don't match any other spec label triage:deciding:community-feedback Open to community discussion. If the community can provide sufficient reasoning, it may be accepted
Projects
None yet
Development

No branches or pull requests

4 participants