Feature Request: Detect and Flag Non-ASCII Characters in Identifiers #3912

konarx · 2025-01-28T14:18:16Z

Feature Request: Detect and Flag Non-ASCII Characters in Identifiers

Summary
Add a StyleCop rule (or rules) to detect and flag identifiers that contain non-ASCII characters (e.g., Greek, Cyrillic), which can be visually indistinguishable from standard Latin letters.

Use Case / Motivation

When coding in C#, developers sometimes inadvertently switch keyboard layouts (e.g., to Greek) and end up typing characters that look identical to standard Latin letters but are actually different Unicode code points. For instance:

public interface ΙMyService // 'Ι' here is Greek capital Iota (U+0399)
{
    // ...
}

public class IMyService : ΙMyService // This won't compile as expected
{
    // ...
}

It’s very easy to end up troubleshooting odd compile errors or references not matching, only to discover a single character is from the wrong alphabet.

A StyleCop rule that flags these occurrences would provide immediate feedback to developers, preventing such subtle bugs.

Proposed Solution

New Rule:
- ID: Suggest something like SA???? (whatever fits StyleCop’s numbering scheme).
- Name: “IdentifiersMustUseAsciiCharacters”
- Category: “Naming” or “Maintainability.”
- Severity: Configurable; default to Warning.
Behavior:
- For each identifier (class, interface, method, property, field, local variable, parameter, etc.), scan the text for any character outside the ASCII range (> 0x7F).
- If found, report a diagnostic indicating which identifier is problematic.
Configuration:
- Allow users to set whether they want to disallow all non-ASCII characters or only certain sets of known homoglyphs (e.g., Greek, Cyrillic, etc.).
- Possibly allow ignoring some characters if needed for legitimate non-English names (but that might be out of scope for a first pass).
Rationale:
- This rule prevents confusion caused by visually identical but semantically different characters, saving time and reducing friction during development.
- Many teams adopt “English-only identifiers” as a best practice to avoid these pitfalls, so providing built-in enforcement aligns with real-world usage.

Potential Implementation Details

Roslyn:

A SyntaxNode or SyntaxToken analysis hooking into SyntaxKind.IdentifierToken.

Perform a quick check:

foreach (char c in identifierText)
{
    if (c > 127) 
    {
        // Report diagnostic
    }
}

Message:
- Something like: “Identifier {0} contains non-ASCII characters and may cause confusion.”
Example:
```
public void ΜyMethod() // This 'Μ' might be Greek capital Mu
{
}
```
The analyzer would produce a warning explaining that the identifier is using a non-ASCII character.

Benefits

Immediate Feedback: Prevents confusion from near-homoglyphs that can break references or cause subtle bugs.
Aligns with Common Practices: Many coding standards advise using only ASCII for public-facing identifiers.
Minimal Overhead: Implementation is straightforward (simple character check).
Highly Configurable: Could provide toggles or whitelists for teams who need exceptions.

Possible Downsides or Considerations

Legitimate Use of Non-ASCII: In some projects, non-English words or domain-specific terminology might be intentionally used. A global rule might cause false positives.
- Mitigation: Provide .editorconfig or rule settings so the user can suppress or allow certain code blocks or whitelisted characters.

Thank you for all the great work on StyleCop Analyzers. We’d love to see this feature to help developers avoid tricky unicode/homoglyph issues in their day-to-day C# projects.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Request: Detect and Flag Non-ASCII Characters in Identifiers #3912

Feature Request: Detect and Flag Non-ASCII Characters in Identifiers #3912

konarx commented Jan 28, 2025

Feature Request: Detect and Flag Non-ASCII Characters in Identifiers #3912

Feature Request: Detect and Flag Non-ASCII Characters in Identifiers #3912

Comments

konarx commented Jan 28, 2025