Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: Detect and Flag Non-ASCII Characters in Identifiers #3912

Open
konarx opened this issue Jan 28, 2025 · 0 comments
Open

Feature Request: Detect and Flag Non-ASCII Characters in Identifiers #3912

konarx opened this issue Jan 28, 2025 · 0 comments

Comments

@konarx
Copy link

konarx commented Jan 28, 2025

Feature Request: Detect and Flag Non-ASCII Characters in Identifiers

Summary
Add a StyleCop rule (or rules) to detect and flag identifiers that contain non-ASCII characters (e.g., Greek, Cyrillic), which can be visually indistinguishable from standard Latin letters.


Use Case / Motivation

When coding in C#, developers sometimes inadvertently switch keyboard layouts (e.g., to Greek) and end up typing characters that look identical to standard Latin letters but are actually different Unicode code points. For instance:

public interface ΙMyService // 'Ι' here is Greek capital Iota (U+0399)
{
    // ...
}

public class IMyService : ΙMyService // This won't compile as expected
{
    // ...
}

It’s very easy to end up troubleshooting odd compile errors or references not matching, only to discover a single character is from the wrong alphabet.

A StyleCop rule that flags these occurrences would provide immediate feedback to developers, preventing such subtle bugs.


Proposed Solution

  1. New Rule:

    • ID: Suggest something like SA???? (whatever fits StyleCop’s numbering scheme).
    • Name: “IdentifiersMustUseAsciiCharacters”
    • Category: “Naming” or “Maintainability.”
    • Severity: Configurable; default to Warning.
  2. Behavior:

    • For each identifier (class, interface, method, property, field, local variable, parameter, etc.), scan the text for any character outside the ASCII range (> 0x7F).
    • If found, report a diagnostic indicating which identifier is problematic.
  3. Configuration:

    • Allow users to set whether they want to disallow all non-ASCII characters or only certain sets of known homoglyphs (e.g., Greek, Cyrillic, etc.).
    • Possibly allow ignoring some characters if needed for legitimate non-English names (but that might be out of scope for a first pass).
  4. Rationale:

    • This rule prevents confusion caused by visually identical but semantically different characters, saving time and reducing friction during development.
    • Many teams adopt “English-only identifiers” as a best practice to avoid these pitfalls, so providing built-in enforcement aligns with real-world usage.

Potential Implementation Details

  • Roslyn:
    • A SyntaxNode or SyntaxToken analysis hooking into SyntaxKind.IdentifierToken.
    • Perform a quick check:
      foreach (char c in identifierText)
      {
          if (c > 127) 
          {
              // Report diagnostic
          }
      }
  • Message:
    • Something like: “Identifier {0} contains non-ASCII characters and may cause confusion.”
  • Example:
    public void ΜyMethod() // This 'Μ' might be Greek capital Mu
    {
    }
    The analyzer would produce a warning explaining that the identifier is using a non-ASCII character.

Benefits

  • Immediate Feedback: Prevents confusion from near-homoglyphs that can break references or cause subtle bugs.
  • Aligns with Common Practices: Many coding standards advise using only ASCII for public-facing identifiers.
  • Minimal Overhead: Implementation is straightforward (simple character check).
  • Highly Configurable: Could provide toggles or whitelists for teams who need exceptions.

Possible Downsides or Considerations

  • Legitimate Use of Non-ASCII: In some projects, non-English words or domain-specific terminology might be intentionally used. A global rule might cause false positives.
    • Mitigation: Provide .editorconfig or rule settings so the user can suppress or allow certain code blocks or whitelisted characters.

Thank you for all the great work on StyleCop Analyzers. We’d love to see this feature to help developers avoid tricky unicode/homoglyph issues in their day-to-day C# projects.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant