Skip to main content
GitHub

PRE30-C. Do not create a universal character name through concatenation

The C Standard supports universal character names that may be used in identifiers, character constants, and string literals to designate characters that are not in the basic character set. The universal character name \U nnnnnnnn designates the character whose 8-digit short identifier (as specified by ISO/IEC 10646) is nnnnnnnn . Similarly, the universal character name \u nnnn designates the character whose 4-digit short identifier is nnnn (and whose 8-digit short identifier is 0000 nnnn ).

The C Standard, 5.1.1.2, paragraph 4 [ ISO/IEC 9899:2024 ], says

If a character sequence that matches the syntax of a universal character name is produced by token concatenation (6.10.5.3), the behavior is undefined.

See also undefined behavior 3 .

In general, avoid universal character names in identifiers unless absolutely necessary.

Noncompliant Code Example

This code example is noncompliant because it produces a universal character name by token concatenation:

Non-compliant code
#define assign(uc1, uc2, val) uc1##uc2 = val

void func(void) {
  int \u0401;
  /* ... */
  assign(\u04, 01, 4);
  /* ... */
}

Implementation Details

This code compiles and runs with Microsoft Visual Studio 2013, assigning 4 to the variable as expected.

GCC 4.8.1 on Linux refuses to compile this code; it emits a diagnostic reading, "stray '\' in program," referring to the universal character fragment in the invocation of the assign macro.

Compliant Solution

This compliant solution uses a universal character name but does not create it by using token concatenation:

Compliant code
#define assign(ucn, val) ucn = val
 
void func(void) {
  int \u0401;
  /* ... */
  assign(\u0401, 4);
  /* ... */
}

Risk Assessment

Creating a universal character name through token concatenation results in undefined behavior. See undefined behavior 3 .

Rule Severity Likelihood Detectable Repairable Priority Level
PRE30-C Low Unlikely Yes No P2 L3

Automated Detection

ToolVersionCheckerDescription
Astrée
25.10
universal-character-name-concatenationFully checked
Axivion Suite
7.12.0
CertC-PRE30Fully implemented
CodeSonar
9.2p0
LANG.PREPROC.PASTE
LANG.PREPROC.PASTEHASH
Macro uses ## operator
## follows # operator
Cppcheck
2.15
preprocessorErrorDirective
Cppcheck Premium
24.11.0
preprocessorErrorDirective
Helix QAC
2025.2
C0905
C++0064, C++0080
Fully implemented
Klocwork
2025.2
MISRA.DEFINE.SHARPFully implemented
LDRA tool suite
9.7.1
573 SFully implemented
Parasoft C/C++test
2026.1
CERT_C-PRE30-aAvoid token concatenation that may produce universal character names
Polyspace Bug Finder
R2025b
CERT C: Rule PRE30-CChecks for universal character name from token concatenation (rule fully covered)
RuleChecker
25.10
universal-character-name-concatenationFully checked
Security Reviewer - Static Reviewer
6.02
RTOS_27Fully implemented

Search for vulnerabilities resulting from the violation of this rule on the CERT website .

Bibliography

[ ISO/IEC 10646-2003 ]
[ ISO/IEC 9899:2024 ]Subclause 5.1.1.2, "Translation Phases"