5 Lexical Conventions[Lex]

5.1 Unit of Translation[Lex.Translation]

The text of hlsl programs is collected in source and header files. The distinction between source and header files is social and not technical. An implementation will construct a translation unit from a single source file and any included source or header files referenced via the #include preprocessing directive conforming to the isoC preprocessor specification.

An implementation may implicitly include additional sources as required to expose the hlsl library functionality as defined in ([Runtime]).

5.2 Phases of Translation[Lex.Phases]

hlsl inherits the phases of translation from isoCPP, with minor alterations, specifically the removal of support for trigraph and digraph sequences. Below is a description of the phases.

Source files are characters that are mapped to the basic source character set in an implementation-defined manner.
Any sequence of backslash (\) immediately followed by a new line is deleted, resulting in splicing lines together.
Tokenization occurs and comments are isolated. If a source file ends in a partial comment or preprocessor token the program is ill-formed and a diagnostic shall be issued. Each comment block shall be treated as a single white-space character.
Preprocessing directives are executed, macros are expanded, pragma and other unary operator expressions are executed. Processing of #include directives results in all preceding steps being executed on the resolved file, and can continue recursively. Finally all preprocessing directives are removed from the source.
Character and string literal specifiers are converted into the appropriate character set for the execution environment.
Adjacent string literal tokens are concatenated.
White-space is no longer significant. Syntactic and semantic analysis occurs translating the whole translation unit into an implementation-defined representation.
The translation unit is processed to determine required instantiations, the definitions of the required instantiations are located, and the translation and instantiation units are merged. The program is ill-formed if any required instantiation cannot be located or fails during instantiation.
External references are resolved, library references linked, and all translation output is collected into a single output.

5.3 Character Sets[Lex.CharSet]

The basic source character set is a subset of the ASCII character set. The table below lists the valid characters and their ASCII values:

Hex ASCII Value	Character Name	Glyph or C Escape Sequence
0x09	Horizontal Tab	`\t`
0x0A	Line Feed	`\n`
0x0D	Carriage Return	`\r`
0x20	Space
0x21	Exclamation Mark	`!`
0x22	Quotation Mark	`"`
0x23	Number Sign	`#`
0x25	Percent Sign	`%`
0x26	Ampersand	`&`
0x27	Apostrophe	`’`
0x28	Left Parenthesis	`(`
0x29	Right Parenthesis	`)`
0x2A	Asterisk	`*`
0x2B	Plus Sign	`+`
0x2C	Comma	`,`
0x2D	Hyphen-Minus	`-`
0x2E	Full Stop	`.`
0x2F	Solidus	`/`
0x30 .. 0x39	Digit Zero .. Nine	`0 1 2 3 4 5 6 7 8 9`
0x3A	Colon	`:`
0x3B	Semicolon	`;`
0x3C	Less-than Sign	`<`
0x3D	Equals Sign	`=`
0x3E	Greater-than Sign	`>`
0x3F	Question Mark	`?`
0x41 .. 0x5A	Latin Capital Letter A .. Z	`A B C D E F G H I J K L M`
		`N O P Q R S T U V W X Y Z`
0x5B	Left Square Bracket	`[`
0x5C	Reverse Solidus	`\`
0x5D	Right Square Bracket	`[`
0x5E	Circumflex Accent	`^`
0x5F	Underscore	`_`
0x61 .. 0x7A	Latin Small Letter a .. z	`a b c d e f g h i j k l m`
		`n o p q r s t u v w x y z`
0x7B	Left Curly Bracket	`{`
0x7C	Vertical Line	`\|`
0x7D	Right Curly Bracket	`}`

An implementation may allow source files to be written in alternate extended character sets as long as that set is a superset of the basic character set. The translation character set is an extended character set or the basic character set as chosen by the implementation.

5.4 Preprocessing Tokens[Lex.PPTokens]

preprocessing-token:
header-name
identifier
pp-number
character-literal
string-literal
preprocessing-op-or-punc
each non-whitespace character from the translation character set that cannot be one of the above

Each preprocessing token that is converted to a token shall have the lexical form of a keyword, an identifier, a constant, a string literal or an operator or punctuator.

Preprocessing tokens are the minimal lexical elements of the language during translation phases 3 through 6 (5.2). Preprocessing tokens can be separated by whitespace in the form of comments, white space characters, or both. White space may appear within a preprocessing token only as part of a header name or between the quotation characters in a character constant or string literal.

Header name preprocessing tokens are only recognized within #include preprocessing directives, __has_include expressions, and implementation-defined locations within #pragma directives. In those contexts, a sequence of characters that could be either a header name or a string literal is recognized as a header name.

5.5 Tokens[Lex.Tokens]

token:
identifier
keyword
literal
operator-or-punctuator

There are five kinds of tokens: identifiers, keywords, literals, and operators or punctuators. All whitespace characters and comments are ignored except as they separate tokens.

5.6 Comments[Lex.Comments]

The characters /* start a comment which terminates with the characters */. The characters // start a comment which terminates at the next new line.

5.7 Header Names[Lex.Headers]

header-name:
< h-char-sequence >
" q-char-sequence "

h-char-sequence:
h-char
h-char-sequence h-char

h-char:
any character in the translation character set except newline or >

q-char-sequence:
q-char
q-char-sequence q-char

q-char:
any character in the translation character set except newline or "

Character sequences in header names are mapped to header files or external source file names in an implementation defined way.

5.8 Preprocessing numbers[Lex.PPNumber]

pp-number:
digit
. digit
pp-number ’ digit
pp-number ’ non-digit
pp-number e sign
pp-number E sign
pp-number p sign
pp-number P sign
pp-number .

Preprocessing numbers begin with a digit or period (.), and may be followed by valid identifier characters and floating point literal suffixes (e+, e-, E+, E-, p+, p-, P+, and P-). Preprocessing number tokens lexically include all integer-literal and floating-literal tokens.

Preprocessing numbers do not have types or values. Types and values are assigned to integer-literal, floating-literal, and vector-literal tokens on successful conversion from preprocessing numbers.

A preprocessing number cannot end in a period (.) if the immediate next token is a scalar-element-sequence ([Lex.Literal.Vector]). In this situation the pp-number token is truncated to end before the period.

5.9 Identifiers[Lex.Ident]

identifier:
identifier-start
identifier identifier-continue

identifier-start:
nondigit
an element of the translation character set of class XID_Start

identifier-continue:
digit
nondigit
an element of the translation character set of class XID_Continue

nondigit: one of
a b c d e f g h i j k l m
n o p q r s t u v w z y z
A B C D E F G H I J K L M
N O P Q R S T U V W X Y Z _

digit: one of
0 1 2 3 4 5 6 7 8 9

The XID_Start and XID_Continue properties are Derived core properties defined in UAX44. The characters which have the property are defined in UAX31.

An XID_Start character is a character of the translation character set whose corresponding code point in UTF has the XID_Start property. An XID_Continue character is a character of the translation character set whose corresponding code point in UTF has the XID_Continue property. An identifier shall conform to Normalization Form C as specified in UTF.

Some identifiers are reserved for use by implementations of this standard, but are not required to emit a diagnostic:

Any identifier that contains a double underscore (__), or begins with an underscore (_) followed by a capital letter is reserved for any use by an implementation.
Any identifier that begins with an underscore (_) is reserved for any implementation to use as a name in the global namespace.

5.10 Keywords[Lex.Keywords]

keyword: any of
auto bool break case cbuffer center centroid class column_major
const constexpr continue discard do double else enum export
false float for globallycoherent groupshared if in indices
inline inout int interface line lineadj linear namespace nointerpolation
noperspective operator out packoffset payload point precise
primitives reordercoherent return row_major sampler_state shared sizeof
snorm static struct switch tbuffer template this triangle
triangleadj true typedef typename uniform unorm unsigned using vertices
void while

The identifiers defined in the grammar above are reserved as keywords. During the phases of translation when preprocessing tokens are converted to tokens, keywords are unconditionally treated as keywords except when they appear in attribute specifiers ([Decl.Attributes]).

5.11 Operators and Punctuators[Lex.Operators]

preprocessing-op-or-punc:
preprocessing-operator
operator-or-punctuator

preprocessing-operator:
#
##

operator-or-punctuator: one of
{ } [ ] ( ) ; : ...
? :: .
! + - * / % &̂ |
= += -= *= /= %= =̂ &= |=
== += < > <= >= && ||
<< >> <<= >>= ++ – ,

HLSL inherits a set of operators and punctuators from isoCPP. The set of tokens in the preprocessing-op-or-punc grammar formation are either interpreted by the preprocessor ([Preprocessing]), or they are converted to tokens.

Each operator-or-punctuator that is not handled completely by the preprocessor is converted to a single token.

5.12 Literals[Lex.Literals]

literal:
integer-literal
character-literal
floating-literal
string-literal
boolean-literal
vector-literal

5.12.1 Integer Literals[Lex.Literals.Int]

integer-literal:
decimal-literal integer-suffix_opt
octal-literal integer-suffix_opt
hexadecimal-literal integer-suffix_opt

decimal-literal:
nonzero-digit
decimal-literal digit

octal-literal: 0
octal-literal octal-digit

hexadecimal-literal:
0x hexadecimal-digit
0X hexadecimal-digit
hexadecimal-literal hexadecimal-digit

nonzero-digit: one of
1 2 3 4 5 6 7 8 9

octal-digit: one of
0 1 2 3 4 5 6 7

hexadecimal-digit: one of
0 1 2 3 4 5 6 7 8 9
a b c d e f
A B C D E F

integer-suffix:
unsigned-suffix long-suffix_opt
long-suffix unsigned-suffix_opt

unsigned-suffix: one of
u U

long-suffix: one of
l L

An integer-literal is a decimal-literal, an octal-literal or a hexidecimal-literal, and an optional type suffix. An integer literal shall not contain a period or exponent specifier.

The type of an integer literal is the first of the corresponding list in the table below in which its value can be represented.

Suffix	Decimal constant	Octal or hexadecimal constant
none	`int32_t`	`int32_t`
	`int64_t`	`uint32_t`
		`int64_t`
		`uint64_t`
`u` or `U`	`uint32_t`	`uint32_t`
	`uint64_t`	`uint64_t`
`l` or `L`	`int64_t`	`int64_t`
		`uint64_t`
Both `u` or `U`	`uint64_t`	`uint64_t`
and `l` or `L`

If the specified value of an integer literal cannot be represented by any type in the corresponding list, the integer literal has no type and the program is ill-formed.

An implementation may support the integer suffixes ll and ull as equivalent to l and ul respectively.

5.12.2 Floating-point Literals[Lex.Literal.Float]

floating-literal:
fractional-constant exponent-part_opt floating-suffix_opt
digit-sequence exponent-part floating-suffx_opt

fractional-constant:
digit-sequence_opt . digit-sequence
digit-sequence .

exponent-part:
e sign_opt digit-sequence
E sign_opt digit-sequence

sign: one of
+ -

digit-sequence:
digit
digit-sequence digit

floating-suffix: one of
h f l
H F L
f16 f32 f64
F16 F32 F64

A floating literal is written either as a fractional-constant with an optional exponent-part and optional floating-suffix, or as an integer digit-sequence with a required exponent-part and optional floating-suffix.

The type of a floating literal is float, unless explicitly specified by a suffix. The suffixes h and H specify half, the suffixes f and F specify float, and the suffixes l and L specify double. The explicitly sized suffixes f16, F16, f32, F32, f64, and F64 specify the types of the bit-width specified by the number in the suffix. The f16 and F16 suffixes will only be supported when an implementation supports native 16-bit types. If a value specified in the source is not in the range of representable values for its type, the program is ill-formed.