Lexical Conventions[Lex]

5 Lexical Conventions[Lex]

5.1 Unit of Translation[Lex.Translation]

The text of hlsl programs is collected in source and header files. The distinction between source and header files is social and not technical. An implementation will construct a translation unit from a single source file and any included source or header files referenced via the #include preprocessing directive conforming to the isoC preprocessor specification.

An implementation may implicitly include additional sources as required to expose the hlsl library functionality as defined in ([Runtime]).

5.2 Phases of Translation[Lex.Phases]

hlsl inherits the phases of translation from isoCPP, with minor alterations, specifically the removal of support for trigraph and digraph sequences. Below is a description of the phases.

  1. Source files are characters that are mapped to the basic source character set in an implementation-defined manner.

  2. Any sequence of backslash (\) immediately followed by a new line is deleted, resulting in splicing lines together.

  3. Tokenization occurs and comments are isolated. If a source file ends in a partial comment or preprocessor token the program is ill-formed and a diagnostic shall be issued. Each comment block shall be treated as a single white-space character.

  4. Preprocessing directives are executed, macros are expanded, pragma and other unary operator expressions are executed. Processing of #include directives results in all preceding steps being executed on the resolved file, and can continue recursively. Finally all preprocessing directives are removed from the source.

  5. Character and string literal specifiers are converted into the appropriate character set for the execution environment.

  6. Adjacent string literal tokens are concatenated.

  7. White-space is no longer significant. Syntactic and semantic analysis occurs translating the whole translation unit into an implementation-defined representation.

  8. The translation unit is processed to determine required instantiations, the definitions of the required instantiations are located, and the translation and instantiation units are merged. The program is ill-formed if any required instantiation cannot be located or fails during instantiation.

  9. External references are resolved, library references linked, and all translation output is collected into a single output.

5.3 Character Sets[Lex.CharSet]

The basic source character set is a subset of the ASCII character set. The table below lists the valid characters and their ASCII values:

Hex ASCII Value Character Name Glyph or C Escape Sequence
0x09 Horizontal Tab \t
0x0A Line Feed \n
0x0D Carriage Return \r
0x20 Space
0x21 Exclamation Mark !
0x22 Quotation Mark "
0x23 Number Sign #
0x25 Percent Sign %
0x26 Ampersand &
0x27 Apostrophe
0x28 Left Parenthesis (
0x29 Right Parenthesis )
0x2A Asterisk *
0x2B Plus Sign +
0x2C Comma ,
0x2D Hyphen-Minus -
0x2E Full Stop .
0x2F Solidus /
0x30 .. 0x39 Digit Zero .. Nine 0 1 2 3 4 5 6 7 8 9
0x3A Colon :
0x3B Semicolon ;
0x3C Less-than Sign <
0x3D Equals Sign =
0x3E Greater-than Sign >
0x3F Question Mark ?
0x41 .. 0x5A Latin Capital Letter A .. Z A B C D E F G H I J K L M
N O P Q R S T U V W X Y Z
0x5B Left Square Bracket [
0x5C Reverse Solidus \
0x5D Right Square Bracket [
0x5E Circumflex Accent ^
0x5F Underscore _
0x61 .. 0x7A Latin Small Letter a .. z a b c d e f g h i j k l m
n o p q r s t u v w x y z
0x7B Left Curly Bracket {
0x7C Vertical Line |
0x7D Right Curly Bracket }

An implementation may allow source files to be written in alternate extended character sets as long as that set is a superset of the basic character set. The translation character set is an extended character set or the basic character set as chosen by the implementation.

5.4 Preprocessing Tokens[Lex.PPTokens]

preprocessing-token:
header-name
identifier
pp-number
character-literal
string-literal
preprocessing-op-or-punc
each non-whitespace character from the translation character set that cannot be one of the above

1

Each preprocessing token that is converted to a token shall have the lexical form of a keyword, an identifier, a constant, a string literal or an operator or punctuator.

Preprocessing tokens are the minimal lexical elements of the language during translation phases 3 through 6 (5.2). Preprocessing tokens can be separated by whitespace in the form of comments, white space characters, or both. White space may appear within a preprocessing token only as part of a header name or between the quotation characters in a character constant or string literal.

Header name preprocessing tokens are only recognized within #include preprocessing directives, __has_include expressions, and implementation-defined locations within #pragma directives. In those contexts, a sequence of characters that could be either a header name or a string literal is recognized as a header name.

5.5 Tokens[Lex.Tokens]

token:
identifier
keyword
literal
operator-or-punctuator

There are five kinds of tokens: identifiers, keywords, literals, and operators or punctuators. All whitespace characters and comments are ignored except as they separate tokens.

5.6 Comments[Lex.Comments]

The characters /* start a comment which terminates with the characters /. The characters // start a comment which terminates at the next new line.

5.7 Header Names[Lex.Headers]

header-name:
< h-char-sequence >
" q-char-sequence "

h-char-sequence:
h-char
h-char-sequence h-char

h-char:
any character in the translation character set except newline or >

q-char-sequence:
q-char
q-char-sequence q-char

q-char:
any character in the translation character set except newline or "

Character sequences in header names are mapped to header files or external source file names in an implementation defined way.

5.8 Preprocessing numbers[Lex.PPNumber]

pp-number:
digit
. digit
pp-number digit
pp-number non-digit
pp-number e sign
pp-number E sign
pp-number p sign
pp-number P sign
pp-number .

Preprocessing numbers begin with a digit or period (.), and may be followed by valid identifier characters and floating point literal suffixes (e+, e-, E+, E-, p+, p-, P+, and P-). Preprocessing number tokens lexically include all integer-literal and floating-literal tokens.

Preprocessing numbers do not have types or values. Types and values are assigned to integer-literal, floating-literal, and vector-literal tokens on successful conversion from preprocessing numbers.

A preprocessing number cannot end in a period (.) if the immediate next token is a scalar-element-sequence ([Lex.Literal.Vector]). In this situation the pp-number token is truncated to end before the period2.