Lexical Analyzer v0.1.0
 
Loading...
Searching...
No Matches
Token Classification

Token type definitions and classification utilities. More...

Typedefs

typedef enum tokcat_e tokcat_e
 
typedef enum tok_e tok_e
 

Enumerations

enum  tokcat_e { PRE_PROC , SYMBOLS , LITERAL , NFKI_LITERAL }
 Token category enumeration for categorizing different token types in the pre-processing phase. More...
 
enum  tok_e {
  KEYWORD , OPERATOR , PUNCTUATION , NUMERIC_LITERAL ,
  FLOATING_POINT_LITERAL , CHARACTER_LITERAL , STRING_LITERAL , INVALID_IDENTIFIER ,
  IDENTIFIER , PRE_PROCESSOR_OPERATOR
}
 Token type enumeration for categorizing various types of tokens during lexical analysis. More...
 

Functions

const char * toktyp_rval (const tok_e type)
 Returns a string representation of a token type.
 
tok_e get_toktyp (const char *const value, const tokcat_e type)
 Determines the specific token type from a given token string and its category.
 

Detailed Description

Token type definitions and classification utilities.

This group contains all enums and functions related to:

Enumeration Type Documentation

◆ tok_e

enum tok_e

Token type enumeration for categorizing various types of tokens during lexical analysis.

This enumeration defines the different types of tokens that can be encountered during the tokenization phase of source code parsing. Each token type corresponds to a specific element or construct in the source code, such as keywords, operators, punctuation, literals, and identifiers.

Enumerator
KEYWORD 

Keyword token type.

This type represents keywords in the source code (e.g., int, if, while, return).

OPERATOR 

Operator token type.

This type represents operators such as arithmetic (+, -, *, /), logical (&&, ||), relational (<, >, ==), and others.

PUNCTUATION 

Punctuation token type.

This type represents punctuation characters such as semicolons (;), commas (,), and parentheses ((, ), {, }).

NUMERIC_LITERAL 

Numeric literal token type.

This type represents integer numeric literals (e.g., 123, 456).

FLOATING_POINT_LITERAL 

Floating point literal token type.

This type represents floating-point numeric literals (e.g., 3.14, 1.618).

CHARACTER_LITERAL 

Character literal token type.

This type represents character literals (e.g., ‘'a’,'b','1'`).

STRING_LITERAL 

String literal token type.

This type represents string literals (e.g., "Hello, world!", "1234").

INVALID_IDENTIFIER 

Invalid identifier token type.

This type represents tokens that are invalid identifiers (e.g., identifiers starting with a number or containing illegal characters).

IDENTIFIER 

Identifier token type.

This type represents valid identifiers (e.g., variable names, function names, etc.).

PRE_PROCESSOR_OPERATOR 

Preprocessor operator token type.

This type represents operators related to preprocessor directives (e.g., #define, #include).

◆ tokcat_e

enum tokcat_e

Token category enumeration for categorizing different token types in the pre-processing phase.

This enumeration represents the different categories of tokens that are encountered during the lexical analysis or tokenization process. The token categories include preprocessor directives, symbols, literals, and numerical literals (including floating-point numbers, keywords, and identifiers).

Note
The NFKI_LITERAL category includes tokens that represent numerical literals, floating-point literals, keywords, or identifiers.
Enumerator
PRE_PROC 

Preprocessor directives (#define, #include).

SYMBOLS 

Operators/punctuation (+, ;).

LITERAL 

String/character literals.

NFKI_LITERAL 

Numerical literals, keywords, or identifiers.

Note
Includes integers, floats, and reserved words.

Function Documentation

◆ get_toktyp()

tok_e get_toktyp ( const char *const value,
const tokcat_e type )

Determines the specific token type from a given token string and its category.

This function maps a token string and its broader token category (from tokcat_e) to a specific token type (from tok_e). It internally uses helper functions to check the nature of the token based on its string value.

Depending on the token category, it checks the token string against known operators, punctuation, literals, keywords, and identifiers.

Parameters
valueThe token string to be classified.
typeThe general category of the token, given as a tokcat_e value.
Returns
The specific token type as a value from tok_e. Returns INVALID_IDENTIFIER if no match is found.

◆ toktyp_rval()

const char * toktyp_rval ( const tok_e type)

Returns a string representation of a token type.

This function takes a token type from the tok_e enumeration and returns its corresponding human-readable string name. It's primarily used for debugging, logging, or displaying token types in a user-friendly format.

Parameters
typeThe token type (of enum tok_e) whose string representation is to be retrieved.
Returns
A constant string representing the name of the token type. If the token type is unrecognized, returns "Invalid Identifier".