Source code to token conversion. More...
Functions | |
size_t | tokcnt (const char *const line) |
Counts the number of tokens in a given string (or file content). | |
void | toknz_segtoset (tokset_t *const set, const size_t token_index, const char *const line, const size_t start, const size_t end, const size_t line_no, const tokcat_e category, const size_t column) |
Tokenizes a segment of a line and stores the resulting token in the token set. | |
tokset_t * | toknz (const char *const line) |
Tokenizes a line (or multiple lines of code) into a set of tokens. | |
Source code to token conversion.
size_t tokcnt | ( | const char *const | line | ) |
Counts the number of tokens in a given string (or file content).
This function counts the number of tokens in a provided line of text (or a whole block of code, if the entire file content is passed). It categorizes tokens into string literals, character literals, operators, punctuations, and identifiers/keywords/literals. The function processes the string character by character and considers escape sequences where applicable.
line | A constant pointer to a string containing the code to be analyzed. This string can represent a single line of code or the entire content of a file |
Handles:
tokset_t * toknz | ( | const char *const | line | ) |
Tokenizes a line (or multiple lines of code) into a set of tokens.
This function processes the given string (or entire code block) and divides it into individual tokens. It supports string literals, character literals, operators, punctuations, separators, preprocessor directives, identifiers, and keywords. The tokens are stored in the provided token set.
The function uses toknz_segtoset
to tokenize individual segments of the line and store them in the token set. It ensures that each token is associated with its type, position (line number, column), and appropriate category (e.g., literal, operator, identifier).
If the number of tokens in the resulting set does not match the expected token count, the function reports an error and returns NULL
. It processes both single lines and whole files, depending on the input provided.
line | A constant pointer to the string (or code block) to be tokenized. This can represent a single line or the entire content of a file. |
tokset_t
) containing all identified tokens. Returns NULL
if an error occurs during tokenization (e.g., mismatch in expected token count). NULL | on tokenization error |
Processes:
void toknz_segtoset | ( | tokset_t *const | set, |
const size_t | token_index, | ||
const char *const | line, | ||
const size_t | start, | ||
const size_t | end, | ||
const size_t | line_no, | ||
const tokcat_e | category, | ||
const size_t | column ) |
Tokenizes a segment of a line and stores the resulting token in the token set.
This function extracts a substring from the given line, determines its token type based on the specified token category, and stores the token in the provided token set at the specified index. It handles memory allocation for the substring, creates a token using the tok_ptor
constructor, and stores the token in the set->toks
array. If memory allocation fails, it sets the token to NULL
.
set | A pointer to the tokset_t object in which the token will be stored. |
token_index | The index in the token set where the token will be stored. |
line | A constant pointer to the string containing the source line of code. |
start | The starting index of the token within the line. |
end | The ending index of the token within the line. |
line_no | The line number where the token appears in the source code. |
category | The category of the token, from tokcat_e (e.g., literals, symbols). |
column | The starting column number of the token within the line. |