A Systematic Literature Review of Lexical Analyzer Implementation Techniques in Compiler Design


  • Vaikunta Pai T Associate Professor, College of Computer Science & Information Science, Srinivas University, Mangalore-575001, India
  • A. Jayanthila Devi Professor, College of Computer Science & Information Science, Srinivas University, Mangalore – 575001, India.
  • Aithal P. S. Professor, College of Management & Commerce, Srinivas University, Mangalore – 575001, India.




Lexical Analysis, Scanner, Lexical Analyzer, Finite Automata, Regular Expression, Compiler, Tokens, Parallel Tokenization, Multi-core Machines


The term “lexical” in lexical analysis process of the compilation is derived from the word “lexeme”, which is the basic conceptual unit of the linguistic morphological study. In computer science, lexical analysis, also referred to as lexing, scanning or tokenization, is the process of transforming the string of characters in source program to a stream of tokens, where the token is a string with a designated and identified meaning. It is the first phase of a two-step compilation processing model known as the analysis stage of compilation process used by compiler to understand the input source program. The objective is to convert character streams into words and recognize its token type. The generated stream of tokens is then used by the parser to determine the syntax of the source program. A program in compilation phase that performs a lexical analysis process is termed as lexical analyzer, lexer, scanner or tokenizer. Lexical analyzer is used in various computer science applications, such as word processing,information retrieval systems, pattern recognition systems and language-processing systems. However, the scope of our review study is related to language processing. Various tools are used for automatic generation of tokens and are more suitable for sequential execution of the process. Recent advances in multi-core architecture systems have led to the need to re-engineer the compilation process to integrate the multi-core architecture. By parallelization in the recognition of tokens in multiple cores, multi cores can be used optimally, thus reducing compilation time. To attain parallelism in tokenizationon multi-core machines, the lexical analyzer phase of compilation needs to be restructured to accommodate the multi-core architecture and by exploiting the language constructs which can run parallel and the concept of processor affinity. This paper provides a systematic analysis of literature to discuss emerging approaches and issues related to lexical analyzer implementation and the adoption of improved methodologies. This has been achieved by reviewing 30 published articles on the implementation of lexical analyzers. The results of this review indicate various techniques, latest developments, and current approaches for implementing auto generated scanners and hand-crafted scanners. Based on the findings, we draw on the efficacy of lexical analyzer implementation techniques from the results discussed in the selected review studies and the paper provides future research challenges and needs to explore the previously under-researched areas for scanner implementation processes.


Download data is not yet available.




How to Cite

Vaikunta Pai T, A. Jayanthila Devi, & Aithal P. S. (2020). A Systematic Literature Review of Lexical Analyzer Implementation Techniques in Compiler Design . International Journal of Applied Engineering and Management Letters (IJAEML), 4(2), 285–301. https://doi.org/10.47992/IJAEML.2581.7000.0087

Most read articles by the same author(s)

1 2 3 4 > >>