Let’s build a simple math expression parser

In this tutorial, we're going to build a simple but powerful math expression parser that can evaluate expressions like "2+3*4" or "(1+2)/(3-1)". We'll implement this using parser combinators—a functional programming technique that allows us to build complex parsers by composing smaller, simpler ones.

2025-01-24

Let’s build a simple math expression parser

In this tutorial, we're going to build a simple but powerful math expression parser that can evaluate expressions like "2+3*4" or "(1+2)/(3-1)". We'll implement this using parser combinators—a functional programming technique that allows us to build complex parsers by composing smaller, simpler ones.

Parser combinators aren’t just for toy examples. They’re ideal whenever you need:

A domain-specific language (DSL) for custom user inputs or configurations
Structured, complex input parsing without juggling a tangle of regexes
Clean, maintainable, type-safe code that mirrors your grammar

They offer an elegant solution for these problems because they allow us to:

Express complex grammar rules by combining small parsers
Maintain strong type safety throughout the parsing process
Write clear, composable code that follows our grammar naturally

We'll start by implementing basic parser combinators and gradually build up to a complete expression parser that can handle arithmetic operations with proper operator precedence.

1. What is a Parser Combinator?

A parser combinator is a higher-order function (or a group of higher-order functions) used in the construction of parsers.

Rather than writing an entire parser by hand or using a separate code generation tool (like traditional parser generators such as Yacc, ANTLR, etc.), parser combinators allow you to build complex grammars directly in a host language by composing smaller, reusable parsing functions. To achieve this flexibility, we need to implement a basic parser. Let's define an interface for it. Our parser will receive the remaining unparsed string as input and return either a success or failure object to indicate whether the parsing was successful.

Now that we understand the concept of parser combinators, let's define our needed interfaces, helper types, and utility functions:

// -- Result Types ---------------------------------------------------
interface Success<T = string> {
  parsedData: T;
  remainder: string;
  success: true;
}

interface Failure {
  message: string;
  success: false;
}

// -- Parser Type ----------------------------------------------------
type Parser<T = string> = (input: string) => Success<T> | Failure;

// -- Utility Types --------------------------------------------------
type UnwrapParser<T> = T extends Parser<infer U> ? U : T;

type UnwrapParsers<T extends any[], Acc extends any[] = []> =
  T extends [infer Head, ...infer Rest]
    ? UnwrapParsers<Rest, [...Acc, UnwrapParser<Head>]>
    : Acc;
   
// -- Utility Functions --------------------------------------------------

function success<T = string>(args:{ parsedData:T, remainder: string } ): Success<T> {
	return { success:true, ...args }
}

function failure(message?:string ): Failure {
	return { success:false, message: message ?? "An error occurred." }
}

// -- Example Parser -------------------------------------------------
const parseUnsignedInteger: Parser<number> = (input) => {
  const matched = input.match(/^[1-9]\d*/)
  if (!matched) return failure("Input doesn't start with a number")
  const num = parseInt(matched[0]!)
  return isNaN(num)
    ? failure(`Expected number. Received NaN. Raw match: ${matched[0]}`)
    : success({ parsedData: num, remainder: input.substring(matched[0].length)});
};

// -- Example Type Extraction ----------------------------------------
// Extracts `number` from the parser above
type ParsedNumber = UnwrapParser<typeof parseUnsignedInteger>;

2. A Basic `char` Parser

Let's implement the basic char parser which will serve as our fundamental building block for more complex parsers. This parser will check if a string starts with a specific character and return a success or failure result accordingly. We'll then use this as a foundation for building more sophisticated parsers.

function char<T extends string>(ch:T):Parser<T> {
	return function(input){
		if (input[0]!==ch) return failure(`Expected input to start with "${ch}" but received "${input[0]}"`)
		return success({ parsedData:ch, remainder:input.substring(1) })	
	}
}

Now that we have our basic char parser implemented, let's test it with a simple example to see how it works:

const parseA = char("a")
console.log(parseA("abc")) // Success: { parsedData: "a", remainder: "bc" }
console.log(parseA("xyz")) // Failure: { message: "Expected input to start with \"a\" but received \"x\"" }

This demonstrates how our parser can recognize a specific character at the start of an input string and either succeed or fail appropriately. With this foundation, we can move on to building more complex parsers.

Let's implement another basic parser that will be useful - the zero parser that matches the character "0". This will complement our existing char parser and help us build more complex number parsers later:

const zero = char("0")
console.log(zero("0123")) // Success: { parsedData: "0", remainder: "123" }
console.log(zero("abc")) // Failure: { message: "Expected input to start with \"0\" but received \"a\"" }

3. Combining Parsers: `sequence`, `or`, `many`, `optional`

We'll move on to the combinator part of parser combinators. These functions allow us to take two or more parsers and compose them into more powerful parsers.

Sequence

The sequence combinator takes multiple parsers and checks if they all succeed in the given order. If any parser fails, the whole sequence fails.

function sequence<const T extends Parser<any>[]>(parsers: T): Parser<UnwrapParsers<T>> {
	return function(remainder){
		const parsedData: UnwrapParsers<T> = [] as any;
		for (const parser of parsers) {
			const result = parser(remainder);
			if (!result.success) return result;
			remainder = result.remainder;
			parsedData.push(result.parsedData as never);
		}
		return success({ parsedData, remainder });
	}
}

or

The or combinator takes multiple parsers and returns the result of the first parser that succeeds. If all parsers fail, it returns a failure.

function or<const T extends Parser<any>[]>(parsers: T): Parser<UnwrapParser<T[number]>> {
	return function(remainder){
		for (const parser of parsers) {
			const result = parser(remainder);
			if (!result.success) continue;
			return result;
		}
		return failure(`Expected any parser to match`);
	}
}

many

The many combinator takes a single parser and tries to match it zero or more times. The result is an array of parsed data.

function many<T>(parser: Parser<T>):Parser<T[]>{
	return function(remainder){
	  const parsedData: T[] = [];
		while (true){ 
			 const result = parser(remainder)
			 if (!result.success) return success({ parsedData, remainder });
			 // Note: if the parser doesn't make any progress you would have an infinite loop
			 if (remainder === result.remainder) return success({ parsedData, remainder });
			 parsedData.push(result.parsedData);
			 remainder = result.remainder;

		}	
	}
}

optional

The optional combinator takes a single parser and tries to match it zero or one time. If the parser fails, optional returns success with null (or any default value).

function optional<T>(parser: Parser<T>):Parser<T | null>{
	return function(remainder){
			const result = parser(remainder);
			if (result.success) return result;
			return success({ parsedData: null, remainder });
	}
}

With these parsers we can now start to combine parsers to more complex ones:


const one = char("1");
const two = char("2");
// ...
const nine = char("0");

const digit = or([zero, one,/*...*/, nine])
const oneToNine = or([one,/*...*/, nine])

const unsignedInteger = sequence([oneToNine, many(digit)])

At this point, unsignedInteger would parse a string like "1234" into an array of characters ['1',['2','3','4']], plus the remainder.

4. Transform a parse result with `map`

The only thing we can do now is to check the structure of a string and get somehow an AST of the matched parsers. We have to do one more step to unchain the full potential of parser combinators. We have to introduce the map functionality. the map function takes in a parser and transform the parsed result T to some arbitrary type U and also returns a Parser. Lets see how it’s implemented

function map<T,U>(parser: Parser<T>,cb: (inner:T) => U): Parser<U> {
	return function(remainder){
		const result = parser(remainder);
		if (!result.success) return result;
		return success({
		  ...result,
			parsedData: cb(result.parsedData),
		});
	}
}

So now let’s see how we can make use of the map-function by transforming the result of our unsignedInteger parser to a number

5. A Simple Math Expression Parser

We can now build our math expression parser. Let’s outline the key steps and constraints for this parser:

Unsigned integers only (no floats, and no sequences like +-+---4).
The operators +, , , /.
Parentheses (, ) for grouping.

Operator Precedence

Operator precedence is critical:

Parentheses have the highest precedence.
Multiplication () and division (/) come next.
Addition (+) and subtraction () have the lowest precedence.

Instead of manually managing precedence, we’ll implicitly enforce it by the order in which our parsers are composed.

Grammar

We’ll use the following rough grammar:

factor → number | '(' expression ')'
term → factor ((' * ' | '/' ) factor)*
expression → term (('+' | '-') term)*

Where each successive production rule is responsible for a lower precedence level.

High-Level Structure

const TokenParser = {
  // All tokens and special characters go here
};

function mathExpression(input: string): Success<number> | Failure {
  /**
   * We'll build up our parser in steps:
   *   1. Parentheses/factor parser
   *   2. Multiplication/division (term)
   *   3. Addition/subtraction (expression)
   */
  return expression(input);
}

Below, we’ll define the pieces—parentheses & factor, term (with multiplication/division), and expression (with addition/subtraction)—showing how they fit together.

Parentheses and factor parsing

Parentheses: We parse an opening parenthesis (, then recursively parse an expression, then parse a closing parenthesis ).
Factor: A factor is either a parenthesized expression or an unsigned integer (unsignedInteger).

const TokenParser = { 
 OPENING_PAR: char("("),
 CLOSING_PAR: char(")"),
 /* ...other tokens... */
}

function mathExpression(input:string):Success<number> | Failure {
	// factor = '(' expression ')' | unsignedInteger
	const factor:Parser<number> = (input) => or([
    map(
      sequence([
        TokenParser.OPENING_PAR,
        mathExpression,
        TokenParser.CLOSING_PAR
      ]),
      // we ignore parentheses themselves and just return the parsed sub-expression
      ([, expr]) => expr
    ),
    unsignedInteger
	  ])(input);

  // ... next steps (term, expression) ...

  return expression(input);
}

Multiplication and division

A term starts with a factor, followed by zero or more * factor or / factor. We store each operator and operand in an array, then evaluate them sequentially.

const TokenParser = {
  /* ... */
  MUL: char("*"),
  DIV: char("/"),
  MUL_OR_DIV: or([char("*"), char("/")]),
  /* ... */
};

function mathExpression(input: string): Success<number> | Failure {
  // ... factor definition above ...

  // A term is a factor, optionally followed by (* factor) or (/ factor)
  const term = sequence([
    factor,
    many(
      map(
        sequence([
          TokenParser.MUL_OR_DIV,
          factor
        ]),
        ([op, val]) => ({ op, val })
      )
    )
  ]);

  // Evaluate that partial result in a map
  const evaluatedTerm = map(term, ([first, rest]) => {
    let result = first;
    for (const { op, val } of rest) {
      if (op === "*") result *= val;
      else result /= val;
    }
    return result;
  });

  // ... next step (expression) ...

  return expression(input);
}

Addition and subtraction

Finally, the expression parser handles low-precedence operations (+ and -). It matches a term, followed by any number of + term or - term.

const TokenParser = {
  /* ... */
  ADD: char("+"),
  SUB: char("-"),
  ADD_OR_SUB: or([char("+"), char("-")])
  /* ... */
};

function mathExpression(input: string): Success<number> | Failure {
  // ... term definition above ...

  // An expression is a term, optionally followed by (+ term) or (- term)
  const expression = sequence([
    evaluatedTerm,
    many(
      map(
        sequence([
          TokenParser.ADD_OR_SUB,
          evaluatedTerm
        ]),
        ([op, val]) => ({ op, val })
      )
    )
  ]);

  // Evaluate the expression
  const evaluatedExpression = map(expression, ([first, rest]) => {
    let result = first;
    for (const { op, val } of rest) {
      if (op === "+") result += val;
      else result -= val;
    }
    return result;
  });

  // Return the final parse result
  return evaluatedExpression(input);
}

6. Testing the Math Expression Parser

Once our parser is assembled, let’s put it to the test:

console.log(mathExpression("1+1"));    // Success: 2
console.log(mathExpression("1*2"));    // Success: 2
console.log(mathExpression("(1+2)*3")) // Success: 9
console.log(mathExpression("(1+2)* 3")) 
// Surprise! This doesn't fail, even though there's a space before the 3 
// that remains unparsed.

Why That Last Example Succeeds

The final example, "(1+2)* 3", still returns a success because our parser never explicitly consumes the entire string. After parsing (1+2), it sees )* as valid tokens, and the space plus 3 remain as unused remainder. By default, returning a Success simply means it recognized a valid expression prefix; it doesn’t guarantee the entire input was parsed.

Enforcing a Full Parse (EOF)

If we want to ensure that every character in the input is consumed—i.e., it truly represents a single math expression and nothing else—we can enforce an "end-of-file" token. One simple trick is to parse an empty string (char("")) immediately after the main expression:

const mathExpressionWithEoF = sequence([mathExpression,char(""))

console.log(mathExpressionWithEoF("(1+2)* 3") // Failure

Here, if there’s any leftover input, matching char("") fails, causing the parser to return a Failure instead of a partial success.

7. Wrapping Up & Next Steps

We’ve now:

Defined a set of small, composable parser combinators for characters, sequences, alternatives, repetition, and optional tokens.
Built a math expression parser that handles parentheses, multiplication/division, and addition/subtraction with correct operator precedence.
Ensured type safety throughout by using TypeScript.
Demonstrated how to require a full parse by enforcing an EOF check.

This approach provides a clean, readable, and extensible foundation for parsing and evaluating arithmetic expressions in TypeScript. You can extend it with features like negative numbers, floating-point parsing, or more complex grammar rules.

Beyond that, you can enhance this small parsing library by adding:

Enhanced Debugging and Tracing: Optional logging in each combinator to track parsing decisions.
Error Recovery: Strategies for handling unexpected tokens gracefully (e.g., skipping them to continue parsing).
Richer Combinators: Specialized combinators for whitespace, comments, or more elaborate syntactic constructs.
Abstract Syntax Trees (AST): Instead of computing results directly, transform the input into a tree structure for further analysis or code generation.
Domain-Specific Features: Custom extensions like function calls, variable references, or specialized operators.

Ultimately, parser combinators offer a flexible, type-safe, and highly composable approach to language design in TypeScript—letting you shape your parser exactly how you need.

Happy parsing!

PS: Here is the code https://tsplay.dev/NndAkW

When Modern Technologies Annoy: A DX Odyssey with tRPC, Drizzle, and TypeScript

Philipp Dehler

Color Spaces on the Web: A Developer's Guide