mirror of https://github.com/YesDrX/nim-treestand synced 2026-01-02 06:14:35 +00:00

No description

Find a file

Yes DrX 7fb9be48de allow ignore conflictions		2025-12-27 14:44:05 -05:00
.github/workflows	windows	2025-12-26 15:18:01 -05:00
docs	external scanner	2025-12-27 10:36:43 -05:00
examples	docs	2025-12-26 16:37:31 -05:00
src	allow ignore conflictions	2025-12-27 14:44:05 -05:00
tests	allow ignore conflictions	2025-12-27 14:44:05 -05:00
tools	init	2025-12-25 19:47:13 -05:00
.gitignore	windows	2025-12-26 15:51:36 -05:00
ARCHITECTURE.md	init	2025-12-25 19:47:13 -05:00
LICENSE	init	2025-12-25 19:47:13 -05:00
nim.cfg	windows	2025-12-26 15:51:36 -05:00
README.md	Add CI badge to README for treestand project	2025-12-27 11:20:27 -05:00
treestand.nimble	allow ignore conflictions	2025-12-27 14:44:05 -05:00

README.md

Treestand

Treestand is a complete re-implementation of the Tree-sitter parser generator in Nim. It takes grammar.js, or grammar rules defined in Nim and optionally scanner.c as input and generates high-performance Nim parsers automatically.

Treestand leverages Nim's powerful metaprogramming capabilities to make parser generation seamless and efficient:

Compile-time macros (importGrammar, buildGrammar) that generate parsers directly in your code
Zero runtime overhead - all grammar processing happens at compile-time
Type-safe DSL for defining grammars in pure Nim with full IDE support

Purpose

The goal of Treestand is to provide a native Nim implementation of the Tree-sitter ecosystem, matching its rigorous standards for conflict detection and parsing performance, while eliminating the need for a C runtime during the generation phase (though scanner.c files are still supported).

Features

Direct Execution: Executes grammar.js using Bun or Node.js via an integrated DSL.
Full Compatibility: Closely follows Tree-sitter's internal algorithms (NFA, DFA, LR(1) table construction).
Nim-Native: Generates pure Nim code for the parser and runtime.
External Scanners: Native support for both scanner.nim (pure Nim) and scanner.c (C/C++) external scanners:
- Auto-detection: Automatically finds scanner.nim or scanner.c in the grammar directory when external tokens are defined
- Priority order: scanner.nim → scanner.c → scanner.cc → src/scanner.c → src/scanner.cc
- Nim scanners: Imported directly as Nim modules (zero FFI overhead)
- C scanners: Compiled using {.compile.} and {.importc.} pragmas
Excellent Conflict Detection: Accurate reporting of Shift/Reduce and Reduce/Reduce conflicts.

Installation

nimble install treestand

Or install from source

nimble install https://github.com/YesDrX/nim-treestand

CLI Usage

Treestand provides a unified CLI for generating and testing parsers.

Generate a Parser

treestand --cmd generate --grammarPath grammar.js --outputDir ./generated

Options:

--grammarPath: Path to the grammar.js file.
--outputDir: Output directory for the generated parser.nim (default: .).
--dslPath: (Optional) Explicit path to dsl.js.
--parserName: (Optional) Custom name for the generated parser.

Test a Grammar

treestand --cmd test --fixtureDir ./tests/fixtures/my_grammar

This will generate the parser, compile it with a test runner, and verify it against corpus.txt.

Library APIs

Treestand is also available as a library. The main entry point is treestand.nim.

import treestand

when isMainModule:
  generateParser(
    grammarPath = "/path/to/tree-sitter-mylang/grammar.js",
    outputDir = "/path/to/output"
  )

Using `tsGrammar` Macro (Recommended)

The easiest way to define grammars is using the tsGrammar macro with a concise, PEG/EBNF-like syntax:

import treestand

tsGrammar "my_lang":
  # Rule Assignment
  program     <- +stmt

  # Sequence (*) and Choice (|)
  stmt        <- assign * semi
  
  # Repetition
  # +rule  -> One or more
  # *rule  -> Zero or more
  # ?rule  -> Optional
  assign      <- (variable: identifier) * eq * (value: expr) # Named fields by (fld : rule) format
  expr        <- identifier | number | external_token # external_token is a token handled by an external scanner (C function), but not implemented in tsGrammar yet
  
  # Lexical Tokens
  # Use token() wrapper for lexical rules
  # String literals and regex patterns are auto-wrapped with str() or patt()
  identifier  <- token(re"\w+")
  number      <- token(re"\\d+")
  eq          <- token("=")
  semi        <- token(";")
  
  # Configuration·
  extras      = token(re"\s+")
  # word        = "identifier"

when isMainModule:
  echo parseMyLang("a = 1; b=a;")

Benefits:

Clean Syntax: Readable PEG/EBNF-like notation
Named Fields: Easy AST node field access with (name: rule)
Operator Sugar: * for sequence, | for choice, +/*/? for repetition
Set Syntax: {"a", "b"} for keyword choices
Embedded Actions: Execute Nim code during parsing (see below)
No JavaScript: Pure Nim, zero dependencies
Compile-time: All generation happens at compile-time

Embedded Actions

Attach Nim code blocks to rules to execute actions during parsing:

type Env = object
  vars: Table[string, int]

tsGrammar "calc", userdata: Env:
  assign <- ident * eq * number * semi:
    # Access matched node via `node`
    let varName = node.child("ident").text
    let value = parseInt(node.child("number").text)
    userdata.vars[varName] = value
  
  ident  <- token(re"[a-zA-Z_]\\w*")
  number <- token(re"\\d+")
  eq     <- token("=")
  semi   <- token(";")

# Usage
var env = Env()
if matchCalc("x = 10; y = 20;", env):
  echo env.vars  # {"x": 10, "y": 20}

Actions can build custom AST structures, populate symbol tables, perform semantic validation, and more. See examples/09_ast_actions for a complete example.

See docs/using_dsl.md for the complete macro reference.

Using `importGrammar` Macro

For existing Tree-sitter grammars, use importGrammar to generate parsers from grammar.js files:

import treestand
import std/os

# Import grammar at compile-time - generates parser directly in your module
importGrammar(currentSourcePath.parentDir / "grammar.js")

when isMainModule:
  # Use the auto-generated parser
  let tree = parseJson("""{"key": "value", "number": 42}""")
  echo tree

Benefits:

No intermediate parser.nim files
All parser generation happens at compile-time
Zero runtime overhead
Self-contained modules

See docs/import_grammar.md for detailed documentation.

Using `buildGrammar` Macro

For the ultimate in simplicity, use buildGrammar to define grammars in pure Nim without any JavaScript dependencies:

import treestand
import std/options

# Define grammar using pure Nim DSL
proc createMathGrammar(): InputGrammar =
  InputGrammar(
    name: "math",
    variables: @[
      Variable(name: "program", kind: vtNamed, rule: rep(sym("expression"))),
      Variable(name: "expression", kind: vtNamed, 
               rule: choice(sym("number"), sym("binary_op"))),
      Variable(name: "binary_op", kind: vtNamed,
               rule: prec_left(1, seq(sym("expression"), sym("op"), sym("expression")))),
      Variable(name: "number", kind: vtNamed, rule: token(patt("\\d+"))),
      Variable(name: "op", kind: vtNamed, rule: token(patt("[+\\-*/]")))
    ],
    extraSymbols: @[token(patt("\\s+"))]
  )

# Build parser from pure Nim grammar
buildGrammar(createMathGrammar)

when isMainModule:
  let tree = parseMath("1 + 2 * 3")
  echo tree

Benefits:

No JavaScript: Zero JavaScript dependencies
Type-safe DSL: Full Nim type checking and IDE support
Single file: Grammar and code in one place
Compile-time: All generation at compile-time

See docs/build_grammar.md for detailed documentation.

Project Structure

src/treestand.nim - Main entry point (CLI & Library Exports)
src/treestand/cli/ - CLI implementation
- generate.nim - Parser generation command
- test.nim - Grammar testing command
src/treestand/ - Core library modules
- pragmas.nim - Compile-time macros (importGrammar, buildGrammar)
- grammar.nim - Grammar types and structures (InputGrammar, Variable, Rule)
- dsl.nim - Pure Nim DSL for grammar definitions
- parse_grammar.nim - JSON grammar parser (for grammar.js)
- prepare_grammar.nim - Grammar preparation (flattening, inlining, optimization)
- build_tables.nim - NFA/DFA/LR(1) table construction
- codegen.nim - Nim code generation for parsers
- js_exec.nim - JavaScript execution engine (for grammar.js)
- query.nim - Tree-sitter compatible query engine
- parser_types.nim - Parser runtime types
- parser_runtime.nim - Parser runtime implementation
- dsl.js - JavaScript DSL used during grammar.js execution
docs/ - Documentation
- getting_started.md - Installation and first parser
- import_grammar.md - importGrammar macro guide
- build_grammar.md - buildGrammar macro guide
- using_dsl.md - Nim DSL reference
- query.md - Query engine documentation
- advanced_usage.md - Conflict resolution and debugging
examples/ - Example projects
- 05_importGrammar/ - Using importGrammar with grammar.js
- 06_buildGrammar/ - Using buildGrammar with pure Nim

License

MIT

README.md

Treestand

Purpose

Features

Installation

CLI Usage

Generate a Parser

Test a Grammar

Library APIs

Using tsGrammar Macro (Recommended)

Embedded Actions

Using importGrammar Macro

Using buildGrammar Macro

Project Structure

License

Using `tsGrammar` Macro (Recommended)

Using `importGrammar` Macro

Using `buildGrammar` Macro