How to upgrade the grammar for a language
Like for adding a language, most of these instructions happen in ocaml-tree-sitter
.
Let's call our language "X".
Summary (ocaml-tree-sitter)
In ocaml-tree-sitter:
- Update submodule tree-sitter-X.
- From
lang/
, run./test-lang X
. - From
lang/
, ask a Semgrep team developer to run./release X
.
In semgrep:
- In the semgrep repo, update submodule semgrep-X.
- In the semgrep repo, update the OCaml code that maps the CST to the generic AST.
In the end, make sure the generated code used by the main branch of semgrep can be regenerated from the main branch of ocaml-tree-sitter:
- Merge your semgrep branch.
- Merge your ocaml-tree-sitter branch.
Components
Here are the main components:
- the OCaml code generator
ocaml-tree-sitter:
generates OCaml parsing code from tree-sitter grammars extended
with
...
and such. Publishes code into the git repos of the formsemgrep-X
. - the original tree-sitter grammar
tree-sitter-X
e.g., tree-sitter-ruby: the original tree-sitter grammar for the language. This is the git submodulelang/semgrep-grammars/src/tree-sitter-X
in ocaml-tree-sitter. It is installed at the project's root innode_modules
by invokingnpm install
. - syntax extensions to support semgrep patterns, such as ellipses
(
...
) and metavariables ($FOO
). This islang/semgrep-grammars/src/semgrep-X
. It can be tested from that folder withmake && make test
. - an automatically-modified grammar for language X in
lang/X
. It is modified so as to accommodate various requirements of the ocaml-tree-sitter code generator.lang/X/src
andlang/X/ocaml-src
contain the C/C++/OCaml code that will published into semgrep-X e.g. semgrep-ruby and used by semgrep. - semgrep-X: provides generated OCaml/C parsers as a dune project. Is a submodule of semgrep.
- semgrep: uses the parsers provided by semgrep-X, which produce a CST. The program's CST or pattern's CST is further transformed into an AST suitable for pattern matching.
Make sure the above is clear in your mind before proceeding further. If you have questions, the best way is reach out on the Semgrep Community Slack channel.