Releasing Semgrep 1.0

Announcing a milestone release: Semgrep 1.0

Yoann Padioleau
December 1st, 2022
Share

Introduction

After almost 3 years of development and 123 0.x releases, we are finally releasing Semgrep 1.0! It’s been an incredible ride and we, at r2c, are really excited about reaching this milestone.

There are different ways and criteria for open source (OSS) projects to reach 1.0, as discussed in this Stack Overflow post. In the case of Semgrep, I think a few of the criteria mentioned in the posted answers could match, but I think the most honest and fitting one is: “we've been at 0.x for 10 years and people are happy, I guess we might as well [release 1.0]”.

Semgrep has been pretty stable and robust for a long time now. It is now used by thousands of companies and is used in CI all over the world (we recently reached more than 10 million Docker pulls). We’ve tried hard release after release not to break the workflow of all the people who now depend on Semgrep, and while we occasionally introduce regressions, we try really hard to fix them ASAP.

The rule syntax has also been pretty stable for a long time. We continue to add new features and syntax (e.g., extract modetaint propagators), but rules written with old versions of Semgrep continue to work on recent versions of Semgrep, a bit like how Java 1 programs can still be compiled with recent JDKs.

The last breaking change I remember was the removal of pattern-where-python: and its related command-line flags more than a year ago in Semgrep version 0.65. However, this was done for excellent reasons (safety and performance), and we introduced a better replacement for it with metavariable-comparison: long before we removed pattern-where-python:.

By releasing 1.0 today, we also want to convey that the current rule syntax and Semgrep command-line interface (CLI) are stable and here to stay! You can expect most, if not all, rules written with Semgrep 1.0 to still work with Semgrep 1.123 (in fact, they will also probably still work with Semgrep 20.42!!).

Semantic versioning and Semgrep

Up until now, Semgrep was using, like many other OSS projects, a “Semantic Versioning” (SemVer) versioning scheme. While we think SemVer makes perfect sense for OSS libraries and APIs, it is not the right fit for programming languages and tools like Semgrep. For example, JavaTypescript, OCaml, and many popular OSS projects such as Emacs, the Linux kernel, and GCC do not use SemVer.

In the case of libraries, it is not rare for an API to evolve in a non-backward compatible way and as the author of the library you want to clearly indicate these major (breaking) changes to users of this library by incrementing the major version number of the library.

In the case of languages (and in essence), Semgrep is a language, the syntax evolves but is usually backward compatible. As I mentioned before, most if not all Java 1 programs are still accepted by recent Java compilers. For languages, the major version number is usually incremented for the introduction of major new features (e.g., generics in Java 5.0); we plan to do the same for Semgrep.

Even when we release new major versions of Semgrep, we plan to remain backward compatible, just like Java. The full rule syntax is now formally specified using a JSON schema. The same is true for the JSON output of Semgrep (also available as a more readable ATD specification here), as well as the core data structure of Semgrep, the generic AST. Those specifications are not only useful for external people to build tools on top of Semgrep, but they are also used internally to statically and dynamically check that the input and output of Semgrep remain backward compatible as we release new versions of Semgrep.

However, at r2c we want to remain agile and occasionally introduce experimental features to get early feedback from users. We test these experimental features and if the community does not benefit from them, we remove these features. However, those experimental features should be marked clearly from the beginning in our documentation as well as in the specifications above (look for the EXPERIMENTAL tag in comments in the schema files).

As a part of release 1.0, we are maturing AutofixGeneric pattern matching, and Metavariable Analysis out of the experiments section. Those features have been used extensively at r2c and externally, and should not be considered “experiments” anymore. They have been so helpful that external people even posted blog posts about them advertising their use (for example 123).

The Semgrep community

We could not have reached 1.0 without the help, feedback, and contributions from the amazing community that grew around Semgrep in the last few years. As of 1.0, Semgrep has surpassed 7,400 stars on GitHub, supports more than 25 programming languages, has more than 2,000 members on the Semgrep Community Slack, and has more than 1,700 followers on Twitter! We at r2c would like to thank you all!

I personally would like to give my first thank you to Ajin Abraham who was the first person outside r2c who believed in Semgrep. Ajin made a big bet for Semgrep when in 2020 he switched to Semgrep internally for his nodejsscan tool. Later on, he then used Semgrep from the start for his more recent mobsfscan tool. I would also like to thank Sjoerd Langkemper and Ruin0x11 for adding support for Lua, Rust, C#, and PHP, for filing many (too many?) bug reports, fixing many bugs, and answering many questions on our Slack. Thanks also to Damian Gryski, Kurt Boberg, Lewis Arden, Blaise Parsia, Michael Sorens, Erwan le Rousseau, Ben Cambourne, and many other people I forgot for their rules, bug reports, and also many answers on Slack. Thanks to Jacob Salassi from Snowflake and Dev Akhawe from Figma for taking the leap to be our early customers. Last but not least, I would like to thank Max Brunsfeld and Douglas Creager for creating and maintaining tree-sitter (along with all maintainers of tree-sitter), which powers an important part of Semgrep.

The future of Semgrep

Semgrep would be nothing without its rules. This is why an important part of our roadmap at r2c is to write new rules, for new languages and new frameworks, detect new classes of bugs, and also to improve our existing rules. Like the Papa John’s ads say: “Better rules, better pizza!”.

We will continue to add new features to the Semgrep OSS engine to make it even easier to write rules, support more languages, and improve even more the performance of Semgrep. As I write this blog post, we already have some plans for 2.0 so stay tuned! In any case, we will continue to follow the original philosophy of Semgrep!

In the next few hours, we will probably be removed from the list of https://0ver.org/.

I will leave you with those last words that inspired this new release and blog post: “Done is better than perfect”.

Success Kid's Mom Won't Stand for Steve King's 'Meme' Ad | WIRED

About

Semgrep lets security teams partner with developers and shift left organically, without introducing friction. Semgrep gives security teams confidence that they are only surfacing true, actionable issues to developers, and makes it easy for developers to fix these issues in their existing environments.