What about Windows?
Here at Semgrep, we make a super fast SAST tool that users can run locally from their command line. We publish Semgrep binaries for Linux and macOS, but not Windows. While Windows users are able to work around this limitation by using Windows Subsystem for Linux (WSL), this comes with a new set of problems: complicated setup, performance penalties, and organizations that flat-out disallow WSL on their machines.
We’ll get into why we haven't already published Windows binaries in a second, but first: why do we care so much?
While it’s common for Windows shops to have a docker-based CI pipeline (meaning they can scan their code with Semgrep in a docker container), these folks wouldn't be able to run Semgrep locally - either for pre-commit, or while coding in the IDE! Ultimately, we want every developer to benefit from Semgrep as early as possible in the development process, regardless of their environment.
Forcing Windows developers to wait until CI time to get feedback from Semgrep isn't great - we've worked hard to build a product that surfaces findings that are accurate and easy to action on, so our goal is to be able to inform every developer of security issues while they code, not just after the fact.
OCaml on Windows is hard
OCaml is a great language, with a lot of features tailored to static analysis, but one that's lacking is its cross platform support. Compiling OCaml for a *nix platform is easy and pleasant, akin to using any other popular compiler. There are distributions of OCaml for Windows, but as of writing it’s not a first class citizen (although this should change with OCaml 5 and Opam 2.2).
The hard part of taking advantage of this is it’s something we’ve never done before, with a lot of unknowns. As a startup with limited resources, instead of trying to compile Semgrep straight to Windows, we’ve decided to go down a route we’ve already walked.
How do we get OCaml on Windows?
So what other options do we have to support Windows users? Last year during a hack week project an engineer was able to compile Semgrep from OCaml to JavaScript, to run it in the browser. You can, and should, read the blog post about his harrowing journey through the land of JS and Wasm.
What else runs JavaScript? That’s right, Windows does. Since we want to targe native Windows, and not a browser (since most people don’t do all their development in the browser), we’ll want to use the Node JavaScript engine, since it runs everywhere, including natively on Windows
Unix and Node
Thanks to our previous work which allowed us to transpile OCaml to JS and our C dependencies to Wasm, we could get a working prototype of a Semgrep JS using pretty quickly. If you haven’t read the blog post linked above, I’ll quickly summarize how we do this. We use two main tools, Js_of_ocaml, an OCaml → JS transpiler, and Emscripten, a C → Wasm compiler.
Although Js_of_ocaml can target Node and does a good job mapping some basic system primitives (such as IO), we quickly ran into an issue. We use a lot more advanced Unix-specific primitives, such as system calls for threads, process spawning, forking, signals, and much more.
Additionally, there was some missing functionality in Js_of_ocaml. A lot of the work it did for us was specific to Unix and untested on Windows. There is an escape hatch though, Js_of_ocaml maps system calls to JS functions. So a system call to fork (spawn()) when called from OCaml will in turn map to a JS function called unix_spawn that’s been declared globally. So we can add these functions to our final JS file, and the transpiled OCaml will see and use them. Node also has a ton of useful OS-agnostic primitives that map directly to these, which makes implementing these super easy.
This is great, we can get somewhere now, but there’s a lot of functionality within Semgrep and implementing every system call it uses would take a long, long time. Most users can get away with the Semgrep IDE extensions though, which means there’s some system calls we don’t need, like signals, pthreads, and some other filesystem ones that are used in different modes of the CLI program. So we decided to limit our surface area to these system calls by just building the Language Server, which is what powers the IDE, and running only one job for scanning, so there is no need for system calls involving spawning/signaling between processes.
Virtual Modules
So now that we’ve decided to limit our surface area to Unix, we can write some OCaml and setup our build system to only transpile the Language Server. We see though that somehow there’s a bunch of unix primitives that are missing, and its expecting something called libev:
As it turns out, we use an OCaml library called Lwt for async io and multithreading when running the Language Server, so that blocking operations (like a scan that takes a long time) doesn't make the extension appear unresponsive. Under the hood, Lwt uses a C library called libev to schedule its async promises. Luckily there’s a version of Lwt that plays nice with js_of_ocaml, and will work when transpiled to JavaScript, but this presents a problem. We still want to be able to build native Semgrep, so we can’t switch out our Lwt version wholesale. If we try to include both to choose one at runtime, the native version of Lwt pulls in libev bindings that can’t be transpiled, since they’re C. So we need to be able to switch which version of Lwt we use at build time.
A quick primer on OCaml build systems
Dune is a tool that manages the builds of an OCaml project, and within a project there are module interface files (.mli
) and module implementations files (ml
). Module interfaces provide just a set of types describing what is in a module, and the module implementations contain the actual logic, which allows users to expose only a subset of functions for consumers of the module/library. Similar to how C has header and C files, or Typescript’s type declaration files and Typescript files.
Now enter Dune’s virtual modules, our solution to choosing between two versions of Lwt. Normally OCaml requires exactly one module implementation with every interface, but Dune’s virtual module system allows writing as many implementations as you want, with only one interface. This means we can wrap Lwt in a virtual module, and at build time choose either the native or JavaScript friendly version of the library, and only pull in their respective dependencies, instead of all of them.
JSON Parsing Deep Dive
Now we finally have a version of the Language Server that’s in JavaScript, and runs on a Linux machine. However, there’s a problem. If we login and ask the server to use rules configured for Semgrep’s own organization from the Semgrep Cloud Platform, then the server overflows its stack. Weird, but JavaScript has great tooling, especially for memory usage. After a quick check it seems that the issue was when we parse the default rule set, it’s small and the program has enough memory. But our internal rule set is way larger than the default, and it easily overflows Node’s default stack size of 984kB.
After increasing it to 4gb, we still get a stack overflow. Needing a 4gb stack to parse JSON is not normal and it was taking a super long time, ~10 minutes, when it usually takes less than one. When debugging, we see a huge number of stack frames, meaning somewhere we’re calling a ton of functions without returning. They’re also the same function, meaning that this is a recursive call. This makes sense, as we use a pure OCaml JSON parser, and OCaml has tail call optimization (TCO).
What is Tail Call Optimization?
TCO is a feature some languages use that allows them to do deep recursive calls, by removing intermediate stack frames that it knows it will not need. Many OCaml libraries take advantage of this, as it’s a common pattern in most functional languages. Sadly no JavaScript runtime supports this, except Safari for some weird reason. This means that our transpiled code blows up in the JavaScript world, so we’re going to need a different approach.
A nice feature of js_of_ocaml is that it allows you to call JavaScript functions from OCaml. This seems bizarre, as OCaml doesn’t have a JavaScript runtime and doesn’t know how to run JavaScript. We remember though that the OCaml is being transpiled to JavaScript, so really OCaml doesn’t need to know how to run JavaScript, and instead js_of_ocaml inserts the function calls in the final transpiled JavaScript. This allows us to use the native JavaScript JSON parser, which gives us a huge speedup and allows us to parse JSON without 4gb of memory.
Finally the Language Server transpiled to JavaScript is “just working.” We should be able to copy some JS and Wasm files over to Windows, right?
Windows Weirdness
After trying to run the transpiled Language Server on Windows, we ran into three more bugs of increasing difficulty. Those of you familiar with Windows systems and how they differ from *nix systems may have seen this coming.
Parsing Newlines
The first issue, a minor one, is that files on Windows have different line endings than Linux. Since a huge part of Semgrep is reading files and parsing them, this really matters. In *nix systems, lines end in just the newline character, \n
. On Windows they end with a carriage return, then a newline \r\n
. Ask your local elder engineer what a carriage return is.
This led to us parsing files incorrectly, and a ton of other file related things breaking. Luckily, the solution was easy: take this into account when reading files, by just patching some of our common library functions used in Semgrep to read Windows’ newlines if the system was Windows.
Windows HOME
The next, medium sized issue, was that when users tried to login through the Language Server, their session wouldn’t persist. On *nix systems we write your login token to /home/<username>/.semgrep
, or wherever you’ve configured the environment variables HOME
or XDG_CONFIG_HOME
. On Windows the closest location would be USERPROFILE
or HOME_DRIVE/HOME_PATH
.
The first one is shorter to type, so we used that, and now we actually persist login sessions. This bug was a lot less obvious, as the failure mode for Semgrep was just to not write anything, and not produce an error. It also wasn’t clear from a brief search on the Internet what the consensus was on where to store data like this, so we had to do some digging to figure out the proper equivalent.
Where is the Void in Windows
Now for the last issue, a big bad Windows bug. In *nix world, when a developer doesn’t want to print something to standard out, they’ll just write it to /dev/null
. This is just a file that is always empty, essentially a black hole - anything written here will disappear into the void.
When running on Windows, the Language Server would try to write to <current_path>/NUL
. Which raised an error as it didn’t exist. Clearly NUL must be the Windows equivalent of /dev/null
, but why is it mad the file doesn’t exist? After a quick Google search, the Internet tells us that you can access NUL at any path, i.e. C:\<path>\NUL
, and it will work like /dev/null
. So why isn’t this happening?
As it turns out, the Internet lies, but only in a way that’s more misleading than an outright false statement. As it turns out the real NUL
path is the monstrosity \\\\.\\NUL
, because really NUL
is a device, not a file, and Windows file systems are weird. So we had to patch some code on the js_of_ocaml side to handle this correctly.
It “Just” Works
At this point we have have something that works across *nix and Windows systems, that’s written purely in JS and Wasm.
Now we can ship a few extra JS files with our extension, and everything will work on any *nix and Windows OS that can run Node, with any architecture, without having to install a binary.
An LLVM Digression
Normally, that’s where a blog post would end. Writing tests is usually boring and straightforward, but not this time.
Semgrep is made up of a bunch of parsers generated by a library called tree-sitter, and those parsers are written in C. This is one of the reasons why we have to use Emscripten to compile C into Wasm. This build process, which has to happen before the tests run, was taking an extraordinarily long time for a small number of the languages we support. Normally, we can compile the Ruby parser in less than two seconds. Compiling it to Wasm was taking up to an hour. This made building and testing near impossible, as compiling all the parsers would take hours. Wasm is very different from “real” instruction sets (what CPUs run) like RISCV or x86, but a 1000x slow down seemed extreme, and so we decided to investigate.
Emscripten uses Clang as its backend, which is just LLVM. After playing around with compiling some other C to Wasm using LLVM, we confirmed our suspicions that, no, compiling to Wasm doesn’t take that long. In fact, the compile times are comparable to Arm or x86 for a similar amount of C code. Something specific to our code was causing the compiler to take this long.
Luckily, LLVM has a flag called ftime-report
, which will break down where time is spent during the compilation process. This time report has a lot of information, most of which we don’t understand, but we do understand what wall time is, namely how long something took IRL. Let’s look at a report from compiling the Ruby parser to Wasm, which at the time of writing takes seven minutes, but previous versions of this parser took up to 40.
We see at the bottom of this report the total was 428 seconds:
Most of the passes, and entire sections seem to take less than a second.
Lo and behold, it seems like we spend a lot of time on this section of passes:
Now we see that we spend a few milliseconds on compiling, and a few minutes on one specific optimization pass called WebAssemblyFixIrreducibleControlFlow
. After turning off all optimizations (using -opt-bisect
) we were frustrated to see that compilation still took roughly the same amount of time.
At this point it was time to dive into the LLVM code. And as it turns out, there's some optimization passes that are always on, even when using opt-bisect
or setting the optimization level to 0.
So why is this pass taking so long anyways? Is there something we can change about our code to shorten this? After looking at the documentation, we can see that this code does some complex stuff that we don't have time to get into (info about this is in the documentation - there’s even a paper on the topic), but the time complexity of it is:
O(NumBlocks NumNestedLoops NumIrreducibleLoops + NumLoops * NumLoops)
Okay - now the problem is becoming clearer. We’re compiling a parser, which is bound to have a lot of loops and blocks, and we’re ending up with some sort of exponential runtime.
These parsers are generated, and hundreds of thousands of lines long, so fixing them is not really an option. It doesn’t seem like this optimization is actually needed, so we decided to disable it and see if things break. After digging around in the LLVM flags some more, it looks like there are ways to disable some other mandatory optimizations, but not this one. Adding a flag to disable an LLVM pass was only a few lines of code, so we went ahead and added one. After compiling with our new flag enabled, we see clearly that this optimization pass was our culprit:
The question is, does this break anything? We ran everything against our large test suite, and everything passed. We decided to make a PR and add this flag to LLVM, so we can actually use it in our build system, and soon enough the LLVM maintainers accepted our PR. This flag saves us a few hours on every single build, so the effort was definitely worth it.
Windows support is in beta!
After we transpiled OCaml to JavaScript, mapped Unix system calls to Node, tackled Windows weirdness, patched exponential time compiler passes, and a whole bunch more, we finally have a nice JavaScript file that’ll run our Language Server anywhere.
We’re still ensuring that everything is up to snuff and speedy as ever, so go try out our VSCode and IntelliJ extensions on Windows and let us know what you think in our community Slack.