There’s a trend to unify security and developer flows that goes by various names like DevSecOps and “shift left.” Generally security teams seem more enthusiastic about this than developer teams, who refer to security as a “soul-withering chore” or “insufferably boring procedural hinderance.” If developers don’t believe in code hardening ideas, things can go wrong in amusing ways.
I was recently reading the 2019 UK government security evaluation of Huawei which provides a humorous example: developers for one of the LTE base stations literally used a #define
to redefine unsafe_functions like strcpy
to safer varients like strlcpy
. Specifically:
Report from the Huawei Cyber Security Evaluation Centre Oversight Board, page 29, March 2019
In net, the report found that about 11% of memcpy
-like and 22% of strcpy
-like function calls in the codebase were to the least safe variants. And assuming safety just from the function name is simplistic—even the safe variants could still be dangerous.
In Huawei’s defense, while they have been subjected to an unusual level of public scrutiny they are definitely not an outlier in having trouble getting developers to adopt secure coding guidelines. In the memcpy
case, it’s been banned at Microsoft since 2009, but I haven’t personally seen any other companies outside the FAANG (Facebook/Apple/Amazon/Netflix/Google) that have done the same. You can actually tell who has banned the bad POSIX functions empirically, by looking at binaries—a non-profit named CITL did a great overview of this and more in the IoT space. As you’d probably guess, the results are dismal.
Clint Gibler and I recently presented at Global AppSec SF 2020 about what we think are some of the core principles for successfully bridging the developer/security divide. If you’re looking for a quick takeaway, we suggest three principles:
Be fast: scanning should run in parallel and be faster than the slowest test. Otherwise you’re blocking developer workflow.
Be early: if you’re going to complain, the editor is better than commit-time is better than CI.
Autofix: don’t just complain, offer a fix. Even an imperfect suggestion is better than nothing.
P.S. If you have a giant C codebase sitting around and you want to figure out how much strcpy
is used, the two fastest ways are probably ripgrep, a hyper-fast grep (noisy, fastest) or Semgrep, a semantic grep-for-code (precise, slower; full disclosure—I work on Semgrep!). Here’s how you could do it in both:
1(webkit@aadf31) $ rg --quiet --stats strcpy -tc
2163 matches
3...
But ripgrep includes some false positives, like commented code:
145:// string.h is not guaranteed to provide strcpy on C++ Builder.
We could obviously try to improve our regex to avoid //
at the start of the line, but then we’d have to consider /*
also, plus a lot more if we really want to do it right, so it might be nice to have a proper parser, which is what Semgrep provides. If you’re curious about the syntax, there’s an online tutorial that takes a few minutes or you can read the docs.
1(webkit@aadf31) $ semgrep -e "memcpy(...)" --lang=c .
2...
3ran 1 rules on 1710 files: 80 findings
(Note that Semgrep has some parse failures on Webkit; C support is still in alpha)