Everyone knows how to write shell scripts, or so they think. In case this wasn’t always true, we’ve just released experimental support for Bash in Semgrep. This allows you to write rules that will catch many problems with misuses of shell syntax as well as checking for the unsafe usage of various commands. Without further ado, here are three examples where Semgrep works better than plain grep.
Detecting a call to a forbidden command
Detecting variable splitting
Many would expect $X
or ${X}
to be replaced by the value of
the X
variable. This is incorrect because the variable undergoes
splitting on whitespace or even on other characters as specified by
the IFS
variable.
First, let's protect ourselves against the obscure problem of the
IFS
variable. IFS
is a special shell variable that determines the
separators used by Bash when splitting strings. The default value is
whitespace (space, tab, or newline). Let's ensure IFS
is not set
globally to avoid the risk of splitting strings where it's not
intended:
$ docker run -it ubuntu
root@d43da008a9b3:/# echo $PATH
/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
root@d43da008a9b3:/# IFS=:
root@d43da008a9b3:/# echo $PATH
/usr/local/sbin /usr/local/bin /usr/sbin /usr/bin /sbin /bin
Uh oh, the colon separators are now missing. A legitimate use of this feature would be to read comma-separated values from the command line:
$ IFS="," read -a values # read values from stdin into an array
1,23,456
$ echo "${values[@]}" # print array
1 23 456
IFS
hasn't changed for the commands that follow as you can see:
$ x=hello,world
$ echo $x
hello,world
Great. So, we only need to prevent IFS
from being set
globally. Here's a simple Semgrep rule that takes care of it:
Now that we have IFS
issues out of the way, let's try to catch
variable expansions that are unquoted and would get split when they
contain whitespace. All we have to do is express "an expansion of any
shell variable not surrounded by double quotes". Here's a solution:
There are two subtleties in this approach. First, the pattern ${$VAR}
can be
surprising:
${}
is the expansion of a shell variable, as is the usual case in Bash.$VAR
in a pattern is a Semgrep metavariable. It's not the expansion of a shell variable. Here it stands for any shell variable.
Therefore, ${$VAR}
means "the expansion of any variable", which Semgrep
captures under the name $VAR
. The captured value of $VAR
is
recalled in the pattern-not-inside: "...${$VAR}..."
which
filters out matches where the variable expansion is
double-quoted.
Here's a key to what different Semgrep patterns mean:
$METAVAR
: a Semgrep metavariable, matches any expression.${SHELLVAR}
: the expansion of the shell variableSHELLVAR
. It will match both${SHELLVAR}
and$SHELLVAR
in a script. The syntax$SHELLVAR
can't be used in a pattern because it conflicts with the syntax for metavariables.$shellvar
or${shellvar}
: the expansion of the shell variableshellvar
....
: a Semgrep ellipsis, matches any sequence of items.
The second gotcha in this rule is the YAML syntax. The following wouldn't work because YAML itself understands double-quoted strings:
- pattern-not-inside: "...${$VAR}..."
The pattern above is interpreted as ...${$VAR}...
, which is not what
we want. To keep the quotes (and line breaks) verbatim, we use the
pipe |
syntax:
- pattern-not-inside: |
"...${$VAR}..."
Detecting an iteration over the output of ls
This example implements ShellCheck rule SC2045. It should be self-explanatory:
A word of caution
As of this week, Bash support in Semgrep is still experimental. Many bugs exist and some constructs can't be matched against. We've been trying to implement the most essential features first. Here's where we're at:
Parsing: about 92% of the Bash/sh code is parsed successfully.
Searching for the following constructs should mostly work:
simple commands
pipelines
foo | bar | baz
if
,for
,while
,case
function definitions
assignments
simple variable expansions
$X
,${X}
double-quoted strings
command substitution
$(cmd)
subshells
(cmd)
and command grouping{ cmd; }
The following Semgrep patterns are supported in most places where they make sense:
ellipsis
...
metavariables
$MV
deep ellipsis
<... foo ...>
Features that aren't supported yet include:
matching over file redirections e.g.
cmd > file
matching over background jobs specifically
cmd &
scanning scripts without a
.sh
orbash
extensionunderstanding the syntax of popular commands e.g.
set -eu
vs.set -u -e
aren't treated as equivalent for now.matching over array accesses e.g.
${arr[$i]}
,arr[$i]=foo
,${#arr[@]}
, etc.matching over arithmetic expressions
matching over C-style loops
Enjoy!