Command injection prevention for Python
This is a command/code injection prevention cheat sheet by r2c. It contains code patterns of potential ways to run an OS command or arbitrary code in an application. Instead of scrutinizing code for exploitable vulnerabilities, the recommendations in this cheat sheet pave a safe road for developers that mitigates the possibility of command/code injection in your code. By following these recommendations, you can be reasonably sure your code is free of command/code injection.
Mitigation summaryβ
In general, try not to let dynamic content into APIs intended for code / OS command execution. If this is not an option then perform proper input validation and contextually escape user data.
Check your project for these conditions:β
semgrep --config p/python-command-injection
1. Running an OS commandβ
1.A. Using subprocess moduleβ
The subprocess
module allows you to start new processes, connect to their input/output/error pipes, and obtain their return codes. Methods such as Popen
, run
, call
, check_call
, check_output
are intended for running commands provided as an argument ('args'). Allowing user input in a command that is passed as an argument to one of these methods can create an opportunity for a command Injection vulnerability.
Example:
import subprocess
import sys
# Vulnerable
user_input = "foo && cat /etc/passwd" # value supplied by user
subprocess.call("grep -R {} .".format(user_input), shell=True)
# Vulnerable
user_input = "cat /etc/passwd" # value supplied by user
subprocess.run(["bash", "-c", user_input], shell=True)
# Not vulnerable
user_input = "cat /etc/passwd" # value supplied by user
subprocess.Popen(['ls', '-l', user_input])
# Not vulnerable
subprocess.check_output('ls -l dir/')
References:β
- subprocess module documentation
- shlex.split documentation
- shlex.quote documentation
- CVE-2020-7698: Gerapy Command Injection
- CVE-2020-11981: Apache Airflow Command Injection
Mitigationβ
Do not let user input into subprocess
methods. Alternatively,
- Always try to use an internal Python API (if it exists) instead of running an OS command.
- Donβt pass user-controlled input.
- If it is not possible, use an array with a sequence of program arguments instead of a single string.
- Use
shlex.split
to correctly parse a command string into an array andshlex.quote
to correctly sanitize input as a command-line parameter. - Avoid running
sh
as a command with arguments. If it is not possible, strip everything except alphanumeric characters from an input provided for the command string and arguments.
Semgrep ruleβ
python.lang.security.audit.dangerous-subprocess-use1.B. shell=Trueβ
Functions from subprocess
module have the shell
argument for specifying if the command should be executed through the shell.
Using shell=True
is dangerous because it propagates current shell settings and variables.
This means that variables, glob patterns, and other special shell features in the command string are processed before the command is run,
making it much easier for a malicious actor to execute commands. The subprocess
module allows you to start new processes, connect to their input/output/error pipes, and obtain their return codes. Methods such as Popen
, run
, call
, check_call
, check_output
are intended for running commands provided as an argument ('args'). Allowing user input in a command that is passed as an argument to one of these methods can create an opportunity for a command injection vulnerability.
Example:
# prints home directory
subprocess.call('echo $HOME', shell=True)
# throws an error
subprocess.call('echo $HOME', shell=False)
References:β
Mitigationβ
Avoid using shell=True
. Alternatively, use shell=False
instead.
Semgrep ruleβ
python.lang.security.audit.subprocess-shell-true1.C. Using os module to execute commandsβ
The os
module provides a portable way of using operating system dependent functionality. Methods such as system
, popen
and deprecated popen2
, popen3
and popen4
are intended for running commands provided as a string. Letting user supplied data into a command that is passed as an argument to one of these methods can create an opportunity for a command injection vulnerability.
Example:
import os
# Vulnerable
user_input = "foo && cat /etc/passwd" # value supplied by user
os.system("grep -R {} .".format(user_input))
# Vulnerable
user_input = "foo && cat /etc/passwd" # value supplied by user
os.popen("ls -l " + user_input)
References:β
Mitigationβ
Do not let user input into os
methods. Alternatively,
- Always try to use an internal Python API (if it exists) instead of running an OS command.
- Consider using
subprocess
functions with array of program arguments. - Donβt pass user-controlled input.
- If it is not possible, then donβt let running arbitrary commands. Use an allow list for inputs.
Semgrep ruleβ
python.lang.security.audit.dangerous-system-call1.D. Using os module to spawn a processβ
The os
module allows executing the program path in a new process. Variations of spawn method including spawnl
, spawnle
, spawnlp
, spawnlpe
, spawnv
, spawnve
, spawnvp
, spawnvpe
, posix_spawn
and posix_spawnp
are intended for spawning a process with a program passed as a string argument. Allowing spawning of arbitrary programs or running shell processes with arbitrary arguments may result in a command injection vulnerability.
Example:
import os
# Vulnerable
user_input = "/foo/bar" # value supplied by user
os.spawnlpe(os.P_WAIT, user_input, ["-a"], os.environ)
# Vulnerable
user_input = "cat /etc/passwd" # value supplied by user
os.spawnve(os.P_WAIT, "/bin/bash", ["-c", user_input], os.environ)
References:β
Mitigationβ
Do not let user input into spawn
methods. Alternatively,
- Always try to use an internal Python API (if it exists) instead of running an OS command.
- Donβt pass user-controlled input.
- Use
shlex.split
to correctly parse a command string into an array andshlex.quote
to correctly sanitize input as a command-line parameter. - Avoid running
sh
as a command with arguments. If itβs not possible to avoid, strip everything except alphanumeric characters from an input provided for the command string and arguments.
Semgrep ruleβ
python.lang.security.audit.dangerous-spawn-process.dangerous-spawn-process1.E. Replacing current process with execβ
Execution methods of the os
module are intended to execute a new program, replacing the current process. The available methods are execl
, execle
, execlp
, execlpe
, execv
, execve
, execvp
, and execvpe
. Allowing running of arbitrary programs or running shell processes with arbitrary arguments may result in a command injection vulnerability.
Example:
import os
# Vulnerable
user_input = "/evil/code" # value supplied by user
os.execl(user_input, '/foo/bar', '--do-smth')
# Vulnerable
user_input = "cat /etc/passwd" # value supplied by user
os.execve("/bin/bash", ["/bin/bash", "-c", user_input], os.environ)
References:β
Mitigationβ
Do not let user input into exec
methods. Alternatively,
- Always try to use internal an Python API (if it exists) instead of running an OS command.
- Donβt pass user-controlled input.
- Use
shlex.split
to correctly parse a command string into an array andshlex.quote
to correctly sanitize input as a command-line parameter. - Avoid running
sh
as a command with arguments. If itβs not possible to avoid, strip everything except alphanumeric characters from an input provided for the command string and arguments.
Semgrep ruleβ
python.lang.security.audit.dangerous-spawn-process.dangerous-spawn-process1.F. Wildcard character in a system call that spawns a shellβ
Spawning a shell or executing a Unix shell command with a wildcard leads to normal shell expansion, which can have unintended consequences if there exist any non-standard file names. Consider a file named '-e sh script.sh' -- this will execute a script when 'rsync' is called.
Example:
# directory example
[root@user public]# ls -al
total 20
drwxrwxr-x. 5 user user 4096 Oct 28 17:04 .
drwx------. 22 user user 4096 Oct 28 16:15 ..
drwxrwxr-x. 2 user user 4096 Oct 28 17:04 DIR1
drwxrwxr-x. 2 user user 4096 Oct 28 17:04 DIR2
drwxrwxr-x. 2 user user 4096 Oct 28 17:04 DIR3
-rw-rw-r--. 1 user user 0 Oct 28 17:03 file1.txt
-rw-rw-r--. 1 nobody nobody 0 Oct 28 16:38 -rf
# running Python code like this will use `-rf` as an argument for rm and force delete all directories
os.system("/bin/rm *")
References:β
Mitigationβ
Avoid wildcards in Unix shell commands.
Semgrep ruleβ
python.lang.security.audit.system-wildcard-detected1.G. Running shell commands asynchronouslyβ
asyncio.subprocess
is an async/await API to create and manage subprocesses.
Such methods as create_subprocess_shell
and Event Loop's subprocess_shell
are intended for running shell commands provided as an argument 'cmd'.
Allowing user input into a command that is passed as an argument to one of these methods can create an opportunity for a command injection vulnerability.
Example:
import asyncio
# Vulnerable
user_input = "cat /etc/passwd" # value supplied by user
loop = asyncio.new_event_loop()
# This is similar to the standard library subprocess.Popen class called with shell=True
loop.subprocess_shell(asyncio.SubprocessProtocol, user_input)
# Vulnerable
user_input = "cat /etc/passwd" # value supplied by user
asyncio.subprocess.create_subprocess_shell(user_input)
References:β
Mitigationβ
Do not let user input into asyncio subprocess
methods. Alternatively,
- Always try to use an internal Python API (if it exists) instead of running an OS command.
- Consider using asyncio.subprocess functions with array of program arguments (e.g.
create_subprocess_exec
). - Donβt pass user-controlled input.
- If itβs not possible, then donβt let running arbitrary commands. Use an allow list for inputs.
Semgrep ruleβ
python.lang.security.audit.dangerous-asyncio-shell.dangerous-asyncio-shell1.H. Creating subprocesses asynchronouslyβ
asyncio.subprocess
also allows the creation of subprocesses asynchronously.
Such methods as create_subprocess_exec
and Event Loop's subprocess_exec
are intended for creating a subprocess from one or more string arguments specified by args.
Allowing user input into a command that is passed as an argument to one of these methods can create an opportunity for a command injection vulnerability.
Example:
import asyncio
# Vulnerable
user_input = "/evil/exe" # value supplied by user
loop = asyncio.new_event_loop()
loop.subprocess_exec(asyncio.SubprocessProtocol, [user_input, "--parameter"])
# Vulnerable
user_input = "cat /etc/passwd" # value supplied by user
asyncio.subprocess.create_subprocess_exec("bash", ["bash", "-c", user_input])
# Not vulnerable
user_input = "/evil/exe" # value supplied by user
loop = asyncio.new_event_loop()
loop.subprocess_exec(asyncio.SubprocessProtocol, ['ls', '-l', user_input])
References:β
Mitigationβ
Do not let user input into asyncio subprocess
methods. Alternatively,
- Always try to use an internal Python API (if it exists) instead of running an OS command.
- Donβt pass user-controlled input.
- Use
shlex.split
to correctly parse a command string into an array andshlex.quote
to correctly sanitize input as a command-line parameter. - Avoid running
sh
as a command with arguments. If itβs not possible to avoid, strip everything except alphanumeric characters from an input provided for the command string and arguments.
Semgrep ruleβ
python.lang.security.audit.dangerous-asyncio-exec.dangerous-asyncio-execpython.lang.security.audit.dangerous-asyncio-create-exec.dangerous-asyncio-create-exec
2. Executing/evaluating codeβ
2.A. Executing code with execβ
exec()
function supports dynamic execution of Python code.
exec()
can be dangerous if used to execute dynamic content.
If this content can be input from outside the program, this may be a code injection vulnerability.
Example:
# Value supplied by user
user_input = "');import requests;requests.get('localhost:3000');print('"
# Vulnerable
exec("foobar('{}')".format(user_input))
References:β
Mitigationβ
Do not use exec()
for non literal values. Alternatively,
- Ensure executed content is not definable by external sources.
- If itβs not possible, strip everything except alphanumeric characters from an input provided for the command string and arguments.
- Donβt try to make
exec
safe with tricks like{'__builtins__':{}}
Semgrep ruleβ
python.lang.security.audit.exec-detected.exec-detected2.B. Evaluating code with evalβ
eval()
function evaluates string value as a Python expression.
eval()
can be dangerous if used to evaluate dynamic content.
If this content can be input from outside the program, this may be a code injection vulnerability.
Example:
# Value supplied by user
user_input = "__import__('code').InteractiveInterpreter().runsource('import requests;requests.get(\'localhost:3000\')')"
# Vulnerable
eval(user_input)
References:β
Mitigationβ
Do not use eval()
. Alternatively,
- Ensure evaluated content is not definable by external sources.
- If itβs not possible, strip everything except alphanumeric characters from an input provided for the command string and arguments.
- Donβt try to make
eval
safe with tricks like{'__builtins__':{}}
Semgrep ruleβ
python.lang.security.audit.eval-detected.eval-detected2.C. Accepting logging configuration with logging.config.listen()β
logging.config.listen()
starts up a socket server on the specified port, and listens for new configurations.
Because portions of the logging configuration are passed through eval()
, use of this function may open its users to a security risk.
While the function only binds to a socket on localhost, and so does not accept connections from remote machines, there are scenarios where untrusted code could be run under the account of the process which calls listen()
.
Example:
# Server example: starting up a socket server on 9999 port, and listening for new configurations.
import logging
import logging.config
logging.config.fileConfig('logging.conf')
t = logging.config.listen(9999)
t.start()
# Client example: sending configuration from `data_to_send` variable to localhost:9999
import socket, sys, struct
# Config example: print("pwned") will be evaluated and "pwned" will be printed to the console
data_to_send = """
[loggers]
keys=root
[handlers]
keys=hand01
[formatters]
keys=form01
[logger_root]
level=NOTSET
handlers=hand01
[handler_hand01]
class=StreamHandler
level=NOTSET
formatter=form01
args=(print("pwned"),)
[formatter_form01]
format=F1 %(asctime)s %(levelname)s %(message)s
datefmt=
class=logging.Formatter
"""
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect(('localhost', 9999))
s.send(struct.pack('>L', len(data_to_send)))
s.send(data_to_send)
s.close()
References:β
Mitigationβ
Verify what is sent across the socket. Alternatively,
- To avoid the risk, use the verify argument to
logging.config.listen()
to prevent unrecognized configurations from being applied. This could be done by encrypting and/or signing what is sent across the socket, such that the verify callable can perform signature verification and/or decryption.
Semgrep ruleβ
python.lang.security.audit.logging.listeneval.listen-eval2.D. Running code in interactive interpreterβ
The code module provides facilities to implement read-eval-print loops in Python.
Two classes: InteractiveInterpreter
and InteractiveConsole
are used for that.
Both have methods that can execute Python code: InteractiveInterpreter.runcode
executes a code object and InteractiveConsole.push
interprets a string as Python code.
This is dangerous if external data can reach these function calls because it allows a malicious actor to run arbitrary Python code.
Example:
import code
# Value supplied by user
user_input = "print('pwned')"
console = code.InteractiveConsole()
# Vulnerable
console.push(user_input)
# Value supplied by user
user_input = "print('pwned')"
interpreter = code.InteractiveInterpreter()
# Vulnerable
interpreter.runcode(code.compile_command(user_input))
References:β
Mitigationβ
Do not let user input in InteractiveInterpreter
/InteractiveConsole
methods. Alternatively,
- Ensure evaluated content is not definable by external sources.
- If itβs not possible, strip everything except alphanumeric characters from an input provided for the command string and arguments.
Semgrep ruleβ
python.lang.security.audit.dangerous-code-run.dangerous-interactive-code-run2.E. Using subinterpreter to run codeβ
_xxsubinterpreters.run_string
is an internal Python function that interprets the string as Python code.
If unverified user data can reach run_string
, this is a command injection vulnerability.
A malicious actor can inject a malicious string to execute arbitrary Python code.
Example:
import _xxsubinterpreters
# Value supplied by user
user_input = "print('pwned')"
# Vulnerable
_xxsubinterpreters.run_string(_xxsubinterpreters.create(), user_input)
References:β
Mitigationβ
Do not let user input in _xxsubinterpreters
methods. Alternatively,
- Ensure evaluated content is not definable by external sources.
- If itβs not possible, strip everything except alphanumeric characters from an input provided for the command string and arguments.
Semgrep ruleβ
python.lang.security.audit.dangerous-subinterpreters-run-string.dangerous-subinterpreters-run-string2.F. Running subinterpreter from regression tests packageβ
run_in_subinterp
is a function from regression tests package for Python that runs code in subinterpreter.
This is dangerous if external data can reach this function call because it allows a malicious actor to run arbitrary Python code.
Example:
import _testcapi
# Value supplied by user
user_input = "print('pwned')"
# Vulnerable
_testcapi.run_in_subinterp(user_input)
from test import support
# Value supplied by user
user_input = "print('pwned')"
# Vulnerable
support.run_in_subinterp(user_input)
References:β
Mitigationβ
Do not let user input in run_in_subinterp
function. Alternatively,
- Ensure evaluated content is not definable by external sources.
- If itβs not possible, strip everything except alphanumeric characters from an input provided for the command string and arguments.
Semgrep ruleβ
python.lang.security.audit.dangerous-testcapi-run-in-subinterp3. Abusing built-in functionsβ
3.A. Accessing dictionary with current global/local symbol tableβ
globals()
and locals()
return a dictionary representing the current global/local symbol table. Using non-static data to retrieve values from this table is extremely dangerous because it may allow an attacker to execute arbitrary code on the system.
Example:
# Name of the arbitrary function supplied by user
user_input = "Name of the function"
# Vulnerable call of arbitrary function
function = locals().get(user_input)
function()
# Name of the arbitrary function supplied by user
user_input = "Name of the function"
# Vulnerable call of arbitrary function
function = test1.__globals__[user_input]
function()
References:β
Mitigationβ
Do not access global/local symbol tables. Refactor your code not to use globals()
and locals()
.
Semgrep ruleβ
python.lang.security.dangerous-globals-use.dangerous-globals-use3.B. Dynamically updating and accessing code annotationsβ
Annotations passed to typing.get_type_hints()
are evaluated in globals
and locals
namespaces.
Make sure that no arbitrary value can be written as the annotation and passed to the typing.get_type_hints
function.
Example:
from typing import get_type_hints
class C:
member: int = 0
def smth():
# Changing annotation for `member` property of class C
C.__annotations__["member"] = "print('pwn')"
# Annotations are evaluated and `print('pwn')` code gets executed
get_type_hints(C)
References:β
Mitigationβ
Do not programmatically rewrite code annotations. Alternatively,
- Ensure that annotations are not definable by external sources.