Code injection prevention for Python

This is a code injection prevention cheat sheet by Semgrep, Inc. It contains code patterns of potential ways to run arbitrary code in an application. Instead of scrutinizing code for exploitable vulnerabilities, the recommendations in this cheat sheet pave a safe road for developers that mitigate the possibility of code injection in your code. By following these recommendations, you can be reasonably sure your code is free of code injection.

Check your project using Semgrep

The following command runs an optimized set of rules for your project:

semgrep --config p/default

1. Executing or evaluating code

1.A. Executing code with exec

The exec() function supports the dynamic execution of Python code. The exec() function can be dangerous if it is used to execute dynamic content (non-literal content). If this dynamic content has an input controllable by a user, it can cause a code injection vulnerability.

Example:

# Value supplied by user
user_input = "');import requests;requests.get('localhost:3000');print('"

# Vulnerable
exec("foobar('{}')".format(user_input))

References

exec documentation

Mitigation

Do not use exec() for non-literal values. Alternatively:

Ensure executed content is not controllable by external sources.
If it's not possible, strip everything except alphanumeric characters from the input.
Don't try to make exec safe with tricks such as {'__builtins__':{}}.

Semgrep rule

python.lang.security.audit.exec-detected.exec-detected

1.B. Evaluating code with eval

The eval() function supports the dynamic execution of Python code. The eval() can be dangerous if it is used to execute dynamic content (non-literal content). If this dynamic content has an input controllable by a user, it can cause a code injection vulnerability.

Example:

# Value supplied by user
user_input = "__import__('code').InteractiveInterpreter().runsource('import requests;requests.get(\'localhost:3000\')')"

# Vulnerable
eval(user_input)

References

eval() documentation

Mitigation

Do not use eval(). Alternatively:

If you need to use eval() with non-literal values, ensure that executed content is not controllable by external sources.
If it's not possible, strip everything except alphanumeric characters from the input.
Don't try to make eval safe with tricks such as {'__builtins__':{}}.

Semgrep rule

python.lang.security.audit.eval-detected.eval-detected

1.C. Accepting logging configuration with logging.config.listen()

The logging.config.listen() function starts a socket server on the specified port, and listens for new configurations. As the logging.config.listen() configuration is passed through eval(), the use of this function can lead to a security risk. While the function only binds to a socket on localhost, and so does not accept connections from remote machines, there are scenarios where untrusted code can potentially run under the account of the process which calls listen().

Example:

# Server example: starting up a socket server on 9999 port, and listening for new configurations.
import logging
import logging.config

logging.config.fileConfig('logging.conf')
t = logging.config.listen(9999)
t.start()


# Client example: sending configuration from `data_to_send` variable to localhost:9999
import socket, sys, struct

# Config example: print("pwned") is evaluated and "pwned" is printed to the console
data_to_send = """
[loggers]
keys=root

[handlers]
keys=hand01

[formatters]
keys=form01

[logger_root]
level=NOTSET
handlers=hand01

[handler_hand01]
class=StreamHandler
level=NOTSET
formatter=form01
args=(print("pwned"),)

[formatter_form01]
format=F1 %(asctime)s %(levelname)s %(message)s
datefmt=
class=logging.Formatter
"""

s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect(('localhost', 9999))
s.send(struct.pack('>L', len(data_to_send)))
s.send(data_to_send)
s.close()

References

logging.config.listen() documentation

Mitigation

Verify what is sent across the socket.
Alternatively: To avoid the risk, verify the argument to logging.config.listen() to prevent applying unrecognized configurations. This can be done by encrypting or signing what is sent across the socket, such that the verify callable can perform signature verification or decryption.

Semgrep rule

python.lang.security.audit.logging.listeneval.listen-eval

1.D. Running code in an interactive interpreter

The code module provides read-eval-print loops in Python. Two classes are included to provide interactive prompts, the InteractiveInterpreter and the InteractiveConsole. Both methods can execute Python code: InteractiveInterpreter.runcode executes a code object and InteractiveConsole.push interprets a string as Python code. This is dangerous if external data reaches these function calls as it allows a malicious actor to run arbitrary Python code.

Example:

import code

# Value supplied by user
user_input = "print('pwned')"
console = code.InteractiveConsole()
# Vulnerable
console.push(user_input)

# Value supplied by user
user_input = "print('pwned')"
interpreter = code.InteractiveInterpreter()
# Vulnerable
interpreter.runcode(code.compile_command(user_input))

References

code module documentation

Mitigation

Do not let the user input in InteractiveInterpreter or InteractiveConsole methods. Alternatively:

Ensure that content that Python interprets is not controllable by external sources.
If it's not possible, strip everything except alphanumeric characters from the input.

Semgrep rule

python.lang.security.audit.dangerous-code-run.dangerous-interactive-code-run

1.E. Using subinterpreter to run code

The _xxsubinterpreters.run_string is an internal Python function that interprets the string as Python code. This causes a code injection vulnerability when unverified user data reaches run_string. A malicious actor can inject a malicious string to execute arbitrary Python code.

Example:

import _xxsubinterpreters

# Value supplied by user
user_input = "print('pwned')"

# Vulnerable
_xxsubinterpreters.run_string(_xxsubinterpreters.create(), user_input)

References

subinterpreters documentation

Mitigation

Do not let a user input in _xxsubinterpreters methods. Alternatively:

Ensure that content that Python interprets is not controllable by external sources.
If it’s not possible, strip everything except alphanumeric characters from the input.

Semgrep rule

python.lang.security.audit.dangerous-subinterpreters-run-string.dangerous-subinterpreters-run-string

1.F. Running subinterpreter from regression tests package

The run_in_subinterp is a function from a Python regression tests package (test) that runs code in a subinterpreter. This is dangerous if external data reaches the run_in_subinterp function call because it allows a malicious actor to run arbitrary Python code.

Example:

import _testcapi

# Value supplied by user
user_input = "print('pwned')"

# Vulnerable
_testcapi.run_in_subinterp(user_input)


from test import support

# Value supplied by user
user_input = "print('pwned')"

# Vulnerable
support.run_in_subinterp(user_input)

References

test module documentation

Mitigation

Do not let a user input in run_in_subinterp function. Alternatively:

Ensure that content that Python interprets is not controllable by external sources.
If it's not possible, strip everything except alphanumeric characters from the input.

Semgrep rule

python.lang.security.audit.dangerous-testcapi-run-in-subinterp

2. Abusing built-in functions

2.A. Accessing dictionary with current global or local symbol table

The globals() and locals() return a dictionary representing the current global or local symbol table. Using non-static data to retrieve values from this table is extremely dangerous because it can allow an attacker to execute arbitrary code on the system.

Example:

# Name of the arbitrary function supplied by user
user_input = "Name of the function" 

# Vulnerable call of arbitrary function
function = locals().get(user_input)
function()

# Name of the arbitrary function supplied by user
user_input = "Name of the function"

# Vulnerable call of arbitrary function
function = test1.__globals__[user_input]
function()

References

Mitigation

Do not access global or local symbol tables. Refactor your code not to use globals() and locals().

Semgrep rule

python.lang.security.dangerous-globals-use.dangerous-globals-use

2.B. Dynamically updating and accessing code annotations

Annotations passed to the typing.get_type_hints() function are evaluated in globals and locals namespaces. Ensure that no arbitrary value can be written as the annotation and passed to the typing.get_type_hints function.

Example:

from typing import get_type_hints

class C:
    member: int = 0

def smth():
    # Changing annotation for `member` property of class C
    C.__annotations__["member"] = "print('pwn')"

    # Annotations are evaluated and `print('pwn')` code gets executed
    get_type_hints(C)

References

typing.get_type_hints documentation

Mitigation

Do not programmatically rewrite code annotations. Alternatively:

Ensure that annotations are not controllable by external sources.

Semgrep rule

python.lang.security.audit.dangerous-annotations-usage

Not finding what you need in this doc? Ask questions in our Community Slack group, or see Support for other ways to get help.

Check your project using Semgrep​

1. Executing or evaluating code​

1.A. Executing code with exec​

References​

Mitigation​

Semgrep rule​

1.B. Evaluating code with eval​

References​

Mitigation​

Semgrep rule​

1.C. Accepting logging configuration with logging.config.listen()​

References​

Mitigation​

Semgrep rule​

1.D. Running code in an interactive interpreter​

References​

Mitigation​

Semgrep rule​

1.E. Using subinterpreter to run code​

References​

Mitigation​

Semgrep rule​

1.F. Running subinterpreter from regression tests package​

References​

Mitigation​

Semgrep rule​

2. Abusing built-in functions​

2.A. Accessing dictionary with current global or local symbol table​

References​

Mitigation​

Semgrep rule​

2.B. Dynamically updating and accessing code annotations​

References​

Mitigation​

Semgrep rule​

Check your project using Semgrep

1. Executing or evaluating code

1.A. Executing code with exec

References

Mitigation

Semgrep rule

1.B. Evaluating code with eval

References

Mitigation

Semgrep rule

1.C. Accepting logging configuration with logging.config.listen()

References

Mitigation

Semgrep rule

1.D. Running code in an interactive interpreter

References

Mitigation

Semgrep rule

1.E. Using subinterpreter to run code

References

Mitigation

Semgrep rule

1.F. Running subinterpreter from regression tests package

References

Mitigation

Semgrep rule

2. Abusing built-in functions

2.A. Accessing dictionary with current global or local symbol table

References

Mitigation

Semgrep rule

2.B. Dynamically updating and accessing code annotations

References

Mitigation

Semgrep rule