get this check now in Bento:
$ pip3 install bento-cli && bento init
Before getting started, if you haven’t seen the first post in this series I highly recommend checking it out. It will give you a good understanding of the ReDoS bug class and we will build on its concepts here. See Bento check: Catch catastrophic backtracking ReDoS bugs.
This post will highlight how we can not only use the r2c distributed analysis platform to analyze code and triage findings, but also to improve our tooling. As the creator of Dlint, that seems like the most appropriate tool to take a look at. From there we will break down another important ReDoS finding (CVE-2020-6817) and wrap up with next steps. Note Dlint has recently been added to Bento’s list of supported tools.
Improving Dlint
Dlint is a tool for encouraging best coding practices and helping ensure Python code is secure. Previously we took a look at how we can use the r2c platform to run custom Dlint static analysis jobs. When performing this type of analysis it’s natural to find false positives. That is, findings the tool detected, but it wasn’t supposed to. After triaging a job's findings the platform provides a convenient way to see the results in aggregate:
7/123
true positives isn’t great, but hey, it’s a start. ReDoS detection is
difficult for a variety of reasons and it’s tough to anticipate all false
positives, so we might as well run the tool and look for deficiencies. This is
where the scale of the r2c platform really shines. Running a tool against
thousands of repositories will quickly tell you if your rules are working as
intended. In this situation we decided to tackle one class of false positives:
subexpressions that aren't "backtrackable."
Backtracking occurs when a regular expression engine encounters a character it can’t match and begins to move backwards. Backtracking is a necessary ingredient for catastrophic backtracking and ReDoS. It’s the reason why one of these two very similar expressions causes ReDoS and the other doesn't:
>>> import re
>>> re.search('(a+)+b', 'a' * 64 + 'c')
...Spins...
>>> import re
>>> re.search('(a+)+', 'a' * 64 + 'c')
<_sre.SRE_Match object; span=(0, 64), match='aaa...'>
When b
in the first expression fails to match, the engine begins to
backtrack. In this case it’s catastrophic because the engine exponentially
alternates in and out of the nested quantifier. The issue here is Dlint
was detecting both of these expressions as catastrophic. To remedy this,
a change was made
in the ReDoS detection algorithm to ensure subexpressions are backtrackable
before reporting a finding. This was accomplished by ensuring each
catastrophic subexpression has a non-optional subexpression following it. Or,
per above, something like b
comes after something like (a+)+
. For more
information on the fix please see the above change.
After a change like this we'd like to confirm the fix and get hard numbers on what’s changed. The r2c platform provides exactly this:
Here we fixed 28 / 123 = 22.8%
of our false positives. Not too bad!
Subsequent platform jobs automatically move findings from false positive to
true negative if it detects the tooling has been fixed!
There’s still plenty of work to be done to reduce false positives. If you’d like to help out, take a look at this issue.
CVE-2020-6817
The r2c platform is continuously evaluating large input sets (groups of code repositories). This allows us to constantly be on the hunt for new bugs and security vulnerabilities. Embracing this mentality is crucial in modern DevSecOps practices. With that, we can always be improving our tooling, running analysis jobs, and finding new issues.
After running a Dlint job against a new input set there was one finding that piqued my interest:
To confirm the finding we can pull down the repository and run Dlint manually:
$ python -m flake8 --select=DUO138 bleach/
bleach/sanitizer.py:595:20: DUO138 catastrophic "re" usage - denial-of-service possible
bleach/sanitizer.py:604:16: DUO138 catastrophic "re" usage - denial-of-service possible
Manual inspection reveals that the finding on line 604 is a false positive and
the finding on line 595 is a true positive. Next we have to confirm that the
vulnerability is exploitable. First, let's understand what the bleach
library
actually does.
Bleach is a fantastic library developed by Mozilla that performs HTML sanitization. Or, as they put it:
Bleach is an allowed-list-based HTML sanitizing library that escapes or strips markup and attributes.
The confirmed finding is in the bleach.BleachSanitizerFilter.sanitize_css
function. This means we'll likely be using CSS to reach the code path. The
main interface to this code path is the bleach.clean
function. HTML
elements with style
attributes eventually make their way to the sanitize_css
function. Let's take a closer look at the ReDoS bug:
def sanitize_css(self, style):
...
gauntlet = re.compile(
r"""^([-/:,#%.'"\s!\w]|\w-\w|'[\s\w]+'\s*|"[\s\w]+"|\([\d,%\.\s]+\))*$""",
flags=re.U
)
for part in parts:
if not gauntlet.match(part):
return ''
Let’s break down the regular expression further. We’ll be considering alternation statements with overlapping character space. To do so let’s look at the alternation statements individually:
..1.. ..2.. ..3.. ..4.. ..5..
^([-/:,#%.'"\s!\w]|\w-\w|'[\s\w]+'\s*|"[\s\w]+"|\([\d,%\.\s]+\))*$
The issue here is that alternation statements 2
, 3
, and 4
have complete
character overlap with statement 1
. Since the alternation subexpression is
also nested inside a quantifier we can cause catastrophic backtracking. Let’s
consider statements 1
and 2
: [-/:,#%.'"\s!\w]
and \w-\w
. It's clear
that statement 2
completely overlaps with statement 1
. \w
and -
are in
both statements. This means our ReDoS payload can be something like:
'b-b' * 64
Remember, \w
means word characters.
But first we must also cause the engine to backtrack. Remember the Dlint
improvements above? We can accomplish this via the $
at the very end of the
expression. $
matches the end of the string, so we need something at the end
of our payload that doesn't match the end of the string. This causes the
engine to backtrack. This is a little tricky because that character also must
not match anything in any of the alternation statements. Matching an alternation
statement means the character will be consumed and there will be no
backtracking. Looking through all the alternation statements reveals some
symbols like $
, ^
, and *
are not included in any statement. So our full
payload is:
'b-b' * 64 + '^'
Remember the vulnerability is in the sanitize_css
function. We must
pass our payload into the CSS style
sanitization functionality. Looking through
the code reveals that the following will work:
>>> payload = "'" + 'b-b' * 64 + '^' + "'"
>>> bleach.clean('<a style=' + payload + '></a>', attributes={'a': ['style']})
...Spins...
Note the attributes={'a': ['style']}
. style
attributes are not parsed by
default, so we must enable this functionality first. This is a mitigating
factor that means the functionality is not vulnerable by default. Also note
that a
is not special here. Any HTML element included in
bleach.sanitizer.ALLOWED_TAGS
or passed in via the tags
keyword argument
that also enables the style
attribute will be vulnerable.
So what does this actually mean? This means we can cause a DoS on calls to the
bleach.clean
function if style
attributes are enabled for the HTML element.
Since bleach
is often used to sanitize untrusted HTML input this is not good.
Fortunately the Mozilla security team was exemplary to work with. They were
always quick to respond and had no problem understanding and fixing the bug.
For more information see CVE-2020-6817
and the mozilla/bleach
security advisory.
Next Steps
Since the last post, Dlint and r2c’s distributed analysis platform also helped find the following ReDoS bugs: