How to prevent HTML email injection in Python web apps

Avoid accidental HTML injection when sending emails from an application

Grayson Hardaway
July 1st, 2020
Share

tl;dr:

  • HTML injection is a vulnerability in which attacker-provided input is rendered as HTML. HTML injection in emails can lead to attackers phishing users from a legitimate email address.

  • Subtle Flask defaults can lead to HTML injections in emails--Flask only escapes templates with certain file extensions.

  • You can automatically detect and prevent email HTML injection in your code.

HTML Injection in Email

Did you know that injection vulnerabilities can occur in HTML-formatted emails?

Emails can be sent with the text/plain or text/html MIME types (and more). Text emails are just that: plain text. No fancy markup, no formatting, no images. Email clients render text emails exactly as written--HTML tags, for instance, are not processed.

On the other hand, clients will process HTML tags in an HTML email, allowing for the rich, colorful email experience we have today. HTML in emails, though, means your emails are vulnerable to some of the same issues as web pages.

This post demonstrates how an attacker can inject HTML into emails from Python web apps to phish users, how apps can prevent this, and how to automatically detect and eliminate email HTML injection from entering your code.

Sending stylized emails from the backend

Some email libraries send text emails by default, such as the default email.message in Python. Developers must explicitly opt-in to HTML emails. This avoids most problems when whipping up quick notifications. But what if you want those clean, professional-looking emails coming from your app? You'll have to style your email using HTML. Beware though: if you let user data creep into your auto-generated emails, you have potentially introduced a vulnerability.

Let's see an example of how this works using this sample Flask app. It's actually a variation of a real Flask app where I made this exact mistake. Here's the winning formula:

  1. I write up a really nice-looking HTML email template, templates/welcome_message.email (Note the file extension. It matters later.)

<h2>
  Hello, {{ name }}!
</h2>

Hey! You're all signed up to get matched with a friend-of-a-friend to offset
housing costs. You can reach out to anyone on this site! Or, if you wait a bit,
we will make introductions with someone on your behalf. :) When you're all done,
click here to delete your entry.

<a href="{{ delete_link }}">{{ delete_link }}</a>

<em>We can't wait to see you!</em>
  1. I want to personalize my app's email based on the new signup's name using form data. Who doesn't like polite computers?

name = request.form.get('name', "")
...
render_template("welcome_message.email", name=name, delete_link=delete_link)
  1. I fire off my newly minted, custom-tailored email to the new signup with this snippet of code:

import smtplib
from flask import render_template
from email.mime.text import MIMEText
from email.mime.multipart import MIMEMultipart

message = MIMEMultipart("alternative")
message['Subject'] = config.get('subject', 'Successful Signup for Roomshare')
message['From'] = config.get('smtp_sender_email', "noreply")
message['To'] = email
message.attach(MIMEText(render_template("welcome_message.email", name=name, delete_link=delete_link), "plain"))
message.attach(MIMEText(render_template("welcome_message.email", name=name, delete_link=delete_link), "html"))
...
s = smtplib.SMTP(smtp_host, 587)
...
s.sendmail(config.get('smtp_sender_email', "noreply"), email, message.as_string())

What could go wrong?

...What if my new signup's name looks like this?

Jerry!</h2><a href="http://www.evil.xyz/give_me_your_password">Click here</a> to view your registration! <div style="display:none">

In Gmail, the result looks like this:

What happened? The closing </h2> tag ensures the rest of the text looks normal. Now, a relatively benign-looking message is displayed with a link going to http://www.evil.xyz/give_me_your_password.

Further, <div style="display:none"> hides all the content of the original email template. So, now I have a legitimate-looking, attacker-controlled email coming from a real, recognizable email address--my email address--encouraging my signup to click through to an attacker-controlled page. Yikes.

Impact

An attacker with control of an email from a legitimate domain can create phishing emails tricking users to open attacker-controlled pages. In all likelihood, this page would look similar to the original site and ask users for sensitive information, such as passwords or account information.

Big-name email providers will strip out <script> tags and will dump javascript: URIs from anchor tags (I tried many variants), so the impact of email HTML injection is limited to phishing emails.

Preventing HTML injection in emails

So, how do we prevent this? The same way we prevent HTML issues normally: by escaping HTML characters!

The above vulnerability is actually possible due to a comedy of errors. Both of these conditions must be true for an attacker to control the HTML contents of an email:

  1. The email template is not HTML escaped when rendered.

  2. The email is an HTML email.

Since the demo app is a Flask app, I'll focus on why this happened in Flask.

Escaping behavior of Flask templates

Regarding (1), Flask templates are only automatically escaped if they end with the .html extension. By simply changing the extension of our email template from .email to .html, we have mitigated the problem. However, if you didn't immediately say to yourself "obviously .email extensions aren't escaped" while reading... you can understand how easy it is to make this mistake. (We wrote about this subtle escaping behavior previously, and you can read more about this escaping behavior in Flask here.)

from flask import render_template

name = request.form.get('name', "")
...
# Templates with '.html' are escaped.
render_template("email.html", name=name, delete_link=delete_link

Prevent this in your code

If you have a Flask app, you can scan your code for rendered templates without escaped extensions using Semgrep. You can scan your project if it's on GitHub or locally:

$ semgrep --config=https://semgrep.dev/c/r/python.flask.security.unescaped-template-extension

Further, you can keep this problem from ever happening again by integrating this check into CI. This way, you can always ensure HTML escaping is applied, which is especially helpful when working with a team.

Only use text emails

Regarding (2), the issue occurs because the email is explicitly given an HTML portion. The application uses Python's built-in email library, using email.mime.multipart.MIMEMultipart and email.mime.text.MIMEText objects to construct an HTML portion. Had the email not included this portion, the email would only be text. Therefore, it would be safe because email clients would not process the injected HTML.

The code without an HTML portion would look like this:

import smtplib
from flask import render_template
from email.mime.text import MIMEText
from email.mime.multipart import MIMEMultipart

message = MIMEMultipart("alternative")
message['Subject'] = config.get('subject', 'Successful Signup for Roomshare')
message['From'] = config.get('smtp_sender_email', "noreply")
message['To'] = email
message.attach(MIMEText(render_template("welcome_message.email", name=name, delete_link=delete_link), "plain"))
...
s = smtplib.SMTP(smtp_host, 587)
...
s.sendmail(config.get('smtp_sender_email', "noreply"), email, message.as_string()

A Django example

Flask's autoescaping behavior caught me by surprise leading to the issue above. However, sometimes developers will throw data straight into an email! No escaping, no templates! This is a snippet that resembles real code I have encountered in Django apps. This send_email function lets an attacker completely control the email contents of an HTML email.

from django.http import HttpResponse
from django.http import HttpResponseBadRequest
from django.core.mail import EmailMessage
from app.models import Recipients

def send_email(request):
    subj = "Daily Crossword"
    from_email = "daily_crossword@example.com"
    recipient_objs = Recipients.objects.all()
    recipients = [recip.email_address for recip in recipient_objs]
    message = request.POST.get('message')
    if not message:
        return HttpResponseBadRequest("No message set.")
    email = EmailMessage(subj, message, from_email, recipients)
    email.content_subtype = "html" # Sets the email to HTML
    email.send()

    return HttpResponse("Email sent successfully!")

This could easily be prevented by sending a text email instead (simply delete email.content_subtype = "html").

If HTML emails are needed, use an automatically escaping template engine (like the one Django provides) instead of reflecting user data directly into an email. The code to render an email body with a template looks like this:

from django.http import HttpResponse, HttpResponseBadRequest
from django.core.mail import EmailMessage
from app.models import Recipients
from django.template.loader import render_to_string

def send_email(request):
    subj = "Daily Crossword"
    from_email = "daily_crossword@example.com"
    recipient_objs = Recipients.objects.all()
    recipients = [recip.email_address for recip in recipient_objs]
    try:
        puzzle = request.POST.get('puzzle')
        message = render_to_string("emails/crossword.html", {"puzzle": puzzle})
    except:
        return HttpResponseBadRequest("Problem generating email.")
    email = EmailMessage(subj, message, from_email, recipients)
    email.content_subtype = "html"
    email.send()

    return HttpResponse("Email sent successfully!")

You can scan a Django app for EmailMessages directly using request data with Semgrep on GitHub or locally:

semgrep --config=https://semgrep.dev/c/r/python.django.security.injection.email

And as before, you can eliminate request data directly into EmailMessage from your code forever by integrating checks for this issue into CI.

Conclusion

In summary, be careful when auto-generating HTML emails!

  • HTML injection in emails can lead to attackers phishing from legitimate domains.

  • Make sure your email content is escaped. Read the documentation to understand its behavior.

  • If you're paranoid, consider using text-only emails.

  • Set up automatic scanning, such as Semgrep, for your code to prevent dangerous code from entering the codebase.

And for me, personally, I'm switching my email client to text only!

References

About

Semgrep lets security teams partner with developers and shift left organically, without introducing friction. Semgrep gives security teams confidence that they are only surfacing true, actionable issues to developers, and makes it easy for developers to fix these issues in their existing environments.