Skip to main content

XML External entity prevention for Java

This article documents two attacks related to XML external entities: XML exponential entity expansion and XML external entity injection. In Java, applications are secure from exponential entity expansion by default. Consequently, no security measures are necessary. When exponential entity expansion occurs the JDK throws an exception, but if this exception is not caught and handled, an attack can still cause a denial-of-service attack (DoS). In contrast, there are various countermeasures to protect a Java XML parser from XML external entity injection, but not all are effective. This cheat sheet also provides test results, availability, and the effect of different security measures on several Java classes that process XML documents in section 3. Overview of the effect of security measures for each class.

Check your project using Semgrep

You may use Pro rules to check your project for XXE vulnerabilities.

Catching XXE bugs in Java with Semgrep taint labels

See the process behind creating a Semgrep rule that detects XXE vulnerabilities. You can create and improve your own rules using Semgrep Playground as you may see in the following video:

Mitigation summary

It is not easy to summarize Java XML security. The XML standard allows multiple ways to include external content, including but not limited to external Document Type Definitions (DTD), external Entity References to external data, general entity references, and external parameter references. There are also several ways to include external content in XML Schemas and stylesheets.

The Java API for XML Processing (JAXP) contains multiple interfaces to process XML content:

  • The Document Object Model (DOM) interface
  • The Simple API for XML (SAX) interface
  • The Streaming API for XML (StAX) interface

Each of these interfaces has its mitigation measures to disable the use of different sources of external content. The inconsistency in availability and even the effect of these measures across the different APIs, combined with the variety of ways to include external content makes it difficult to provide general mitigation recommendations. Many configurations that are secure for one parser are not secure for another. However, the following security settings have a consistent effect on specific parsers:

  • For SAXBuilder, SAXParser, and XMLReader use:

    setAttribute(XMLConstants.ACCESS_EXTERNAL_DTD, "");
  • For DocumentBuilderFactory, TransformerFactory, SAXTransformer, SchemaFactory, and Validator use:

    setFeature(XMLConstants.FEATURE_SECURE_PROCESSING, true);
  • For SAXReader the only secure configuration is:

    setFeature("http://apache.org/xml/features/<b>disallow-doctype-decl", true);

The complete Semgrep set of rules that covers vulnerabilities mentioned in this cheat sheet is distributed as part of the rules available in the Team tier. This set of rules verifies the settings mentioned in the list above and also includes combinations of several other security features.

The following infographic provides an overview of Java XXE security features:

XXE Java security features overview infographics


1. Exponential entity expansion

Exponential entity expansion occurs when several layers of nested entities reference several other entities. See an example of such a payload, called also an XML bomb, in Semgrep's research project on GitHub.

When a document that contains an XML bomb is parsed, the parser expands each of the nested references. Consequently, this expansion becomes exponential leading to excessive use of resources and denial-of-service attacks (DoS). In Java, the JDK throws an exception when such excessive use of resources is detected. If this JDK exception is carefully caught and handled, the attack can be prevented. In such a case, no explicit security measures need to be taken. This is the case for all recent versions of Java (17, 18, 19) and all older versions with long-term support (8, 11).

To catch and handle the exception, log the event and display in the UI that an upload failed. This example is for XMLReader. The same actions can be taken for any of the XML processing classes.

public boolean uploadXmlFile(File xmlFile){
XMLReader reader = SAXParserFactory.newInstance().newSAXParser().getXMLReader();
try {
InputStream is = new FileInputStream(xmlFile);
reader.parse(new InputSource(is));
} catch (Exception e) {
logger.error("The parser has encountered more than "64000" entity expansions in this document; this is the limit imposed by the JDK.");
return false;
}
logger.info("XML File uploaded successfully.");
return true;
}

2. XML external entity injection

XML external entity injection occurs when the system identifier to some of the external content contains data controlled by an attacker. The parser then dereferences this identifier containing attacker-controlled XML code, leading to the disclosure of confidential data, or even to arbitrary code execution.

2.A Ways to include external content in XML documents

There are many potential ways in which external resources can be referenced in XML documents, XML Schemas, and XSLT stylesheets. External content can be loaded through an External Document Type Definition (DTD), an External Entity Reference to external data, a General Entity reference, or an External Parameter Entity reference. Additional XML code can also be included through XInclude, or references to XML Schema components using the schemaLocation attribute of import or include elements. In stylesheets, multiple sheets can be combined using the xsl:include element, the ?xml-stylesheet processing instruction, or the document() function.

For examples of each of these ways of including external resources, see the Java API for XML Processing (JAXP) Security Guide in Oracle documentation. In addition, you can review Semgrep's research project on GitHub.

2.B Security measures

There are several security measures, but not all of them are available for each class. See an overview of which measures are available for which class in section 3. Overview of the effect of security measures for each class with Semgrep research results.

Feature for Secure Processing (FSP)

Feature for Secure Processing (FSP) is considered the central mechanism for secure XML processing. It is defined as javax.xml.XMLConstants.FEATURE_SECURE_PROCESSING. For various classes, this feature can protect the parser from all available payloads. However, this is not the case for all of the classes that have been tested to create this cheat sheet.

Disabling DTD processing

Disable DTD processing with one of the available mechanisms:

  • The setFeature method with the following URL: http://apache.org/xml/features/disallow-doctype-decl
  • The setProperty method to set XMLInputFactory.SUPPORT_DTD to false.
  • The setAttribute method to set XMLConstants.ACCESS_EXTERNAL_DTD to an empty string (“”).

For some parsers, there are multiple methods to disable DTD processing. For most parsers, this setting adds additional security for processing XML documents. However, this does not affect the processing of schemas or stylesheets. Still, there are classes such as DocumentBuilderFactory, Validator, and SchemaFactory with multiple methods to disable DTD, and setting the above-mentioned features throws no exceptions, but only one of them has any effect on the security of the parser.

Disabling external schema and stylesheet processing

The setAttribute method can be used to set XMLConstants.ACCESS_EXTERNAL_SCHEMA and XMLConstants.ACCESS_EXTERNAL_STYLESHEET to an empty string to disable these features. The effects on security, however, are often limited with no observed effect for several classes.

Disabling external entities

Set setFeature to false to disable http://xml.org/sax/features/external-general-entities and http://xml.org/sax/features/external-parameter-entities features. When both of these features are disabled, all parsers are protected against all of the tested payloads. Except for the Validator class, where they have no effect at all. There is also setExpandEntityReferences that can be set to false and seems to have the same effect as disabling external general entities, based on our testing.

3. Overview of the effect of security measures for each class

Summarizing security measures against XXE is not easy as not all security measures work for every class (as mentioned in section 2.B Security measures). The tables below this section can help you to choose the best security measures against a specific vulnerability.

Table legend
  • Each table represents one Java class. For example DocumentBuilderFactory.
  • Columns display attack payloads that can be potentially used to exploit a vulnerability (for example DTD or XML bomb).
  • Rows represent ways to configure the parsers.

✅ - Secure
❌ - Not Secure
⚠️ - An exception is thrown
N/A - Not available

3.A DocumentBuilderFactory

javax.xml.parsers.DocumentBuilderFactory can be used to process all payload types. By default it is vulnerable against 4 of the attack payloads.

Parser configurationAttack payload
XMLXSLXSD
XML BombDTDParameter entityDTDimportincludedocumentDTDimportinclude
Default configuration ✅⚠️
setFeature(XMLConstants.FEATURE_SECURE_PROCESSING, true) ✅⚠️ ✅⚠️ ✅⚠️ ✅⚠️ ✅⚠️
setAttribute(XMLConstants.ACCESS_EXTERNAL_DTD, "") ✅⚠️ ✅⚠️ ✅⚠️ ✅⚠️ ✅⚠️
setAttribute(XMLConstants.ACCESS_EXTERNAL_SCHEMA, "") ✅⚠️
setAttribute(XMLConstants. ACCESS_EXTERNAL_STYLESHEET, "") N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A
setFeature("http://apache.org/xml/features/disallow-doctype-decl", true) ✅⚠️ ✅⚠️ ✅⚠️ ✅⚠️ ✅⚠️
setFeature("http://xml.org/sax/features/external-general-entities", false) ✅⚠️
setFeature("http://xml.org/sax/features/external-parameter-entities", false) ✅⚠️
setValidating(false) ✅⚠️
setExpandEntityReferences(false) ✅⚠️
setXIncludeAware(false) ✅⚠️
setFeature("http://apache.org/xml/features/xinclude", false) ✅⚠️
setFeature("http://apache.org/xml/features/nonvalidating/load-external-dtd", false) ✅⚠️
setFeature("http://apache.org/xml/features/nonvalidating/load-dtd-grammar", false) ✅⚠️
setFeature("http://apache.org/xml/features/namespaces", false) ✅⚠️
setNamespaceAware(true) ✅⚠️

3.B SAXBuilder

org.jdom2.input.SAXBuilder can process all 10 payloads. In the default configuration, it is vulnerable to 4 of the attack payloads.

Parser configurationAttack payload
XMLXSLXSD
XML BombDTDParameter entityDTDimportincludedocumentDTDimportinclude
Default configuration ✅⚠️
Default configuration DTDValidating ✅⚠️ ✅⚠️ ✅⚠️ ✅⚠️ ✅⚠️ ✅⚠️ ✅⚠️ ✅⚠️ ✅⚠️
Default configuration XSDValidating ✅⚠️ ✅⚠️ ✅⚠️ ✅⚠️ ✅⚠️ ✅⚠️ ✅⚠️ ✅⚠️ ✅⚠️
setFeature(XMLConstants.FEATURE_SECURE_PROCESSING, true) ✅⚠️
setAttribute(XMLConstants.ACCESS_EXTERNAL_DTD, "") ✅⚠️ ✅⚠️ ✅⚠️ ✅⚠️ ✅⚠️
setAttribute(XMLConstants.ACCESS_EXTERNAL_SCHEMA, "") ✅⚠️
setAttribute(XMLConstants. ACCESS_EXTERNAL_STYLESHEET, "") N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A
setFeature("http://apache.org/xml/features/disallow-doctype-decl", true) ✅⚠️ ✅⚠️ ✅⚠️ ✅⚠️ ✅⚠️
setFeature("http://xml.org/sax/features/external-general-entities", false) ✅⚠️
setFeature("http://xml.org/sax/features/external-parameter-entities", false) ✅⚠️
setExpandEntityReferences(false) ✅⚠️
setFeature("http://apache.org/xml/features/xinclude", false) ✅⚠️
setFeature("http://apache.org/xml/features/nonvalidating/load-external-dtd", false) ✅⚠️

3.C SAXParserFactory

javax.xml.parsers.SAXParserFactory can process all 10 attack payloads. In its default configuration, it is vulnerable to 4 of the attack payloads.

Parser configurationAttack payload
XMLXSLXSD
XML BombDTDParameter entityDTDimportincludedocumentDTDimportinclude
Default configuration ✅⚠️
setFeature(XMLConstants.FEATURE_SECURE_PROCESSING, true) ✅⚠️ ✅⚠️ ✅⚠️ ✅⚠️ ✅⚠️
setAttribute(XMLConstants.ACCESS_EXTERNAL_DTD, "") N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A
setAttribute(XMLConstants.ACCESS_EXTERNAL_SCHEMA, "") N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A
setAttribute(XMLConstants. ACCESS_EXTERNAL_STYLESHEET, "") N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A
setFeature("http://apache.org/xml/features/disallow-doctype-decl", true) ✅⚠️ ✅⚠️ ✅⚠️ ✅⚠️ ✅⚠️
setFeature("http://xml.org/sax/features/external-general-entities", false) ✅⚠️
setFeature("http://xml.org/sax/features/external-parameter-entities", false) ✅⚠️
setValidating(false) ✅⚠️
setExpandEntityReferences(false) N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A
setXIncludeAware(false) ✅⚠️
setFeature("http://apache.org/xml/features/xinclude", false) ✅⚠️
setFeature("http://apache.org/xml/features/nonvalidating/load-external-dtd", false) ✅⚠️
setFeature("http://apache.org/xml/features/nonvalidating/load-dtd-grammar", false) N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A
setFeature("http://apache.org/xml/features/namespaces", false) N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A
setNamespaceAware(true) N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A

3.D SAXParser

javax.xml.parsers.SAXParser can process all 10 payloads. In its default configuration, it is vulnerable to 4 attack payloads.

Parser configurationAttack payload
XMLXSLXSD
XML BombDTDParameter entityDTDimportincludedocumentDTDimportinclude
Default configuration
setFeature(XMLConstants.FEATURE_SECURE_PROCESSING, true) N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A
setAttribute(XMLConstants.ACCESS_EXTERNAL_DTD, "") ✅⚠️ ✅⚠️ ✅⚠️ ✅⚠️ ✅⚠️
setAttribute(XMLConstants.ACCESS_EXTERNAL_SCHEMA, "") ✅⚠️
setAttribute(XMLConstants. ACCESS_EXTERNAL_STYLESHEET, "") N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A
All other features N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A

3.E SAXReader

org.dom4j.io.SAXReader can process all 10 payloads. In its default configuration, it is vulnerable to 4 of the attack payloads.

Parser configurationAttack payload
XMLXSLXSD
XML BombDTDParameter entityDTDimportincludedocumentDTDimportinclude
Default configuration ✅⚠️
setFeature(XMLConstants.FEATURE_SECURE_PROCESSING, true) ✅⚠️
setAttribute(XMLConstants.ACCESS_EXTERNAL_DTD, "") ✅⚠️
setAttribute(XMLConstants.ACCESS_EXTERNAL_SCHEMA, "") ✅⚠️
setAttribute(XMLConstants. ACCESS_EXTERNAL_STYLESHEET, "") N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A
setFeature("http://apache.org/xml/features/disallow-doctype-decl", true) ✅⚠️ ✅⚠️ ✅⚠️ ✅⚠️ ✅⚠️
setFeature("http://xml.org/sax/features/external-general-entities", false) ✅⚠️
setFeature("http://xml.org/sax/features/external-parameter-entities", false) ✅⚠️
setValidating(false) N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A
setExpandEntityReferences(false) N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A
setXIncludeAware(false) N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A
setFeature("http://apache.org/xml/features/xinclude", false) ✅⚠️
setFeature("http://apache.org/xml/features/nonvalidating/load-external-dtd", false) ✅⚠️
setFeature("http://apache.org/xml/features/nonvalidating/load-dtd-grammar", false) N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A
setFeature("http://apache.org/xml/features/namespaces", false) N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A
setNamespaceAware(true) N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A
setIncludeExternalDTDDeclarations(false) ✅⚠️

3.F TransformerFactory & SAXTransformerFactory

javax.xml.transform.TransformerFactory and javax.xml.transform.sax.SAXTransformerFactory are not able to process XML Schema Definition (XSD) payloads. Out of the 7 payloads they can process, the default configurations parses 6 payloads insecurely.

Parser configurationAttack payload
XMLXSL
XML BombDTDParameter entityDTDimportincludedocument
Default configuration ✅⚠️
setFeature(XMLConstants.FEATURE_SECURE_PROCESSING, true) ✅⚠️ ✅⚠️ ✅⚠️ ✅⚠️ ✅⚠️ ✅⚠️ ✅⚠️
setAttribute(XMLConstants.ACCESS_EXTERNAL_DTD, "") ✅⚠️ ✅⚠️ ✅⚠️ ✅⚠️
setAttribute(XMLConstants. ACCESS_EXTERNAL_STYLESHEET, "") ✅⚠️ ✅⚠️ ✅⚠️ ✅⚠️
All other features N/A N/A N/A N/A N/A N/A N/A

3.G SchemaFactory

javax.xml.validation.SchemaFactory is unable to process XML Stylesheet Language (XSL) documents. Out of the 6 payloads it can process, the default configuration is vulnerable to 5.

Parser configurationAttack payload
XMLXSD
XML BombDTDParameter entityDTDimportinclude
Default configuration ✅⚠️
setFeature(XMLConstants.FEATURE_SECURE_PROCESSING, true) ✅⚠️ ✅⚠️ ✅⚠️ ✅⚠️ ✅⚠️ ✅⚠️
setAttribute(XMLConstants.ACCESS_EXTERNAL_DTD, "") ✅⚠️ ✅⚠️ ✅⚠️ ✅⚠️
setAttribute(XMLConstants.ACCESS_EXTERNAL_SCHEMA, "") ✅⚠️ ✅⚠️ ✅⚠️
setFeature("http://apache.org/xml/features/disallow-doctype-decl", true) ✅⚠️ ✅⚠️
All other features N/A N/A N/A N/A N/A N/A

3.H Validator

javax.xml.validation.Validator can only process XML documents, not XSL or XSD. Out of the 3 attack payloads it can process, the default configuration is vulnerable to 2.

Parser configurationAttack payload
XML
XML BombDTDParameter entity
Default configuration ✅⚠️
setFeature(XMLConstants.FEATURE_SECURE_PROCESSING, true) ✅⚠️ ✅⚠️ ✅⚠️
setAttribute(XMLConstants.ACCESS_EXTERNAL_DTD, "") ✅⚠️ ✅⚠️ ✅⚠️
setAttribute(XMLConstants.ACCESS_EXTERNAL_SCHEMA, "") ✅⚠️
setFeature("http://apache.org/xml/features/disallow-doctype-decl", true) ✅⚠️
setFeature("http://xml.org/sax/features/external-general-entities", false) ✅⚠️
setFeature("http://xml.org/sax/features/external-parameter-entities", false) ✅⚠️
All other features N/A N/A N/A

3.I XMLReader

org.xml.sax.XMLReader can process all 10 payloads. The default configuration is vulnerable to 4 of the attack payloads.

Parser configurationAttack payload
XMLXSLXSD
XML BombDTDParameter entityDTDimportincludedocumentDTDimportinclude
Default configuration ✅⚠️
setFeature(XMLConstants.FEATURE_SECURE_PROCESSING, true) ✅⚠️
setAttribute(XMLConstants.ACCESS_EXTERNAL_DTD, "") ✅⚠️ ✅⚠️ ✅⚠️ ✅⚠️ ✅⚠️
setAttribute(XMLConstants.ACCESS_EXTERNAL_SCHEMA, "") ✅⚠️
setAttribute(XMLConstants. ACCESS_EXTERNAL_STYLESHEET, "") N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A
setFeature("http://apache.org/xml/features/disallow-doctype-decl", true) ✅⚠️ ✅⚠️ ✅⚠️ ✅⚠️ ✅⚠️
setFeature("http://xml.org/sax/features/external-general-entities", false) ✅⚠️
setFeature("http://xml.org/sax/features/external-parameter-entities", false) ✅⚠️
setValidating(false) N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A
setExpandEntityReferences(false) N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A
setXIncludeAware(false) N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A
setFeature("http://apache.org/xml/features/xinclude", false) ✅⚠️
setFeature("http://apache.org/xml/features/nonvalidating/load-external-dtd", false) ✅⚠️
setFeature("http://apache.org/xml/features/nonvalidating/load-dtd-grammar", false) N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A
setFeature("http://apache.org/xml/features/namespaces", false) N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A
setNamespaceAware(true) N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A