Reentrancy Woes in Smart Contracts

ethereum smart contracts July 13, 2016 at 10:45 AM Emin Gün Sirer

Smart contracts are pretty difficult to get right.

Signs of Trouble

This should come as no surprise. We knew that programming in general is difficult, that most of the valley runs on cut&paste from stack overflow, directed by technological decisions made by reading hearsay carefully planted by marketing professionals masquerading as programmers on social media. We knew that there are wholesale industries (hello NoSQL, first generation!) which take pride in selling software that provides no guarantees at all. It's just difficult, what with all the Slack chat and Pokemon Go effort, to get all of those pesky little pre- and post-conditions right to build solid code that actually works.

We also knew that some notable professors had given up on trying to teach concurrency to their students, instead preferring to teach them how to use "event-driven" frameworks. An "event-driven framework" is just some code that someone else wrote where the framework "handles concurrency" (a.k.a. kills it) by grabbing a mutex and making it impossible for the student's code to take advantage of concurrency, thereby avoiding concurrency bugs. (For the record, this is the exact opposite of the approach we take in our systems courses at Cornell, after I revamped them).

What Could Possibly Go Wrong?

Even though Ethereum smart contracts lack concurrency, they are exposed to its cousin, reentrancy. And reentrancy turns out to have many hidden gotchas, as anyone who has followed the attacks on The DAO can attest.

This distilled example illustrates just how subtle the problems can be. Take a look at this token contract, courtesy of Peter Borah. It's only 31 lines, and it seems, at first glance, to be an excellent attempt at condition-oriented programming, a good practice. See if you can spot the error. I'll provide some hints.

contract TokenWithInvariants {
  mapping(address => uint) public balanceOf;
  uint public totalSupply;

  modifier checkInvariants {
    _
    if (this.balance < totalSupply) throw;
  }

  function deposit(uint amount) checkInvariants {
    balanceOf[msg.sender] += amount;
    totalSupply += amount;
  }

  function transfer(address to, uint value) checkInvariants {
    if (balanceOf[msg.sender] >= value) {
      balanceOf[to] += value;
      balanceOf[msg.sender] -= value;
    }
  }

  function withdraw() checkInvariants {
    uint balance = balanceOf[msg.sender];
    if (msg.sender.call.value(balance)()) {
      totalSupply -= balance;
      balanceOf[msg.sender] = 0;
    }
  }
}

Hints

First, you'll note that this contract performs state-changing operations whose correctness is predicated on previous checks, after an external call. It does not perform the external call at the end. It does not have a mutex to prevent reentrant calls. So it does not conform to best practices. This is on purpose -- the author was writing code to illustrate what not to do. This anti-pattern is exactly what The DAO got wrong. So, hint #1 is that this usage should be a red flag. But luckily, not every red flag indicates an error, and judicious sprinkling of invariant checks can prohibit unwanted behaviors, right?

Second, you'll note that the code is written in a very nice style where it actually checks a global invariant, namely, that the contract balance (this.balance) should never be below what the contract thinks it ought to be (totalSupply). This is excellent defensive programming. There is nothing wrong with this; in fact, despite the bad pattern, the contract looks safe because the invariant checks seem to protect the key assumptions in the contract.

Except the invariant checks are performed in function modifiers, at function entry. So this particular stylistic pattern is treating a global invariant, which ought to hold at all times, as if it's just a post-condition. This is hint #2, though, it should always be safe to treat an invariant as a post-condition, right?

Finally, note that the bug got introduced when line 7 got changed from

if (this.balance != totalSupply) throw;

if (this.balance < totalSupply) throw;

So we went from checking a pretty strong condition, to a slightly weaker condition. This is hint #3, though, again, this ought to be safe, right? After all, so what if the actual balance of the contract is higher than what the contract thinks it ought to be? The weakening was actually necessary to avoid another problem, where someone could send a sliver of a payment to the contract and have it disable itself because the balance no longer matched the internal accounting. The new check still avoids the important case, where there is less cash on hand than there ought to be.

Now is a good time to think about how these three issues, which together allow the contract to hold more money than it thinks it should, would enable an attacker to withdraw more than their share.

Edit: Note, also, that the deposit function is flawed because it takes the amount that the user specified, and not msg.amount. It should be made internal, and there should be a default function in the contract that calls deposit with msg.amount. These are missing to simplify exposition and cut back on magical Solidity syntax. Assume that deposit is always called with the same amount as in msg.amount.

The Hack

The brilliant trick is best explained by the person who discovered it. Roughly speaking, the attacker starts out by pretending to withdraw ether, lowering the contract's idea of how much it holds, then reentrantly deposits the same ether back and transfers it to a second address. Now that the contract's balance is in excess of what the contract thinks it holds, all of the excess can be drained.

"Oy vey" is about all I could muster after looking at the brilliant exploit. Reasoning in the presence of reentrant calls is quite difficult. A 31-line simple token contract, written with a defensive pattern where invariants are diligently checked on function entry, ends up being prone to reentrancy attacks.

The Takeaways

The immediate takeaway for contract programmers is as clear now as it was when The DAO got hacked: do not perform external calls in contracts. If you do, ensure that they are the very last thing you do. If that's not possible, use mutexes to guard against reentrant calls. And use the mutexes in all of your functions, not just the ones that perform an external call.

There are some takeaways for toolchain developers as well: the Solidity compiler, or lint-like tools, need to detect and warn about these kinds of anti-patterns.

At a higher level, I see no good reason why the EVM should enable the default function of a contract to engage in arbitrarily complex behaviors. In particular, the EVM can simply prohibit a contract B and all of its callees C, D, ..., Z, from making calls back to contract A when B is invoked by A, unless explicitly permitted to do so by A. That is, a default ban on cross-contract reentrancy, unless optionally disabled. Contract A still gets to call its own internal functions all day long, but if it calls out to another function, there is no coming back.

I suspect that such a prohibition of reentrant calls would affect very few contracts, and it would stave off a potential attack vector. The contracts that want to be invoked in a reentrant fashion can do so by specifying this explicitly. This opt-in behavior would enable programmers to not have to worry about a large and complicated class of subtle errors. Sure, this fix does not prohibit all external-invocation-related errors (there can still be cross-contract errors), but it's a step in the right direction. And default-off behavior for reentrancy does not give up any expressiveness or performance or anything else, as it can always be turned on explicitly by contracts that require it.

My suggestion here still seems a bit hacky. Perhaps there are more comprehensive ways of staving off such errors. If so, I'd love to hear about them.