Let us set the scene: It is the end of July 2022, a few months back Elon Musk made an offer to buy Twitter. The deal has since fallen apart and is scheduled for a showdown in Delaware Chancery Court.

Axios has a more detailed timeline of events.

In Twitter’s SEC filings they estimate that “false or spam accounts” account for less than 5% of Monetizable Daily Active Users (mDAU).

The Twitter Annual Report (Section: NOTE REGARDING KEY METRICS) provides the following definition for mDAU:

We define mDAU as people, organizations, or other accounts who logged in or were otherwise authenticated and accessed Twitter on any given day through twitter.com, Twitter applications that are able to show ads, or paid Twitter products, including subscriptions

Musk asserts Twitter is at least 20% full of bots. As far as I’m aware he hasn’t provided any basis for this claim.

One notable omission from the Axios timeline is that day in May when Musk decided to create his own estimate by enrolling his followers to sample on his behalf:

This simplistic approach created quite a storm on Stats Twitter with many suspecting that they were being trolled.

Putting aside the accuracy of Musk’s claim for a moment, Musk’s claim and Twitter’s claim aren’t even incompatible. Both can be true.

Twitter’s claim pertains to a percentage of active users, Musk’s seems to pertain to all users (or perhaps all sampled users). There are very different denominators involved in each. The vagueness of Musk’s claims have pulled through to the press, where active users and all users are conflated.

Notably Twitter doesn’t seem to report total users over all time1, because it isn’t a very meaningful metric, hence the use of DAU/MAU etc.

Of course this doesn’t even consider a common basis for defining “false or spam accounts” or whether or not bots should be included.

Three big considerations for considering the accuracy of a metric are:

  1. The definition of the metric and the basis of calculation.
  2. The correctness of the implementation of 1.
  3. The correctness of the data being used.

If you think a metric is wrong then you need to be able to articulate what you think is wrong about it in terms of the above. If you’re working with a different definition then of course you’re going to get different results (assuming you even have access to the data and are able to work with it at the required scale).

Then there is the obvious point that most activity accounted for by mDAU would be read-only activity: users who are reading tweets but not doing anything else. This usage cannot be established externally. And we certainly don’t expect much of this form of usage2 from spam accounts (however they are defined), which would be oriented towards writing tweets and likely posting these via API, which would exclude them from mDAU. As pointed out by Twitter, you can’t calculate mDAU with only externally available data.

I am a data engineer and I have spent far too many months of my life producing activity-based metrics. I can personally attest to Vicki Boykis’s observation that counting users is hard. I really feel for the data teams at Twitter. If the Delaware Chancery Court forces Musk to close and buy a company he no longer wants they’re going to be in for an unpleasant time.


  1. For me it is a red flag when companies present themselves in terms of total user base or total downloads instead of activity (depending on the company/industry of course). ↩︎

  2. If humans are operating spam accounts and happen to be reading tweets then they probably should count as mDAU because they’re still seeing ads. ↩︎