Okay, to kick things off on this forum, I figured I’d post something I wrote up a while back and never actually posted on /r/badmathematics. If you’re a stats person, feel free to correct me in the comments if I fucked something up.

Also, I know this is likely somewhat basic to most the users coming over here, but I envision a good badmathematics post as being written to someone totally unfamiliar with the math in question, as much as can be done, so hopefully this hits that mark.

I wrote this up back when Elon Musk was trying to get out of his contract to buy twitter, so if we’ll harken back to that age:

Legal Woes

If you guys haven’t been following the Elon Musk Twitter Saga, you should, because it’s been pretty entertaining. Elon Musk put in an offer to purchase twitter for 44 billion dollars. He made an offer that was staggeringly in twitter’s favor, Twitter accepted, and they signed a contract to lock everything in. (The contract itself contains such amazing terms as: Elon waives all due diligence, Elon agrees that he can be forced to go through with the purchase instead of paying monetary damages for breaking the contract, and Elon agrees that if the deal fails to go through on his side for any other reason, he will pay a 1 billion fee to Twitter).

Now, Elon isn’t quite so eager to go through with this deal after all, and he filed notice that he was terminating the deal due to, in part, Twitter allegedly misrepresenting the number of bots there were on the site. Naturally, Twitter rebutted with a hearty “you can’t do that as per the contract” and started legal proceedings. Their complaint against Elon is here.

To date, Twitter and Elon have had a couple of exchanges in court filings, and it’s one particular claim by Elon, as well as all the people on Twitter I see repeating it, that brings this saga into relevancy for /m/badmathematics. We can see the claim in this tweet.

Elon objects to Twitter’s method for estimating the number of bots on Twitter. Twitter is estimating the number of bots in their set of “monetizable daily active users” or mDAU. Basically, mDAU refers to all accounts who use or view twitters that could hypothetically advertised to. Obviously, if Twitter knows an account is a bot, it’s not counted as an mDAU. Estimating the number of bots in mDAU is essentially estimating how many bots slip through their normal filtering systems. For years, Twitter has estimated bots to make up about 5% of mDAU (and has said as much in SEC filings), and this 5% figure is what Elon is using try and terminate the deal. Specifically, he thinks this number is bogus and the way Twitter calculates it is bad. (None of that will probably end up mattering, as far as the legal side of things go, because Elon waived due diligence, so he needs to clear an incredibly high bar to show this matters. )

But We’re Here for the Math

Elon says that Twitter’s process for making this estimate is bad. Twitter randomly samples 100 mDAU accounts every day, a human reviewer makes a judgement on which ones are bots, and Twitter uses those numbers to make their estimate. Our question is “is Elon right that this process is unreasonable?”.

The major objection which Elon makes in his filing is with regard to the sample size. 100 users a day is 0.00005% of daily users! Surely you can’t make any judgements from this! Well, they test 100 accounts per day, but those add up to a sample of 9,000 accounts per quarter. Is 9,000 accounts enough to get a decent estimate?

The Joys of Stats

In this case, we have a series of trials where we look at a random account and go “bot” or “not bot”. Let’s say if an account is a bot, we give it the number 0, and if a bot is a human, we give it the number 1. The Law of Large Numbers states that if we take the mean of n samples, the mean of the samples will converge to the mean of the overall population as n gets larger. Notice that this makes no reference to the total population size. All that matters is the number of samples you take. In Twitter’s case, they took 100 samples a day, or 9,000 samples a quarter, or 36,000 samples a year, and they got a sample mean of roughly 0.95, since they reported 5% bots and 95% humans. Should we expect these percentages to be close to the real percentage?

The Central Limit Theorem states that a collection of sample means (with some scaling and translation) will form a normal distribution as your number of samples goes to infinity. The normal distribution is your classic bell curve type distribution centered on zero. It’s very well studied and has a lot of tools, and because of the central limit theorem, we can use them to examine twitter’s data. (Also, notice again: no reference to the total population size. It’s all about the number of samples.)

Most importantly for us, we can calculate the confidence interval. The idea here is that we can calculate an interval of percentages such that (with a given level of confidence x%), this interval will contain the true percentage of bots x% of the time. Let’s aim for 90% confidence. That is to say, let’s calculate a confidence interval such that if we were to repeatedly run the tests over and construct this confidence interval anew each time, we would expect it contain the true percentage of bots 90% of the time.

Calculating this for just one day of Twitter’s process (100 samples, 5 bots), we can find a 90% confidence interval of [1.42% , 8.58%].

If we calculate the 90% confidence interval for a whole quarter of testing (9000 samples, 8550 bots), then the range gets considerably tighter. Now, we get a 90% confidence interval of [4.62% , 5.38%].

If we go for a whole year of testing? 36,000 accounts with roughly 1800 bots found? Then we get a 90% confidence interval of [4.81% , 5.19%].

Granted, this is said with caveats. Statistics is remarkably finicky, and there are tons and tons of things you can quibble about. Were these tests independent? Does it appreciably matter that 9000 samples were taken over 90 days? That being said, the idea that the sample size is even remotely a concern with Twitter’s bot estimation is a pipe dream, as far as the legal situation goes.

And if you think I’m just wrong about all this (which is a very valid concern, since I have no clue what I’m talking about), we can always just look at what the people who know what they’re doing and see how they handle it. The Pew research center, for instance basically does nothing but surveys. How do they handle sample size? They do a bunch of surveys for the entirety of the US, which has a current population of 329 million. Twitter has an estimated 229 million mDAU, so they’re in the same ballpark. Pew does its US surveys by polling the American Trend Panel, which is a panel of roughly 10,000 US adults, selected from the US Postal Service’s address files. What’s more, they don’t even necessarily poll the entire panel for all surveys, with many surveys only polling a fraction of the panel. So, they’re certainly not afraid to extrapolate from a few thousand samples to a population of hundreds of millions. It’s probably a good signal that Twitter’s similar extrapolation isn’t suffering from some drastic issues of sample size.

Conclusion

The sample size Twitter is working with is more than fine to make judgements about the number of bots on Twitter. I don’t think the confidence intervals I calculated are perfect representations of the underlying statistics, but I don’t think it’s likely that a proper statistical analysis would find wildly different numbers. It’s also not very important to the overall legal shenanigans. The sort of bot population Elon would need to be on Twitter to even have a chance of this contract possibly being overturned is probably a solid order of magnitude or more greater than the 5% Twitter estimates. The contract Elon signed is very tilted against him, and he’d need it to amount to what’s known as a Material Adverse Effect (if you wanna look that up). I’m not really sure there is a number of bots on Twitter that could amount to that.

Other quick things I was going to talk about but then forgot

It’s funny for Elon to write about 100 samples being too small (and also something so shocking that it would leave him flabbergasted), when he recommended doing that exact same thing.