Longest Run of Heads

Tossing a Coin

By Silvana Acosta in Statistics Coding

January 21, 2020

A coin is tossed \(n\) times, thereby generating a sequence of heads (H) and tails (T). A subsequence of consecutive heads is called a run of heads. Consequently, the following sequence of length \(n = 20\):

\begin{equation} HHTHTTHHHHTTTHHTTTTH
\label{eq:tosses} \end{equation}

contains 5 runs of heads with lengths 2,1,4,2,1, respectively. The longest run of heads is \(M_n = 4\).


If we toss a coin 1000 times and get that the largest number of heads in a row was 6, is this a fair coin? What is the expected number of tosses to get a longest run of heads of 6?

Generate Tosses

Since we will be simulating we first set a seed to be able to later reproduce the results we get:

my_seed <- 3
set.seed(my_seed)

We now write a function that generates random sequences of heads and tails for any value of \(n\). We simulate just 15 tosses (n_toss=15), not even 1000, using rbinom and we inspect them:

n_toss <- 15 # select your desired number of trials n
p_head <- 0.5 

tosses_b <- rbinom(n = n_toss, 
                   size = 1, 
                   prob = p_head)

tosses_b # inspect the 15 bernoulli trials
 [1] 0 1 0 0 1 1 0 0 1 1 1 1 1 1 1

We could have also done this using the sample function to simulate \(H\) and \(T\) uniformly and compute the frequency table to count cases.

code
tosses_u <- sample(c("T", "H"), # unif sampling: 0.5 to each case
                   n_toss,
                   replace = T) # to avoid getting seq prob (weighting by remaining obs)
tosses_u
 [1] "H" "H" "T" "H" "T" "H" "T" "T" "H" "T" "T" "T" "H" "H" "H"
table(tosses_u) # freq table to inspect tosses
tosses_u
H T 
8 7 

Compute Run

We write a function to compute the length of the longest run of heads in an arbitrary sequence of heads and tails. But we could have just used the rle function (stands for Run Length Encoding).

get_longest_run <- function(tosses) {
  
  cum_heads <- vector()
  cum_heads[1] <- tosses[1] # the 1st cum value is the initial toss
  
  for (n in 1:(length(tosses) - 1)) {
    
    if (tosses[n] == 1 & tosses[n + 1] == 1) { # if next is also H 
      
      cum_heads[n + 1] <- tosses[n + 1] + cum_heads[n] # keep adding cum H
      
      } else {
        
        cum_heads[n + 1] <- tosses[n + 1]
        
      }
    }
  
  return(max(cum_heads, na.rm = T))
  
  }

get_longest_run(tosses = tosses_b)
[1] 7

The rle function computes the lengths and values of runs of equal values in a vector. Then, applying the max function to the rle result, you can get the longest numbers of heads and the longest number of tails.

code
tosses_rle <- rle(tosses_b)
tosses_rle$values
[1] 0 1 0 1 0 1
# How many times do each of these two values above appear in tosses_b?:
tosses_rle$lengths 
[1] 1 1 2 2 2 7
# Identify max length for each of these two values:
max_lengths <- tapply(tosses_rle$lengths, tosses_rle$values, max)
max_lengths
0 1 
2 7 

Expectation & Fairness

So it could be that the coin is fair if we tossed it 1000 times and observed the longest run of heads to be \(M_n = 6\). The one I simulated above is fair (the probability of each binary outcome is equal to 0.5), and it has been flipped only \(n=15\) times, and I still obtained \(M_n=7\).

Every two flips we expect to get one head and one tail, but of course both can turn out to be heads. Then we would have a run of heads of 2. And we could have got one of 6. On average we would not expect to get “too many” heads or tails in a row, particularly with very few tosses, but it can happen.

What is the expected number of coin flips until we get 6 heads in a row?

You fail as soon as you get a tail, so you have to keep tossing when this happens. Consider the outcomes T, HT, HHT, HHHT, HHHHT, HHHHHT, HHHHHH. Note that the outcome T contains TT, TH, and all other outcomes that start with T. And HTT is contained in the outcome HT, etc.

The first outcome implies that you will need to keep tossing the coin and you have already tossed it once: \(n_h+1\). This happens with probability 1/2. The second outcome implies that you will need to keep tossing the coin some \(n_h\) times, and you have already tossed it twice: \(n_h+2\). This happens with probability 1/2*1/2=1/4. Reasoning analogously for the other outcomes, the expected tosses \(n_h\) is given by:

\begin{equation} \label{eq:exptosses} n_h = \sum_{i=1}^{i=6} \frac{1}{2^i}(i+n_h) \end{equation}

eq <- function(n_h) 
  1/2*(1+n_h)+1/4*(2+n_h)+1/8*(3+n_h)+1/16*(4+n_h)+1/32*(5+n_h)+1/64*(6+n_h)+1/64*(6)-n_h

n_h_sol <- uniroot(eq, c(0, 500))$root # uses bisec method and needs a plausible interval

n_h_sol
[1] 126