Like many these days, I’ve been having my Twitter feed filled up with endless tweets of “Wordle”. I was reluctant to learn what Wordle was until recently (mainly due to some hipster based stubbornness of not wanting to go with trends probably). My main hangup with these simple games is, rather appropriately, simple: I don’t find them fun. What I do find fun however, is learning how they work so that I can cheat at them. You play your way, I’ll play mine.

So, I wrote a script that allows me to cheat at Wordle. In this post, I want to cover how it works and the results that have come out of it.

The code accompanying this blog post can be found at https://github.com/sinkingpoint/blog-codes/tree/master/wordle-cheating

A quick run down of Wordle

For those of you who are like me and are stubbornly refusing to work out what those colored squares in your Twitter feeds are, here’s a quick run down of the game:

Every day there’s a newly chosen 5 letter word, and you have six guesses to work out what the word is. When you make a guess, each character in your guess is marked a color. Grey means that letter is not in the word, yellow means the letter is in the word but in a different place, and green means that the letter is correct and in the right place. Your goal is to work out what the word is with the least number of guesses. Not unlike Hangman etc.

A Boring Method of Cheating

Of course, the easiest course of action is to work out the word in a private browser, and then immediately input the word correctly elsewhere. That’s pretty boring, and it’s obvious that you’re cheating. It also requires no skill, so I’m not going to talk about it.

Getting a Word List

For my solution, the first thing I needed was a way to work out what potential words were available for Wordle to choose; that would give us a finite list we could whittle down into the correct one. One interesting feature of Wordle is that if you enter a set of characters that is not a valid word into the input, then the input is rejected as such. This response was too quick to be making a network request (especially on my terrible Australian Internet), so I figured it must be doing something locally.

Wordle serves its code in a single minified Javascript file. An unminified copy can be found in the above repo. The first thing you notice is that its 14000 lines long, and 12000 of those are two giant arrays of words, Ta and Ca (although these change depending on the minification). At first I thought that there was something to these being two seperate arrays, but no; they’re both checked when checking if its a valid word:

if (((e = s), !Ca.includes(e) && !Ta.includes(e))) return a.setAttribute("invalid", ""), void this.addToast("Not in word list");

If I had to guess, this was the minifier splitting the list into two chunks of some maximum length but I can’t be certain of that (if you know otherwise, please let me know!)

Edit: I’ve been informed that Ta is the list that the correct words are chosen from, where as Ca is the list of words that can be inputted that aren’t selectable words. That means that only ~2300 of our ~13000 words are actually potential solutions! That opens up a whole world of optimisations.

So, I thought I was going to have to go through a dictionary and query an API, but here we have a complete list of words right there for us. Convenient. I’ve included a sorted list of all 12972 words in the above Git repo

What word to pick?

Now that we have all the potential words it could be, how can we choose our guesses strategically? This was my thought process: Letters are much more likely to be rejected than accepted; there’s at most 5 letters in a word, and 26 letters to choose from. This gives a 1/5 chance that any given letter is in the word on the first go. Thus, with the likelihood that each letter we choose is going to be incorrect, I want to find the word that will reject the most number of words if all its letters are wrong.

To this end, I came up with the following score functions:

def count_chars(words):
    counts = Counter()
    for word in words:
        for c in set(word):
            counts[c] += 1

    return counts

def score_words(words):
    counts = count_chars(words)
    scores = []
    for word in words:
        score = sum(counts[c] for c in set(word))
        scores.append((word, score))
    
    return scores

First, we count the number of words that each character appears in in our word list (in the count_chars function) to come up with a “character score”, then each word is assigned a score based on the sum of its unique character scores. For a concrete example:

Let’s say the character scores of a, b, c, d, and e were as follows:

a = 1
b = 2
c = 3
d = 4
e = 5

Then the score of the word abcde would be 1 + 2 + 3 + 4 + 5 = 15. The score of the word aaabb would be 1 + 2 = 3 because of the repeated characters.

Scoring words like that, my system comes up with “aeros” as the best word to choose with zero knowledge. This sort of makes logical sense to me - it has 3 whole vowels, which would eliminate a large number of words if those were wrong.

Iterating on our guess

So, we start each round by guessing “aeros”. Where do we go from there? Well first, we have to input those characters into Wordle and get our prompts out:

Our guess of aeros, with a and r in the word in the wrong position, and e, o and s not being in the word

So, we know that a and r are in the word somewhere. Inputting that into our system, we can start to pare down the list of words. In particular, we can store the letters that are definitely not in the word, the letters that are in the word and the positions they are not in (the yellow results), and the letters that are in the word with their known positions. From there, we can start to trim down the list of potential words, which I wrote the following function to do:

def still_valid(word, no, maybe, yes):
    # If the word contains any chars that we have already
    # found don't exist, it's not valid
    if any(c in word for c in no):
        return False
    
    # If a word contains a different letter in a place where
    # we already know the letter, it's not valid
    if any(word[i] != c for c, i in yes):
        return False
    
    # For every position we have the character in the wrong place
    # if the word has that character in that place, or if the word _doesn't_
    # contain that character, it's not valid
    for c, positions in maybe:
        if any(word[i] == c for i in positions):
            return False
        
        if c not in word:
            return False
    
    return True

This takes our above lists, and a word and returns if that word is still valid given the known information. The great thing about this is that we only eliminate words - no words can come back given new information. Thus, with our above result for “aeros” we have trimmed our list down by a whopping 12683, down to 289. Our next guess, given the above scoring and the remaining words is “liart”, followed by “grand”, and finally the word: “crank”.

The whole game played out with the above guesses

Potential improvements

I’m under no illusion that this is an optimal solution - it was hacked out over a drink and half an hour of iteration. That being said, it consistently gets the answer out in 5 or less guesses (generally 3-4), over the last couple of days at least. Off the top of my head, there’s a few basic improvements we could make:

  • Because I’m counting the scores of each individual character in a word, there’s bound to be some double counting where two letters not existing both discount the same word, so words could be inflated in their scores
  • Because our chance of having correct characters improves over time, it would make sense to switch to an opportunistic guess where we assume characters are correct, rather than that they are incorrect and adjust our scoring respectively
  • New: Because only a subset of our words are actually potential solutions, after a number of guesses (or perhaps once we’ve narrowed it down to a certain number of potential solutions), we should start only suggesting words from the real solutions set

What have we learned?

This was an interesting quick exersise in automating a simple game. That being said, we learned quite a bit: Unminimising a Javascript file, probabilities, and optimisation techniques! If anyone has competing algorithms, it would be fascinating to compare techniques - maybe we could even get a bit of a competition going, running algorithms against each other to find out which can find the word the fastest?

I'm on Twitter: @sinkingpoint and BlueSky: @colindou.ch. Come yell at me!