I’m Colten Webb, an MSCS student in the College of Computer Science at Georgia Tech.

How I'm using ChatGPT

Posted on January 9, 2023

I see lots of hype and lots of skepticism towards ChatGPT. Most of the skepticism at this point seems rooted in two claims:

ChatGPT doesn’t have a mechanism for verifying the truth of what it generates. It does sound confident though.
Businesses built on GPT 2 haven’t been particularly successful, except in the copywriting/marketing niches. Until we see businesses use LLMs in other domains, this will look a lot like web3, a bubble.

Proponents of ChatGPT think it is something like a better alternative to search, believing it can provide a new interface for the internet.

I see two use cases here, where search solves one and ChatGPT solves the other. Both the skeptics and proponents of LLMs seem to equate these use cases, so I think both their conclusions are wrong (or maybe they’re right, but for the wrong reasons).

The first use case is when you need the answer to something specific. These are the sorts of questions you might use wikipedia or a how-to video answer. Since you are trying to do something with the information, you need the source to be accurate, or to have a quick and easy heuristic to verify the accuracy of the information. These are problems solved by like buttons and brands. Because of this need for a quick-and-dirty way to gain trust, Google search is a great solution. Google doesn’t need to gain your trust, it needs to connect you with sites that can gain your trust.

The second use case is when you need to answer questions that are more open-ended or ambiguous. These are exactly the sorts of questions a researcher, writer, or any otherwise curious person asks! More specifically, LLMs are great at providing context for subjects one isn’t yet familiar with. Although many are worried that LLMs generate text with innacurate information, this looks to me more like a robustly censorship-resistant system. OpenAI even tried to prevent people from getting answers to certain questions, but ChatGPT was jailbroken within hour of of its release. So if you are new to a subject, it makes a lot of sense to query ChatGPT about it. Then, once you have a list of topics of interest, use reputable sources (however you define them) to learn more. In the worst case, you spend time studying something the was unrelated, but you will be immune to inaccurate information.

This is how I use ChatGPT. For example, when I was studying wavelet trees, I queried ChatGPT for related topics to see their use cases and similar data structures. If I had googled “topics related to wavelet trees,” I would have had less success.

ChatGPT is useful right now for learning. Therefore it is already unreasonable to compare LLMs to web3, and silly to believe they will only be useful to copywriting. It’s inability to verify its own information means it isn’t yet a replacement for search either. But it has created a niche for itself that is useful.

You got 20.23% on an exam, oof.

Posted on January 6, 2023

What’s the minimum number of questions you got right? An almost equivalent question, slightly less optimistic, is how many questions were on the test?

We can use the Stern-Brocot tree to answer this. The Stern-Brocot tree produces reduced fractions, where the in-order traversal of every subtree is increasing. Also the parent of every node has a numerator and denominator no bigger than those of the node. Therefore we can binary search the Stern-Brocot tree to find the most reduced fraction in a range.

If we make the starting matrix

$$S= \begin{pmatrix} 0 & 1 \\ 1 & 0 \end{pmatrix}$$

then we can move to the left child by multiplying with

$$L= \begin{pmatrix} 1 & 1 \\ 0 & 1 \end{pmatrix}$$

and to the right child with

$$R= \begin{pmatrix} 1 & 0 \\ 1 & 1 \end{pmatrix}$$ The following python code does this,

import numpy as np

lo = .20225
hi = .20235

S = np.array([[0,1],[1,0]])
L = np.array([[1,1],[0,1]])
R = np.array([[1,0],[1,1]])
E = np.array([[1],[1]])

cur = S

def eval(mat):
    mediant = (mat @ E)
    p = mediant[0,0]
    q = mediant[1,0]
    return p / q

while not (lo <= eval(cur) < hi):
    if eval(cur) < lo:
        print("R", end="")
        cur = cur @ R
    else:
        print("L", end="")
        cur = cur @ L

print(cur @ E)
print(eval(cur))

Which spits out the fraction 35/173! Therefore, you got at least 35 questions right, on an at least 173 question exam!

[For more info, see wikipedia]

Is the universe $H_n=o(1)$?

Posted on January 3, 2023

…where $H_n$ is the nth order empirical entropy (of everything!).

When compressing text (possibly with the help of a wavelet tree), you can use the statistical distribution of characters in the text to be more efficient. The 0th-order empirical entropy ignores context, meaning the characters of a sentence could be shuffled without affecting it. Intuitively this is better than nothing, since in english sentences “z” is less common than “e.” The 1st-order empirical entropy takes into account the preceding character. This means “qu” can be stored about as efficiently as “q” alone, since “u” almost always follows “q.” Then $H_2$ takes into account the preceding 2 characters, and so on. If $H_n \to 0$, then the system can be approximated arbitrarily well, you just need a big enough markov model.

So what I’m asking is, given enough historical context, could the next state of the universe be perfectly predicted? The answer to this question probably lies somewhere in quantum physics. If there is any true randomness at all, the answer is no.

What's empirical entropy of wikipedia?

Posted on January 3, 2023

Suppose you create a markov model empirically based off the first 100MB of wikipedia, and use it to measure the nth order entropy of the text bytewise. I got the following values:

Order	Entropy
0	5.08014
1	3.88591
2	3.08797

These data make sense because we would expect the entropy to decrease as one gains more context. But this was calculated for all 256 values of the bytes in the utf-8 encoded text. What happens to the entropy if we only look at alphanumerical characters?

Order	Entropy
0	4.71990
1	4.03609
2	3.52428

The entropies still decreased with increasing order, so these results are sane. However, higher order entropies increased and the 0th-order decreased relative to before. The smaller 0th-order entropy is expected, because there are much fewer characters competing. The higher order entropies increased because the context available for predicting the next character is smaller, reduced from 256^n to 62^n.

Bonus: Classic latin writing sometimes employed scripta continua where no word dividers or punctuation was used. Very nice and artsy, but more difficult to read, which we can prove with entropy: By reintroducing spaces, we can decrease $H_2$ from 3.52 to 3.18!

Wavelet trees

Posted on January 1, 2023

A wavelet tree is a data structure that stores a string and lets you do two special queries, rank and select, quickly. The rank query takes two arguments, a symbol and an index, and returns the number of occurrences of that symbol to the left of the index in the string. The select query takes a symbol and a number k, and returns the index of the kth occurrence of the symbol.

Though wavelet trees are often used for text-based applications, the alphabet can represent integers or floats as well, making it useful in some situations that would otherwise require complicated and memory-intensive segment trees. For example, it can also support range quantile queries (what’s the kth smallest element in a range of indices), and range counting queries (how many elements between x and y are in a range of indices). These would usually be done with something like a 2d segment tree, or a sweep-line algorithm if you can reorder the queries in batches. Wavelet trees can be extended to support some other update queries, like push to front or back and toggle element.

Quick and dirty python implementation for rank query:

from itertools import accumulate
# T = (m, C, left, right)
def build(inp, m1, m2):
    m = (m1+m2)//2
    c = [0]*len(inp)
    inp1 = [x for x in inp if ord(x) <= m]
    inp2 = [x for x in inp if ord(x) > m]
    c = list(accumulate([1 if ord(x) <= m else 0 for x in inp]))
    if (m1==m2):
        return (m, c,(),())
    return (m, c, build(inp1,m1,m), build(inp2,m+1,m2))

# occurences of q to the left of idx (inclusive)
def rank(T, q, idx):
    if idx == -1:
        return 0
    if T[2] == () and T[0] == ord(q):
        return 0 if len(T[1])==0 else T[1][idx]
    if ord(q) > T[0]:
        return rank(T[3],q,idx - T[1][idx])
    else:
        return rank(T[2],q,T[1][idx]-1)

I like this because it is conceptually simple, has several applications, and appears underrated.

https://users.dcc.uchile.cl/~jperez/papers/ioiconf16.pdf https://users.dcc.uchile.cl/~gnavarro/ps/cpm12.pdf