Discrete vs. Continuous: A Tale of Two Approaches to Language Modeling
Balancing n-gram models and neural networks for next-token prediction
By Taewoon Kim
Modern Natural Language Processing (NLP) revolves around language modeling—the art
of predicting the next token given the previous ones. Formally, if we have a sequence of
tokens \(w_1, w_2, \dots, w_{n-1}\), we want to learn:
[Read More]