What is Bayes rule used for?
What do each of the terms in Bayes' rule signify?
What does the 'N' in 'N-grams' indicate?
In the illustration, say we're in the state marked 'went to'. Assuming a deterministic model, what state will we be in if the next input is 'the'?
In the illustration, say we're in the state 'to the'. How much does the model remember about the state we were in before that?
Since the higher the 'N' in our N-gram language model, the higher the accuracy, why wouldn't we want to use, say 100-grams to model a language?
What is the sparse data problem?