Perplexity is a measurement used in natural language processing to quantify how well a probability model predicts a sample. It helps in evaluating language models by calculating how confused the model is when making predictions, where a lower perplexity indicates a model that can predict the sample more accurately.