Data augmentation is a technique used in machine learning to increase the diversity of data available for training models without actually collecting new data. This is achieved by making modifications to existing data such as rotating, cropping, or altering the colors of images, or by introducing variations in text data. The purpose is to make the model more robust and improve its ability to generalize from the training data to new, unseen data.