https://www.ibm.com/think/topics/self-supervised-learning
Self-supervised learning is a machine learning technique that uses unsupervised learning for tasks that conventionally require supervised learning.
… self-supervised learning is a technically a subset of unsupervised learning … related to supervised learning …
In self-supervised learning (SSL) you use your own inputs ( x ) (or a modification, e.g. a crop or with data augmentation applied) as the supervision. Instead, in unsupervised learning (UL) there is no supervision at all.
To clarify, both SSL and UL have in common the fact that the targets are missing. UL has no explicit supervision, while SSL replaces the targets with the inputs ( x ), recovering ‘supervision’.
- SSL is mostly used for pre-training, and representation learning. So to bootstrap some model on a later downstream task.
- UL, at least in classical ML, for density estimation, dimensionality reduction and clustering.
An important thing is not to confuse self-supervised with semi-supervised or weakly-supervised: the latter two (semi- and weak-) refer to the fact that in a dataset ( D ) some examples ( x ) are not labeled, but the ( y ) exist.
So, you can see SSL at the intersection between supervised and unsupervised learning. Actually, things got even more shaded in modern unsupervised deep learning methods that tend to mix approaches from both SSL and UL, like an AE that also have a density estimation head for example. Or even embeddings that are first learned by SSL and then fine-tuned for clustering in an unsupervised manner.
An unusual example is maybe unsupervised reinforcement learning, in which you maximize usually an entropy objective (e.g. on visited states) as a pre-training step to favor exploration.
What is an example of unsupervised learning that is definitely not self-supervised learning?
Density estimation, dimensionality reduction (e.g. PCA, t-SNE), and clustering (K-means), at least seen from a classical ML prospective are completely unsupervised: e.g. PCA tries just to preserve variance. Indeed, in DL things tend to blurry: e.g. you can use a V/AE for dimensionality reduction too.
https://mljourney.com/self-supervised-learning-vs-unsupervised-learning/
Key Differences: Self-Supervised Learning vs Unsupervised Learning
| Feature | Self-Supervised Learning | Unsupervised Learning |
|---|---|---|
| Labeling Process | Creates labels from raw data | No labels used at all |
| Goal | Learns meaningful representations for downstream tasks | Finds patterns and structures in data |
| Common Use Cases | NLP, Computer Vision, Pretrained Models | Clustering, Dimensionality Reduction, Anomaly Detection |
| Requires Pretraining? | Yes, followed by fine-tuning | No pretraining required |
| Scalability | Requires more computation for self-labeling | More scalable for large datasets |
| Examples | BERT, GPT, SimCLR | K-Means, PCA, DBSCAN |