Notes on Self-Supervised Learning vs Unsupervised Learning

https://www.ibm.com/think/topics/self-supervised-learning

Self-supervised learning is a machine learning technique that uses unsupervised learning for tasks that conventionally require supervised learning.

… self-supervised learning is a technically a subset of unsupervised learning … related to supervised learning …

https://ai.stackexchange.com/questions/40341/what-is-the-difference-between-self-supervised-and-unsupervised-learning/40353#40353

In self-supervised learning (SSL) you use your own inputs ( x ) (or a modification, e.g. a crop or with data augmentation applied) as the supervision. Instead, in unsupervised learning (UL) there is no supervision at all.

To clarify, both SSL and UL have in common the fact that the targets are missing. UL has no explicit supervision, while SSL replaces the targets with the inputs ( x ), recovering ‘supervision’.

SSL is mostly used for pre-training, and representation learning. So to bootstrap some model on a later downstream task.

UL, at least in classical ML, for density estimation, dimensionality reduction and clustering.

An important thing is not to confuse self-supervised with semi-supervised or weakly-supervised: the latter two (semi- and weak-) refer to the fact that in a dataset ( D ) some examples ( x ) are not labeled, but the ( y ) exist.

So, you can see SSL at the intersection between supervised and unsupervised learning. Actually, things got even more shaded in modern unsupervised deep learning methods that tend to mix approaches from both SSL and UL, like an AE that also have a density estimation head for example. Or even embeddings that are first learned by SSL and then fine-tuned for clustering in an unsupervised manner.

An unusual example is maybe unsupervised reinforcement learning, in which you maximize usually an entropy objective (e.g. on visited states) as a pre-training step to favor exploration.

What is an example of unsupervised learning that is definitely not self-supervised learning?

Density estimation, dimensionality reduction (e.g. PCA, t-SNE), and clustering (K-means), at least seen from a classical ML prospective are completely unsupervised: e.g. PCA tries just to preserve variance. Indeed, in DL things tend to blurry: e.g. you can use a V/AE for dimensionality reduction too.

https://mljourney.com/self-supervised-learning-vs-unsupervised-learning/

Key Differences: Self-Supervised Learning vs Unsupervised Learning

Feature	Self-Supervised Learning	Unsupervised Learning
Labeling Process	Creates labels from raw data	No labels used at all
Goal	Learns meaningful representations for downstream tasks	Finds patterns and structures in data
Common Use Cases	NLP, Computer Vision, Pretrained Models	Clustering, Dimensionality Reduction, Anomaly Detection
Requires Pretraining?	Yes, followed by fine-tuning	No pretraining required
Scalability	Requires more computation for self-labeling	More scalable for large datasets
Examples	BERT, GPT, SimCLR	K-Means, PCA, DBSCAN