By Philipp Zimmermann in Artificial Intelligence — Dec 30, 2023

Unsupervised Machine Learning

AI generated image by DALL·E

In todays post we will focus on a specific learning method called "unsupervised" machine learning. It is one of four learning methods, the other three being supervised learning, semi-supervised learning and reinforcement learning.

In simple words...

Unsupervised machine learning can be likened to a fun game of sorting colorful candies without any labels or instructions. Imagine you have a big bag filled with candies of various shapes and colors, but you have no idea how many different types there are or which ones are similar. Your task is to group these candies based on their similarities, all without anyone telling you what each group should contain.

To get started, you begin by examining the candies one by one. You start noticing that some candies share similar colors or shapes. So, you decide to put those candies together in separate piles. As you continue, you realize that some piles are getting bigger because more candies with similar traits keep popping up.

However, you don't have names or categories for these piles yet; you're just grouping them based on visual similarities. This process of organizing candies into piles without prior knowledge or guidance describes unsupervised machine learning.

As you keep sorting and grouping, you start to see distinct patterns emerge. Some piles contain candies with similar colors, some with similar shapes, and others might have a mix of both. The interesting part is that you didn't need any predefined labels or examples; you discovered these patterns all on your own.

In unsupervised machine learning, algorithms work in a similar way. They analyze data without any predefined labels or categories. Instead, they group similar data points together based on shared characteristics or patterns. Over time, the algorithm identifies these clusters and can help uncover hidden insights within the data, such as grouping customers with similar shopping habits or finding trends in unstructured data like text documents.

Pros

Discover Hidden Patterns
Unsupervised learning can reveal hidden patterns, structures, or relationships within data that may not be apparent through manual inspection.
No Labeling Required
Unlike supervised learning, unsupervised learning doesn't require labeled data, making it applicable to a wider range of problems where obtaining labeled examples can be challenging or expensive.
Scalability
Unsupervised learning algorithms can handle large datasets and are often scalable to process vast amounts of information efficiently.
Exploratory Analysis
It serves as a valuable tool for exploratory data analysis, allowing data scientists to gain insights and a better understanding of the data before formal modeling.
Anomaly Detection
Unsupervised learning can identify anomalies or outliers within data, which is critical for fraud detection, quality control, and cybersecurity.
Clustering
Unsupervised learning algorithms can group similar data points into clusters, helping in customer segmentation, recommendation systems, and image segmentation.

Cons

Interpretability
The clusters or patterns discovered may not always have clear, human-interpretable meanings, making it challenging to extract actionable insights.
No Ground Truth
Since there are no labeled examples to compare against, it can be challenging to assess the accuracy of unsupervised learning models objectively.
Subjectivity
The choice of algorithms and hyperparameters can be subjective, leading to variations in results depending on the practitioner's decisions.
Quality of Clustering
The quality of clustering results can vary based on the algorithm chosen and the initial conditions, requiring careful evaluation and validation.
Computational Complexity
Some unsupervised learning algorithms can be computationally intensive and time-consuming, especially for large datasets.
Curse of Dimensionality
Unsupervised learning can struggle with high-dimensional data, as the data space becomes increasingly sparse, making it harder to find meaningful patterns.

Thank you for reading this article. I hope you enjoyed it and if there are any questions regarding this topic feel free to drop a comment below. If you want to continue your learning journey with more basics on machine learning have a look at the following page where I keep all my AI articles organized.

Citation

If you found this article helpful and would like to cite it, you can use the following BibTeX entry.

@misc{
	hacking_and_security, 
	title={Unsupervised Machine Learning}, 
	url={https://hacking-and-security.cc/unsupervised-machine-learning}, 
	author={Zimmermann, Philipp},
	year={2023}, 
	month={Dez}
}