So far, we’ve focused on Supervised Learning. Now, we enter the world of Unsupervised Learning, where the machine is given data without any labels and must find patterns on its own.
What is Unsupervised Learning?
Unlike supervised learning, there is no “teacher” or “answer key.” The algorithm looks at the input data and tries to find structure, such as grouping similar items together.
There are two main types: 1. Clustering: Grouping similar data points. 2. Association: Finding rules that describe your data (e.g., people who buy X also buy Y).
Clustering with K-Means 🎲
The most popular clustering algorithm is K-Means. It aims to partition the data into K clusters.
How it Works:
- Initialize: Randomly place K points called Centroids.
- Assign: Assign each data point to the nearest centroid.
- Update: Calculate the center (mean) of all points in each cluster and move the centroid to that new center.
- Repeat: Keep assigning and updating until the centroids stop moving.
Measuring Success: SSE
How do we know if our clusters are good? We use the Sum of Squared Error (SSE). This measures the total distance between every point and its assigned centroid. A lower SSE means the clusters are tighter and more accurate.
Bisecting K-Means
If you want even better results, you can use Bisecting K-Means. It starts with one giant cluster and repeatedly splits the cluster with the highest SSE into two, until the desired number of clusters (K) is reached.
Summary
Unsupervised learning is powerful for: * Customer Segmentation: Grouping customers by buying habits. * Anomaly Detection: Finding data points that don’t fit into any cluster (potential fraud). * Data Compression: Representing a large dataset with a few cluster centers.
Exercise: Look for a visualization of the K-Means algorithm online. Seeing the centroids “dance” into position makes the logic much clearer!
Suggested Posts
Why Nesterov Accelerated Gradient Converges Faster Than Momentum
Gradient-based optimization lies at the heart of modern machine learning. From linear regression to ...
Chat with your own data/text files using chatGPT and LangChain
There are many different approaches to chatting with your own data. One way is to upload your PDF t...
Understand langChain to Quickly Get Started with Production Codebases
LangChain is a superb library to productionize LLMs. It has few concepts which make it great. If you...