Machine Learning: Supervised or Unsupervised

Machine Learning: Supervised or Unsupervised

Machine_Learning_for_Start-Ups__Developers__and_ManagersMore and more enterprises these days are investing resources in machine learning techniques, to understand more about the data and make better decisions to help clients. The advent of many big data technologies has only helped this cause as the ability to process and understand data has become faster and cheaper. In this blog, we would like to help you identify if and which machine learning approach you should be applying for your problem

Most common use cases of machine learning are to automatically build relationship between an entity and a group or a label.
For example – Identifying whether an insurance claim is fraud or not based, on the claim details. Here the “claim” is an entity and the groups are fraud or not-fraud.

In order for the algorithm to do this, it needs an experience with data to understand the patterns present and make a calculated decision. The process of providing this experience is called as training. Generally, the more training a system is provided, the more accurate its results are.

The groups, in which the entity has to be put, maybe known earlier or may not be known. This is the distinction, which brings us to our two main types of learning algorithms – Supervised and Unsupervised.

Supervised Learning
In this technique the groups are known and the experience provided to the algorithm is the relationship between actual entities and the group they belong to. This is called supervised because the machine is told who is what, a significant number of times, and then is expected to predict this on its own.
The claims example above is an example of Supervised learning. Below are few more examples –
– Identifying if a news article belongs to a sports news or politics
– Classify an animal in one of the predefined classes like mammal, bird etc.
– Classify a person as male or female based on the products bought by the user.
There are many open datasets available here to try supervised learning.

Algorithms
Below is a list of most widely used supervised learning algorithms –
– Naïve Bayes
– Support Vector Machines
– Random Forests
– Decision Tree

Unsupervised Learning
This technique is used when the groups (categories) of data are not known. This is called unsupervised as it is left on the learning algorithm to figure out patterns in the data provided. Clustering is an example of unsupervised learning in which different data sets are clustered into groups of closely related items.
Some of the use cases of unsupervised learning are as follows –
– Given a set of news reports, cluster related news items together. (Used by news.google.com)
– Given a set of users and movie preferences, cluster users who have similar taste

Algorithms
Below is a list of most widely used unsupervised learning algorithms –
– K-Means
– Fuzzy clustering
– Hierarchical clustering

There are many open datasets available here to try supervised learning.

3 Comments

  1. Samarth Bhargav says:

    Some other dataset links that I’ve bookmarked:
    The UCI ML Lib : archive.ics.uci.edu/ml/
    Kaggle : Kaggle.com/competitions
    KD Nuggets : http://www.kdnuggets.com/datasets/index.html
    This is so far the most comprehensive: http://kevinchai.net/datasets

    Can you also do an article on semi-supervised learning? I haven’t understood what that is.

    1. Jaskaran singh says:

      Thanks Samarth for the additional dataset links. Will try to do one on semi-supervised learning.

  2. Manish Carpenter says:

    Useful information.
    Nice explanation.
    Really helpful article. Thanks for sharing.

    In some cases if man buys product for his wife. then he will be considered a female by Supervised Learning? ? ?

Leave a Comment

Your email address will not be published. Required fields are marked *

*
= 4 + 2