When you carry out searches on a websites, they often give you an option to sort the results by a number of options. If you are looking to buy something, they will let you sort it by price, if you are searching for job, they let you sort the results by location, distance from a zipcode or by salary. Most of these searches also provide you another option – sort by relevance. It is the default sort option on most of the websites. But have you ever wondered, what does sort by relevance mean? What is the relevance that search engines are talking about and how do they determine the relevance criteria?
Some time back, we were engaged by a client to extract related news and articles for their business that could be posted on their website to boost rate of returning customers and improve stickiness of their website. Our team started working on the problem and started developing data mining algorithm which relied on our optimization software and techniques for picking up fresh news from all over the world. We were able to pull out good news feed together in a couple of days and went back to the client with the results of our initial prototype.
Although the client liked what he saw, he was not sure if these were the best news and articles for their business. They were fresh alright but not that ‘relevant’. We came out of the first meeting with the challenge of making the content more relevant to the client’s business. We went back to the drawing board to figure out what relevance meant in context of this client’s business. After spending some time on the corpus data, we realized that the articles themselves could not tell us how relevant they were. We needed to look at what’s happening around the articles to figure out how ‘relevant’ it was.
So, what makes content relevant? Here is what we learnt during this search for relevance:
Finding #1 – It’s not in the data!
Surprise, surprise! The data does not tell you how relevant it is. But that is what most search engines look at to determine how relevant a job posting is or how relevant a product is to your search term. That approach is based on measuring keywords and looking at occurrences of several keywords. If you truly want to measure relevance, you need to look at what people are doing with that data rather than looking at data itself.
Finding #2 – Figure out your relevance signals
When people read an article and they find it relevant, they share it with others, tweet about it, post it on facebook, etc. Looking at how many times an article has been shared can give you an insight into how relevant an article is. So, what people do with an article is the first signal of relevance. Secondly, the profile of the person acting on the article also speaks volumes about the relevance. For instance, if the head of leading search shares an article about new trends in search engines then it is a lot more relevant than you and I sharing the same article. So, it becomes very important to figure out who are important influencers for a business to monitor relevant content.
Finding #3 – Weigh your signals
All the signals that you monitor to determine the relevance of content may not have the same importance. In order to accurately measure relevance of a piece of content, you need to have a good weighted matrix of signals. These can then be used to calculate the overall relevance score accurately to pick out the most relevant content.
Finding #4 – Weed out the anti relevant.
While most shared article and by whom it has been shared can give you a good idea about its relevance, you may still end up with cases where an influencer has shared an article saying ‘this is a joke’ or ‘this guy does not know what is he talking about’. To tackle such cases, you need to analyze in what context an article has been shared. We have used our sentiment analysis engine to determine the tone of the share and use that to determine if a share makes it anti-relevance.
Depending on the context, different signals can tell you how relevant an item is. This case was about articles but if we were to look for relevant food items or doctors then the signals will change. But the framework for relevance determination will remain the same.
Now, getting back to our original question – what is sort by relevance anyway? It is the means by which a search engine determines how relevant is a given search result to the keyword that you had looked for. Unfortunately, today the relevance is largely calculated based on what is contained in the data. The web search engines have come a long way to determine better ways of calculating relevance but the local website search engines have a lot of ground to cover.
One can only imagine how much business a website loses because their search engines are unable to help their customers discover the right products. Since they are only focused on keywords and not on what is happening around the product they miss out on a lot of discovery options.