Comparing Search Engines in Context of e-Commerce

Comparing Search Engines in Context of e-Commerce

For an eCommerce company, Product Search or Product Discovery as it is called, is a critical factor in enabling purchase decisions.  If a customer is not enabled to find the product he is looking for, then he would easily go away to another site.  Among other things to get the product search right, it is pretty important that the underlying search engine is capable of handling the nuances of product discovery/search requirements of an e-Commerce site.  Here, I would like to talk about three search engines, which are most commonly used in e-Commerce sites.

  • Oracle Endeca –  Commercial Search Engine from Oracle Corp. Largely used in Retail and e-Commerce Scenarios.
  • Apache SOLR – Open Source Search Engine from Apache. Works on top of Lucene and is know for its full text search capabilities and rich document handling (Word, PDF etc.)
  • Elastic Search – Also Open Source with Apache License, known for it being a Schema free, REST and JSON based document store. Highly scalable with powerful Distribution and sharding feature.

Now, lets look at them at a bit of detail:

Oracle Endeca:

 Positive Aspects

  • Built for Ecommerce Requirements with strong browse and search features.
  • Has a strong Business user-friendly tool called Experience Manager, which gives complete control to Business User to be able to manage the structure of a page and also the factors that influence the content of the page. One can define landing pages for every category and also control the relevancy ranking for each of the category.
  • Also, known for its developer tools that allow to easily configure Search Interfaces, Thesaurus, Auto Phrasing, Keyword redirects, Relevancy ranking methods.
  • Tightly integrated with ATG Commerce Server and ATG BCC.

Relatively Challenging Stuff

  • Commercial Licensing model as defined by Oracle.
  • Extremely high scalability is a challenge. Distribution of Index across nodes is not possible.
  • Sharding is not supported well. Entire index will have to be copied.
  • Access of Endeca APIs by PHP and Ruby based clients is a challenge. They have to make a web service call and process the large JSON objects.

Apache SOLR

Positive Aspects

  • Open Source and hence low total cost of ownership
  • Strong Full text search capability and is known for handling rich documents (pdf, word etc..)
  • Flexible and Adaptable Architecture. Can add pluggable search workflow, request workflow scoring scripts, function queries, filed types etc.
  • Supports APIs that can return JSON, XML or Java object.
  • One can control how scores for documents are calculated. This is mostly done using Function queries and different boosting methods

Relatively Challenging Stuff

  • Need to programmatically create queries if going beyond Lucene query syntax.
  • Has a flat document structure. One cannot define hierarchies and has to be done through a work around.
  • Relies on a schema definition for an index.
  • Integration with ATG or any other commerce engine needs to be implemented.
  • Lot of things require dependency on IT team.

Elastic Search

Positive Aspects

  • Open Source and hence low total cost of ownership
  • Sharding and replication is straight forward. Helps to scale better.
  • Distribution of nodes and clustering is also easy, which again helps in scaling.
  • Supports dynamic data structure. i.e. you can have multiple types of documents in a single Index. This also helps in dynamic refresh of the index.
  • More structured querying as it uses JSON structure to specify the queries.
  • Nested indexing and nested queries are supported.

Relatively Challenging Stuff

  • Spell correction is not available out of the box. So, one cannot have “Did you mean this?” sort of feature made available.
  • Does not support de duplication of documents while indexing. You need to implement de duping.
  • Business user tooling is not supported much. IT teams are required to implement changes. However, is relatively easier for developers compared with SOLR.
  • Integration with ATG needs to be implemented.

Conclusion

 It is a horses for courses sort of thing. In my view, if there is a need for Business teams to control the search relevancy and configurations, then Endeca is the best bet. If IT needs to control and be accountable the relevancy rules, then SOLR and Elastic Search is a better bet. If you need to index lots of unstructured data and through different sources, then Elastic Search is better.

Leave a Comment

Your email address will not be published. Required fields are marked *

*
= 4 + 2