In our last blog, we discussed why it is important to differentiate your project as a BI or Unstructured data project with or without the Big Data flavor. That distinction will help you take execute your project efficiently and with a high ROI. In this post, we will talk about the second very important yet often deferred subject of role of business in success of an unstructured data analytics project.
Now, I am not saying that when it comes to Unstructured Data Analytics projects, the business teams are not involved or they do not participate. They do. But attribute it either to all the hype around the big data/unstructured data technologies or to the fact that most of the business folks have worked on structured, transactional projects, often times business people have unrealistic expectations when it comes to Unstructured Data Analytics projects.
Most business sponsors and business analysts are used to measuring and validating numbers and facts – the number of orders placed on the website, the weekly user visits reports, revenues by departments even the customer satisfaction index. On the other hand, Unstructured Data Analytics is not a perfect science. It relies a lot on inferences, experiments and feedback based fine tuning. Unlike transactional systems, the Unstructured Data Analytics project is not going to deliver results from the very first day. It needs time (at Rare Mile, we refer to this time as the system ‘thaw’ time) to orient itself to the data, learn and come to life.
So, before you take up the endeavor of implementing an unstructured data project, it will serve you well to clarify/educate the following three differences between transactional systems and unstructured data analytics projects to your business stakeholders:
- A perfect system is not the goal of the UAT (User Acceptance Test).
- The project will have a learning curve in production.
- The solution will grow with time but will have its imperfections
UAT – When dealing with transactional systems, the business expectation is to have a system that is good enough for production – a system free of any major issues and a system that can do its work predictably and reliably. If you keep the same yardstick for Unstructured Data Analytics systems then you will never get out of UAT. This is primarily because of the fact that such systems need to feed on large volumes of data to learn new patterns and make them effective.
To get a comprehensive UAT done, you will need loads of data and time. So, as a principle, Unstructured Data Analytics systems are moved out of UAT when they give satisfactory results with a given set of data, execute their work in a stable fashion and demonstrate easy learning capabilities. When the system does not crash under load and it does not take a long time to configure new rules then you are all set to take it to production. Do not wait to get to a near perfect system, you will never get out of UAT.
One of the Unstructured Data Analytics products that we all use today is Google Search. Remember that it was not perfect when it was launched, it still isn’t perfect but it is better than any other search system. It also did not have many analytical rules when it was rolled out. But over time, it has got to its current state by observing failed search queries and configuring new rules and patterns in the base search algorithm. This is a good UAT blue print for most Unstructured Data Analytics projects.
Production Learning Curve – For Unstructured Data Analytics projects, another big contrast that you will need to be prepared for is that system will be ineffective in production for some time after it is launched. This is related to the previous point about the UAT – the system will encounter new data in production, new uses and new patterns. It will need some learning time during which its knowledge is enhanced and it starts producing satisfactory results. In our experience, we have found this time to be between six to twelve weeks depending on the data and algorithm complexity.
One trick to manage this period is to put the system in production but keeping it dormant. It starts receiving all the data and does the analytics but these are not dished out to the users during the dormant phase. Once sufficient amount of new rules and patterns have been plugged into the system and it starts showing reasonable results, you can switch it on. For a fairly complex system, 80-85%+ success rate is a reasonable number for the analytics but you will almost never hit a 100%. Which brings us to our next point…
Living with Imperfections – While transactional systems like web site ordering, insurance premium calculation systems operate in perfection; you will almost never hit perfection with Unstructured Data Analytics systems. Over time, they will come close to 90-95% accuracy range but there will still be an odd miss, an occasional error or a new pattern that has not been caught yet. Your business team needs to appreciate and get onboard with this. If hypothetically, a perfect unstructured data analytics system can be built then it will cost enough money and time investment to make it unviable. On the other hand, a system that can analyze unstructured data with more than 85% accuracy can have a lot of positive impact on your business.
Understanding and communicating these nuances of an Unstructured Data Analytics project will help you bring all the stakeholders on the same page. This will also go a long way in ensuring that you get a high ROI for your project and that the technology team and the business teams are aligned in terms of what these kind of projects can and cannot do.