| | July 201619CIOReviewterms of justifying hard ROI on investing in Big Data projects. While this shall ease out as the field goes up the standard maturity continuum, a few ideas which can work well are - starting small (and not be too ambitious) to showcase quick results and using innovative "pay for results" business models that play in well with the psyche of people on the edge of decision making around investment in Big Data projects.Getting the right data and Infra architecture for performance and scalabilityThis is an obvious technical challenge which although should be easily addressable, but gets tremendously complex due to many variables. Recent investments in legacy infrastructure severely restricts coming up with an ideal "future-ready" architecture. The endeavor is to come up with a most "optimal" architecture which allows for as much re-use of legacy infrastructure as possible.The Big Data tech stack has been evolving way too fast with technologies getting up on hype curve and suddenly losing favor due to a newer alternative (ex. STORM v/s SPARK).Until that happens, a detailed "assessment" of what is trending in market and more importantly of the internal stack and future needs, the technology architecture roadmap should be very thoughtfully crafted.Turn-around time from Data acquisition to insightsOne of the most common problems we've encountered is the high overall turn-around time which it takes from data acquisition, clean-up, modeling and deploying models at scale in production, in many cases high enough for data to not stay very relevant. A typical flow would look like:· Data ingestion (from multiple sources)· Data clean-up and transformations/ enrichment· Iterative model development (Data Science)· Deploying models in productionIn more enterprises than not, this is a sequential process and some of the key bottlenecks are around data clean-up, and exporting/importing the data in and out of the statistical modeling tools like R, and eventually, coding the models into production applications. A 3-4 weeks time for getting data ready for statistical analysis and similar time-frame for manual analysis and iterative model building by the data scientists (for well formulated problems) is very typical. Again, more often than not, we've seen the models and recommendations from data science team being hard-coded into applications which take another few weeks of development, QA and release management time. By that time, in many use cases, the data may not stay much relevant anymore!Data QualityData quality is one of the most under-estimated issue in Big Data Implementations esp vis-à-vis the schedule and cost implications it may cause. Our approach to data quality issues has been a pragmatic one which focuses on putting in realistic schedules and costs in planning for handling data quality issues. Data Governance and SecurityHistorically, Data Governance has followed what would be a "waterfall" approach in software development parlance ­ i.e. data was governed as it was discovered or brought into the enterprise which would typically mean that enterprises would integrate data and govern it to the highest required standard. This is going to be slow especially in the "agile" world of Big Data. An "agile" governance would entail discovering and understanding and "profiling" the data and applying appropriate controls without inhibiting the speed and flexibility. A comprehensive yet "agile" data governance mechanism would not only ensure that enterprises protect their and customer's information assets but also allow for flexibility to deploy innovative Big Data approaches and technologies.
< Page 9 | Page 11 >