Separator

Solving Data Science Problems from First Principles

Separator
Solving Data Science Problems from First Principles

Dr. T. Ravindra Babu, Global Head, Data Science, Sahaj Ai, 0

Dr. T. Ravindra Babu is Global Head, Data Science, Sahaj Ai. He earlier served as Vice President and Head, Data Science, Myntra Designs Pvt Ltd, Principal Data Scientist, Flipkart, Principle Data Scientist, Infosys Research and Scientist, ISRO. He did his PhD and MSc (Engg) from Dept. of Computer Science and Automation, Indian Institute of Science, Bangalore.

With massive amounts of data being collected at every second in our daily life, the importance for building models over such data to provide business advantage, as well as, to provide improved personal comforts is increasing. Some data collection scenarios include that when we visit an online store to make a purchase, those systems capture the time that we spend on the home page, the items that we view, the time we take to make a decision to purchase or not making a purchase, time we spend on the content shown by the company, etc. Some other data collection touch points are online content viewing, the daily route that we travel, songs that we listen to, etc. Such data is in turn used to create better customer experience for the customers by showing relevant advertisements, and personalized recommendations, while at the same time providing commercial value to those who serve.

Vague Problem Statements
On most occasions, especially, in the startup world, the problem statements emanating from business teams remain vague, when posed to data scientists. Some examples are “the customer addresses are noisy, can you do something about it?”, “the users are not getting converted (not making purchases online) and how could data science models help”, “the catalog content gets manually curated, can machine learning algorithms automate this”, “who should we extend the option to buy through installments”, and “how can we detect and solve online fraud”. At this stage, it is left to the maturity of data scientists to convert them into a formal mathematical statement, seek data that captures these signals, and build models.
Solving from First Principles
A data scientist approaches to solve the above problems in multiple ways, such as, whether such a problem was solved earlier in research literatures, solutions in other industrial settings as well as from the first principles. On most occasions, the problems are unique and need specific attention as well as solutions. Also, it is practically found that one could only adapt abstract ideas from such literature rather than replicating them. The first principles approach relates to understanding the problem on the ground and its domain; and finding a solution through formal machine learning principles.

While developing the solution it is necessary to visualize the final operational scenario, latency requirements and challenges in the adaptation by the relevant stakeholders



Let us discuss one problem of customer addressing challenges. With a broad direction provided by the vague business problem, a data scientist needs to understand actual challenges on the ground, get a hands-on feel about the data, decide on what kind of model could be built to solve the problem, and how it could be deployed and measured through the entire modeling life cycle. This needs visits to mother hubs and delivery hubs to understand how shipments are received, distributed among the delivery agents before delivering to the customers. Discussion with delivery agents and managers provides insights to the challenges. At this stage,in the parlance of machine learning, a decision is made on supervised or unsupervised learning approaches to be chosen to solve these problems, and challenges that need to be surmounted for their deployment in hubs. On most occasions one chooses to solve using supervised learning algorithms since they could be measured effectively. A model is built on these considerations and deployed.

Summary
Data Science model building requires domain understanding, obtaining data insights and finding a solution that is apt for the given scenario from the first principles. First principles approaches are most effective. While developing the solution it is necessary to visualize the final operational scenario, latency requirements and challenges in the adaptation by the relevant stakeholders.