It is typically asked by starting data scientists, analysts and managers new to data science.
Their bosses are under pressure to show some ROI from all the money that has been spent on systems to collect, store and organize the data (not to mention the money being spent on data scientists).
Sometimes they are lucky — they may be asked to solve a very specific and well-studied problem (e.g., predict which customer is likely to cancel their mobile contract). In this situation, there are numerous ways to skin the cat and it is data science heaven.
But often they are simply asked to “mine the data and tell me something interesting”.
Where to start?
This is a difficult question and it doesn’t have a single, perfect answer. I am sure experienced practitioners have evolved many ways to do this. Here’s one way that I have found to be useful.
Borrow an analogy from Andy Grove’s High Output Management, complex systems are black-boxes and an insight is like a window cut into the side of the black box that “sheds light” on what’s going on inside.
So the search for insight can be thought of as the effort to understand how something complicated really works by analyzing its data.
But this is the sort of thing that scientists do! The world is unbelievably complex and they have a tried-and-tested playbook to gradually increase our understanding of it — the Scientific Method.
Before you explore the data, write down a short list of what you expect to see in the data: the distribution of key variables, the relationships between important pairs of them, and so on. Such a list is essentially a prediction based on your current understanding of the business.
Now analyze the data. Make plots, do summaries, whatever is needed to see if it matches your expectations.
Is there anything that doesn’t match? Anything that makes you go “That’s odd” or “That doesn’t make any sense.”?
Zoom in and try to understand what in your business is making that weird thing show up in the data like that. This is the critical step.
You may have just found an insight into the business and increased your understanding*.
Here’s a real example. A few years ago, we were looking at transaction data from a large B2C retailer. One of the fields in the dataset was ‘transaction amount’.
Before “flipping the page”, pause for a few seconds to guess what sort of things you would expect to see. You may find that this increases the contrast and you are better able to spot interesting things in a sea of numbers.