Working as a market researcher with clients at all levels of sophistication has led me to build the following framework for how I think of levels of knowledge. It exists at three levels, with three processes that generate each. The levels are data, information, and intelligence, and the processes are observation, analysis, and synthesis.
As I was writing this, I realized that I either didn’t come up with this myself or that it’s a common way to parse this hierarchy — called the DIKW Pyramid.
Anyway, here’s how I think about it:
Data are recorded observations from the real world. Examples are CSV files, folders of images, clickstreams, etc.
Information consists of summaries of data. Examples include scatterplots, crosstabs, regression coefficients, select count(*) queries, etc. You can think of it as the data compression step.
Intelligence is the human interpretation of information, usually with a goal in mind: “In order to maximize growth within a $75 CPA cap, we should target our advertising spend to these demographic cells in these ZIP codes”; “we should raise our prices 15%”; etc.
The processes of observation, analysis, and synthesis are simply the transitions between each step.
Observation is the act of recording data, whether it’s a SQL transaction, a survey, a camera taking a picture, a webscraper, or something else. It can be observational or interventional (coming from an experimental design). The result of observation is data.
Analysis is the act of taking data and producing summaries of it. This could be something as simple as a sum or a mean, visual summaries like charts, or as complicated as a gaussian process regression with a crazy link function.
Finally, synthesis is all the hours that one typically spends thinking about how to take action based on the information.
While the observation and analysis steps are usually well-known to the analyst, the synthesis step is not, and there aren’t as many tools, or active research on such tools, to help analysts perform that step
Additionally, while I’ve laid out the steps in sequence, that is not how they have to occur temporally. In fact, at Gradient, we try to perform “upfront synthesis” to lay out a decision-making model for our clients in advance of collecting data or performing analyses.
Finally, it’s often typically implicitly assumed that quantitative tools to perform synthesis don’t exist, or aren’t possible. But that isn’t true. Tools like decision trees, optimizers, simulators, etc., are quantitative techniques of performing synthesis of information. But there needs to be substantially more research in this area.