Decision Trees


Understanding interactions of explanatory variables
for some variables of interest

Decision Tree Analysis is an exploratory method which explains some variable of interest without using a strict regression model.

It falls half-way between the categories of Segmentation and Key Drivers Analysis.

It is designed primarily for situations where:

  • The variable of interest is a two category (Yes / No) or multi-category (choose one option from 3+) variable.
  • Most of the potential predictors are categorical, rather than numeric scales.
  • We are interested in understanding the interactions between predictors, rather than their overall effects.
  • A simple model is required based on observed data, rather than a regression model.
  • We want to identify small groups in the population, which contain a high proportion of cases, belonging to some category of interest.

The most well-known form of decision tree analysis is known as CHAID (Chi-squared Automatic Interaction Detector). Predictors are built up in layers in a tree-like fashion with a root, nodes and branches. For multi-category predictors, CHAID creates optimally merged categories in order to simplify the resulting model and reduce the number of branches.

These models are often useful in their own right. For example, “What combination and interaction of demographic category typifies someone who buys brand X?”. They are also useful as an exploratory lead-in for other methods to help identify which categories of potential predictor should be merged and combined in, say, a Drivers Analysis.

Case Studies
  • Employment Tribunals

    We have worked on several CHAID analyses to determine the combination of features of employment…

Key Factors

  • Identify small groups in the Population which account for high proportions of behaviour or interest
  • Intuitive segments defined by combinations of existing variables
  • Optimal merges and combinations
  • Ideal for categorical data (e.g. demographics)


