Understanding the characteristics of high growth companies using non-traditional data sources
High growth businesses drive economic growth in the UK. Predicting if a business has the potential to show high growth – or alternatively low performance – could be used to target where and how much people invest, where they choose to work and what support structures and policies are developed and put in place. Therefore, understanding the characteristics that may lead to high performance is an area of active research. Many of these research approaches tend to use more traditional datasets and methods. “Non-traditional data” in this context broadly refers to data initially collected for a purpose other than statistics, research or administration, for example, data collected about a company from the web. We’ve produced a report outlining our work to explore how non-traditional data sources and data science methods can be combined with more conventional business data to help understand the characteristics and behaviours of high growth companies.
The work was carried out as part of a wider project led by the Department for Business, Energy and Industrial Strategy (BEIS) aiming to identify the characteristics of businesses with high growth potential using HM Revenue and Customs (HMRC) tax data. Our full project report outlines work that combined business administrative data with non-traditional datasets, such as geographical features and websites data, with the objective of understanding if this adds insight into the main features that characterise companies with high growth. An exploration of the datasets and the features engineered from their variables is described along with an assessment of whether they contribute to understanding high growth.
Understanding features using prediction models
A number of different classification models and data balancing techniques were used to investigate the main features that could characterise high growth companies. No improvement in classification was observed with the addition of company websites data, although company connection measures did appear as part of the top features, suggesting that high growth firms are more likely to have a bigger network.
Using websites free text  it was observed that high growth companies tend to discuss general overview terms such as team and management more than specific terms like tax, law and manufacturing. This is regardless of whether they are describing the organisation, describing their people roles and biographies or in the news and jobs they post. This is an interesting first insight into the language used by businesses in their websites, and could suggest that high growth companies are more interested in people and processes rather than specific tools or terms.
Figure 1: Summary of words used by high growth companies in free text
A wide distribution of both standard and high growth companies is observed across the UK, but using Office for National Statistics (ONS) rural urban indicator high growth companies are more likely to be found in major conurbations rather than cities, towns and in the fringes of towns. Using Ordnance Survey’s retail cluster data for six districts of the UK we observe that the high growth companies are more likely to be located in a retail cluster. This rate is particularly the case for the two large conurbations investigated Birmingham and Glasgow.
Figure 2: Retail cluster (in blue) shown for near the centre of Birmingham
The understanding of what makes a company high growth is varied and complex. Our analysis has confirmed existing research that it’s difficult to predict high growth firms, and that both clustering effects and being well networked are weakly associated to high growth firms. We have, nevertheless, uncovered some interesting relationships which might bear further investigation. Using new data sources such as Glass AI web data and new geographical measures do not give us a step change in this knowledge but do allow a small insight into the characteristics of these businesses. Given further data these insights could be developed further to help tailor targeting and policy to help businesses that could potentially be high growth.
More detail is available in the full project report.
Authors: David Pugh, Sonia Williams, Caitlin Johnson
 Full details of the web reading principles used to obtain the websites free text data are available in the report appendix.