Data Science Accelerator

We are giving analysts from across the public sector the opportunity to develop their data science skills by offering a range of different mentoring opportunities for individuals. The Data Science Accelerator started in 2015 and it is backed by the Government Digital Service (GDS), Office for National Statistics (ONS), Government Office for Science and Civil Service analytical professions (statistics, economics, operational research and social research). We have been running the South-West and Wales hub since 2016. The Data Science Academy is a similar programme that we have been running since 2016, designed only for ONS staff. Mentoring is also offered to teams across the public sector, following a similar structure as the previous programmes for individuals, but having different time-scales according to individual’s needs. Some examples of past and current projects are listed here.

Automation of object detection from satellite imagery (UK Hydrographic Office)

Developed a method to process open source satellite data using image classification, object recognition and machine learning techniques to validate and discover maritime hazards and create a dataset of global offshore infrastructure.

Forecasting the condition of the school estate (Department for Education)

Applied clustering and machine learning techniques, to identify patterns of underinvestment in schools and to see if this correlates with schools that have larger condition-related needs and schools that have applied to targeted funding programmes.

Reducing potential harm through improved risk profiling (NHS Wales)

Developed an automatic multifactorial risk assessment process that measures a patient’s likelihood of having a fall whilst in the hospital by using their demographics, treatment history, previous admissions and any other data.

Use machine learning to match individually collected prices to web scrapped prices (Office for National Statistics)

Produced a system using natural language processing, that matches individual price quotes collected by local collectors to the corresponding online web scraped alternatives. A Python notebook was developed to perform matching on large-scale data.

Pathway mining & analysis for patient level data (Public Health Wales)

Produced an adaptable methodology in R for linking datasets for the purpose of including additional components of a patient pathway. The tool graphically generates the hierarchical pathways that patients have taken for a particular dependant variable.

Developing a tool to maximise the use of Trafficmaster data (Welsh Government)

Developed an automated process and an interactive mapping tool that enables faster and wider scope of analysis of commercially collected anonymised traffic data to provide useful estimates relating to congestion.

AIS-derived products for improved defence situational awareness (UK Hydrographic Office)

Produced electronic charts and geographic information system layers that include a heatmap which highlights no-go areas in terms of probabilities and a model that predicts the heatmap for a future date, given that date and other parameters.

Sounding selection tool to detect seabed changes (UK Hydrographic Office)

Created a process which can help significantly reduce the manual element of selecting the correct soundings from a survey to chart. Included different data sources in the process to select the most relevant depths for mariners.

Better analysis and dissemination of the annual June survey of agriculture in England (Department for Environment, Food and Rural Affairs)

Set up an interactive online database and build an interactive online tool that incorporates built-in statistical disclosure controls using primary and secondary suppression rules to produce tables which protect individual’s data.

Using machine learning techniques in Economic Statistics to improve survey methodology results (Office for National Statistics)

Used machine learning techniques and attributes of businesses engaged in trading activities to produce improved estimates to currently published exporters/importers survey results and minimise selection bias.

Propensity Matching with Clothing and Formula Effect (Office for National Statistics)

Used propensity score/nearest neighbour matching and smooth filtering to improve the identification of comparable replacement products in the clothing products dataset and replace missing products with the closest matches.

Improving the accessibility of statistics on specific crime types (Office for National Statistics)

Transfer the dissemination of crime statistics in a R and Shiny application providing more engaging ways of visualising this type of information, including the potential for enhanced graphics, the addition of new data or better representation of existing data.

The Welsh name strategy (Pembrokeshire County Council)

Build a statistical model, using multiple regression techniques in R, to predict the profile of a target audience to which promotion of Welsh language courses can be focused. Visual representation of the analysis will be provided using R Markdown and a Shiny application.

Automated patent casework allocation (Intellectual Property Office)

Create an automated case allocation system that analyses the text in patent applications and accurately allocates them to appropriate patent examiners. The tool will use text mining, natural language processing, machine learning and document classification.

Machine learning as an alternative estimation method for later period VAT turnover data (Office for National Statistics)

Explored various machine learning techniques to evaluate their suitability for estimating the missing 40% of VAT turnover data and produced a sample dataset that included the estimated data.

For more information on how you can participate in these programmes or talk to us about mentoring, contact Ioannis Tsalamanis or the Data Science Campus.