The Data Science Campus – five years of data science for public good
The birth of the Campus
The paint was still wet on the walls of the Campus (to the detriment of at least one jacket!), we had an audience of UK and international data science leaders from across the public, private and academic sectors, a team of eight, some brilliant presentations, and a lot of excitement about doing Data Science for the Public Good. This was the launch of the Data Science Campus five years ago.
In 2016, the Independent review of UK economic statistics (Bean, 2016) called for the:
‘recruitment of a cadre of data scientists … active learning and experimentation … facilitated through collaboration with relevant partners – in academia, the private and public sectors, and internationally.’
And so, on 27 March 2017, the Data Science Campus was officially born. Our goal was to be the hub for data science across the UK public sector, building data science capability, developing new data science products, and testing out new tools, techniques and datasets.
Data Science Campus HQ, 1 June 2017 (yes, this is still Newport in Wales!)
So, how have we done over the last five years?
We have grown from eight people in 2017 to over 80 in 2022
This includes data scientists (of course), trainers and lecturers, content developers, software engineers, project and delivery managers, data engineers, and our outstanding business support team, led by Anya Crisp-Patterson, who are the backbone of the Campus. They have organised uncountable events (including the MDataGov Symposium and the Government Data Science Conference – now a virtual Festival), delivered our recruitment campaigns, supported international visits both to and from the Campus, resolved our issues, and even helped to design Campus HQ.
We have completed 87 projects, with 31 currently in progress
That is a lot of projects. Roughly half the Campus team are data scientists, and we currently have our highest ever number in post.
I am so proud of the volume, complexity, and sheer range of different products our skilful, committed team have produced. There isn’t space here to list all of our projects but you can find out more by exploring our website. I have attempted to pick a top five (have you spotted the theme yet?) of projects. It is a difficult task – a bit like choosing your favourite child!
- Publishing the world’s first faster economic indicators, using alternative data sources, which paved the way for:
- Our coronavirus (COVID-19) response, where, among other things, we used aggregated mobility data to explore the impact of lockdowns and built a novel, community-level COVID-19 risk model using a range of socio-economic data (to be published soon). We used streamed, publicly available traffic camera images to explore busyness in town and city centres, and analysed COVID-19 notices on business websites. We also spun up a cloud environment in record time, and using it to share analyses across departments (possibly the first instance of its kind), and a pathfinder for the cross-Government Integrated Data Service.
- Improving the speed and quality of evidence assessment for the National Institute for Health and Care Excellence; if you’re interested in natural language processing, the report is an excellent guide.
- How green is my street? enter your postcode (Cardiff and Newport) and find out how green your street is; my street is a rather disappointing 1%.
- Counting cows from space in South Sudan, where livestock accounts for about one third of the economy.
We have utterly smashed the target of training 500 government analysts in data science by March 2021, set out in the National Data Strategy
Our very capable Capability Team, have delivered training to more than 4,000 people! This includes:
- degree-level data analytics apprenticeships
- Master’s in Data Analytics for Government (MDataGov)
- Data Science Graduate Programme, with 150 new grads due to start in October
- data science, data visualisation and international accelerator mentoring programmes
- Data Masterclass for Senior Leaders, building data literacy for our senior leaders; the number of people completing the Masterclass passed 1,000 last week
- the regular training that our Faculty lecturers and trainers deliver, which they rapidly moved online at the outset of the coronavirus pandemic
We have helped to develop data science as an attractive career in the public sector. You can hear more about this, and join the discussion on what is next, at the Government Data Science Festival 2022 from 27 April to 11 May 2022. Register now as spaces are going fast!
We have built a strong reputation, both in the UK and internationally
We are recognised as the hub of data science in the UK and are the first port of call for many public sector organisations looking for support on their data science journey.
The Campus is recognised as a world leader in data science and big data in the public sector. Our International Team co-ordinates the United Nations Economic Commission for Europe (UNECE) Machine Learning Group (ML 2022), which currently has more than 350 members, from over 45 different statistical organisations around the world. This shows how important data science and exploring new data sources is to a global audience.
We are supporting low-and-middle-income countries, through our Data Science Hub embedded with the Foreign, Commonwealth and Development Office (FCDO) and we have a long-standing relationship with the National Institute for Statistics in Rwanda. We support the development of the UN’s Global Platform, and the creation of the UN Regional Hubs for Big Data. It has been fascinating to work with and learn from colleagues across the world, all of whom face very similar challenges to us.
We have built some really productive partnerships across academia and the private sector
One of our goals was to develop strong links with academia, and we have done this through the Office for National Statistics’ (ONS’) strategic partnerships with Cardiff University and the Alan Turing Institute, as well as working with academics on individual projects like the traffic cameras work and mobility.
We have also worked with Barclays Plc, including a brilliant two-day hackathon, with 25 ONS staff and 25 Barclays staff tackling a variety of challenges. Our work with Barclays proved invaluable during the coronavirus pandemic, where their aggregate data helped to model travel services, while face-to-face interviewing was suspended. We also worked with O2 Motion and glass.ai, among others, focussing on accessing interesting data sources.
If you’re interested in working with us, please get in touch!
We have come a long way from that day with the wet paint! What have we learned?
Here are five things that we have learned along the way.
There are no unicorns, we are all pieces of the bigger jigsaw
Data science is a collective effort and, to work we absolutely need people with different skills, knowledge and ways of thinking. This includes both ends of data science (data and software engineering as well as mathematical modelling and statistics), but also subject matter experts, problem-owners, people who really understand the data, communicators, and of course, delivery managers.
Don’t do a project unless you have a product owner
You will never get your use case nailed down, or be able to prioritise, or be able to handover the product, unless there is someone responsible for the product.
Six months is not long enough to complete a project with impact
We set out in the heady early days, thinking we would do very rapid projects. Even leaving aside the realities of data access and getting up to speed, only in very exceptional circumstances is this enough time to complete something worthwhile. See also ‘it will only take two weeks’ – this has never been true in the history of humanity!
Good delivery managers are invaluable
I want our data scientists to be doing data science, and so do they. Our delivery management team, led by the fabulous Sharon Hill, has developed agile processes for us which work in the research environment. This is one of our biggest successes. Our processes mean that we are transparent, clear about goals, deliverables and priorities, have good communication with our stakeholders, and deliver at pace. The evidence is in the number of projects we have been able to deliver. Our delivery managers also take the burden of the crucial non-data science activities, like data access or chasing stakeholders, from the data scientists, freeing up their time for actual data science. I cannot overstate the value of good delivery management.
Scale-up is harder than prototyping – and this is true for teams as well as for data science projects
Prototyping is easy and fun. Scaling up often takes longer and requires different skills. But if you don’t plan for the scale up and think about handover from the beginning, you are doomed. Doomed, if not to complete failure, to a very long and frustrating period of completely re-writing your code. If you follow good practice for coding standards, your future self (and your colleagues) will thank you. If you involve your customers from the start, and think about handover from the start, it will be less painful. Your code is only useful if others can use and understand it. This is the voice of experience!
A small start-up team needs fewer processes than a large established team. Agreeing the minimum viable bureaucracy with the teams at each stage as the Campus has grown has been an important part of our development.
We couldn’t have done it without…
I’d like to end with some thanks.
First, to everyone who has ever been in the Campus, including Tom Smith, our (former) Managing Director, and Peter Fullerton and Dave Johnson, my predecessors, who made the Campus possible. And everyone who has worked in the Campus. I hope you all read this and take a moment to congratulate yourselves on what you have achieved. To cover everything we have done would require an enormous book (even longer than the two-year report!) but I hope this has reminded you of some of your achievements.
Second, to everyone who has supported us at the ONS: the cloud architect and data access teams, who are so fundamental to everything we do; the commercial team who have responded so positively to every new and crazy request we’ve had; the Facilities team who built the physical Campus in such record time; and everyone else who has worked with us to build something truly new and innovative.
Thirdly to the public sector data scientist community, who have been part of this journey, sharing knowledge and challenges, and with an enthusiasm for data science – I hope to see you at the Government Data Science Festival 2022, to discuss the Future of Data Science for Public Good.
And finally, to all the people we have worked with, both in the UK and internationally, in academia, in the private sector and in other public sector organisations, and in other national statistics institutes.
What does the future look like?
After five years in the Campus myself, my time here is coming to an end.
But as I leave, the Campus is stronger than ever with a great team, with ambition and the skills to deliver on it. There are exciting developments afoot in our Capability team, there is our Turing programme to complete, including work on synthetic data and privacy preserving techniques, nowcasting and economic networks, and the potential of mobility data to transform our understanding of how populations move about the UK to be realised. The next five years will be exciting, challenging, and, I hope, as much fun as the first five years!