Bridging data science and statistics for international development
Tuli Amutenya is a Graduate Data Scientist from the Namibia Statistics Agency (NSA) who has spent the last 6 months at ONS. Here, Tuli shares her experiences and the learning she will be taking home.
Identifying the challenges
I worked at the NSA for three years leading the data processing team responsible for application development, database and data processing for all national surveys and censuses. During this time, I was always motivated to keep up to date with the trending techniques, methods and tools in the data analytics environment, such as the use of machine learning and natural language processing.
With data science being such an emerging global discipline, the idea of studying overseas excited me, so I applied for and was granted a Chevening Scholarship, funded by the UK Foreign and Commonwealth Office (FCO), which gave me the opportunity to complete an MSc in Data Science at the University of Salford.
As part of my master’s degree, I had to complete a data science project, tackling a real data challenge. Through a mutual relationship between the NSA and the UK Office for National Statistics’s (ONS) International Development team, I was fortunate to receive a three-month mentorship for my dissertation at the ONS Data Science Campus, based in Newport, South Wales.
Taking actions
Namibia, like other developing countries, is often faced with various constraints when undertaking frequent data collection surveys and the census, leading to data gaps in reporting progress on the Sustainable Development Goals (SDGs). These data shortages present major challenges for African nations when responding to current issues and tackling problems.
Agricultural statistics is one of the most affected areas because of the significant labour and cost involved in data collection. In Namibia and the rest of Sub-Saharan Africa, smallholder farmers who depend on crop production for nutrition and as the main source of household income are the most affected, with few statistics available to policymakers to account for their land use and crop production. It is crucial that African statistical institutions find more data-driven, innovative methods to tackle challenges.
The objective of my study was to explore the potential use of Earth observation data to complement official agricultural statistics. The study integrated the use of Earth observation data, data science tools and techniques to extract statistics, such as land use, estimate plot areas and build a machine learning model for crop yields estimates on small farms.
I used the Sentinel 2 satellite imagery from the European Space Agency (ESA) and to calculate vegetation indices, time series and ratios to detect crop or non-crop land. The findings show a promising start towards building a crop yield model, but the main constraint was the lack of ground truth data, so this had to be estimated using data sources with similar signature profiles.
The statistics extracted are aimed at addressing the following Sustainable Development Goal (zero hunger) indicators:
- 2.3.1 Volume of production per labour unit by classes of farming/pastoral/forestry enterprise size
- 2.4.1 Proportion of agricultural area under productive and sustainable agriculture
The journey and take on home (solutions)
During my time at the ONS, I have been exposed to many opportunities, such as attending data science conferences, in-house training, visits and mentorship. Here, I have learned that data are treated as a service, tailor-made to different clients according to their needs. These are my main takeaways and what I believe are practices that could benefit many African statistical offices.
Introducing skills development schemes
Introduce training programs such as mentorships, graduate schemes and other short-term coaching opportunities, which allow theory and practice integration. This gives students the opportunity to access data for research and to work on real-life problems.
Investing on enhancing statistical process automation
Developing automated statistical production pipelines frees up more room for value-added data product deliverables, shortens lead times, creates reusable algorithms and improves documentation. For instance, if a statistical office can move towards using open-source programming languages, such as Python and R, they can redirect these funds onto training and the development of new data science techniques and methods.
Integrating data science into the statistical environment
Making use of big data from various sources to create insights and help solve challenges faced by African countries. This will not happen overnight; data science is a new field, so constant research needs to be carried out to identify, compare and adopt techniques to fit the local context.
Creating a collaborative environment
Collaborating with other offices means that resources, best practices and skills can be shared among the data science community. A close working relationship with fellow National Statistics Offices (NSOs), local universities and the private sector is crucial for creating robust data-driven solutions.
Strengthening the relationship with existing partners
Sharing resources, experiences and complementing each other’s skills in specialised areas can help African statistical offices to move forward in this era of data revolution, and it also avoids doubling up on work that has already been done by other statistical offices.
What’s next
Although the NSA is new, a lot of work has been done to ensure good quality statistics are being produced and that we are operating within the international statistical framework, policies and guidelines. The Agency has recently completed the General Statistical Business Process Model (GSBPM) assessment with the aim of aligning itself with the standards of the statistical value chain. Moving forward, I would like to see Namibia adopting more robust and coherent automated reproducible pipelines to improve efficiency and help us produce better statistics.
With the use of data science, statistical offices have an opportunity to tap into various data sources available to bring new insights and make a profound difference when tackling social and environmental challenges faced in the African continent. Upon my return to Namibia, I will work to strengthen relationships with academic institutions. I believe collaborative work among NSOs, to build a cross-government data science community, will place African countries in a better place to progress towards becoming a data-driven nation and ensuring that no one is left behind.
5 comments on “Bridging data science and statistics for international development”
Comments are closed.
Dear Tuli,
Hearty congratulations!
Hey Tuli?
I read it all….Congratulations!!
Very interesting to read of your experience, Tuli! I agree completely regarding moving towards open-source programming languages and fostering a close relationship between statistics offices, universities and the private sector.
I will be running an R workshop in collaboration with the statistics department at UNAM in March and we plan to launch a local R Users Group in Windhoek. Maybe you/your colleagues would be interested? I would be happy to discuss further by email, or possibly in person while you are still in the UK (I am based near Newport).
You (or anyone else interested in this project) is welcome to contact me at rowforwards@gmail.com (the contact address for Forwards, the R Foundation taskforce for underrepresented groups).
Hi Tuli, congratulations are in order and all the best as you embark on using the acquired skills to take NSA to a global level
Impressive my sister, bring those skills back and let’s move Namibia forward.