Examples of International Data Science Accelerator projects

Here are some examples of projects submitted to the International Accelerator that we would support.

If you are an experienced data scientist and can support a mentee through a project like this, please contact us.

Web scraping of Prices for Consumer Price Index (CPI) in a National Capital Region (NCR)

The purpose of this study is to explore using prices from online stores as inputs in the generation of monthly CPI in a NCR. The data is available as web scraped data from selected online stores from February 2020 to December 2021.

To achieve the objectives of this study, the average prices, price relatives, subclass indices, class indices, growth rates and graphs would be generated using R software.

The expected outputs for this project are tables for the average prices and price relatives by commodity, indices at the subclass (5-digit) and class (4-digit) level, and month-on-month and year-on-year growth rates for the online prices. Graphical outputs for the indices in the form of image (.png) files will also be produced.

Desertification assessment analysis in the north of Mexico

This project will measure the evolution of desertification in the north of Mexico, where there is a subtle but constant loss of vegetation. The analysis will be performed on Natural Protected Areas (NAPs). NAPs will serve as control regions due to the existing policies to protect them.

Combat Desertification is part of the agenda of the United Nations to improve quality of life across nations. The objective will be to measure, quantify and detect trends in the vegetation loss in NAPs in the north of Mexico. The results obtained can be used by decision makers to modify existing public policies in NAPs and furthermore develop new ones to ensure the protection of this area.

The approach is to analyse different datasets for land cover type, measure the changes over time presented in NAPs, and quantify and analyse trends in transitions. The different sources will provide a rich collection of data to perform the analysis. The temporal availability of these data ranges from 1990 to 2019. Our main objective of this project will be to develop a land cover predictor, which will take Landsat images (temporal availability since 1980) and predict the land cover presented on one particular year and then apply the proposed methodology to perform the analysis. Well-known spectral vegetation indexes like NDVI and EVI, derived from satellite images, can be used to enrich the analysis.

Developing a web application from scratch for life table data using R package Shiny

The main goal of the project is to create a web application in the R programming language for the dissemination, export and visualisation of life table data to the general public. Currently there is a script in R containing an algorithm for calculating life tables that allows various settings of input parameters, including the selection of a statistical model for smoothing and mortality modelling.

This project is to develop a web application to work more flexibly with life table data where only a limited number of colleagues who know the basics of the R programming language can work with the script. It will also look at setting various input parameters and their dissemination, not only visually, but also in the form of tables for publications to the general public.

Automating data coding for statistical classification using Natural Language Processing (NLP) and Machine Learning (ML) techniques

The department of statistics in Jordan collects some data fields as Arabic text, for example Occupation, scientific specialization, and Economic activity. These data fields must be transformed into a specific international or local standard coding. Usually, this is done manually by a specialized technical team at the data cleaning and preparation stage, this process is costly and time-consuming.

The project proposes the use of text classification using natural language processing techniques and a supervised machine learning model. The project output would be a machine learning model with appropriate user interface.


What do our mentors and mentees say about the programme?

Mentors

“I learned a lot about web scraping! Also, this was a new way of sharing knowledge. I tried to follow the initial instructions about letting the mentee speak and do, and that was an interesting and very new attitude for me. As a teacher, I have the habit to teach!”

Christophe Bontemps, United Nations Statistical Institute for Asia and the Pacific

“I really enjoyed it, I enjoyed working together with my mentee from Zimbabwe and seeing him learn a lot of new skills while working on the project. “

Laurent Smeets, Ghana Statistical Service

“A great programme that promotes cross-country networking and emphasises the importance of mentoring teams rather than one mentee, at least at international level. I have enjoyed working with such a great and hardworking team. We have agreed to keep the collaboration moving forward and looking forward to future projects.

The programme connected us to mentors and colleagues from different countries. I am looking forward to working with more regional and international teams.”

Hatem ElSherif, Federal Competitiveness and Statistics Centre, United Arab Emirates

Mentees

“The course helped me improve many skills such as project implementation, using software efficiently, guided study, data acquisition and handling. A great programme that helped me learn so much.”

Nguyen Thi Huyen, Vitenam Statistics

“I believe this programme of capacity building has been of good quality, productive and exciting. The collaboration with our designated mentor was incredibly supportive and helpful. In addition our mentor’s flexibility in terms of arranging our weekly meetings has helped us ensure we can achieve great progress with our project.”

Ahlam Al Rosan, Department of Statistics Jordan

“I have learned very much from my mentor and I can only recommend this programme to anyone wishing to improve in their data science knowledge and build on their network.”

Perkins Watambwa, Zimbabwe National Statistics Agency