skip to content

Training in machine learning using Stata

Dr Ulla Sovio (Obstetrics & Gynaecology)

Funding round: 2021–2022

What is your research about?

I study the risk of pregnancy complications through the application of statistical analysis methods in large, complex datasets. The main data source I use is the Pregnancy Outcome Prediction (POP) study which is a prospective cohort of over 4,000 unselected first pregnancies in Cambridge. The pregnancy complications I study include fetal growth abnormalities, preeclampsia, gestational diabetes, and preterm birth. I attempt to predict the risk of these complications using multiple types of data such as maternal characteristics, ultrasonic measurements of fetal size and biomarkers from maternal blood at different stages of the pregnancy. The aim of my research is to identify new predictors which could improve the identification of high-risk pregnancies early enough to enable intervention and to improve the pregnancy outcome. Using the POP study data, we have developed a new screening test for high-risk pregnancies which is going to be evaluated in POP study 2.

What is your chosen development activity?

My current research involves the analysis of metabolomics and proteomics data, which requires me to familiarise myself with machine learning methods developed for large, complex datasets. I have been using Stata as my main statistical analysis tool, and I would like to make the most of the packages available in Stata to perform various machine learning analyses. I am very grateful to Cambridge Reproduction SRI for funding my application to the Development fund. This will enable me to participate on a 2-day online machine learning course to learn new skills.

How will this benefit your research?

So far, I have made new discoveries from data mainly using traditional statistical analysis methods. New skills in machine learning methods may help me to identify more predictive biomarkers for pregnancy complications. I have recently started supervising an MPhil student who will write a dissertation on machine learning analysis of maternal serum metabolites in relation to spontaneous pre-term birth. The new knowledge obtained from the course will help me in the supervision of the student. I also expect the new knowledge and skills to help me in my current and future collaborations with other research groups.


Ulla Sovio is a Principal Research Associate in the Department of Obstetrics & Gynaecology. Her research focusses on the application of statistical analysis methods in the prediction of pregnancy complications.