15 April 2020
Assystem participates in Kaggle challenge COVID-19 Open Research Dataset (CORD-19) to develop text and data mining tools that can help the medical community developing answers to high priority scientific questions.
In response to the COVID-19 pandemic, the White House and a coalition of leading research groups have prepared the COVID-19 Open Research Dataset (CORD-19). This freely available dataset is provided to the global research community. The challenge is organised to develop tools to help the medical community developing answers to high priority scientific questions. Assystem decided to participate.
This challenge is hosted on Kaggle, and it involves analyzing a large body of research and data about COVID-19. It will help the global community to better understand the disease using data science tools (NLP, data mining, etc.) that can help the medical community developing answers to high priority scientific questions.
Within the framework of this challenge, the Assystem team made up of Paolo Minelli, Aleksei Iancheruk, Ali Kabbadj, Kien Trung Dang and Zakaria Bouhoun, five of our experts in Natural Language Processing (NLP) and data mining, chose to focus their research on the following questions: What do we know about COVID-19 risk factors? What have we learned from epidemiological studies?
The objective of the team is to find out what the literature reports about:
- Data on potential risks factors
- Transmission dynamics of the virus, including the basic reproductive number, incubation period, serial interval, modes of transmission and environmental factors
- Severity of disease, including the risk of fatality among symptomatic hospitalized patients, and high-risk patient groups
- Susceptibility of populations
- Public health mitigation measures that could be effective for control
To do so, our team is working on text and data mining using natural language processing, search engines and machine learning tools, already developed previously for other projects. This will allow them to provide an excellent response to these questions.
"We felt it was important to take part in this challenge to support the scientific community. We bring our expertise in a context where mutual cooperation is more necessary than ever" says Paolo Minelli, Data Science Team Manager at Assystem.
And what is the prize? Kaggle is sponsoring a $1,000 per task award to the winner whose submission is identified as best meeting the evaluation criteria. The winner may be elected to receive this award as a charitable donation to COVID-19 relief/research efforts or as a monetary payment.
As part of the fight against this pandemic, Assystem is committing to using its digital expertise to serve society and the healthcare sector.
More information on the team's report here: https://www.kaggle.com/alekseiiancheruk/assystem-covid-must-die-risk-factors-analysis
To know more about Kaggle and its challenge, click here: https://www.kaggle.com/