# Modelled Parasite Rate Maps
The Malaria Atlas Project (MAP) creates the Parasite Rate maps for the World Malaria Report country and regional profiles.
Understanding how malaria risk varies across a country and within administrative units is essential for malaria control and elimination planning. Malaria case counts are typically reported from health facilities at the district or provincial level. While this gives a broad estimate of within-country variation, it lacks precision due to i) under-reporting from facilities and ii) large-scale reporting units.
Furthermore, not all people contracting malaria seek treatment from public healthcare facilities. Some seek treatment from facilities that do not report (such as private healthcare facilities), some do not seek treatment at all. This contributes to under-reporting of malaria cases.
To address this, the MAP uses statistical methods to estimate the burden of malaria in endemic countries at fine-scale resolution (5 km x 5 km grids). Both parasite rate (prevalence) and clinical incidence (API – annual parasite index) are modelled and mapped. The outputs of these methods are published in scientific journals as data and maps. The parasite rate maps are used for the Plasmodium falciparum Parasite Prevalence maps seen in the World Malaria Report for African and some Asian countries.
- P. falciparum models and maps are currently available for 2015 Africa and 2010 rest of the world
- Global P. vivax models and maps are available using 2013 data
- Later in 2017, new global models will be published for P. falciparum and P. vivax, based on data from 1980-2015
This document provides an overview of the process by which the maps are created. Readers interested in details of the statistical methods used should read the papers listed in the references section.
# What data goes into the statistical models?
# 1. Published Survey Data
Researchers, governmental, and non-governmental researchers conduct cross-sectional parasite rate surveys where blood tests are taken from every person at a given location. The results of these tests allow the calculation of the parasite rate: the percentage of people infected with a parasite. This is usually broken down by species and age bands, for example children under 5 years old.
MAP has gathered around nearly 60,000 survey records from published sources that contain data on the infection status of nearly 5 million individuals. These are largely concentrated in Africa, with survey data elsewhere in the world much scarcer. Please note that the survey data described above is distinct from the routine case data submitted by National Malaria Control Programme (NMCP) staff to the World Health Organization (WHO) for use in creating the Country Profile AFI and AVI maps. MAP does not use this routine case data in its statistical models as it has no permission to do so.
# 2. NASA Satellite Data
The North American Space Agency (NASA) has numerous satellites acquiring imagery of the earth’s surface. Two of these satellite platforms contains MODIS sensor systems that provide daily imagery of the entire globe at moderate resolutions (e.g., 1 km x 1 km pixel sizes). By applying well understood and thoroughly tested models to the information collected by the MODIS sensors it is possible to create useful environmental covariates for malaria modelling (i.e., metrics known to contribute to vector habitat), such as land surface temperature and vegetation indices. While the satellites capture imagery each day, clouds often contaminate the images. As such, we use composite imagery (collected over 8- or 16-day periods) that take only the best image from a period, thus increasing the likelihood of having a cloud-free view. However, even image composites will often contain clouds for areas such as dense forests, which we correct for using a gap-filling algorithm. After all our pre-processing steps, MAP ultimately produces global, gap-free data, with a resolution of 5 km x km, for each month, for all years from 2000 to the present. These data constitute the core covariates that flow into our malaria models, thus helping us map where malaria transmission is most likely to occur.
# 3. Interventions
Household surveys such as DHS, MIS and MICS, as well as country-level reports published by the WHO provide details about the distribution of interventions such as bed nets, anti-malarial drugs, and spraying of houses with insecticides.
# How does the modelling work?
Imagine we have two locations, A and B, 50km apart. A survey has been undertaken at location A indicating the P. falciparum PR is 25% for under-fives. We also know the covariates which are the temperature, humidity, wealth indicators, intervention coverage, and other variables outlined in the section above for location A.
No survey has been undertaken at location B, so we do not have a figure for PR. However, we do have (1) all the covariate data for location B and (2) the spatial location of B relative to A and all other point locations for which we have PR data. In a simple statistical model (e.g., a least-squares regression) we could predict PR at point B using just the covariate data, but because we construct spatial models, the measured PR values from nearby locations also contributes to the final predicted value for point B.
The statisticians run their model to predict a PR value for location B, comparing the covariates for location A with those of B to inform this prediction. This predicted value for B might be (for example) a value of 30%. The statistical model is then re-run with a slight modification and a second prediction is made, this time 25%. A third run might predict 75%, a fourth run 25% again, a fifth run 22%, and so on.
This process is repeated a thousand times, with each run producing a new prediction of PR for location B. These thousand predictions form a distribution.
From this distribution, a mean PR value is calculated. This is the most likely value of PR that a survey conducted at location B would record. If the distribution is narrow, we can have confidence the predicted value is close to what a survey would find. However, the distribution can be very widely spread (from low PR values to very high ones), making the mean value less likely to accurately represent the true PR at location B. So the mean value on its own is of limited value: it needs to be presented with an indication of how confident we are it reflects the true value of PR for location B. This confidence is reflected in the 95% confidence interval for location B, which can be calculated from the distribution.
In order to create maps, the statisticians divide the endemic world up into a grids of approximately 5km2 (some of the maps are at 1km2). Each of these grid squares is treated in a similar way to the example of location A and B above. Those grid squares with surveys in them can be used to predict the PR for those squares that do not have surveys within their bounds. These grids are known as rasters and MAP’s work results in two rasters relevant to the World Malaria Report:
- The raster of PR values, which you see translated into maps on the World Malaria Report online country profiles
- The raster of the corresponding confidence intervals
Scientists using MAP’s outputs in their research use both rasters: the PR values plus the confidence that you can have that the PR value for any given area is correct.
The World Malaria Report only shows the maps of predicted PR. Often when staff from NMCPs see these maps, they dispute the predictions for certain areas. However, the corresponding map of confidence intervals are required to accurately interpret the PR maps: areas where the NMCP dispute the predicted PR values may correspond to areas where we have a very low confidence the predictions are correct.
All the predictions are dependent on the quality of data that feeds into the models. This is why MAP devotes such a large proportion of its resources to gathering, processing, and curating malariometric data. These data are greatly improved by the data contributions made directly from NMCPs.
# How are the maps made?
The modelling process described above is carried out at a global scale, to produce two world-wide raster surfaces.
Three subsequent steps are then undertaken to convert the PR raster into the Parasite Rate maps seen in the World Malaria Report:
- The application of aridity and temperature masks
- Once the PR maps have been produced from the statistical models, they are modified to reflect those areas where it is known to be too cold or dry for the mosquito vectors to maintain transmission. These areas appear as white areas on the maps
- Dividing the raster to the region or country of interest
- To create the regional and country profile maps, the relevant section of the PR raster is extracted using borders provided by the WHO’s legal department
- Conversion to EPS (Encapsulated Postscript Vector) format and the addition of the legend
The above explanation is a deliberately simplified explanation of the statistical approach to creating the burden estimate maps in the World Malaria Report. For a more detailed explanation, please see:
Bhatt, S., Weiss, D.J.W., Cameron, E., Bizansio, D., Mappin, B., Dalrymple, U., Battle, K.E., Moyes, C.L., Henry, A., Eckhoff, P.A., Wenger, E.A., Briet, O., Penny, M.A., Smith, T.A., Bennett, A., Yukich, J., Eisele, T.P., Griffin, J.T., Fergus, C.A., Lynch, M., Lindgren, F., Cohen, J.M, Murray, C.L.J, Smith, D.L., Hay,S.I., et al, (2015). The effect of malaria control on Plasmodium falciparum in Africa between 2000 and 2015. Nature 526, 207-211.
Gething PW, Patil AP, Smith DL, Guerra CA, Elyazar IR, Johnston GL, Tatem AJ, Hay SI: A new world malaria map: Plasmodium falciparum endemicity in 2010. Malar J 2011, 10:378.
Gething PW, Elyazar IR, Moyes CL, Smith DL, Battle KE, Guerra CA, Patil AP, Tatem AJ, Howes RE, Myers MF, et al: A long neglected world malaria map: Plasmodium vivax endemicity in 2010. PLoS Negl Trop Dis 2012, 6:e1814.
Battle KE, Gething PW, Elyazar IR, Moyes CL, Sinka ME, Howes RE, Guerra CA, Price RN, Baird KJ, Hay SI: The global public health significance of Plasmodium vivax. Adv Parasitol 2012, 80:1-111.
Annual Parasite Incidence (API) maps
MAP creates the Annual Parasite Incidence maps for the World Malaria Report country and regional profiles.
The API maps are of three types:
- API – Annual Parasite Incidence for all species
- AFI – Annual P. falciparum Incidence
- AVI – Annual P. vivax Incidence
These maps rely solely on data provided by country NMCP staff to the WHO. Unlike the Parasite Rate maps, there is no modelling process.
# What data goes into the API maps?
# 1. Routine case data
Each year, NMCP staff provide the WHO with a spreadsheet of routine case data broken down by subnational region. The subnational level varies between countries: some provide data at a high level, such as the state, some at lower district or county levels.
The figures supplied include counts of the following data for each of the subnational regions reported:
- The population
- The total number of malaria cases of all species
- The number of microscopy and rapid diagnostic (RDT) tests undertaken
- The number of these tests that were positive and the species identified. This provides the number of confirmed cases of malaria by species
- The reporting completeness for the region. This is the percentage of health facilities within the region that submitted case figures
From these data, it is possible to calculate the following two important metrics:
- The slide positivity rate (SPR), calculated as: the number of positive microscopy tests divided by the total number of microscopy tests undertaken
- The number of unconfirmed cases of malaria, calculated as: the total number of malaria cases minus the number of positive microscopy and RDT tests
API can be calculated from the following equation:
Where M is the total number of cases. However, just using the total number of malaria cases reported by the NMCP for the region does not take into account issues of under-reporting (i.e. a low reporting completeness percentage). Neither does it factor in misdiagnoses of malaria.
These two factors are addressed by using the following formula to calculate the number of cases for a sub-national region:
- C – The number of confirmed malaria cases (derived from the number of positive microscopy and RDT tests)
- U – The number of unconfirmed cases
- s – Slide positivity rate
- r – Reporting completeness
Slide positivity rate is applied to the number of unconfirmed cases to remove misdiagnoses. For example, if the slide positivity rate is 0.1 (i.e. only 10% of microscopy tests were positive), it is assumed that only 10% of the unconfirmed cases were genuine malaria cases. This is achieved by multiplying the number of unconfirmed cases by 0.1.
Reporting completeness is used as the denominator in order to adjust for under-reporting. For example, if only 10% of the health facilities for a region report, the denominator is 0.1, resulting in the multiplication of the confirmed and unconfirmed cases (adjusted by SPR) by a factor of 10.
# 2. Subnational boundary definitions
Software packages are used to create maps and these require files that define the geometry of the national and subnational administrative boundaries of a country. One format these geometry definitions can be stored in are known as shapefiles. MAP relies on such shapefiles to create maps, and uses files available from a variety of online gazetteers, most notably Global Admin Unit Layers (GAUL) and Global Administrative Layers (GADM).
# Creating the maps
The data supplied by the NMCP staff is passed to MAP staff who undertake an exercise of matching up the rows of data supplied with the appropriate shapefiles in order to be able to create the maps.
This process of data to shapefile matching is often problematic for a number of reasons, including:
- Differences in spelling, particularly from languages not using the Latin character set (for example, Thai, Arabic, and Korean)
- Differences in abbreviations used by the native language in the source data and the translations in the shapefiles
- Differences in the names used for a given area between the source data and the shapefiles
MAP undertakes to make the best possible match between the source data and the available shapefiles, consulting with the WHO to resolve issues as far as is possible.
Once the data is matched to shapefiles, a geographic information system (GIS) is used to create and export the maps. Adobe Illustrator is then used to adjust map colours and the visibility of disputed boundaries if necessary, and to produce EPS (Encapsulated Postscript Vector) format maps. Map legends are also produced using Adobe Illustrator and supplied separately. The national-level boundaries used are supplied by the legal department of the WHO.