Opening Up African Farming Data
Improving the performance of African farms is enabled by access to reliable and multi-faceted data linking information about farm sizes, farm location, input use, and technological choices to the local climate, soil, market access and other circumstances faced by African farm families.
Dearth of Usable Data
The dearth of usable data about African farming operations is beginning to be addressed, but there is a long way to go. In 2009, the World Bank launched the LSMS-Integrated Survey of Agriculture (LSMS-ISA). These data are beginning to make a difference, but as valuable as they are, the data span a comparatively limited share of the African agricultural landscape—presently 8 countries, 25.9 percent of sub-Saharan Africa’s agricultural land in 2015, 45.9 percent of the region’s economically active population in agriculture, and 47.3 percent of its agricultural output by value in 2016 according to FAO statistics.
Complementing this effort, the University of Minnesota’s InSTePP center led an undertaking to compile spatially calibrated, subnational farming data across more than one million surveyed (farm) households (drawn from national agricultural censuses, reports and surveys) spanning 25 countries, 62.8 percent of sub-Saharan Africa’s agricultural land in 2015, 75.6 percent of its economically active population in agriculture, and 78.8 percent of its agricultural output by value in 2016.
Working with G.E.M.S, InSTePP is making these spatially-explicit farm data accessible via GEMSOpen in ways that make them functionally interoperable with numerous other datasets (e.g., climate, terrain, market access, soils, demographics and related data).
Rescuing African Agricultural Data
With financial support from the Bill and Melinda Gates Foundation by way of the HarvestChoice project, the University of Minnesota, the Universities of Pretoria and Stellenbosch, the Agricultural Research Council of South Africa, with assistance from the Statistical Division of the United Nations Food and Agriculture Organization (FAO) in collaboration with agricultural statistical agencies throughout the region, launched the African Agricultural Data Rescue Initiative (AADRI) in 2007.
AADRI’s mission is to discover, retrieve, scan and make available online past agricultural censuses and reports throughout the region. The initiative also develops summary representations of these data and draws on these data to undertake a range of policy-related analyses, often in an explicitly geo-referenced framework.
African Farming Landscapes Database
Drawing on the AADRI undertaking, InSTePP developed the African Farming Landscapes (AFL) version 2.0 database. AFL-v2.0 includes digitized, statistical survey data compiled at the subnational level, specifically at the ADM1 (administrative unit level 1), the first sub-national level of data stratification. It includes thoroughly documented data for 55 core variables from 50 primary data sources. Documentation was compiled for each country in a way that allows for every entry in the database to be traced back to the particular page and/or table in the primary source from which it was obtained.
InSTePP is working to incorporate AFL-v.2.0 into GEMSOpen, providing open access to these hard won data. They are beginning with data on the farm use of agricultural inputs, specifically improved seed, fertilizers, other farm chemicals and mechanization inputs. As resources permit, other variables will be processed and made openly accessible via the platform.
Initially, the spatialized farm input data (and associated metadata) will be accessible through the “geek interface” of GEMSOpen. The GEMS team is also working on developing a visualization tool to streamline access and export capabilities for more general users.
Making usable data out of reported statistics is a time consuming and difficult process. The InSTePP team developed a set of procedures that are now increasingly incorporated as workflows within the G.E.M.S agroinformatics platform. Key steps involve using GEMSTools to
- Harmonize variable names and definitions across disparate sources, varying categorizations and somewhat inconsistent units of measurement
- Detect and clean outliers (making sure to document every step in the process to ensure replicability)
- Resolve numerous types of spatial anomalies linking subnational statistical data to digitized boundary files