National Soil Information Systems in 5 steps

The 5-steps presented below are intended as a high level overview of activities that would need to be implemented to establish national-level soil and landscape information systems over a planning period of circa 18-24 months. The timeline and cost ranges at the bottom of each step are indicative based on standard AfSIS protocols but would depend on specific national requirements and implementation capacities. Steps 3 & 4 can be carried out in parallel over the initial 18-24 month period.

Step 1: Product identification (begin at the end!)
This is possibly the most crucial step in planning for national-level soil and land resource information systems. There is no point to collecting data that no one will use for decision-making. Hence, initially identify key users and describe the specific products that are widely recognized as essential in helping make important land management decisions.
  • Develop a prioritized checklist of key users and the specific products that they request. This should be formalized in a use case document.
  • Provide worked examples of the main proposed products to allow users to visualize products. Use legacy data for this purpose, where possible and appropriate.
  • Work backwards from use case analyses to prioritize and timeline activities with requirements documents (detailed work plans).
  • Consider what level of training and capacity building is needed to produce and maintain the primary information system and its infrastructure.
  • Cost everything and decide if the thing you are producing is still a priority. Potential return to investment scenario/sensitivity analyses may be useful at stage.

time: 1-2 months, cost: 30-50k U$

Step 2: Delineate the region of interest (ROI)
Clearly identify the (geographical) region of interest in any country. Considerable time, effort and money can be wasted if the specific areas of interest are not clearly identified and agreed upon at the outset. From a statistical perspective the ROI is equivalent to the population about which analysts are trying to seek inference from data.
  • Identify a region of interest (ROI), e.g. “the bread-basket cropland area of Ghana”, or “the maize growing area of Tanzania” by consulting widely.
  • Use any data or tools (e.g., GeoSurvey) needed to assist in producing accurate maps of the ROI e.g., with empirical crop distribution models or through other means.
  • Produce a map clearly delineating the ROI, and version this map regularly if it is likely to change.

time: 1-3 month, cost: 15-25k U$

Step 3: Sample the ROI
This is potentially the most time-consuming and expensive step in producing initial soil information content, and thus warrants careful planning and costing. Our advice is to:
  • Use power analyses to determine cost/accuracy of your sampling designs and pick one of several potential sampling plans that reflect the available budget.
  • Produce a map of the proposed sampling locations … and stick to those.
  • Differentiate core data versus optional data. This applies equally to crop yield, yield response and yield monitoring data. It is absolutely essential that all observations and measurements are georeferenced and time stamped.
  • Identify how to collect and reserve data for prediction model validation (step 5).
  • Identify and select the most appropriate protocols for labeling and tracking physical samples from the field, through lab analyses to databases. We recommend our universally unique soil sample ID protocol here.
  • Select and document standard field operating procedures (FSOP’s), and where possible translate those to GPS and time stamp enabled field data logging apps.

time: 6–12 months, fixed cost for survey outfitting: 75-135k, variable cost: 3-6 U$ per geopoint location

Step 4: Analyze samples taken from the ROI
Soil and plant samples will typically require laboratory analyses. Using modern measurement techniques this should be the second most time consuming and expensive step in producing content and maintaining soil information systems after step 3.
  • Identify, cost and prioritize all soil properties needed to generate key products (step 1).
  • Select and document standard laboratory operating procedures (LSOP’s).
  • Specify which samples will be analyzed using conventional methods vs newer methods such as MIR, XRF and/or LDSPA. Power analyses are useful here again.
  • Describe and cost the necessary lab infrastructure and make it operational and sustainable.
  • Identify and implement options for data management, e.g. cloud-based vs local database solutions.
  • Ensure that databases are open standards compliant (e.g. OGC).
  • Archive all collected soil samples for future analyses.

time: 3-6 months, fixed cost labs: 150-280k U$, variable costs: 1- 4 U$ per soil sample

Step 5: Produce space(time) predictions for the ROI
Spatial and or space-time predictions will eventually form the basis for most of the products identified in step 1 above.
  • Identify, obtain, and collate gridded spatial data to use as covariates for spatial predictions and as mask files to identify areas for inclusion or exclusion from analysis (e.g.
  • Identify how to generate and evaluate spatial predictions and maps with the relevant model training, stacking, validation and testing procedures.
  • Decide on how to update the spatial predictions as additional data become available over time.
  • Practice reproducibility - write literate prediction code that people can modify and track over time. Generally, there is no use in writing something with software that no one can afford to buy, contribute to and/or modify.
  • If possible, use a venue such as Kaggle to ensure that all predictions receive “the best possible” attention from professional data scientists.
  • Version all code in Git or Subversion, and revert to step 1 frequently.

time: 1-3 months, cost: 15-25k U$