Soybean Uniform Regional Trials (11 US States)

Rescuing Trial Data

Soybean breeders throughout the Upper Midwest states have carried out uniform field trials since the 1940s. Presently, these uniform field trials involve a coordinated effort among public sector breeders from 11 U.S. states and 2 Canadian provinces.

Access to a cleaned, properly documented, and well-organized set of historical field trial data is invaluable for current crop breeding efforts. However, much of the historical data for these soybean trials exist in hardcopy logbooks, old floppies, and multiple electronic spreadsheets. Professor Aaron Lorenz, the soybean breeder at the University of Minnesota, is working with G.E.M.S™ to rescue and rehabilitate these historical field trial records into a fully functional database, which is then being migrated over to the USDA’s T3 (The Triticeae Toolbox) and GEMSOpen (the open data repository of G.E.M.S). In doing so these valuable data are now readily accessible to a larger research community, and in the case of GEMSOpen, it also ensures these trial data are made functionally interoperable with a growing suite of data cleaning and analytical tools in GEMSTools along with other geospatially tagged climate and related data.

PedTools—Cleaning Crop Pedigrees

Reliable crop pedigree information can inform breeding efforts. However, a universal problem besetting crop breeding programs around the world is that the recorded pedigrees are often incomplete, inconsistent and riddled with unidentified aliases – both in breeders’ log books and in public repositories like GRIN. Consequently, the pedigree relationships among different breeding lines are often limited, thus resulted in shallow and misleading genealogies.

For some time, the G.E.M.S team has been developing computational cleanup tool to accelerate the process of correcting and expanding upon messy and incorrect pedigree information, in many instances enabling the genealogy of modern breeding lines to be traced back to landraces. PedTools can currently handle soybean and wheat pedigree data, and is in the process of expanding its capabilities to deal with corn and apple pedigrees.

What is it, What does it do?

PedTools, is a semi-automated Python pipeline for pedigree clean up and standardization with five major functions:

  • standardization of pedigree formats (e.g., Purdy notation),
  • pedigree correction – identify lines with multiple pedigrees, remove ambiguities (e.g., unpaired parenthesis, which delimits a cross)
  • alias identification – identify all IDs used to refer to a soybean line and replace all occurrences of that line in the dataset by only one ID
  • pedigree reconstruction – once names and pedigrees are corrected and standardized, PedTools uses this information to expand the genealogy of a line
  • calculation of coefficient of coancestry among soybean lines – offering options for inbred, non-inbred and partially inbred lines.
    In addition to these major functions, PedTools’ built-in tracking system allows the integration of new entries into an already clean data set, thus substantially reducing processing time. It also exports pedigrees in the format required by Helium – a pedigree visualization tool.

Pedigree for two soybean lines prior to PedTools. The pedigrees of two soybean samples (red dot and red box) are represented. Samples in the pedigrees with multiple names (aliases) are represented with the same color dots (green, purple, and yellow), while all other lines are in a gray color.

Pedigree for the same two soybean lines, now post PedTools. Samples with multiple IDs were reduced to a single ID name (green, purple, and yellow dots). The pedigrees of these two lines now go back to five landraces originating from China and Japan. The coefficient of coancestry for the two soybean lines is estimated to be 0.02.