

Cotton Research Data Standardization and Centralization
USDA-ARS Partnerships for Data Innovations works with US cotton industry and research groups to leverage cotton data from dirt to shirt.
Why standardize and centralize cotton research data?
The US cotton industry, like many other agricultural industries, is robust and continually seeks ways to grow, process, and manufacture cotton products efficiently and sustainably. Researchers from universities, federal agencies, and non-profit organizations like Cotton Incorporated, collect multitudes of data throughout the cotton growing, processing, and manufacturing phases. While this data has value to individual researchers and the questions they seek to answer, the power of this data can be amplified when shared amongst researchers, growers, processors, and manufacturers.
The U.S. cotton industry has more than a quarter century of history in providing it's customers with fiber quality data on each bale produced through a fee-based service of the United States Department of Agriculture Agricultural Marketing Service (AMS). Recognizing the value of standardized and centralized data, cotton affiliates within the USDA's Agricultural Research Service (USDA-ARS) and Cotton Incorporated partnered with USDA-ARS's Partnerships for Data Innovations (PDI) on an effort to standardize and centralize cotton research data.

Starting small with working groups
To facilitate quick and efficient progress, we put together three stakeholder groups, each composed of a small but diverse group of university, private, and federal researchers and industry representatives. These groups have expertise in the following areas:
- Breeding and variety trials
- Agronomics
- Fiber quality testing
These groups meet on a regular basis and collaborate with the PDI team to:
- Generate a standardized list of cotton data variables and units
- Centralize cotton data into a cloud-based database that can integrate with data exploration and visualization tools
- Produce and modernize electronic field data collection applications
- Create data query and output tools
- Design data visualization products
Over time, these groups will be expanded and the initial efforts of these groups will be presented and available for feedback from a broader group of cotton researchers and representatives.
Here, we showcase several on-going products of the Cotton-PDI collaboration.

What data do cotton researchers collect?
Our effort began by asking our stakeholder groups what types of data they collect and how they collect it.
Our stakeholders submitted spreadsheets and databases, copies of reports and field datasheets, data collection protocols, and more. We are currently consolidating these resources into a single data dictionary where variables and standard units are defined, common data variables across research groups are identified, and variables are grouped into logical types. Once we have a data dictionary developed, additional stakeholders will be asked to review and identify any missing variables and the data dictionary with serve as the backbone of a standardized, centralized, cloud-based cotton database.
Electronic field data collection
Traditionally, much of the data collected by cotton researchers in the field has been with pen and paper. However, this process is vulnerable to mistakes such as leaving data fields blank as well as transcription errors during data collection and entry.
By developing an electronic data collection system, our hope is to assist researchers in standardizing data while minimizing data errors and increasing efficiency both in the field and in the office.
We are working with researchers to modernize and expand the COTMAN plant monitoring system. The update and renovation will allow researchers to collect plant development data in-season under the COTMAN protocols with modern technology and reporting. This will include accessing site-specific temperature measurements from different public sources.
Visualizing large datasets
Historically, cotton data collected across large geographic areas and over multiple years is not commonly aggregated. When this type of data has been consolidated, it is often in the form of static paper reports that were both lengthy and difficult to interpret and explore.
Recognizing the need to visualize and interact with data in real-time, we have designed data visualization tools to explore multi-project, multi-year data collected across US cotton growing regions.
Legacy cotton variety trial data
Here, we present a legacy cotton variety trial data from 2005-2013. Data can be filtered by variables of interest and explored across varities and locations. Graphs are interactive and respond to data filters. Cotton has at least six quality measures that impact its commercial value, so visualizing how these measures vary across environments as well as better understanding their impacts on productivity (yield) is powerful.
This type of data visualization product will help make cotton data accessible to a larger stakeholder audience as well as allow researchers to download data and perform data exploration to guide more rigorous data analysis.
Large-plot variety trial data
Here, we present cotton large-plot variety trial data beginning in 2020. Data can be filtered by variables of interest and explored across varities and locations. Graphs are interactive and respond to data filters. Additionally, data can be downloaded to allow researchers to perform data exploration to guide more rigorous data analysis.
National cotton variety trial data
Moving forward
Although the Cotton/PDI initiative has made a lot of positive progress, many of the electronic data collection tools, data visualiztion and exploration, and data standardization methods showcased here are still in development. The ARS-PDI team will continue to engage our cotton stakeholder groups to refine our data visualization and data collection tools. The Cotton data dictionary will be used in the development of a standardized, centralized cloud-based cotton database, and tools for uploading current as well as historical datasets are all works in-progress and long-term goals of this initiative.