Data and Information Submission at the Virginia Coast LTER

Copyright 1997, John Porter, VCR/LTER

Introduction

Data and information are a critical part of the Virginia Coast LTER. However, data alone is not enough. There needs to be sufficient metadata (documentation, data about data) that someone knowledgable in the field 20 years from now can use and understand your data. In archiving data we are fighting entropy to keep our data from becoming unusable or disappearing entirely, as has been the rule in the past.

Preparing Data and Information

There is no one "right way" to prepare data for submission to the VCR/LTER. Provided that your data is in some regular or described form that can be used by others how you choose to prepare your data is up to you. We support a number of options:
  1. Textual or graphical data
    Typical examples are theses, papers or abstracts that are largely self-documenting. For these we prefer to receive three versions of the document. The first is a printed version. This is a backup for the other versions. Acid free paper is recommended. The second is as a hypertext-markup language (HTML) version. HTML can be automatically produced by many modern word processors. If not, a RTF (rich-text-file) or even a ASCII (text) file is a suitable stand-in. Finally, we would like to have a copy of the document in the original format of your word processor. This allows us to generate the HTML if you are unable to provide an HTML file. Graphics should be in GIF (Graphics Interchange Format), JPEG or Encapsulated Postscript (EPS) formats. They can also be provided as a raw graphics file as used by your graphics software, provided that you tell us what that software is.

  2. Numerical or Coded Data includes:

    • Delimited Data
      Data are arranged in a consistent form with one observation per line with individual values separated by commas or some other "delimiter." An example of this type of file is the "CSV" (comma separated values) files created by most spreadsheet programs or "SDF" (standard delimited files) produced by many types of database software. Except where information is actually missing, each line or observation should have all the values filled in.

      For example:

      Station, Month, Year, Day, Temp, Precip
      HOGI,10,1996,1,12,0
      HOGI,10,1996,2,14,3.3
      HOGI,10,1996,3,19,0
      
      Where data items consisting of text include the delimiter as part of the data, that data item should be included between "quote" characters (typically ").

      For example:

      Station, Month, Year, Day, Temp, Precip
      "HOG ISLAND, VA",10,1996,1,12,0
      "REDBANK, VA",10,1996,2,14,3.3
      
      Note that HOG ISLAND, VA is considered to be the value of the Station variable. Without the quotes, the Station would be set to HOG ISLAND and the month would be set to VA. Most spreadsheet software does this automatically when producing CSV files.

    • Column Formatted Data
      Data are arranged in specified columns of the data file, which may, or may not be separated by spaces. As with delimited files, there is typically one observation per line, however multi-line data structures can also be used when required. Except where information is actually missing, each line or observation should have all the values filled in.

      For example:

      Station Month Year  Day Temp
      HOGI   10   1996     01 12
      HOGI   10   1996     02 14
      HOGI   11   1996     01 21
      HOGI   11   1996     02 19
      
    • Free Text or Labeled Data
      Data are not arranged in any particular structure, but each value is labled as to its identity. (Although we can take this kind of data, we prefer column or delimited structures).

  3. Binary Data or Specialized Data Structures
    Data are arranged in a proprietary binary or specialized export form. Examples of this type of data include ARC/INFO Export files, USGS DLG files, ERDAS .LAN, .GIS and .IMG files. Form of the data needs to be well described so that future users will be able to understand and interpret this data.

Variables


Data Elements

Individual data values (here called "data elements") can take many different forms, depending on the needs of the researcher. Data codes, missing values and numerical formats vary widely across datasets.