 |
La Selva Biological Station
|
|
DESCRIBING DATASETS IN DETAIL
This document provides guidelines for the detailed description
of the variables (data columns) contained in datasets (made up
of one or more files) that will be stored and made available online
as part of the RDMCNFS system. (Portions of this document were
adapted from the Data Management Guidelines of the Luquillo LTER station
in Puerto Rico.)
The description should contain as much information as necessary
for the dataset to be usable by another researcher,
either directly (e.g., having a copy of the dataset) or through
a specialized interface.
For each file, the following information should be included:
- File name. The name of the data file.
- File format. The exact format of the data in the file
(e.g., comma-separated ASCII, DBase, Excel).
(If there are several files
with the same format or containing the same variables,
this fact can be noted instead of repeating the information
each time.)
- Variable definitions, following the order in which the variables appear
in the file.
For each variable, indicate the following:
- Variable name.
- Scientific definition.
- Measurement units.
- Measurement precision. Error bounds and an explanation of what
they refer to.
- Range or list of values. The minimum and maximum values,
or for categorical varibles, a list of the possible variables
or a reference to a flie that lists them.
- Data type. The format specification for the datum. (e.g.,
alphanumeric, integer, real, logical, date, etc.)
If necessary, specify the number of characters occupied by each
variable, and relevant observations.
- Codes for missing or null values. A list of codes used to
identify missing or null data values.
- Computational methods. Algorithms or formulas used to
derive this variable from others, if applicable.
Example 1
Meteorological dataset made up of a single file.
- File Name: MET-1-1993.XLS
- File Format: Microsoft Excel 5.0 spreadsheet.
- Variable Definitions:
- W_VEL: Wind velocity, in meters per second, represented as
a positive real number (greater or equal to zero) to three decimal places.
(No information on precision is available.)
Missing values are represented as zeros.
- W_VEC_MAG: Wind vector magnitude, represented as a positive real
number (greater or equal to zero) to three decimal places.
(No information on precision is available.)
Missing values are represented as zeros.
- W_VEC_DIR: Wind vector direction, in degrees, represented as a positive
real number (greater or equal to zero) to two decimal places.
(No information on precision is available.)
Missing values are represented as zeros.
- SD_W_DIR: Standard deviation wind direction, represented as a positive
real number (greater or equal to zero) to two decimal places.
(No information on precision is available.)
Missing values are represented as zeros.
- RAIN_MM: Rainfall (in mm), represented as a positive
real number (greater or equal to zero) to two decimal places.
(No information on precision is available.)
Missing values are represented as zeros.
- AVE_TEMP: Average temperature, in degrees celsius, represented as a positive
real number (greater or equal to zero) to two decimal places.
(No information on precision is available.)
Missing values are left blank.
- REL_HUM: Relative humidity, represented as a positive
real number (greater or equal to zero) to two decimal places.
(No information on precision is available.)
Missing values are left blank.
Example 2
Dataset of researchers and visits to STRI, made up of two files:
"BIOGRAFS.DBF" and "VISITS.DBF".
- File Name: BIOGRAFS.DBF
- File Format: DBase III
- Variable definitions: Missing values in alphabetic variables are represented
using blanks.
- LNAME. FNAME, MNAME: last name, first name and middle name of visiting
researcher, alphabetic.
- TITLE: personal title of researcher; valid entries are: Ms.,
Mrs., Mr., Dr.; alphabetic.
- ID: sequential identifier assigned to visiting researchers consisting
of a positive integer value.
- BIRTH_YEAR: year of birth of researchers represented as a positive
integer with four digits.
- NATIONALITY: country of the researcher, alphabetic.
- PASSPORT: researcher passport, alphabetic.
- POSITION: type of researcher; valid entries are: STUDENT,
SCIENTIST, FILM GROUP, VISIT.
- ORGANIZATN: name of the organization of the researcher, alphabetic.
- ORGAN_TYPE: type of organization, alphabetic. Valid entries are: university,
smithsonian, research, business, institute.
- W_ADD1, W_ADD2, W_ADD3, W_CITY, W_STATE, W_ZIP, W_COUNTRY: working
address of the researcher, made up by three lines for street address, city,
state, postal zip code and country; alphabetic.
- W_TEL: telephone number of researcher at work. Numeric with 10 digits.
Missing values represented using blanks.
- H_ADD1, H_ADD2, H_ADD3, H_CITY, H_STATE, H_ZIP, H_COUNTRY: home address
of the researcher, made up by three lines for street address, city, state,
postal zip code and country; alphabetic.
- FAX: fax number of researcher. Numeric with 10 digits. Missing values
represented using blanks.
- EMAIL: email address of the researcher, alphabetic.
- DATE_ENTER: date when information about researcher was entered. No
missing values.
- DECEASED: indicates whether the researcher has died. Valid entries
are TRUE, FALSE.
- File Name: VISITS.DBF
- File Format: DBase III
- Variable definitions: Missing values in alphabetic variables are represented
using blanks.
- ID: same as ID in BIOGRAFS. dbs The value of this ID must
correspond to a value of one ID in BIOGRAFS.dbs. One particular researcher
may have several visits, therefore, several records in this file may have
the same ID.
- ARRIVE: visitor's arrival date.
- LEAVE: visitor's departure date.
- DURATION: number of days the visitor stayed at the Station.
days).
- ACADEMIC_S: visitor's highest academic degree; alphabetic. Valid entries
are: Undergraduate, Masters,PhD student.
- VISITOR_S:visitor's status; alphabetic. Valid entries are: research
assistant, principal, film, fellow, intern,
collaboration, producer/director, spouse.
- TITLE1, TITLE2: Title of the research project (2 lines), alphabetic.
- ADVISOR: for visitors who are students, name of the advisor, alphabetic.
- FUND_NAME: name of the supporting institution of the visitor's project, alphabetic.
- WORK_SITE1, WORK_SITE2, WORK_SITE3, WORK_SITE4, WORK_SITE5: sites visited
during this stay.
- BASE_SITE: site used as base by the visitor.