Metadata template

Metadata templates for the MIxS environmental packages are available from the FTP site: click here (Chrome, Firefox) or use your computer or FTP client to connect to ftp://ftp.microbio.me as a guest and navigate to emp -> protocols -> metadata. Choose from one (or more) of the following 15 environmental packages:

  • air
  • built environment
  • host-associated
  • human-associated
  • human-skin
  • human-oral
  • human-gut
  • human-vaginal
  • microbial mat/biofilm
  • misc environment
  • plant-associated
  • sediment
  • soil
  • wastewater/sludge
  • water

Once you find the appropriate CSV file, change “Lastname” to the PI’s last name, change “99” to the study number (if you have one), add rows for additional samples and change “5” to the actual number of samples, and fill in your metadata. Information on general required fields is provided in the following sections, and on environmental package specific fields in the linked README files.

Metadata field definition file

You are strongly encouraged to add additional fields to your metadata file, e.g., additional environmental metadata and/or sample groups relevant to your study. Please follow these guidelines:

  • Create a second spreadsheet named “Lastname_field_definitions.xlsx” (fill in the PI’s last name). This file should contain all of the fields you added to your metadata file, defining what the column header means, how the data were measured, and what any abbreviations mean. Please also include any detail or methodology for the required/optional metadata fields as well.
  • Use the naming convention “analyte_units” for anything you measured. For example, if you are reporting phosphate in units of micromolar, the field header should be “phosphate_umol_per_l”. Putting the units directly in the column header will ensure that you or anyone else using your metadata in the future will know what the units are. Without units, your measurements are not useable!

Qiita and EBI required fields

These fields are required by the Qiita database or EBI.
Note: Need to add sample_type, which will be similar to env_material but needs its own ontology.

Field name Format Description
sample_name {text} (restricted) Identifies a sample and should be descriptive. It is the primary key and must be unique. Allowed characters are alphabetic [A-Za-z], numeric [0-9], and periods .. Disallowed characters include space, _, -, #, %, /, &, and *.
taxon_id {integer} NCBI taxon ID for the sample. Should indicate metagenome being investigated. Examples: 410658 for soil metagenome, 749906 for gut metagenome, 412755 for marine sediment metagenome, 408172 for marine metagenome (seawater), or 449393 for freshwater metagenome. If unspecified use 408169.
scientific_name {text} Common name for the provided NCBI taxon ID (must match taxon_id above). Examples: soil metagenome, gut metagenome, marine sediment metagenome, marine metagenome.
host_subject_id {text} An identifier for the ‘host’. Should be specific to a host, and can be a one-to-many relationship with samples. If this is not a host-associated study, this can be an identifier for a replicate, or can be the same as sample_name.
description {text} Description of the sample.
physical_specimen_location {text} Where you would go to find physical sample or DNA, regardless of whether it is still available or not.
physical_specimen_remaining {boolean} Is there still physical sample (e.g., soil, not DNA) available? True or False.
collection_timestamp yyyy-mm-dd hh:mm The time of sampling, either as an instance (single point in time) or interval. In case no exact time is available, the date/time can be right truncated. Examples: 2008-01-23T19:23:10+00:00 (T optional), 2008-01-23T19:23:10 (T optional), 2008-01-23, 2008-01, and 2008; all are ISO8601 compliant except 2008-01 and 2008. Date range may also be specified: 2007-2008 or 02/2011-04/2011.
latitude_deg {decimal degrees} Latitude where sample was collected. Postive if north of equator, negative if south of equator. Examples: 18.580 and -89.122.
longitude_deg {decimal degrees} Longitude where sample was collected. Positive if east of prime meridian, negative if west of prime meridian. Examples: 40.743 and -10.530.

MIMS required fields

These fields are required by the Genomic Standards Consortium MIMS standard.

Field name Format Description
investigation_type {term} Nucleic Acid Sequence Report is the root element of all MIGS/MIMS compliant reports as standardized by the Genomic Standards Consortium. This field is either eukaryote, bacteria, virus, plasmid, organelle, metagenome, mimarks-survey, or mimarks-specimen.
project_name {text} Name of the project within which the sequencing was organized. Also called study title.
experimental_factor {term, text} (optional) Experimental factors are essentially the variable aspects of an experiment design which can be used to describe an experiment, or set of experiments, in an increasingly detailed manner. This field accepts ontology terms from Experimental Factor Ontology (EFO) and Ontology for Biomedical Investigations (OBI).
geo_loc_name GAZ:{term} The geographical origin of the sample as defined by the country or sea name followed by specific region name (if international waters of ocean, use ocean name). Examples: United States of America, Switzerland, Red Sea, Ile Coco Marine National Park. Go to EBI’s Gazetter Ontology Lookup Service OLSVis, select GAZ, search for the region of interest, then select the most specific and appropriate option.
env_biome ENVO:{term} Biomes are defined based on factors such as plant structures, leaf types, plant spacing, and other factors like climate. Biome should be treated as the descriptor of the broad ecological context of a sample. Examples: freshwater biome, desert biome, woodland biome (taiga), temperate woodland biome, marine pelagic biome, or marine coral reef biome. Go to EBI’s Environmental Ontology Lookup Service OLSVis, select ENVO, search for “biome”, then select the most specific and appropriate option. The most recent ENVO data is also on GitHub.
env_feature ENVO:{term} Environmental feature level includes geographic environmental features. Compared to biome, feature is a descriptor of the more local environment. Examples: harbor, sandy beach, cliff, or lake. Go to EBI’s Environmental Ontology Lookup Service OLSVis, select ENVO, search for “environmental feature”, then select the most specific and appropriate option. The most recent ENVO data is also on GitHub.
env_material ENVO:{term} Environmental material level refers to the material that was displaced by the sample, or material in which a sample was embedded, prior to the sampling event. Environmental material terms are generally mass nouns. Examples: air, sediment, stream sediment, lake sediment, soil, water, or feces. Go to EBI’s Environmental Ontology Lookup Service OLSVis, select ENVO, search for “environmental material”, then select the most specific and appropriate option. The most recent ENVO data is also on GitHub.
env_package air, built environment, host-associated, human-associated, human-skin, human-oral, human-gut, human-vaginal, microbial mat/biofilm, misc environment, plant-associated, sediment, soil, wastewater/sludge, water Environmental package is a MIGS/MIMS/MIMARKS extension for reporting of measurements and observations obtained from one or more of the environments where the sample was obtained. All environmental packages listed here are further defined in separate subtables. By giving the name of the environmental package, a selection of fields can be made from the subtables and can be reported.

Required depending on environmental package

Field name Format Description
depth_m {value} Depth in meters of sample below surface (earth surface if soil, sea/lake bottom if sediment, lake surface if lake, sea level if marine). Generally zero if ground level. Either depth or altitude can have a non-zero value but not both.
altitude_m {value} Height above surface, usually zero unless the mouse is floating in the air. Either depth or altitude can have a non-zero value but not both.
elevation_m {value} Height above sea level in meters (use georeferencing tool).

Environmental package specific fields

Requirement codes

  • M — Mandatory (the item has to be reported to have MIGS, MIMS or MIMARKS compliant contextual data).
  • C — Conditionally mandatory (the item is mandatory only when applicable to the study, i.e. if this item is not applicable for the study the contextual data will still be MIGS, MIMS or MIMARKS compliant even if it is left out).
  • X — Recommended (the item is not mandatory for a compliant report, but is nice to have, or extra).
  • E — Environment-dependent (the item is only mandatory for certain samples coming from a particular environment. For example ‘depth’ is mandatory for water or sediment samples).
  • ‘-‘ denotes that an item is not applicable for a given checklist.

Additional notes

For all human studies:

  • env_biome = human-associated habitat
  • env_feature = human-associated habitat (organic feature for all?)
  • env_matter = feces, mucus, sebum, breast milk, urine, ear wax (or organic material?)

For all animal studies:

  • env_biome = animal-associated habitat
  • env_feature = animal-associated habitat (organic feature for all?)
  • env_matter = feces (or organic material?)

For all insect studies:

  • env_biome = insect-associated habitat
  • env_feature = insecta-associated habitat (organic feature for all?)
  • env_matter = feces (or organic material?)

EMP Ontology

Values for the EMP Ontology for sample type can also be provided. The relevant fields are empo_0, empo_1, empo_2, and empo_3. See the page EMP Ontology for valid ontology values.