Taxonomie

Author

Vincent Beauregard & Victor Cameron

Published

Invalid Date

Repo Github

Features

  • Raw observed taxonomic entry stored as-is. Minimal maintenance of stored taxon database and taxon entries overhead operation (validation, correction) is allowed by storing raw taxon values as-is. All corrected and validated referenced taxonomic entries are found through fuzzy matching and stored independently.

  • Taxon accepted for all ranks : Observation may be be related to an organism identified at many different levels ie. species, genus, family depending on the type of survey. All taxonomic entries may be ingested into the observed taxon table regardless of their rank and will be related to their referenced taxons.

  • Fuzzy matching : Raw taxon are matched to entries in reference taxonomic databases using fuzzy matching, thus correcting for orthographic or casing error.

  • Unresolved entry : If a raw taxonomic entry cannot be matched, closest taxonomic parent reference will be obtained and related if possible.

  • Multiple and conflicting taxonomic sources : Raw taxons are matched to their referenced counterparts from multiple taxonomic databases. These matches allows for use of specialized databases or conflicting ones. They are stored without priority, making it possible to reference a raw taxon and related observation through any names obtained through conflicting reference database.

  • Parent-children taxonomic relationship : Search taxons and related observations through parent taxons possible through stored reference taxons for parents and relationship to raw entry. ie. Parent taxon class Aves can be related to all children species taxon entries Cyanocitta cristata, Falco peregrinus, etc.

  • Revised taxon and valid synonym : Raw taxon whose valid reference named has changed are matched to both deprecated references and valid ones, making it possible to search raw taxons and related observation and event through either one.

  • Updating and change in reference taxonomic database : Updates to the validity of a taxonomic entry is possible through periodic update of references obtained from raw taxon entries. Raw taxons are thus stored and maintained as described in original sources and surveys

  • Vernacular names : A list of vernacular names (fr & en) are found for each reference taxons (parents, synonyms) related to a raw taxon and for a number of reference vernacular databases.

  • Complex observation : When the taxon related to an observation is complex, such as multiple organism are identified for the same observation(Species 1 | Species 2 | Species 3), a single observed taxonomic entry is injected as such. References will be obtained for each single organism listed by the complex and all related parents. References matched from complex observed taxons are identified as such and can then be included or discarded from queries performed by the user. Common parent taxon are identified as such and can be used to query complex observed taxons.

Principles

  • Raw observed taxons are stored as is as rows in table taxa_obs, no orthographic correction nor validation of values is required. It’s primary key id_taxa_obs is used to be related to tables
  • A list of reference taxons (parent, valid synonym) are found for each raw taxon and for a number of taxonomic reference databases through fuzzy match based on the Global names and GBIF taxononomic backbone API. All reference taxons are stored in table taxa_ref and may be related to observed raw taxa_obs rows through taxa_obs_ref_lookup lookup table.
  • A list of vernacular names (fr & en) are found for each reference taxons (parents, synonyms) related to a raw taxons and for a number of reference vernacular databases through the GBIF taxononomic backbone API. All vernacular taxons are stored in table taxa_vernacular and may be related to observe taxa_obs rows through taxa_obs_vernacular_lookup lookup table.

Common workflows and procedures

IMPORTANT NOTES

  • ALTER TABLE public.obs_species DROP CONSTRAINT obs_species_taxa_name_fkey;

  •   select *                                     
      from taxa_obs
      where id not in (
      select id_taxa_obs from taxa_obs_ref_lookup)
      ;
        id  | scientific_name |          created_at          
      ------+-----------------+------------------------------
       2380 | sphaigne verte  | 2022-04-07 17:14:09.45303-04
       2658 | pellie sp.      | 2022-04-07 17:14:09.45303-04
       5934 | Maccafertium    | 2022-08-17 18:06:51.7358-04
       5986 | Caecidota       | 2022-08-17 18:06:51.7358-04
       6277 | Callophrus      | 2022-08-17 18:06:51.7358-04
      (5 rows)
  • No index on columns from cells, sites, campaigns, etc.

  • No complex are listed through API endpoints. However, their closest common parents are.

  • TODO