Structure

Table: Observations

Description

The observations table is central in the Atlas database to recording ecological observations. It is is designed to support different observation types, such as abundances, occurrences, and presence/absence data of various units. The type of observation is specified by the id_variables column, which references the variables table. See the variables table documentation for more details.

It encompasses a wide range of data including geographical coordinates, observation times, taxa details, and associated variables, relying on foreign keys to other tables to store this information. See the Dependencies section below for more details and relevant tables documentation.

Partitions are used to store observations that are within or outside of Quebec. This allows for faster queries on observations that are within Quebec, which are the most commonly used. See the partitions section below for more details.

Triggers are used to automatically update the within_quebec, modified_at, dwc_event_date columns on insert or update. See the triggers section below for more details.

Dependencies

Foreign Keys:
- id_datasets references public.datasets (id)
- id_taxa_obs references public.taxa_obs (id)
- id_variables references public.variables (id)
Partitions:
- observations_partitions.outside_quebec
- observations_partitions.within_quebec
Sequence: observations_partitions.observations_id_seq

Columns

Column Name	Type	Description	Constraints
id	Bigint	A unique identifier for each observation.	Primary key (part of), Not Null, Default: nextval(‘observations_partitions.observations_id_seq’::regclass)
org_parent_event	Character Varying	Optional. Identifier of the parent event in the original source.	-
org_event	Character Varying	Optional. Identifier of the event in the original source.	-
org_id_obs	Character Varying	Optional. Original identifier of the observation in the original source.	-
id_datasets	Integer	The identifier of the dataset to which the observation belongs.	Not Null, Foreign key
id_taxa_obs	Integer	Identifier for the observed taxon.	Not Null, Foreign key
geom	Geometry(Point, 4326)	The geometry of the coordinates of the observation.	Not Null
year_obs	Integer	The year of the observation.	Not Null
month_obs	Integer	The month of the observation.	-
day_obs	Integer	The day of the observation.	-
time_obs	Time	The time of the observation.	-
id_variables	Integer	The identifier of the variable observed. Whether of type `abundance`, `occurrence`, etc. is stored in table `variables`	Not Null, Foreign key
obs_value	Numeric	The observed value for the variable. Might be integer or float in the case of abundance, or 0 or 1 in case of presence/absence	Not Null
id_taxa (deprecated)	Integer	Deprecated. The identifier of the taxa observed. From `taxa` table. Only to maintain compatibility.	Foreign key
issue	Character Varying	Optional. Any issues or remarks related to the observation.	-
created_by	Character Varying	Auto on insert. The user who inserted the observation record within Atlas DB.	Default: CURRENT_USER
modified_at	Timestamp with time zone	Auto on insert/update. The timestamp when the observation was last modified.	Not Null, Default: CURRENT_TIMESTAMP
modified_by	Character Varying	Auto on insert/update. The user who last modified the observation within Atlas DB.	Default: CURRENT_USER
within_quebec	Boolean	Auto on insert/update. Indicates whether the observation was within Quebec.	Part of primary key, Not Null, Default: false
dwc_event_date	Text	Auto on insert/update. The Darwin Core (DwC) formatted event date of the observation.	Not Null, Default: ’’ (empty string)

Additional Constraints

Primary Key Constraint: The combination of id and within_quebec serves as a unique identifier for each observation record.
Foreign Key Constraints: Links to datasets and taxa_obs tables ensure referential integrity.
Unique Constraint: The combination of geom, dwc_event_date, id_taxa_obs, obs_value, id_variables, within_quebec ensures that each observation is uniquely identifiable and retrievable. This is used to prevent duplicates using the index observations_unique_rows.

Indexes

dwc_event_date_idx on dwc_event_date
observations_geom_date_time_idx on geom, year_obs, month_obs, day_obs, time_obs
observations_geom_idx on geom
observations_id_datasets_id_taxa_idx on id_datasets, id_taxa
observations_id_datasets_idx on id_datasets
observations_id_idx on id
observations_id_taxa_obs_dwc_event_date_geom_idx on id_taxa_obs, dwc_event_date, geom
observations_id_taxa_obs_idx on id_taxa_obs
observations_unique_rows on geom, dwc_event_date, id_taxa_obs, obs_value, id_variables, within_quebec
observations_year_obs_idx on year_obs
Specific indexes on partitions (within_quebec_year_obs_idx, observations_id_taxa_obs_idx in observations_partitions.within_quebec)

Triggers

set_dwc_event_date_trggr: Before insert, sets the DwC event date.
update_modified_at: Before update, updates the modified_at timestamp.
action_user_logger: In partition observations_partitions.outside_quebec, logs user actions before insert or update.

Rationale

Use of Partitions for Performance

The use of table partitions in the observations table, particularly with the within_quebec column, is a strategic decision aimed at enhancing database performance. Partitioning is a highly effective method for managing large tables by splitting them into smaller, more manageable pieces. In this case, the observations_partitions schema segregates observation data into distinct partitions based on whether the observations occurred within Quebec (within_quebec = true) or outside it (within_quebec = false). This approach significantly improves query performance by allowing the database engine to quickly locate and retrieve data from a smaller subset of the table. Additionally, it simplifies maintenance tasks such as backups and data purges, as operations can be performed on individual partitions without affecting the entire dataset.

Partition tables observations_partitions.within_quebec and observations_partitions.outside_quebec are created within the observations_partitions schema.

Temporal Columns

The inclusion of year_obs, month_obs, day_obs, and time_obs columns in the observations table is intentionally designed to accommodate observations with varying levels of temporal resolution. This flexibility is crucial in ecological data collection, where the exact time of an observation may range from a specific moment to a broader time frame. By breaking down the observation timestamp into separate components, the database can store and process data that is only accurate to a certain level (e.g., year, month, day, or time). This structure not only caters to the diverse nature of ecological observations but also enhances the ability to perform time-based queries and analyses with varying granularities.

Duplicates and DwC Event Date

Duplicates are possible through the process of adding data from various sources, such as GBIF and eBird. Duplicates are limited through the use of a unique constraint on the combination of geom, dwc_event_date, id_taxa_obs, obs_value, id_variables, within_quebec.

The dwc_event_date column values are generated automatically using a trigger function. It is a formatted string that reflects a standardized approach to recording event dates in biodiversity data. The use of this column is required to ensure each observation is distinctly identifiable and retrievable. It also facilitates data interoperability, making it easier to share and compare ecological data across different platforms and studies.

Table: Datasets

Description

The datasets table stores comprehensive metadata about ecological observation datasets. This includes information about the source, creator, type of data, and publication details. It serves as a central repository for tracking various datasets related to ecological observations.

Dependencies

None

Columns

Column Name	Type	Description	Constraints
id	Integer	A unique identifier for each dataset.	Primary key, Not Null
original_source	Character Varying	The original source of the dataset. Where was it obtained from (e.g., ‘GBIF’, ‘eBird’, ‘MELCCFP’).	Not Null
org_dataset_id	Character Varying	Optional. An identifier used by the `original_source` or `publisher` for the dataset.	-
creator	Character Varying	Optional. The creator of the dataset (e.g., individual researcher or institution).	-
title	Character Varying	The title of the dataset.	Not Null
publisher	Character Varying	Optional. The publisher of the dataset (e.g., ‘Nature Publishing Group’, ‘Elsevier’, ‘GBIF’, ‘Données Québec’).	-
modified	Date	The date when the dataset was last modified.	Not Null
keywords	Character Varying[]	Optional. An array of keywords associated with the dataset.	-
abstract	Text	Optional. A brief abstract or summary of the dataset.	-
type_sampling	Character Varying	Optional. The type of sampling method used in the dataset.	-
type_obs	public.type_observation	Optional. The type of observation (Enum values: ‘living specimen’, ‘preserved specimen’, ‘fossil specimen’, ‘human observation’, ‘machine observation’, ‘literature’, ‘material sample’, ‘others’).	-
intellectual_rights	Character Varying	Optional. Information about the intellectual rights of the dataset.	-
license	Character Varying	Optional. The license under which the dataset is released (e.g., ‘CC BY 4.0’, ‘GPL’, ‘Entente de partage’).	-
owner	Character Varying	Optional. The owner of the dataset (e.g., a university, research institution, or individual researcher).	-
methods	Text	Optional. A detailed description of the methods used in the dataset.	-
open_data	Boolean	A boolean indicating whether the dataset is open data.	Not Null
exhaustive	Boolean	A boolean indicating whether the dataset is exhaustive, meaning obtained from a checklist survey.	Not Null
direct_obs	Boolean	A boolean indicating whether the dataset contains direct observations.	Not Null
centroid	Boolean	Optional. A boolean indicating whether the dataset contains centroid data.	Default: False
doi	Text	Optional. The Digital Object Identifier for the dataset. (e.g. https://doi.org/10.15468/ykxm8x)	Default: Empty String
citation	Text	Optional. The recommended citation for the dataset.	Default: Empty String

Table: Taxa_Obs

Description

The taxa_obs table is structured to record raw taxonomic information directly from source data. This table is essential for capturing a wide range of taxonomic details, including scientific names, authorship, and taxonomic rank. It is designed to accommodate taxa of any rank, such as species, genus, complexes, etc., without requiring corrections for grammatical accuracy or valid taxonomy. These aspects are managed by other resources within the database, like the taxa_ref table.

Dependencies

Sequence: taxa_obs_id_seq

Columns

Column Name	Type	Description	Constraints
id	Integer	A unique identifier for each taxonomic observation.	Primary key, Not Null, Default: nextval(‘taxa_obs_id_seq’::regclass), Using Index Tablespace ssdpool
scientific_name	Text	Optional. The scientific name of the taxon observed. This includes taxa at any rank, such as species or genus.	Not Null, Part of Unique Constraint
authorship	Text	Optional. The authorship of the scientific name. Defaults to an empty string if not provided.	Not Null, Default: ’’ (empty string), Part of Unique Constraint
rank	Text	Optional. The taxonomic rank of the observation. Defaults to an empty string if not provided.	Default: ’’ (empty string), Part of Unique Constraint
parent_scientific_name	Text	Optional. The scientific name of the parent taxon. Used to resolve conflicts where a scientific name corresponds to different organisms in different branches of the tree of life. Optional; if not specified, all results for the given scientific name are returned.	-
created_at	Timestamp with time zone	Auto on insert/update. The timestamp when the taxonomic observation was created.	Not Null, Default: CURRENT_TIMESTAMP
modified_at	Timestamp with time zone	Auto on insert/update. The timestamp when the taxonomic observation was last modified.	Not Null, Default: CURRENT_TIMESTAMP
modified_by	Text	Auto on insert/update. The user who last modified the taxonomic observation record.	Not Null, Default: CURRENT_USER

Additional Constraints

Unique Constraint: taxa_obs_unique_rows ensures that each combination of scientific_name, authorship, rank, parent_scientific_name is unique within the table.

Indexes

taxa_obs_scientific_name_idx on scientific_name

Triggers

update_modified_at: Before update, updates the modified_at timestamp.

Conflicts and Parent Taxa

The design of the taxa_obs table accounts for the potential conflicts in scientific naming. For instance, the same scientific name may correspond to different organisms in separate branches of the tree of life (e.g., Salix as a genus of willows in plants and a genus of tunicates in animals). To resolve such conflicts, the parent_scientific_name column allows users to specify a parent taxa name, thus restricting results to a specific branch. This feature is particularly useful for accurate data retrieval and association in cases of nomenclatural ambiguity.

Table: Variables

Description

The variables table in the database is crafted to support a diverse array of observation types and units. This versatility is key to accommodating various ecological data formats, such as counts of individuals, weight, density, occurrences, and presence/absence data, among others. Each of these observation types can be associated with different units of measurement. The table facilitates the recording and categorization of these different types and units, making it a pivotal component for data analysis and interpretation in ecological studies. The id_variables column, found in other related tables like observations, references the variables table to specify the type of observation, providing a clear link to the detailed description and unit of the variable being observed.

Dependencies

Sequence: variables_id_seq

Columns

Column Name	Type	Description	Constraints
id	Integer	A unique identifier for each variable.	Primary key, Not Null, Default: nextval(‘variables_id_seq’::regclass)
name	Character Varying	The name of the variable, indicating the type of observation (e.g., count, weight, density).	Not Null, Part of Unique Constraint
unit	Character Varying	Optional. The unit of measurement for the variable, which may vary based on the observation type.	Not Null, Default: ’’, Part of Unique Constraint
description	Text	Optional. A detailed description of the variable, explaining its significance and use in observations.	-

Additional Constraints

Unique Constraint: variables_name_unit_key ensures that each combination of name and unit is unique within the table.

Table: Efforts

Description

The efforts table is specifically designed to record the efforts associated with ecological observations. This table plays a crucial role in quantifying and categorizing the effort invested in obtaining various observations. The efforts could be in various forms like time spent, area covered, traps used, etc., and are quantified as numeric values. The link to the variables table through id_variables specifies the type of effort being recorded, ensuring a standardized and coherent approach to effort recording across different observation types.

Relationship with Observations

The efforts table is intricately linked to the observations table through the obs_efforts lookup table. This table acts as a relational bridge, establishing a connection between individual observations and the corresponding efforts involved in their acquisition. The obs_efforts table essentially serves as a many-to-many relationship facilitator, allowing each observation to be associated with one or more effort entries, and vice versa. See the obs_efforts table documentation for more details.

Dependencies

Sequence: efforts_id_seq
Foreign Key:
- id_variables references public.variables (id)

Columns

Column Name	Type	Description	Constraints
id	Integer	A unique identifier for each effort entry.	Primary key, Not Null, Default: nextval(‘efforts_id_seq’::regclass)
id_variables	Integer	The identifier of the variable associated with the effort.	Not Null, Foreign key
effort_value	Numeric	The quantitative value representing the effort.	Not Null, Part of Unique Constraint

Additional Constraints

Unique Constraint: efforts_id_variables_effort_value_key ensures that each combination of id_variables and effort_value is unique within the table, preventing redundancy in effort recording.

Foreign Key Constraints

efforts_id_variables_fkey: Links to the variables table, ensuring that each effort is associated with a valid variable type, providing context and meaning to the effort value recorded.

Table: Obs_Efforts

Description

The obs_efforts table serves as a linking mechanism between ecological observations and the efforts associated with them. It is a junction table designed to create a many-to-many relationship, capturing which efforts are related to specific observations. This design is essential for providing a comprehensive understanding of the resources and efforts expended in gathering each observation, such as the time, distance covered, or equipment used. By establishing a clear connection between observations and their corresponding efforts, this table enhances the granularity and context of ecological data analysis.

Dependencies

Foreign Keys:
- id_obs references an observation table (to be specified).
- id_efforts references public.efforts (id)

Columns

Column Name	Type	Description	Constraints
id_obs	Bigint	The identifier of the observation. Represents a link to a specific observation record.	Not Null, Part of Unique Constraint
id_efforts	Integer	The identifier of the effort. Links to a specific effort record in the `efforts` table.	Not Null, Foreign key, Part of Unique Constraint

Additional Constraints

Unique Constraint: obs_efforts_id_obs_id_efforts_key ensures that each combination of id_obs and id_efforts is unique within the table, preventing duplication in the linking of observations and efforts.

Foreign Key Constraints

obs_efforts_id_efforts_fkey: Ensures the integrity of the link between the obs_efforts and efforts tables, confirming that every effort associated with an observation is valid and exists in the efforts table.