Structure
Table: Observations
Description
The observations
table is central in the Atlas database to recording ecological observations. It is is designed to support different observation types, such as abundances, occurrences, and presence/absence data of various units. The type of observation is specified by the id_variables
column, which references the variables
table. See the variables
table documentation for more details.
It encompasses a wide range of data including geographical coordinates, observation times, taxa details, and associated variables, relying on foreign keys to other tables to store this information. See the Dependencies
section below for more details and relevant tables documentation.
Partitions are used to store observations that are within or outside of Quebec. This allows for faster queries on observations that are within Quebec, which are the most commonly used. See the partitions
section below for more details.
Triggers are used to automatically update the within_quebec
, modified_at
, dwc_event_date
columns on insert or update. See the triggers
section below for more details.
Dependencies
- Foreign Keys:
id_datasets
referencespublic.datasets (id)
id_taxa_obs
referencespublic.taxa_obs (id)
id_variables
referencespublic.variables (id)
- Partitions:
observations_partitions.outside_quebec
observations_partitions.within_quebec
- Sequence:
observations_partitions.observations_id_seq
Columns
Column Name | Type | Description | Constraints |
---|---|---|---|
id | Bigint | A unique identifier for each observation. | Primary key (part of), Not Null, Default: nextval(‘observations_partitions.observations_id_seq’::regclass) |
org_parent_event | Character Varying | Optional. Identifier of the parent event in the original source. | - |
org_event | Character Varying | Optional. Identifier of the event in the original source. | - |
org_id_obs | Character Varying | Optional. Original identifier of the observation in the original source. | - |
id_datasets | Integer | The identifier of the dataset to which the observation belongs. | Not Null, Foreign key |
id_taxa_obs | Integer | Identifier for the observed taxon. | Not Null, Foreign key |
geom | Geometry(Point, 4326) | The geometry of the coordinates of the observation. | Not Null |
year_obs | Integer | The year of the observation. | Not Null |
month_obs | Integer | The month of the observation. | - |
day_obs | Integer | The day of the observation. | - |
time_obs | Time | The time of the observation. | - |
id_variables | Integer | The identifier of the variable observed. Whether of type abundance , occurrence , etc. is stored in table variables |
Not Null, Foreign key |
obs_value | Numeric | The observed value for the variable. Might be integer or float in the case of abundance, or 0 or 1 in case of presence/absence | Not Null |
id_taxa (deprecated) | Integer | Deprecated. The identifier of the taxa observed. From taxa table. Only to maintain compatibility. |
Foreign key |
issue | Character Varying | Optional. Any issues or remarks related to the observation. | - |
created_by | Character Varying | Auto on insert. The user who inserted the observation record within Atlas DB. | Default: CURRENT_USER |
modified_at | Timestamp with time zone | Auto on insert/update. The timestamp when the observation was last modified. | Not Null, Default: CURRENT_TIMESTAMP |
modified_by | Character Varying | Auto on insert/update. The user who last modified the observation within Atlas DB. | Default: CURRENT_USER |
within_quebec | Boolean | Auto on insert/update. Indicates whether the observation was within Quebec. | Part of primary key, Not Null, Default: false |
dwc_event_date | Text | Auto on insert/update. The Darwin Core (DwC) formatted event date of the observation. | Not Null, Default: ’’ (empty string) |
Additional Constraints
- Primary Key Constraint: The combination of
id
andwithin_quebec
serves as a unique identifier for each observation record. - Foreign Key Constraints: Links to
datasets
andtaxa_obs
tables ensure referential integrity. - Unique Constraint: The combination of
geom
,dwc_event_date
,id_taxa_obs
,obs_value
,id_variables
,within_quebec
ensures that each observation is uniquely identifiable and retrievable. This is used to prevent duplicates using the indexobservations_unique_rows
.
Indexes
dwc_event_date_idx
ondwc_event_date
observations_geom_date_time_idx
ongeom
,year_obs
,month_obs
,day_obs
,time_obs
observations_geom_idx
ongeom
observations_id_datasets_id_taxa_idx
onid_datasets
,id_taxa
observations_id_datasets_idx
onid_datasets
observations_id_idx
onid
observations_id_taxa_obs_dwc_event_date_geom_idx
onid_taxa_obs
,dwc_event_date
,geom
observations_id_taxa_obs_idx
onid_taxa_obs
observations_unique_rows
ongeom
,dwc_event_date
,id_taxa_obs
,obs_value
,id_variables
,within_quebec
observations_year_obs_idx
onyear_obs
- Specific indexes on partitions (
within_quebec_year_obs_idx
,observations_id_taxa_obs_idx
inobservations_partitions.within_quebec
)
Triggers
set_dwc_event_date_trggr
: Before insert, sets the DwC event date.update_modified_at
: Before update, updates themodified_at
timestamp.action_user_logger
: In partitionobservations_partitions.outside_quebec
, logs user actions before insert or update.
Rationale
Use of Partitions for Performance
The use of table partitions in the observations table, particularly with the within_quebec column, is a strategic decision aimed at enhancing database performance. Partitioning is a highly effective method for managing large tables by splitting them into smaller, more manageable pieces. In this case, the observations_partitions schema segregates observation data into distinct partitions based on whether the observations occurred within Quebec (within_quebec = true) or outside it (within_quebec = false). This approach significantly improves query performance by allowing the database engine to quickly locate and retrieve data from a smaller subset of the table. Additionally, it simplifies maintenance tasks such as backups and data purges, as operations can be performed on individual partitions without affecting the entire dataset.
Partition tables observations_partitions.within_quebec
and observations_partitions.outside_quebec
are created within the observations_partitions
schema.
Temporal Columns
The inclusion of year_obs, month_obs, day_obs, and time_obs columns in the observations table is intentionally designed to accommodate observations with varying levels of temporal resolution. This flexibility is crucial in ecological data collection, where the exact time of an observation may range from a specific moment to a broader time frame. By breaking down the observation timestamp into separate components, the database can store and process data that is only accurate to a certain level (e.g., year, month, day, or time). This structure not only caters to the diverse nature of ecological observations but also enhances the ability to perform time-based queries and analyses with varying granularities.
Duplicates and DwC Event Date
Duplicates are possible through the process of adding data from various sources, such as GBIF and eBird. Duplicates are limited through the use of a unique constraint on the combination of geom
, dwc_event_date
, id_taxa_obs
, obs_value
, id_variables
, within_quebec
.
The dwc_event_date
column values are generated automatically using a trigger function. It is a formatted string that reflects a standardized approach to recording event dates in biodiversity data. The use of this column is required to ensure each observation is distinctly identifiable and retrievable. It also facilitates data interoperability, making it easier to share and compare ecological data across different platforms and studies.
Table: Datasets
Description
The datasets table stores comprehensive metadata about ecological observation datasets. This includes information about the source, creator, type of data, and publication details. It serves as a central repository for tracking various datasets related to ecological observations.
Dependencies
None
Columns
Column Name | Type | Description | Constraints |
---|---|---|---|
id | Integer | A unique identifier for each dataset. | Primary key, Not Null |
original_source | Character Varying | The original source of the dataset. Where was it obtained from (e.g., ‘GBIF’, ‘eBird’, ‘MELCCFP’). | Not Null |
org_dataset_id | Character Varying | Optional. An identifier used by the original_source or publisher for the dataset. |
- |
creator | Character Varying | Optional. The creator of the dataset (e.g., individual researcher or institution). | - |
title | Character Varying | The title of the dataset. | Not Null |
publisher | Character Varying | Optional. The publisher of the dataset (e.g., ‘Nature Publishing Group’, ‘Elsevier’, ‘GBIF’, ‘Données Québec’). | - |
modified | Date | The date when the dataset was last modified. | Not Null |
keywords | Character Varying[] | Optional. An array of keywords associated with the dataset. | - |
abstract | Text | Optional. A brief abstract or summary of the dataset. | - |
type_sampling | Character Varying | Optional. The type of sampling method used in the dataset. | - |
type_obs | public.type_observation | Optional. The type of observation (Enum values: ‘living specimen’, ‘preserved specimen’, ‘fossil specimen’, ‘human observation’, ‘machine observation’, ‘literature’, ‘material sample’, ‘others’). | - |
intellectual_rights | Character Varying | Optional. Information about the intellectual rights of the dataset. | - |
license | Character Varying | Optional. The license under which the dataset is released (e.g., ‘CC BY 4.0’, ‘GPL’, ‘Entente de partage’). | - |
owner | Character Varying | Optional. The owner of the dataset (e.g., a university, research institution, or individual researcher). | - |
methods | Text | Optional. A detailed description of the methods used in the dataset. | - |
open_data | Boolean | A boolean indicating whether the dataset is open data. | Not Null |
exhaustive | Boolean | A boolean indicating whether the dataset is exhaustive, meaning obtained from a checklist survey. | Not Null |
direct_obs | Boolean | A boolean indicating whether the dataset contains direct observations. | Not Null |
centroid | Boolean | Optional. A boolean indicating whether the dataset contains centroid data. | Default: False |
doi | Text | Optional. The Digital Object Identifier for the dataset. (e.g. https://doi.org/10.15468/ykxm8x) | Default: Empty String |
citation | Text | Optional. The recommended citation for the dataset. | Default: Empty String |
Table: Taxa_Obs
Description
The taxa_obs
table is structured to record raw taxonomic information directly from source data. This table is essential for capturing a wide range of taxonomic details, including scientific names, authorship, and taxonomic rank. It is designed to accommodate taxa of any rank, such as species, genus, complexes, etc., without requiring corrections for grammatical accuracy or valid taxonomy. These aspects are managed by other resources within the database, like the taxa_ref
table.
Dependencies
- Sequence:
taxa_obs_id_seq
Columns
Column Name | Type | Description | Constraints |
---|---|---|---|
id | Integer | A unique identifier for each taxonomic observation. | Primary key, Not Null, Default: nextval(‘taxa_obs_id_seq’::regclass), Using Index Tablespace ssdpool |
scientific_name | Text | Optional. The scientific name of the taxon observed. This includes taxa at any rank, such as species or genus. | Not Null, Part of Unique Constraint |
authorship | Text | Optional. The authorship of the scientific name. Defaults to an empty string if not provided. | Not Null, Default: ’’ (empty string), Part of Unique Constraint |
rank | Text | Optional. The taxonomic rank of the observation. Defaults to an empty string if not provided. | Default: ’’ (empty string), Part of Unique Constraint |
parent_scientific_name | Text | Optional. The scientific name of the parent taxon. Used to resolve conflicts where a scientific name corresponds to different organisms in different branches of the tree of life. Optional; if not specified, all results for the given scientific name are returned. | - |
created_at | Timestamp with time zone | Auto on insert/update. The timestamp when the taxonomic observation was created. | Not Null, Default: CURRENT_TIMESTAMP |
modified_at | Timestamp with time zone | Auto on insert/update. The timestamp when the taxonomic observation was last modified. | Not Null, Default: CURRENT_TIMESTAMP |
modified_by | Text | Auto on insert/update. The user who last modified the taxonomic observation record. | Not Null, Default: CURRENT_USER |
Additional Constraints
- Unique Constraint:
taxa_obs_unique_rows
ensures that each combination ofscientific_name
,authorship
,rank
,parent_scientific_name
is unique within the table.
Indexes
taxa_obs_scientific_name_idx
onscientific_name
Triggers
update_modified_at
: Before update, updates themodified_at
timestamp.
Conflicts and Parent Taxa
The design of the taxa_obs
table accounts for the potential conflicts in scientific naming. For instance, the same scientific name may correspond to different organisms in separate branches of the tree of life (e.g., Salix
as a genus of willows in plants and a genus of tunicates in animals). To resolve such conflicts, the parent_scientific_name
column allows users to specify a parent taxa name, thus restricting results to a specific branch. This feature is particularly useful for accurate data retrieval and association in cases of nomenclatural ambiguity.
Table: Variables
Description
The variables
table in the database is crafted to support a diverse array of observation types and units. This versatility is key to accommodating various ecological data formats, such as counts of individuals, weight, density, occurrences, and presence/absence data, among others. Each of these observation types can be associated with different units of measurement. The table facilitates the recording and categorization of these different types and units, making it a pivotal component for data analysis and interpretation in ecological studies. The id_variables
column, found in other related tables like observations
, references the variables
table to specify the type of observation, providing a clear link to the detailed description and unit of the variable being observed.
Dependencies
- Sequence:
variables_id_seq
Columns
Column Name | Type | Description | Constraints |
---|---|---|---|
id | Integer | A unique identifier for each variable. | Primary key, Not Null, Default: nextval(‘variables_id_seq’::regclass) |
name | Character Varying | The name of the variable, indicating the type of observation (e.g., count, weight, density). | Not Null, Part of Unique Constraint |
unit | Character Varying | Optional. The unit of measurement for the variable, which may vary based on the observation type. | Not Null, Default: ’’, Part of Unique Constraint |
description | Text | Optional. A detailed description of the variable, explaining its significance and use in observations. | - |
Additional Constraints
- Unique Constraint:
variables_name_unit_key
ensures that each combination ofname
andunit
is unique within the table.
Table: Efforts
Description
The efforts
table is specifically designed to record the efforts associated with ecological observations. This table plays a crucial role in quantifying and categorizing the effort invested in obtaining various observations. The efforts could be in various forms like time spent, area covered, traps used, etc., and are quantified as numeric values. The link to the variables
table through id_variables
specifies the type of effort being recorded, ensuring a standardized and coherent approach to effort recording across different observation types.
Relationship with Observations
The efforts table is intricately linked to the observations table through the obs_efforts lookup table. This table acts as a relational bridge, establishing a connection between individual observations and the corresponding efforts involved in their acquisition. The obs_efforts table essentially serves as a many-to-many relationship facilitator, allowing each observation to be associated with one or more effort entries, and vice versa. See the obs_efforts
table documentation for more details.
Dependencies
- Sequence:
efforts_id_seq
- Foreign Key:
id_variables
referencespublic.variables (id)
Columns
Column Name | Type | Description | Constraints |
---|---|---|---|
id | Integer | A unique identifier for each effort entry. | Primary key, Not Null, Default: nextval(‘efforts_id_seq’::regclass) |
id_variables | Integer | The identifier of the variable associated with the effort. | Not Null, Foreign key |
effort_value | Numeric | The quantitative value representing the effort. | Not Null, Part of Unique Constraint |
Additional Constraints
- Unique Constraint:
efforts_id_variables_effort_value_key
ensures that each combination ofid_variables
andeffort_value
is unique within the table, preventing redundancy in effort recording.
Foreign Key Constraints
efforts_id_variables_fkey
: Links to thevariables
table, ensuring that each effort is associated with a valid variable type, providing context and meaning to the effort value recorded.
Table: Obs_Efforts
Description
The obs_efforts
table serves as a linking mechanism between ecological observations and the efforts associated with them. It is a junction table designed to create a many-to-many relationship, capturing which efforts are related to specific observations. This design is essential for providing a comprehensive understanding of the resources and efforts expended in gathering each observation, such as the time, distance covered, or equipment used. By establishing a clear connection between observations and their corresponding efforts, this table enhances the granularity and context of ecological data analysis.
Dependencies
- Foreign Keys:
id_obs
references an observation table (to be specified).id_efforts
referencespublic.efforts (id)
Columns
Column Name | Type | Description | Constraints |
---|---|---|---|
id_obs | Bigint | The identifier of the observation. Represents a link to a specific observation record. | Not Null, Part of Unique Constraint |
id_efforts | Integer | The identifier of the effort. Links to a specific effort record in the efforts table. |
Not Null, Foreign key, Part of Unique Constraint |
Additional Constraints
- Unique Constraint:
obs_efforts_id_obs_id_efforts_key
ensures that each combination ofid_obs
andid_efforts
is unique within the table, preventing duplication in the linking of observations and efforts.
Foreign Key Constraints
obs_efforts_id_efforts_fkey
: Ensures the integrity of the link between theobs_efforts
andefforts
tables, confirming that every effort associated with an observation is valid and exists in theefforts
table.