Structure
Table: Observations
Description
The observations table is central in the Atlas database to recording ecological observations. It is is designed to support different observation types, such as abundances, occurrences, and presence/absence data of various units. The type of observation is specified by the id_variables column, which references the variables table. See the variables table documentation for more details.
It encompasses a wide range of data including geographical coordinates, observation times, taxa details, and associated variables, relying on foreign keys to other tables to store this information. See the Dependencies section below for more details and relevant tables documentation.
Partitions are used to store observations that are within or outside of Quebec. This allows for faster queries on observations that are within Quebec, which are the most commonly used. See the partitions section below for more details.
Triggers are used to automatically update the within_quebec, modified_at, dwc_event_date columns on insert or update. See the triggers section below for more details.
Dependencies
- Foreign Keys:
id_datasetsreferencespublic.datasets (id)id_taxa_obsreferencespublic.taxa_obs (id)id_variablesreferencespublic.variables (id)
- Partitions:
observations_partitions.outside_quebecobservations_partitions.within_quebec
- Sequence:
observations_partitions.observations_id_seq
Columns
| Column Name | Type | Description | Constraints |
|---|---|---|---|
| id | Bigint | A unique identifier for each observation. | Primary key (part of), Not Null, Default: nextval(‘observations_partitions.observations_id_seq’::regclass) |
| org_parent_event | Character Varying | Optional. Identifier of the parent event in the original source. | - |
| org_event | Character Varying | Optional. Identifier of the event in the original source. | - |
| org_id_obs | Character Varying | Optional. Original identifier of the observation in the original source. | - |
| id_datasets | Integer | The identifier of the dataset to which the observation belongs. | Not Null, Foreign key |
| id_taxa_obs | Integer | Identifier for the observed taxon. | Not Null, Foreign key |
| geom | Geometry(Point, 4326) | The geometry of the coordinates of the observation. | Not Null |
| year_obs | Integer | The year of the observation. | Not Null |
| month_obs | Integer | The month of the observation. | - |
| day_obs | Integer | The day of the observation. | - |
| time_obs | Time | The time of the observation. | - |
| id_variables | Integer | The identifier of the variable observed. Whether of type abundance, occurrence, etc. is stored in table variables |
Not Null, Foreign key |
| obs_value | Numeric | The observed value for the variable. Might be integer or float in the case of abundance, or 0 or 1 in case of presence/absence | Not Null |
| id_taxa (deprecated) | Integer | Deprecated. The identifier of the taxa observed. From taxa table. Only to maintain compatibility. |
Foreign key |
| issue | Character Varying | Optional. Any issues or remarks related to the observation. | - |
| created_by | Character Varying | Auto on insert. The user who inserted the observation record within Atlas DB. | Default: CURRENT_USER |
| modified_at | Timestamp with time zone | Auto on insert/update. The timestamp when the observation was last modified. | Not Null, Default: CURRENT_TIMESTAMP |
| modified_by | Character Varying | Auto on insert/update. The user who last modified the observation within Atlas DB. | Default: CURRENT_USER |
| within_quebec | Boolean | Auto on insert/update. Indicates whether the observation was within Quebec. | Part of primary key, Not Null, Default: false |
| dwc_event_date | Text | Auto on insert/update. The Darwin Core (DwC) formatted event date of the observation. | Not Null, Default: ’’ (empty string) |
Additional Constraints
- Primary Key Constraint: The combination of
idandwithin_quebecserves as a unique identifier for each observation record. - Foreign Key Constraints: Links to
datasetsandtaxa_obstables ensure referential integrity. - Unique Constraint: The combination of
geom,dwc_event_date,id_taxa_obs,obs_value,id_variables,within_quebecensures that each observation is uniquely identifiable and retrievable. This is used to prevent duplicates using the indexobservations_unique_rows.
Indexes
dwc_event_date_idxondwc_event_dateobservations_geom_date_time_idxongeom,year_obs,month_obs,day_obs,time_obsobservations_geom_idxongeomobservations_id_datasets_id_taxa_idxonid_datasets,id_taxaobservations_id_datasets_idxonid_datasetsobservations_id_idxonidobservations_id_taxa_obs_dwc_event_date_geom_idxonid_taxa_obs,dwc_event_date,geomobservations_id_taxa_obs_idxonid_taxa_obsobservations_unique_rowsongeom,dwc_event_date,id_taxa_obs,obs_value,id_variables,within_quebecobservations_year_obs_idxonyear_obs- Specific indexes on partitions (
within_quebec_year_obs_idx,observations_id_taxa_obs_idxinobservations_partitions.within_quebec)
Triggers
set_dwc_event_date_trggr: Before insert, sets the DwC event date.update_modified_at: Before update, updates themodified_attimestamp.action_user_logger: In partitionobservations_partitions.outside_quebec, logs user actions before insert or update.
Rationale
Use of Partitions for Performance
The use of table partitions in the observations table, particularly with the within_quebec column, is a strategic decision aimed at enhancing database performance. Partitioning is a highly effective method for managing large tables by splitting them into smaller, more manageable pieces. In this case, the observations_partitions schema segregates observation data into distinct partitions based on whether the observations occurred within Quebec (within_quebec = true) or outside it (within_quebec = false). This approach significantly improves query performance by allowing the database engine to quickly locate and retrieve data from a smaller subset of the table. Additionally, it simplifies maintenance tasks such as backups and data purges, as operations can be performed on individual partitions without affecting the entire dataset.
Partition tables observations_partitions.within_quebec and observations_partitions.outside_quebec are created within the observations_partitions schema.
Temporal Columns
The inclusion of year_obs, month_obs, day_obs, and time_obs columns in the observations table is intentionally designed to accommodate observations with varying levels of temporal resolution. This flexibility is crucial in ecological data collection, where the exact time of an observation may range from a specific moment to a broader time frame. By breaking down the observation timestamp into separate components, the database can store and process data that is only accurate to a certain level (e.g., year, month, day, or time). This structure not only caters to the diverse nature of ecological observations but also enhances the ability to perform time-based queries and analyses with varying granularities.
Duplicates and DwC Event Date
Duplicates are possible through the process of adding data from various sources, such as GBIF and eBird. Duplicates are limited through the use of a unique constraint on the combination of geom, dwc_event_date, id_taxa_obs, obs_value, id_variables, within_quebec.
The dwc_event_date column values are generated automatically using a trigger function. It is a formatted string that reflects a standardized approach to recording event dates in biodiversity data. The use of this column is required to ensure each observation is distinctly identifiable and retrievable. It also facilitates data interoperability, making it easier to share and compare ecological data across different platforms and studies.
Table: Datasets
Description
The datasets table stores comprehensive metadata about ecological observation datasets. This includes information about the source, creator, type of data, and publication details. It serves as a central repository for tracking various datasets related to ecological observations.
Dependencies
None
Columns
| Column Name | Type | Description | Constraints |
|---|---|---|---|
| id | Integer | A unique identifier for each dataset. | Primary key, Not Null |
| original_source | Character Varying | The original source of the dataset. Where was it obtained from (e.g., ‘GBIF’, ‘eBird’, ‘MELCCFP’). | Not Null |
| org_dataset_id | Character Varying | Optional. An identifier used by the original_source or publisher for the dataset. |
- |
| creator | Character Varying | Optional. The creator of the dataset (e.g., individual researcher or institution). | - |
| title | Character Varying | The title of the dataset. | Not Null |
| publisher | Character Varying | Optional. The publisher of the dataset (e.g., ‘Nature Publishing Group’, ‘Elsevier’, ‘GBIF’, ‘Données Québec’). | - |
| modified | Date | The date when the dataset was last modified. | Not Null |
| keywords | Character Varying[] | Optional. An array of keywords associated with the dataset. | - |
| abstract | Text | Optional. A brief abstract or summary of the dataset. | - |
| type_sampling | Character Varying | Optional. The type of sampling method used in the dataset. | - |
| type_obs | public.type_observation | Optional. The type of observation (Enum values: ‘living specimen’, ‘preserved specimen’, ‘fossil specimen’, ‘human observation’, ‘machine observation’, ‘literature’, ‘material sample’, ‘others’). | - |
| intellectual_rights | Character Varying | Optional. Information about the intellectual rights of the dataset. | - |
| license | Character Varying | Optional. The license under which the dataset is released (e.g., ‘CC BY 4.0’, ‘GPL’, ‘Entente de partage’). | - |
| owner | Character Varying | Optional. The owner of the dataset (e.g., a university, research institution, or individual researcher). | - |
| methods | Text | Optional. A detailed description of the methods used in the dataset. | - |
| open_data | Boolean | A boolean indicating whether the dataset is open data. | Not Null |
| exhaustive | Boolean | A boolean indicating whether the dataset is exhaustive, meaning obtained from a checklist survey. | Not Null |
| direct_obs | Boolean | A boolean indicating whether the dataset contains direct observations. | Not Null |
| centroid | Boolean | Optional. A boolean indicating whether the dataset contains centroid data. | Default: False |
| doi | Text | Optional. The Digital Object Identifier for the dataset. (e.g. https://doi.org/10.15468/ykxm8x) | Default: Empty String |
| citation | Text | Optional. The recommended citation for the dataset. | Default: Empty String |
Table: Taxa_Obs
Description
The taxa_obs table is structured to record raw taxonomic information directly from source data. This table is essential for capturing a wide range of taxonomic details, including scientific names, authorship, and taxonomic rank. It is designed to accommodate taxa of any rank, such as species, genus, complexes, etc., without requiring corrections for grammatical accuracy or valid taxonomy. These aspects are managed by other resources within the database, like the taxa_ref table.
Dependencies
- Sequence:
taxa_obs_id_seq
Columns
| Column Name | Type | Description | Constraints |
|---|---|---|---|
| id | Integer | A unique identifier for each taxonomic observation. | Primary key, Not Null, Default: nextval(‘taxa_obs_id_seq’::regclass), Using Index Tablespace ssdpool |
| scientific_name | Text | Optional. The scientific name of the taxon observed. This includes taxa at any rank, such as species or genus. | Not Null, Part of Unique Constraint |
| authorship | Text | Optional. The authorship of the scientific name. Defaults to an empty string if not provided. | Not Null, Default: ’’ (empty string), Part of Unique Constraint |
| rank | Text | Optional. The taxonomic rank of the observation. Defaults to an empty string if not provided. | Default: ’’ (empty string), Part of Unique Constraint |
| parent_scientific_name | Text | Optional. The scientific name of the parent taxon. Used to resolve conflicts where a scientific name corresponds to different organisms in different branches of the tree of life. Optional; if not specified, all results for the given scientific name are returned. | - |
| created_at | Timestamp with time zone | Auto on insert/update. The timestamp when the taxonomic observation was created. | Not Null, Default: CURRENT_TIMESTAMP |
| modified_at | Timestamp with time zone | Auto on insert/update. The timestamp when the taxonomic observation was last modified. | Not Null, Default: CURRENT_TIMESTAMP |
| modified_by | Text | Auto on insert/update. The user who last modified the taxonomic observation record. | Not Null, Default: CURRENT_USER |
Additional Constraints
- Unique Constraint:
taxa_obs_unique_rowsensures that each combination ofscientific_name,authorship,rank,parent_scientific_nameis unique within the table.
Indexes
taxa_obs_scientific_name_idxonscientific_name
Triggers
update_modified_at: Before update, updates themodified_attimestamp.
Conflicts and Parent Taxa
The design of the taxa_obs table accounts for the potential conflicts in scientific naming. For instance, the same scientific name may correspond to different organisms in separate branches of the tree of life (e.g., Salix as a genus of willows in plants and a genus of tunicates in animals). To resolve such conflicts, the parent_scientific_name column allows users to specify a parent taxa name, thus restricting results to a specific branch. This feature is particularly useful for accurate data retrieval and association in cases of nomenclatural ambiguity.
Table: Variables
Description
The variables table in the database is crafted to support a diverse array of observation types and units. This versatility is key to accommodating various ecological data formats, such as counts of individuals, weight, density, occurrences, and presence/absence data, among others. Each of these observation types can be associated with different units of measurement. The table facilitates the recording and categorization of these different types and units, making it a pivotal component for data analysis and interpretation in ecological studies. The id_variables column, found in other related tables like observations, references the variables table to specify the type of observation, providing a clear link to the detailed description and unit of the variable being observed.
Dependencies
- Sequence:
variables_id_seq
Columns
| Column Name | Type | Description | Constraints |
|---|---|---|---|
| id | Integer | A unique identifier for each variable. | Primary key, Not Null, Default: nextval(‘variables_id_seq’::regclass) |
| name | Character Varying | The name of the variable, indicating the type of observation (e.g., count, weight, density). | Not Null, Part of Unique Constraint |
| unit | Character Varying | Optional. The unit of measurement for the variable, which may vary based on the observation type. | Not Null, Default: ’’, Part of Unique Constraint |
| description | Text | Optional. A detailed description of the variable, explaining its significance and use in observations. | - |
Additional Constraints
- Unique Constraint:
variables_name_unit_keyensures that each combination ofnameandunitis unique within the table.
Table: Efforts
Description
The efforts table is specifically designed to record the efforts associated with ecological observations. This table plays a crucial role in quantifying and categorizing the effort invested in obtaining various observations. The efforts could be in various forms like time spent, area covered, traps used, etc., and are quantified as numeric values. The link to the variables table through id_variables specifies the type of effort being recorded, ensuring a standardized and coherent approach to effort recording across different observation types.
Relationship with Observations
The efforts table is intricately linked to the observations table through the obs_efforts lookup table. This table acts as a relational bridge, establishing a connection between individual observations and the corresponding efforts involved in their acquisition. The obs_efforts table essentially serves as a many-to-many relationship facilitator, allowing each observation to be associated with one or more effort entries, and vice versa. See the obs_efforts table documentation for more details.
Dependencies
- Sequence:
efforts_id_seq - Foreign Key:
id_variablesreferencespublic.variables (id)
Columns
| Column Name | Type | Description | Constraints |
|---|---|---|---|
| id | Integer | A unique identifier for each effort entry. | Primary key, Not Null, Default: nextval(‘efforts_id_seq’::regclass) |
| id_variables | Integer | The identifier of the variable associated with the effort. | Not Null, Foreign key |
| effort_value | Numeric | The quantitative value representing the effort. | Not Null, Part of Unique Constraint |
Additional Constraints
- Unique Constraint:
efforts_id_variables_effort_value_keyensures that each combination ofid_variablesandeffort_valueis unique within the table, preventing redundancy in effort recording.
Foreign Key Constraints
efforts_id_variables_fkey: Links to thevariablestable, ensuring that each effort is associated with a valid variable type, providing context and meaning to the effort value recorded.
Table: Obs_Efforts
Description
The obs_efforts table serves as a linking mechanism between ecological observations and the efforts associated with them. It is a junction table designed to create a many-to-many relationship, capturing which efforts are related to specific observations. This design is essential for providing a comprehensive understanding of the resources and efforts expended in gathering each observation, such as the time, distance covered, or equipment used. By establishing a clear connection between observations and their corresponding efforts, this table enhances the granularity and context of ecological data analysis.
Dependencies
- Foreign Keys:
id_obsreferences an observation table (to be specified).id_effortsreferencespublic.efforts (id)
Columns
| Column Name | Type | Description | Constraints |
|---|---|---|---|
| id_obs | Bigint | The identifier of the observation. Represents a link to a specific observation record. | Not Null, Part of Unique Constraint |
| id_efforts | Integer | The identifier of the effort. Links to a specific effort record in the efforts table. |
Not Null, Foreign key, Part of Unique Constraint |
Additional Constraints
- Unique Constraint:
obs_efforts_id_obs_id_efforts_keyensures that each combination ofid_obsandid_effortsis unique within the table, preventing duplication in the linking of observations and efforts.
Foreign Key Constraints
obs_efforts_id_efforts_fkey: Ensures the integrity of the link between theobs_effortsandeffortstables, confirming that every effort associated with an observation is valid and exists in theeffortstable.