import sys
sys.path.append('./bqio/lib/')
from pipelinelib import StacCatalog
from utils import push_to_api
cat = StacCatalog.client()
#for Acer, use cat = StacCatalog.client(url="https://acer.biodiversite-quebec.ca/stac")
sg = cat.get_collection('soilgrids')
sg.description='SoilGrids aggregated datasets at 1km resolution, from https://www.isric.org/explore/soilgrids'
p=push_to_api(sg,'https://io.biodiversite-quebec.ca/stac')STAC Catalog
bq-stac
Basic tools for creating STAC items and catalogs for IO repository of geospatial data in Biodiversité Québec
The Github repository contains:
- A Python library in
/bqiofor loading raster data into the STAC catalog and sending it to the Stac FastAPI library. - A StacFastAPI library for serving the catalogues for IO and Acer.
- A NodeJS app for securing the StacFastAPI requests.
- An experimental stacItem Pipeline API for sending rasters directly to an API endpoint for automated ingestion into the STAC catalog.
Python STAC ingestion library
The Python library located in /bqio is used to
- Take local or remote files and convert them into Cloud optimized Geotiffs (COG) using
gdalwarp. - Send the COGs to the Digital Research Alliance object storage using
S3tools. - Extract information from the COGs to populate the information for the STAC items.
- Create collections and items using the
pystacPython library and send them through POST requests to STAC FASTAPI.
There are a number of example ingestion scripts located in /datasets/. Note that older scripts were using an older version of the library. These scripts were run in a Docker container on a virtual machine with sufficient resources for the COG conversion and data download/upload to proceed. Some of theses scripts take several days to run.
Example query for updating a collection
docker compose run --rm gdal-python python
STAC-FASTAPI
At the moment, the STAC API is generated from the generic STAC-FASTAPI/PGSTAC Docker image. A functioning nginx configuration section is in nginx-stac-endpoint.txt
The db-backup.sh file contains the script to backup the database through regular CRON jobs.
EXPERIMENTAL stacItem Pipeline API
StacItemPipeline API allows user to send new stac items to an existing stac collection. A simple flask api has been used to expose some endpoints that make the injection possible.
Endpoints
{api_url}/newitem : POST request that allows to inject the new item.
body: a json object with the following format
{ "collection_id":"id of the collection where item will be inserted",
"date":"2018-01-01",
"name":"name of the item",
"filename":"filename.tif",
"stac_api_server":"<url_stac_api>",
"file_source_host":"<url_to_save_cog_file>",
"properties" : { "full_filename": "full_filename.tif",
"description": "Description of the item",
"otherparams...."
}
}return : json object with id of the process of your item and a message (received means that the server already have your item in the queue waiting to be processed).
{
"id": "c1705f50-029c-4502-a2b6-9ffd1e8876d4",
"msg": "received"
}{url}/status?id=c1705f50-029c-4502-a2b6-9ffd1e8876d4: GET request to verify the status of the process with the given id.
return: json
Case where process finished properly
{
"_description": "description of diferent steps of the process",
"_id": "c1705f50-029c-4502-a2b6-9ffd1e8876d4",
"_operation": "PROCESS FINISHED",
"_param": {
"collection_id": "collection_id",
"date": "2018-01-01",
"file_source_host": "url(location of the source .tiff file)",
"filename": "filename.tif",
"name": "item name",
"properties": {
"datetime": "2018-01-01T00:00:00Z",
"description": "TTT Other Hansen 2020",
"full_filename": "full_filename.tif",
"other propertis...."
},
"other propertis...."
},
"_status": "ok"
}Case of failure
{
"_description": "Unable to read collection: \"chelsa-clim\" from server, error \n: <urlopen error [Errno 111] Connection refused>. \n Please check your conexion or congif.",
"_id": "c1705f50-029c-4502-a2b6-9ffd1e8876d4",
"_operation": "GETCOLLECTION",
"_param": {
"collection_id": "collection_id",
"date": "2018-01-01",
"file_source_host": "url(location of the source .tiff file)",
"filename": "filename.tif",
"name": "item name",
"properties": {
"datetime": "2018-01-01T00:00:00Z",
"description": "TTT Other Hansen 2020",
"full_filename": "full_filename.tif",
"other propertis...."
},
"other propertis...."
},
"_status": "error"
}{api_url}/status/all: GET request that returns a list of all item sent to the API.
return: list of json object with the same format of the preview ones.
Docker config for stacItem Pipeline
Create first an environment variable file (.env) with the following variables:
ARBUTUS_OBJECT_ACCESS_ID=...ARBUTUS_OBJECT_ACCESS_KEY=....API_PORT=...API_HOST=0.0.0.0STAC_API_HOST=..url..
note: if the container is running in the same network as the STAC API server container, you might need to use the api address in the STAC_API_HOST variable (ex: STAC_API_HOST=STAC_API_HOST=http://172.21.0.3:8082). To get the IP address of your container, run this command:
docker inspect -f '{{.NetworkSettings.Networks.[network].IPAddress}}' [container name]Run in the terminal:
docker-compose -f docker-compose-api.yml up --build # only first time to build the image.
docker-compose -f docker-compose-api.yml up gdal-api-pythonnote: make sure the stac api server is running and accesible.
Database Backup
Every day for the last 30 days a catalogdb database’s backup is created and stored in cloud server. This process is automatically triggered once a day.
Restore a Backup
Our backups are store in the cloud server (s3). In order to restore a database you need to do the following steps:
- Download the backup file from the cloud server ( since it is S3 in our case make sure your computer is configured so it can connect to the S3)
- unzip the backup file.
- copy the file
docker-compose-io.ymlinside the stac-fastapi repo folder in the server. - Run the command
docker-compose -f docker-compose-io.yml upinside the stac-fastapi repo folder in the server. - move backup file to the backup folder inside the docker container.
- run
docker exec -it stac-db psqlin the server to run the postgresql in the container. - Create a new database
create database newDB; - Restore the backup file in the newDB by :
psql -d newBD < backupfile.sql. - Drop the catalogdb databse inside the container (there are conflicts if we restore directly in catalogdb) by:
drop database catalogdb. - Rename the database
newDBtocatalogdb; - Restart containers one more time by:
docker-compose -f docker-compose-io.yml up.