import sys
'./bqio/lib/')
sys.path.append(from pipelinelib import StacCatalog
from utils import push_to_api
= StacCatalog.client()
cat #for Acer, use cat = StacCatalog.client(url="https://acer.biodiversite-quebec.ca/stac")
= cat.get_collection('soilgrids')
sg ='SoilGrids aggregated datasets at 1km resolution, from https://www.isric.org/explore/soilgrids'
sg.description=push_to_api(sg,'https://io.biodiversite-quebec.ca/stac') p
STAC Catalog
bq-stac
Basic tools for creating STAC items and catalogs for IO repository of geospatial data in Biodiversité Québec
The Github repository contains:
- A Python library in
/bqio
for loading raster data into the STAC catalog and sending it to the Stac FastAPI library. - A StacFastAPI library for serving the catalogues for IO and Acer.
- A NodeJS app for securing the StacFastAPI requests.
- An experimental stacItem Pipeline API for sending rasters directly to an API endpoint for automated ingestion into the STAC catalog.
Python STAC ingestion library
The Python library located in /bqio
is used to
- Take local or remote files and convert them into Cloud optimized Geotiffs (COG) using
gdalwarp
. - Send the COGs to the Digital Research Alliance object storage using
S3
tools. - Extract information from the COGs to populate the information for the STAC items.
- Create collections and items using the
pystac
Python library and send them through POST requests to STAC FASTAPI.
There are a number of example ingestion scripts located in /datasets/
. Note that older scripts were using an older version of the library. These scripts were run in a Docker container on a virtual machine with sufficient resources for the COG conversion and data download/upload to proceed. Some of theses scripts take several days to run.
Example query for updating a collection
docker compose run --rm gdal-python python
STAC-FASTAPI
At the moment, the STAC API is generated from the generic STAC-FASTAPI/PGSTAC Docker image. A functioning nginx configuration section is in nginx-stac-endpoint.txt
The db-backup.sh
file contains the script to backup the database through regular CRON jobs.
EXPERIMENTAL stacItem Pipeline API
StacItemPipeline
API allows user to send new stac items to an existing stac collection. A simple flask api has been used to expose some endpoints that make the injection possible.
Endpoints
{api_url}/newitem
: POST request that allows to inject the new item.
body
: a json object with the following format
"collection_id":"id of the collection where item will be inserted",
{ "date":"2018-01-01",
"name":"name of the item",
"filename":"filename.tif",
"stac_api_server":"<url_stac_api>",
"file_source_host":"<url_to_save_cog_file>",
"properties" : { "full_filename": "full_filename.tif",
"description": "Description of the item",
"otherparams...."
} }
return
: json object with id of the process of your item and a message (received means that the server already have your item in the queue waiting to be processed).
{"id": "c1705f50-029c-4502-a2b6-9ffd1e8876d4",
"msg": "received"
}
{url}/status?id=c1705f50-029c-4502-a2b6-9ffd1e8876d4
: GET request to verify the status of the process with the given id
.
return
: json
Case where process finished properly
{"_description": "description of diferent steps of the process",
"_id": "c1705f50-029c-4502-a2b6-9ffd1e8876d4",
"_operation": "PROCESS FINISHED",
"_param": {
"collection_id": "collection_id",
"date": "2018-01-01",
"file_source_host": "url(location of the source .tiff file)",
"filename": "filename.tif",
"name": "item name",
"properties": {
"datetime": "2018-01-01T00:00:00Z",
"description": "TTT Other Hansen 2020",
"full_filename": "full_filename.tif",
"other propertis...."
,
}
"other propertis...."
,
}"_status": "ok"
}
Case of failure
{"_description": "Unable to read collection: \"chelsa-clim\" from server, error \n: <urlopen error [Errno 111] Connection refused>. \n Please check your conexion or congif.",
"_id": "c1705f50-029c-4502-a2b6-9ffd1e8876d4",
"_operation": "GETCOLLECTION",
"_param": {
"collection_id": "collection_id",
"date": "2018-01-01",
"file_source_host": "url(location of the source .tiff file)",
"filename": "filename.tif",
"name": "item name",
"properties": {
"datetime": "2018-01-01T00:00:00Z",
"description": "TTT Other Hansen 2020",
"full_filename": "full_filename.tif",
"other propertis...."
,
}
"other propertis...."
,
}"_status": "error"
}
{api_url}/status/all
: GET request that returns a list of all item sent to the API.
return
: list of json object with the same format of the preview ones.
Docker config for stacItem Pipeline
Create first an environment variable file (.env) with the following variables:
ARBUTUS_OBJECT_ACCESS_ID=...
ARBUTUS_OBJECT_ACCESS_KEY=....
API_PORT=...
API_HOST=0.0.0.0
STAC_API_HOST=..url..
note: if the container is running in the same network as the STAC API server container, you might need to use the api address in the STAC_API_HOST
variable (ex: STAC_API_HOST=STAC_API_HOST=http://172.21.0.3:8082
). To get the IP address of your container, run this command:
docker inspect -f '{{.NetworkSettings.Networks.[network].IPAddress}}' [container name]
Run in the terminal:
docker-compose -f docker-compose-api.yml up --build # only first time to build the image.
docker-compose -f docker-compose-api.yml up gdal-api-python
note: make sure the stac api server is running and accesible.
Database Backup
Every day for the last 30 days a catalogdb database’s backup is created and stored in cloud server. This process is automatically triggered once a day.
Restore a Backup
Our backups are store in the cloud server (s3). In order to restore a database you need to do the following steps:
- Download the backup file from the cloud server ( since it is S3 in our case make sure your computer is configured so it can connect to the S3)
- unzip the backup file.
- copy the file
docker-compose-io.yml
inside the stac-fastapi repo folder in the server. - Run the command
docker-compose -f docker-compose-io.yml up
inside the stac-fastapi repo folder in the server. - move backup file to the backup folder inside the docker container.
- run
docker exec -it stac-db psql
in the server to run the postgresql in the container. - Create a new database
create database newDB
; - Restore the backup file in the newDB by :
psql -d newBD < backupfile.sql
. - Drop the catalogdb databse inside the container (there are conflicts if we restore directly in catalogdb) by:
drop database catalogdb
. - Rename the database
newDB
tocatalogdb
; - Restart containers one more time by:
docker-compose -f docker-compose-io.yml up
.