| Title: | Access 'ERDA' Data Archive via SFTP and HTTP |
|---|---|
| Description: | Utilities for listing folders and reading image EXIF metadata from the Electronic Research Data Archive (ERDA) at Aarhus University. Supports both SFTP access via SSH ControlMaster and public anonymous HTTP URLs. |
| Authors: | Lars Dalby [aut, cre, cph] |
| Maintainer: | Lars Dalby <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.10.0 |
| Built: | 2026-06-10 19:38:25 UTC |
| Source: | https://gitlab.au.dk/ecos/tools/r-pkgs/erdatools |
Starts the plumber REST API bundled with erdatools. The API provides endpoints for ERDA directory indexing, sourceimages pipelines, and deployment management. Opens a Swagger UI in the browser by default.
et_api_run(port = 8000, host = "127.0.0.1", ...)et_api_run(port = 8000, host = "127.0.0.1", ...)
port |
Port to listen on. |
host |
Host to bind to. Use '"0.0.0.0"' to listen on all interfaces. |
... |
Additional arguments passed to [plumber::pr_run()]. |
Invisibly returns the plumber router (after the server stops).
Reads a CSV file of deployment records and returns a tibble ready for [et_db_write_deployments()]. Validates required columns, normalizes values, and optionally resolves partner names to IDs.
et_build_deployments(csv_path, projectid, partner_map = NULL)et_build_deployments(csv_path, projectid, partner_map = NULL)
csv_path |
Character. Path to the CSV file. |
projectid |
Character. Project ID (e.g. '"ias"'). Applied to all rows. |
partner_map |
Named character vector mapping partner names to IDs (e.g. 'c("Denmark" = "5", "Sweden" = "11")'). When supplied, the 'partnerid' column is resolved and the 'partner' column is dropped. When 'NULL' (default) the 'partner' column is kept for the caller to resolve. |
The 'Partner' column must match values in 'data.partners.name' exactly.
**Required CSV columns**: 'Code', 'Partner', 'Location name', 'Year', 'Latitude', 'Longitude'.
**Optional CSV columns** (filled with 'NA' if absent): 'Elevation (m)', 'Solar or grid power', 'Trap serial number', 'Comments'.
A [tibble::tibble()] with columns 'year', 'code', 'locationname', 'elevation', 'latitude', 'longitude', 'power', 'trapserialno', 'trapstatus', 'comment', 'projectid', and either 'partnerid' (when 'partner_map' is supplied) or 'partner' (when it is not).
## Not run: # With partner resolution partners <- DBI::dbGetQuery(con, "SELECT id, name FROM data.partners") pmap <- setNames(partners$id, partners$name) df <- et_build_deployments("deployments.csv", "ias", partner_map = pmap) # Without (caller resolves later) df <- et_build_deployments("deployments.csv", "ias") ## End(Not run)## Not run: # With partner resolution partners <- DBI::dbGetQuery(con, "SELECT id, name FROM data.partners") pmap <- setNames(partners$id, partners$name) df <- et_build_deployments("deployments.csv", "ias", partner_map = pmap) # Without (caller resolves later) df <- et_build_deployments("deployments.csv", "ias") ## End(Not run)
Aggregates a grouped data frame (from [et_group_motion_images()] or [et_group_snapshot_images()]) into one row per unique 'session_key', matching the 'data.sessions' database schema.
et_build_sessions(grouped, deploymentid, type = "motion")et_build_sessions(grouped, deploymentid, type = "motion")
grouped |
Data frame as returned by [et_group_motion_images()] or [et_group_snapshot_images()] — must contain columns 'session_key' and 'timestamp'. |
deploymentid |
Character. Deployment ID (e.g. '"IT1#2023"'). |
type |
Character. Session type: '"motion"' or '"snapshot"'. |
Data frame with columns: 'session_key' (character, retained for joining), 'date' (Date), 'deploymentid' (character), 'type' (character), 'starttime' (POSIXct), 'endtime' (POSIXct). One row per unique 'session_key'.
## Not run: grouped <- et_group_motion_images(idx) sessions <- et_build_sessions(grouped, deploymentid = "IT1#2023", type = "motion") ## End(Not run)## Not run: grouped <- et_group_motion_images(idx) sessions <- et_build_sessions(grouped, deploymentid = "IT1#2023", type = "motion") ## End(Not run)
Transforms a filtered ERDA index (files only) into the 'data.sourceimages' schema used by the AMI database. Constructs public URLs from the full 'dir/name' path in the index and extracts timestamps from AMI or CamAlien filenames.
et_build_sourceimages( index, sharelink, http_base = "https://anon.erda.au.dk/share_redirect", deploymentid, type = "snapshot" )et_build_sourceimages( index, sharelink, http_base = "https://anon.erda.au.dk/share_redirect", deploymentid, type = "snapshot" )
index |
Data frame as returned by [et_index_filter()] — must contain columns 'dir', 'name', 'is_dir', 'size', 'mtime'. Should be filtered to files only (no directories). |
sharelink |
ERDA sharelink token. |
http_base |
Base URL for the ERDA share-redirect endpoint. |
deploymentid |
Character. Deployment ID for all rows (e.g. '"IT1#2023"'). |
type |
Character or 'NULL'. Image type: '"snapshot"' or '"motion"'. When 'NULL', the type is derived per-image from AMI filename parsing and rows with unparseable filenames are dropped. Use 'NULL' when relying on the database trigger for session grouping. |
The 'index_ts' column records the time this function is called (i.e. the build timestamp), not the time the index was originally created.
Data frame with columns: 'index_ts' (character, build timestamp), 'filename', 'deploymentid', 'url', 'timestamp', 'issnapshot', 'processed', 'queued'.
## Not run: idx <- et_index_filter("erda_index.parquet", extensions = c("jpg", "jpeg")) si <- et_build_sourceimages( idx, sharelink = Sys.getenv("ERDA_SHARELINK"), http_base = "https://anon.erda.au.dk/share_redirect", deploymentid = "IT1#2023", type = "snapshot" ) ## End(Not run)## Not run: idx <- et_index_filter("erda_index.parquet", extensions = c("jpg", "jpeg")) si <- et_build_sourceimages( idx, sharelink = Sys.getenv("ERDA_SHARELINK"), http_base = "https://anon.erda.au.dk/share_redirect", deploymentid = "IT1#2023", type = "snapshot" ) ## End(Not run)
Transforms the output of [et_parse_edge_tracks()] into the 'data.tracks' database schema by mapping 'id' to 'edge_id' and adding 'sessionid', 'algorithm', and 'algorithmversion' columns.
et_build_tracks( parsed_tracks, sessionid, algorithm = NULL, algorithmversion = NULL )et_build_tracks( parsed_tracks, sessionid, algorithm = NULL, algorithmversion = NULL )
parsed_tracks |
Tibble as returned by [et_parse_edge_tracks()]. |
sessionid |
Character. Session UUID to associate with all tracks. |
algorithm |
Character or 'NULL'. Algorithm name (e.g. '"yolov8"'). Default 'NULL'. |
algorithmversion |
Character or 'NULL'. Algorithm version string. Default 'NULL'. |
A data frame with columns matching the 'data.tracks' schema: 'sessionid', 'edge_id', 'algorithm', 'algorithmversion'.
## Not run: parsed <- et_parse_edge_tracks("results/tracks/20250508TR.csv") tracks <- et_build_tracks(parsed, sessionid = "abc-123") ## End(Not run)## Not run: parsed <- et_parse_edge_tracks("results/tracks/20250508TR.csv") tracks <- et_build_tracks(parsed, sessionid = "abc-123") ## End(Not run)
Batch UPDATE that assigns 'sessionid' to sourceimages based on temporal overlap with session '(starttime, endtime)' ranges. Motion sourceimages are matched to motion sessions, and snapshot sourceimages to snapshot sessions. Only sourceimages with 'sessionid IS NULL' are updated, making this function idempotent.
et_db_assign_sessions(dbcon, deploymentid)et_db_assign_sessions(dbcon, deploymentid)
dbcon |
A DBI connection object (from [et_db_connect()]). |
deploymentid |
Character. Deployment ID to scope the update. |
The number of rows updated (invisibly).
## Not run: con <- et_db_connect() on.exit(DBI::dbDisconnect(con)) et_db_assign_sessions(con, "IT1#2023") ## End(Not run)## Not run: con <- et_db_connect() on.exit(DBI::dbDisconnect(con)) et_db_assign_sessions(con, "IT1#2023") ## End(Not run)
Wrapper around [DBI::dbConnect()] using [RPostgres::Postgres()]. Connection parameters default to the environment variables used by the existing ami-indexer container ('AMI_HOST', 'AMI_USER', 'AMI_PASSWORD', 'AMI_DB').
et_db_connect( host = Sys.getenv("AMI_HOST"), port = 5432L, dbname = Sys.getenv("AMI_DB", unset = "ami_ias"), user = Sys.getenv("AMI_USER"), password = Sys.getenv("AMI_PASSWORD"), sslmode = "verify-full", sslrootcert = Sys.getenv("AMI_SSLROOTCERT") )et_db_connect( host = Sys.getenv("AMI_HOST"), port = 5432L, dbname = Sys.getenv("AMI_DB", unset = "ami_ias"), user = Sys.getenv("AMI_USER"), password = Sys.getenv("AMI_PASSWORD"), sslmode = "verify-full", sslrootcert = Sys.getenv("AMI_SSLROOTCERT") )
host |
Database hostname. Defaults to 'AMI_HOST' env var. |
port |
Database port. Defaults to '5432'. |
dbname |
Database name. Defaults to 'AMI_DB' env var or '"ami_ias"'. |
user |
Database user. Defaults to 'AMI_USER' env var. |
password |
Database password. Defaults to 'AMI_PASSWORD' env var. |
sslmode |
SSL mode for the connection. Defaults to '"verify-full"' (encrypted with certificate verification). |
sslrootcert |
Path to the root CA certificate ('.pem' file) used when 'sslmode' is '"verify-full"' or '"verify-ca"'. Defaults to the 'AMI_SSLROOTCERT' env var. |
Requires the DBI and RPostgres packages at runtime.
A DBI connection object.
## Not run: con <- et_db_connect() on.exit(DBI::dbDisconnect(con)) ## End(Not run)## Not run: con <- et_db_connect() on.exit(DBI::dbDisconnect(con)) ## End(Not run)
Looks up a deployment's metadata and its associated project and partner folders to construct the canonical ERDA directory path. The path structure depends on the deployment's trap type:
et_db_deployment_path(dbcon, deployment_id)et_db_deployment_path(dbcon, deployment_id)
dbcon |
A DBI connection object. |
deployment_id |
Character. The deployment ID (primary key). |
- standalone/lepisense/mothbox: /project_folder/year/partner_folder/code/ - iot: /project_folder/partner_folder/code/ (no year level; images are organised in date-based subdirectories)
Character string with the remote path, e.g. '"/ias/2023/italy/IT1/"' for standalone or '"/storage/denmark/DK5/"' for IoT.
## Not run: con <- et_db_connect() path <- et_db_deployment_path(con, "IT1#2023") # "/ias/2023/italy/IT1/" ## End(Not run)## Not run: con <- et_db_connect() path <- et_db_deployment_path(con, "IT1#2023") # "/ias/2023/italy/IT1/" ## End(Not run)
Queries existing '(code, year, projectid)' triples from 'data.deployments' and returns only the rows in 'df' that do not yet exist.
et_db_diff_deployments(dbcon, df)et_db_diff_deployments(dbcon, df)
dbcon |
A DBI connection object. |
df |
Data frame with at least 'code', 'year', and 'projectid' columns (as returned by [et_build_deployments()]). |
A subset of 'df' containing only new deployments. All columns are preserved.
## Not run: new <- et_db_diff_deployments(con, df) et_db_write_deployments(con, new) ## End(Not run)## Not run: new <- et_db_diff_deployments(con, df) et_db_write_deployments(con, new) ## End(Not run)
Compares a sourceimages data frame against 'data.sourceimages' in the database and returns only the rows whose 'filename' is not yet present. Queries are batched to stay within PostgreSQL's parameter limit.
et_db_diff_sourceimages(dbcon, df, batch_size = 5000L, n_workers = 1L)et_db_diff_sourceimages(dbcon, df, batch_size = 5000L, n_workers = 1L)
dbcon |
A DBI connection object (from [et_db_connect()]). Used for the sequential path only; parallel workers open their own connections. |
df |
Data frame with at least a 'filename' column (typically the output of [et_build_sourceimages()]). |
batch_size |
Integer. Number of filenames per query. Default '5000L'. |
n_workers |
Integer. Number of parallel mirai workers. Default '1L'. See [et_db_write_sourceimages()] for requirements. |
A subset of 'df' containing only rows not already in the database. All original columns are preserved.
## Not run: con <- et_db_connect() on.exit(DBI::dbDisconnect(con)) si <- et_build_sourceimages(idx, sharelink, http_base = http_base, deploymentid = 1L) new <- et_db_diff_sourceimages(con, si) et_db_write_sourceimages(con, new) ## End(Not run)## Not run: con <- et_db_connect() on.exit(DBI::dbDisconnect(con)) si <- et_build_sourceimages(idx, sharelink, http_base = http_base, deploymentid = 1L) new <- et_db_diff_sourceimages(con, si) et_db_write_sourceimages(con, new) ## End(Not run)
Queries ‘data.deployments' by primary key and returns the deployment’s year, code, partner ID, and project ID.
et_db_get_deployment(dbcon, deployment_id)et_db_get_deployment(dbcon, deployment_id)
dbcon |
A DBI connection object. |
deployment_id |
Character. The deployment ID (primary key). |
A named list with elements 'year' (integer), 'code' (character), 'partnerid' (character), 'projectid' (character), and 'traptype' (character, e.g. '"standalone"', '"iot"', '"lepisense"', or '"mothbox"'). Aborts with an error if the deployment is not found.
## Not run: con <- et_db_connect() dep <- et_db_get_deployment(con, "IT1#2023") dep$year # 2023 dep$code # "IT1" ## End(Not run)## Not run: con <- et_db_connect() dep <- et_db_get_deployment(con, "IT1#2023") dep$year # 2023 dep$code # "IT1" ## End(Not run)
Queries 'data.deployments' for a deployment matching the given project, partner, year, and code. The deployment ID is a character string (e.g. '"IT1#2023"').
et_db_get_deployment_id(dbcon, projectid, partnerid, year, code)et_db_get_deployment_id(dbcon, projectid, partnerid, year, code)
dbcon |
A DBI connection object. |
projectid |
Character. Project identifier (e.g. '"ias"'). |
partnerid |
Character. Partner ID (e.g. '"8"'). |
year |
Integer. Deployment year. |
code |
Character. Deployment code (e.g. '"IT1"'). |
Character deployment ID (e.g. '"IT1#2023"'), or 'NA_character_' if not found.
## Not run: con <- et_db_connect() dep_id <- et_db_get_deployment_id(con, "ias", "8", 2023L, "IT1") ## End(Not run)## Not run: con <- et_db_connect() dep_id <- et_db_get_deployment_id(con, "ias", "8", 2023L, "IT1") ## End(Not run)
Queries 'data.partners' for the folder associated with a partner ID.
et_db_get_partner_folder(dbcon, partnerid)et_db_get_partner_folder(dbcon, partnerid)
dbcon |
A DBI connection object. |
partnerid |
Integer. Partner ID. |
Character string with the partner folder, or 'NA_character_' if not found.
## Not run: con <- et_db_connect() folder <- et_db_get_partner_folder(con, 1L) ## End(Not run)## Not run: con <- et_db_connect() folder <- et_db_get_partner_folder(con, 1L) ## End(Not run)
Queries 'data.projects' for the folder associated with a project ID.
et_db_get_project_folder(dbcon, projectid)et_db_get_project_folder(dbcon, projectid)
dbcon |
A DBI connection object. |
projectid |
Character. Project ID. |
Character string with the project folder, or 'NA_character_' if not found.
## Not run: con <- et_db_connect() folder <- et_db_get_project_folder(con, "ias") ## End(Not run)## Not run: con <- et_db_connect() folder <- et_db_get_project_folder(con, "ias") ## End(Not run)
Queries 'data.sessions' for all sessions belonging to a deployment. Used to retrieve sessions that were created by the database trigger (migration 2034) after sourceimages were inserted.
et_db_get_sessions(dbcon, deploymentid)et_db_get_sessions(dbcon, deploymentid)
dbcon |
A DBI connection object (from [et_db_connect()]). |
deploymentid |
Character. Deployment ID (e.g. '"IT1#2023"'). |
Data frame with columns: 'id' (character, UUID), 'date' (Date), 'type' (character), 'starttime' (POSIXct), 'endtime' (POSIXct). Zero rows if no sessions exist.
## Not run: con <- et_db_connect() on.exit(DBI::dbDisconnect(con)) sessions <- et_db_get_sessions(con, "IT1#2023") ## End(Not run)## Not run: con <- et_db_connect() on.exit(DBI::dbDisconnect(con)) sessions <- et_db_get_sessions(con, "IT1#2023") ## End(Not run)
Queries 'data.deployments' for all deployments matching an optional project and/or year filter. Useful for batch operations such as running the deploy pipeline over every deployment in a given project-year.
et_db_list_deployments(dbcon, projectid = NULL, year = NULL)et_db_list_deployments(dbcon, projectid = NULL, year = NULL)
dbcon |
A DBI connection object. |
projectid |
Character or 'NULL'. Project identifier (e.g. '"ias"'). If 'NULL', deployments from all projects are returned. |
year |
Integer or 'NULL'. Deployment year. If 'NULL', deployments from all years are returned. |
Character vector of deployment IDs ordered by 'id'. Returns 'character(0)' when no deployments match.
## Not run: con <- et_db_connect() ids <- et_db_list_deployments(con, projectid = "ias", year = 2025L) ## End(Not run)## Not run: con <- et_db_connect() ids <- et_db_list_deployments(con, projectid = "ias", year = 2025L) ## End(Not run)
Performs batch INSERT into 'data.deployments'. Each deployment receives a UUID generated by 'uuid_generate_v4()' in PostgreSQL. Geographic coordinates are converted to a PostGIS point via 'ST_SetSRID(ST_MakePoint(lon, lat), 4326)'.
et_db_write_deployments(dbcon, df, batch_size = 250L)et_db_write_deployments(dbcon, df, batch_size = 250L)
dbcon |
A DBI connection object (from [et_db_connect()]). |
df |
Data frame with columns: 'year', 'code', 'locationname', 'elevation', 'latitude', 'longitude', 'trapserialno', 'trapstatus', 'traptype', 'comment', 'power', 'partnerid', 'projectid'. |
batch_size |
Integer. Rows per INSERT statement (default 250). |
Use [et_db_diff_deployments()] beforehand to avoid inserting duplicates.
Total number of rows inserted (invisibly).
## Not run: df <- et_build_deployments("deployments.csv", "ias", partner_map = pmap) new <- et_db_diff_deployments(con, df) et_db_write_deployments(con, new) ## End(Not run)## Not run: df <- et_build_deployments("deployments.csv", "ias", partner_map = pmap) new <- et_db_diff_deployments(con, df) et_db_write_deployments(con, new) ## End(Not run)
Performs batch INSERT into 'data.sessions' with ON CONFLICT upsert on '(date, deploymentid, type)'. Returns the input data frame augmented with 'id' (UUID) from the database.
et_db_write_sessions(dbcon, df, batch_size = 250L, n_workers = 1L)et_db_write_sessions(dbcon, df, batch_size = 250L, n_workers = 1L)
dbcon |
A DBI connection object (from [et_db_connect()]). Used for the sequential path and the final ID-fetch; parallel workers open their own connections. |
df |
Data frame with columns: 'date', 'deploymentid', 'type', 'starttime', 'endtime'. Typically the output of [et_build_sessions()]. |
batch_size |
Integer. Number of rows per INSERT statement. Default '250L'. |
n_workers |
Integer. Number of parallel mirai workers. Default '1L'. See [et_db_write_sourceimages()] for requirements. |
The input data frame with an 'id' column (character UUID) populated from the database.
## Not run: con <- et_db_connect() on.exit(DBI::dbDisconnect(con)) sessions <- et_build_sessions(grouped, "IT1#2023", type = "motion") sessions_with_ids <- et_db_write_sessions(con, sessions) ## End(Not run)## Not run: con <- et_db_connect() on.exit(DBI::dbDisconnect(con)) sessions <- et_build_sessions(grouped, "IT1#2023", type = "motion") sessions_with_ids <- et_db_write_sessions(con, sessions) ## End(Not run)
Performs batch INSERT into 'data.sourceimages' with ON CONFLICT upsert on '(filename, deploymentid)'. Existing rows are updated with the new values. Uses parameterized queries via [DBI::dbExecute()].
et_db_write_sourceimages(dbcon, df, batch_size = 250L, n_workers = 1L)et_db_write_sourceimages(dbcon, df, batch_size = 250L, n_workers = 1L)
dbcon |
A DBI connection object (from [et_db_connect()]). Used for the sequential path only; parallel workers open their own connections via env vars. |
df |
Data frame with columns: 'filename', 'deploymentid', 'url', 'timestamp', 'issnapshot', 'processed', 'queued'. |
batch_size |
Integer. Number of rows per INSERT statement. Default '250L'. |
n_workers |
Integer. Number of parallel mirai workers. Default '1L' (sequential). When '> 1', batches are dispatched via [mirai::mirai_map()] and each worker opens and closes its own DB connection. Requires the mirai package and an installed (not just loaded) erdatools. |
The total number of rows affected (invisibly).
## Not run: con <- et_db_connect() on.exit(DBI::dbDisconnect(con)) et_db_write_sourceimages(con, sourceimages_df) et_db_write_sourceimages(con, sourceimages_df, n_workers = 4L) ## End(Not run)## Not run: con <- et_db_connect() on.exit(DBI::dbDisconnect(con)) et_db_write_sourceimages(con, sourceimages_df) et_db_write_sourceimages(con, sourceimages_df, n_workers = 4L) ## End(Not run)
Performs batch INSERT into 'data.tracks' with ON CONFLICT upsert on '(sessionid, edge_id)'. Typically used to write edge-processed tracking data from IoT AMI traps.
et_db_write_tracks(dbcon, df, batch_size = 250L)et_db_write_tracks(dbcon, df, batch_size = 250L)
dbcon |
A DBI connection object (from [et_db_connect()]). |
df |
Data frame with columns: 'sessionid', 'edge_id', 'algorithm', 'algorithmversion'. Typically the output of [et_build_tracks()]. |
batch_size |
Integer. Number of rows per INSERT statement. Default '250L'. |
Total number of rows affected (invisibly).
## Not run: con <- et_db_connect() on.exit(DBI::dbDisconnect(con)) parsed <- et_parse_edge_tracks("results/tracks/20250508TR.csv") tracks <- et_build_tracks(parsed, sessionid = "abc-123") et_db_write_tracks(con, tracks) ## End(Not run)## Not run: con <- et_db_connect() on.exit(DBI::dbDisconnect(con)) parsed <- et_parse_edge_tracks("results/tracks/20250508TR.csv") tracks <- et_build_tracks(parsed, sessionid = "abc-123") et_db_write_tracks(con, tracks) ## End(Not run)
Returns the vector of ExifTool tag names passed to [exifr::read_exif()]. When the environment variable 'EXIF_TAGS' is set (comma-separated tag names), it is used instead of the built-in defaults. '"FileName"' is always prepended so output parquet files retain their original name.
et_exif_tags()et_exif_tags()
If 'EXIF_TAGS' is set but produces no valid tag names after splitting and trimming (e.g. 'EXIF_TAGS=" , , "'), a warning is emitted and the built-in defaults are used.
Default tags (when 'EXIF_TAGS' is unset): '"FileName"', '"Model"', '"CameraSerialNumber"', '"XResolution"', '"YResolution"', '"ImageWidth"', '"ImageHeight"', '"UserComment"', '"Duration"', '"VideoFrameRate"'.
Character vector of tag names, always starting with '"FileName"'.
Keeps filenames ending in '.jpg', '.jpeg', '.tif', '.tiff', '.mp4', '.mov', or '.avi' (all case-insensitive).
et_filter_media_files(filenames)et_filter_media_files(filenames)
filenames |
Character vector of file paths or names. |
Subset of 'filenames' matching a supported extension.
Removes rows from a grouped image data frame where the 'timestamp' year falls outside '[year, year + 1]'. Deployments can span a year boundary (e.g. a 2025 deployment may have images from early 2026), so 'year + 1' is included.
et_filter_stray_images(grouped, year, log_path = NULL)et_filter_stray_images(grouped, year, log_path = NULL)
grouped |
Data frame as returned by [et_group_motion_images()] or [et_group_snapshot_images()]. Must contain columns 'dir', 'name', and 'timestamp' (POSIXct). |
year |
Integer. The deployment year. |
log_path |
Character or 'NULL'. If non-'NULL', full ERDA paths of filtered images are written to this file (one per line). Parent directories are created if needed. |
Optionally writes the ERDA paths of removed images to a log file.
A named list:
Data frame of rows whose timestamps are within range.
Data frame of rows that were filtered out.
## Not run: grouped <- et_group_motion_images(idx) result <- et_filter_stray_images(grouped, year = 2025) grouped <- result$kept ## End(Not run)## Not run: grouped <- et_group_motion_images(idx) result <- et_filter_stray_images(grouped, year = 2025) grouped <- result$kept ## End(Not run)
Removes rows from a sourceimages data frame where the 'timestamp' year falls outside '[year, year + 1]'. This is the sourceimages-schema equivalent of [et_filter_stray_images()], operating on character timestamps and the 'filename' column instead of grouped data frames.
et_filter_stray_sourceimages(sourceimages, year, log_path = NULL)et_filter_stray_sourceimages(sourceimages, year, log_path = NULL)
sourceimages |
Data frame as returned by [et_build_sourceimages()]. Must contain columns 'filename' and 'timestamp' (character, format '"YYYY-MM-DD HH:MM:SS"'). |
year |
Integer. The deployment year. |
log_path |
Character or 'NULL'. If non-'NULL', filenames of removed images are written to this file (one per line). Parent directories are created if needed. |
A named list:
Data frame of rows whose timestamps are within range.
Data frame of rows that were filtered out.
## Not run: si <- et_build_sourceimages(idx, sharelink = "TOKEN", deploymentid = "IT1#2023", type = NULL) result <- et_filter_stray_sourceimages(si, year = 2023) si <- result$kept ## End(Not run)## Not run: si <- et_build_sourceimages(idx, sharelink = "TOKEN", deploymentid = "IT1#2023", type = NULL) result <- et_filter_stray_sourceimages(si, year = 2023) si <- result$kept ## End(Not run)
Takes an ERDA index data frame (as returned by [et_index_filter()]) and groups motion images into **sessions** using noon-to-noon grouping.
et_group_motion_images(index)et_group_motion_images(index)
index |
Data frame with columns 'dir', 'name', 'is_dir', 'size', 'mtime' (as returned by [et_index_filter()]). |
A session spans from noon UTC on one day to noon UTC on the next. The 'session_key' column is the calendar date of the noon boundary, so an image at 22:00 on Aug 31 and another at 06:00 on Sep 1 both receive 'session_key = "2023_08_31"'. An image at 13:00 on Sep 1 starts a new session with key '"2023_09_01"'.
Only motion images (as identified by [et_parse_ami_filename()]) are retained.
The filtered data frame with three added columns:
Character. Session date as '"YYYY_MM_DD"'.
POSIXct (UTC). Parsed from the AMI filename.
Character. Trigger counter from the filename.
## Not run: idx <- et_index_filter("motion_index.parquet", extensions = c("jpg", "jpeg")) grouped <- et_group_motion_images(idx) table(grouped$session_key) ## End(Not run)## Not run: idx <- et_index_filter("motion_index.parquet", extensions = c("jpg", "jpeg")) grouped <- et_group_motion_images(idx) table(grouped$session_key) ## End(Not run)
Takes an ERDA index data frame (as returned by [et_index_filter()]) and groups snapshot images into **sessions** using noon-to-noon grouping.
et_group_snapshot_images(index)et_group_snapshot_images(index)
index |
Data frame with columns 'dir', 'name', 'is_dir', 'size', 'mtime' (as returned by [et_index_filter()]). |
A session spans from noon UTC on one day to noon UTC on the next. The 'session_key' column is the calendar date of the noon boundary, so an image at 22:00 on Aug 31 and another at 06:00 on Sep 1 both receive 'session_key = "2023_08_31"'. An image at 13:00 on Sep 1 starts a new session with key '"2023_09_01"'.
Only snapshot images (as identified by [et_parse_ami_filename()]) are retained.
The filtered data frame with two added columns:
Character. Session date as '"YYYY_MM_DD"'.
POSIXct (UTC). Parsed from the AMI filename.
## Not run: idx <- et_index_filter("snapshot_index.parquet", extensions = c("jpg", "jpeg")) grouped <- et_group_snapshot_images(idx) table(grouped$session_key) ## End(Not run)## Not run: idx <- et_index_filter("snapshot_index.parquet", extensions = c("jpg", "jpeg")) grouped <- et_group_snapshot_images(idx) table(grouped$session_key) ## End(Not run)
Builds the HTTPS URL used to access a single image file through ERDA's anonymous share-redirect endpoint. This is a pure string operation and requires no network connection. Slashes between path components are normalised automatically.
et_img_url( sharelink, remote_dir, country, folder, filename, http_base = "https://anon.erda.au.dk/share_redirect" )et_img_url( sharelink, remote_dir, country, folder, filename, http_base = "https://anon.erda.au.dk/share_redirect" )
sharelink |
ERDA sharelink token. |
remote_dir |
Remote root directory, i.e. the path relative to the sharelink root (e.g. '"/storage/onestop/2025/"'). Leading and trailing slashes are stripped before joining. |
country |
Country subdirectory name (e.g. '"portugal"'). |
folder |
Dated subfolder name (e.g. '"2025_07_10-11_10"'). |
filename |
Image file name (e.g. '"img.jpg"'). |
http_base |
Base URL of the ERDA share-redirect endpoint. Defaults to '"https://anon.erda.au.dk/share_redirect"'. |
A length-1 character vector containing the full URL.
et_img_url( sharelink = "TOKEN", remote_dir = "/storage/2025/", country = "portugal", folder = "2025_07_10-11_10", filename = "img.jpg" )et_img_url( sharelink = "TOKEN", remote_dir = "/storage/2025/", country = "portugal", folder = "2025_07_10-11_10", filename = "img.jpg" )
Traverses the directory tree rooted at 'remote_path' using BFS, calling 'ls -la' on each directory via [et_sftp_batch()]. Results are stored in a parquet file at 'index_path' with columns 'dir', 'name', 'is_dir', 'size' (integer, file size in bytes; 'NA' for directories), and 'mtime' (character, last-modified timestamp from 'ls -la', e.g. '"Jul 10 11:10"' or '"Jan 5 2024"').
et_index_dir( conn, remote_path, index_path, checkpoint_every = 25L, n_workers = 1L )et_index_dir( conn, remote_path, index_path, checkpoint_every = 25L, n_workers = 1L )
conn |
Connection object returned by [et_sftp_connect()]. |
remote_path |
Remote root path to index. |
index_path |
Local path for the output parquet file. |
checkpoint_every |
Integer. Rewrite the parquet every this many directories. Default '25L'. |
n_workers |
Integer. Number of parallel mirai workers for wave-based BFS. Default '1L' (sequential). When '> 1', each wave of directories is listed concurrently – all workers share the parent ControlMaster socket. Requires the mirai package and an installed erdatools. |
**Resume**: if 'index_path' already exists, the index is loaded and any directory already present in 'unique(index$dir)' is skipped. Previously indexed directories are used to reconstruct the BFS queue without additional SFTP calls, so resuming after an interrupted run is fast.
**Checkpointing**: the parquet is rewritten every 'checkpoint_every' directories. At most 'checkpoint_every' directories of work can be lost if the connection drops between checkpoints.
The full index data frame (invisibly).
## Not run: conn <- et_sftp_connect(Sys.getenv("ERDA_SHARELINK")) on.exit(conn$disconnect(), add = TRUE, after = FALSE) idx <- et_index_dir(conn, "/storage/2025", "./erda_index.parquet") idx <- et_index_dir(conn, "/storage/2025", "./erda_index.parquet", n_workers = 4L) ## End(Not run)## Not run: conn <- et_sftp_connect(Sys.getenv("ERDA_SHARELINK")) on.exit(conn$disconnect(), add = TRUE, after = FALSE) idx <- et_index_dir(conn, "/storage/2025", "./erda_index.parquet") idx <- et_index_dir(conn, "/storage/2025", "./erda_index.parquet", n_workers = 4L) ## End(Not run)
Removes directory rows from the index and optionally filters by file extension. Returns a filtered data frame with the same schema as the input.
et_index_filter(index, extensions = NULL)et_index_filter(index, extensions = NULL)
index |
A data frame as returned by [et_index_dir()], or a character string of length 1 giving the path to a parquet file. |
extensions |
Character vector of file extensions to keep, e.g. 'c("jpg", "jpeg")' or 'c(".tif")'. A leading dot is added automatically if missing. Matching is case-insensitive. 'NULL' (default) keeps all files. |
Data frame with the same columns as the input index ('dir', 'name', 'is_dir', 'size', 'mtime'), filtered to files only.
## Not run: # From a parquet file filtered <- et_index_filter("erda_index.parquet", extensions = c("jpg", "jpeg")) # From an in-memory data frame idx <- et_index_dir(conn, "/storage/2025", "erda_index.parquet") filtered <- et_index_filter(idx, extensions = "jpg") # All files (no extension filter) all_files <- et_index_filter("erda_index.parquet") ## End(Not run)## Not run: # From a parquet file filtered <- et_index_filter("erda_index.parquet", extensions = c("jpg", "jpeg")) # From an in-memory data frame idx <- et_index_dir(conn, "/storage/2025", "erda_index.parquet") filtered <- et_index_filter(idx, extensions = "jpg") # All files (no extension filter) all_files <- et_index_filter("erda_index.parquet") ## End(Not run)
Re-runs 'ls -la' on every directory in the existing index, diffs the listings against the stored entries, and returns a named list describing the changes. Newly discovered subdirectories are recursively indexed via [et_index_dir()].
et_index_update(conn, remote_path, index_path, update = TRUE)et_index_update(conn, remote_path, index_path, update = TRUE)
conn |
Connection object returned by [et_sftp_connect()]. |
remote_path |
Remote root path (used when recursively indexing new subdirectories). |
index_path |
Local path to the existing parquet index file. |
update |
Logical. If 'TRUE' (default), the index is updated in place with the detected changes. |
Named list 'list(added = <data.frame>, removed = <data.frame>)' (invisibly). Each data frame has columns 'dir', 'name', 'is_dir', 'size' (integer, bytes), 'mtime' (character timestamp).
## Not run: conn <- et_sftp_connect(Sys.getenv("ERDA_SHARELINK")) on.exit(conn$disconnect(), add = TRUE, after = FALSE) delta <- et_index_update(conn, "/storage/2025", "./erda_index.parquet") ## End(Not run)## Not run: conn <- et_sftp_connect(Sys.getenv("ERDA_SHARELINK")) on.exit(conn$disconnect(), add = TRUE, after = FALSE) delta <- et_index_update(conn, "/storage/2025", "./erda_index.parquet") ## End(Not run)
Creates a job in the in-memory job store with status '"queued"'.
et_job_create(type, params = list())et_job_create(type, params = list())
type |
Character. Job type (e.g. '"index_full"', '"index_update"', '"sourceimages_write"'). |
params |
Named list of job parameters. |
Character string: the job ID.
id <- et_job_create("index_full", list(remote_path = "/storage/2025")) et_job_get(id)id <- et_job_create("index_full", list(remote_path = "/storage/2025")) et_job_get(id)
If the job has a background process handle, its status is synced from the process state before returning.
et_job_get(id)et_job_get(id)
id |
Character. Job ID. |
Named list with job details, or 'NULL' if not found.
id <- et_job_create("test", list()) et_job_get(id)id <- et_job_create("test", list()) et_job_get(id)
Syncs status from background process handles before returning.
et_job_list()et_job_list()
A data frame with columns: 'id', 'type', 'status', 'created_at', 'updated_at'. Returns an empty data frame if no jobs exist.
et_job_list()et_job_list()
Stores a [mirai::mirai()] handle on the job and sets the status to '"running"'. The handle is polled by [et_job_get()] and [et_job_list()] to derive the current status.
et_job_set_process(id, process)et_job_set_process(id, process)
id |
Character. Job ID. |
process |
A 'mirai' object (from [mirai::mirai()]). |
The updated job list (invisibly).
Update a job's status
et_job_update(id, status, result = NULL, error = NULL)et_job_update(id, status, result = NULL, error = NULL)
id |
Character. Job ID. |
status |
Character. New status: '"queued"', '"running"', '"completed"', or '"failed"'. |
result |
Optional result object to attach. |
error |
Optional error message to attach. |
The updated job list (invisibly).
id <- et_job_create("test", list()) et_job_update(id, "running") et_job_update(id, "completed", result = list(rows = 100))id <- et_job_create("test", list()) et_job_update(id, "running") et_job_update(id, "completed", result = list(rows = 100))
Given a character vector of folder names and a local parquet directory, returns a named logical vector indicating which folders already have a corresponding '.parquet' file. This is a pure function: it performs no network I/O.
et_parquet_folders_done(folders, parquet_dir)et_parquet_folders_done(folders, parquet_dir)
folders |
Character vector of folder names. If full paths are supplied, [base::basename()] is applied before looking up the parquet file. |
parquet_dir |
Path to the local parquet directory. |
Named logical vector (names equal to 'folders'). 'TRUE' means a '.parquet' file already exists for that folder.
tmp <- tempdir() writeLines("", fs::path(tmp, "2025_07_10-11_10.parquet")) et_parquet_folders_done(c("2025_07_10-11_10", "2025_07_11-08_00"), tmp)tmp <- tempdir() writeLines("", fs::path(tmp, "2025_07_10-11_10.parquet")) et_parquet_folders_done(c("2025_07_10-11_10", "2025_07_11-08_00"), tmp)
AMI traps produce five filename formats:
et_parse_ami_filename(filenames)et_parse_ami_filename(filenames)
filenames |
Character vector of filenames (with or without extension). |
- **Motion (standalone)**: '<YYYYMMDDHHmmss>-<sequence>-<trigger>.ext' (e.g. '20230831124747-00-01.jpg') - **Motion (IoT)**: '<YYYYMMDDHHmmss>.ext' (e.g. '20250508104842.jpg') — bare 14-digit timestamp, no sequence or trigger counter - **Snapshot (with camera id)**: '<camera_id>-<YYYYMMDDHHmmss>-<type>.ext' (e.g. '01-20230831225959-snapshot.jpg') - **Snapshot (no camera id)**: '<YYYYMMDDHHmmss>-<type>.ext' (e.g. '20250527110000-snapshot.jpg') - **Mothbox**: '<trap_name>_<YYYY>_<MM>_<DD>__<HH>_<MM>_<SS>_<HDR>.ext' (e.g. 'wryNinox_2026_05_29__23_02_29_HDR0.jpg') — scheduled captures with underscore-delimited date/time separated by double underscore
This function extracts the trigger counter (or camera ID for snapshots), timestamp, and image type from each filename. All formats are detected automatically. Filenames that match none of the patterns produce 'NA' values for all columns.
A data frame with columns 'trigger' (character: trigger counter for motion images, camera ID for snapshot images that include one, or 'NA'), 'timestamp' (character, 'YYYY-MM-DD HH:MM:SS'), 'type' (character, typically '"snapshot"' or '"motion"'), and 'sequence' (character, motion sequence number or 'NA' for snapshots). One row per filename.
et_parse_ami_filename(c( "01-20230831225959-snapshot.jpg", "20250527110000-snapshot.jpg", "20230831124747-00-01.jpg", "20250508104842.jpg", "wryNinox_2026_05_29__23_02_29_HDR0.jpg" ))et_parse_ami_filename(c( "01-20230831225959-snapshot.jpg", "20250527110000-snapshot.jpg", "20230831124747-00-01.jpg", "20250508104842.jpg", "wryNinox_2026_05_29__23_02_29_HDR0.jpg" ))
CamAlien filenames encode metadata as underscore-delimited key-value pairs (e.g. 'CT_1234_GT_5678_lon_12.34_lat_56.78_alt_120.5_gain_1.4.jpg'). This function strips the file extension and splits on '_', treating odd-positioned tokens as keys and even-positioned tokens as values.
et_parse_camalien_filename(filenames)et_parse_camalien_filename(filenames)
filenames |
Character vector of filenames (with or without extension). |
If the number of tokens is odd, the last value is padded with 'NA_character_' so that key-value pairing still works.
Known numeric columns ('alt', 'vel', 'expt', 'gain') are coerced to 'double'. Coordinates ('lon', 'lat') and timestamps ('CT', 'GT') are kept as 'character' to preserve their original string representation. All other columns are 'character'.
**Note:** Values containing underscores will cause incorrect key-value pairing, since '_' is also the key-value and pair delimiter.
A [tibble::tibble()] with one row per filename and one column per key found in the filenames. Columns not present in a given filename are filled with 'NA'. Column types: 'alt', 'vel', 'expt', 'gain' are 'double'; all others are 'character'.
et_parse_camalien_filename(c( "CT_1234_GT_5678_lon_12.34_lat_56.78_alt_120.5_vel_0.0_expt_0.01_gain_1.4.jpg", "CT_9999_GT_0000_lon_8.10_lat_55.40_alt_80.0_vel_2.3_expt_0.02_gain_2.0.jpg" ))et_parse_camalien_filename(c( "CT_1234_GT_5678_lon_12.34_lat_56.78_alt_120.5_vel_0.0_expt_0.01_gain_1.4.jpg", "CT_9999_GT_0000_lon_8.10_lat_55.40_alt_80.0_vel_2.3_expt_0.02_gain_2.0.jpg" ))
Reads a detection CSV produced by IoT AMI trap edge processing (e.g. 'results/detect/YYYYMMDD.csv'). Parses timestamps and coerces column types.
et_parse_edge_detections(path)et_parse_edge_detections(path)
path |
Character. Path to the detection CSV file. |
A [tibble::tibble()] with columns:
Integer. Capture year.
Character. Trap identifier.
POSIXct (UTC). Capture time.
Double. Object detection confidence.
Integer. Detection ID within image.
Double. Bounding box coordinates.
Character. Source image filename.
Character. Taxonomic order label.
Integer. Order ID.
Double. Order classification confidence.
Logical. Whether confidence exceeds threshold.
Integer. Detection key (links to track details).
Character. Species label.
Integer. Species ID.
Double. Species classification confidence.
## Not run: detections <- et_parse_edge_detections("results/detect/20250508.csv") ## End(Not run)## Not run: detections <- et_parse_edge_detections("results/detect/20250508.csv") ## End(Not run)
Reads a track summary CSV produced by IoT AMI trap edge processing (e.g. 'results/tracks/YYYYMMDDTR.csv'). Parses timestamps and coerces column types.
et_parse_edge_tracks(path)et_parse_edge_tracks(path)
path |
Character. Path to the track summary CSV file. |
A [tibble::tibble()] with columns:
Integer. Track ID from edge processor.
POSIXct (UTC). Start of track.
POSIXct (UTC). End of track.
Integer. Duration in seconds.
Character. Species label (majority vote).
Integer. Number of frames in track.
Double. Classification consistency in percent.
Double. Mean bounding box area in pixels squared.
Double. Total pixel displacement.
## Not run: tracks <- et_parse_edge_tracks("results/tracks/20250508TR.csv") ## End(Not run)## Not run: tracks <- et_parse_edge_tracks("results/tracks/20250508TR.csv") ## End(Not run)
Finds track summary CSVs in an ERDA index, downloads each via SFTP, parses them with [et_parse_edge_tracks()], maps tracks to motion sessions using the noon-to-noon date boundary, and writes to the database via [et_db_write_tracks()].
et_process_edge_tracks(conn, dbcon, idx_raw, sessions)et_process_edge_tracks(conn, dbcon, idx_raw, sessions)
conn |
SFTP connection object from [et_sftp_connect()]. |
dbcon |
A DBI connection object. |
idx_raw |
Data frame. Raw ERDA index (as returned by [et_index_dir()]). |
sessions |
Data frame. Sessions with 'id', 'date', and 'type' columns (as returned by [et_db_write_sessions()]). |
Integer. Total number of track rows written.
Executes one SFTP batch session over the existing ControlMaster connection returned by [et_sftp_connect()], returning standard output as a character vector.
et_sftp_batch(conn, commands, stdout = TRUE, stderr = FALSE)et_sftp_batch(conn, commands, stdout = TRUE, stderr = FALSE)
conn |
Connection object returned by [et_sftp_connect()]. |
commands |
Character vector of SFTP commands, passed to the session via 'stdin'. |
stdout |
Passed to [base::system2()]. 'TRUE' (default) captures output as a character vector. |
stderr |
Passed to [base::system2()]. 'FALSE' (default) discards stderr. |
Character vector of stdout lines (when 'stdout = TRUE'), or the integer exit code (when 'stdout = FALSE').
## Not run: conn <- et_sftp_connect(Sys.getenv("ERDA_SHARELINK")) on.exit(conn$disconnect(), add = TRUE, after = FALSE) lines <- et_sftp_batch(conn, paste0("ls -1 /storage/2025/")) ## End(Not run)## Not run: conn <- et_sftp_connect(Sys.getenv("ERDA_SHARELINK")) on.exit(conn$disconnect(), add = TRUE, after = FALSE) lines <- et_sftp_batch(conn, paste0("ls -1 /storage/2025/")) ## End(Not run)
Establishes a persistent SSH connection to ERDA using an SSH ControlMaster socket. Authentication uses 'SSH_ASKPASS' — no 'sshpass' is required. The function waits up to 5 seconds for the socket to appear and calls [cli::cli_abort()] on timeout.
et_sftp_connect( sharelink, host = "io.erda.au.dk", port = "2222", tmp_root = fs::path(tempdir(), "sftp_imgs") )et_sftp_connect( sharelink, host = "io.erda.au.dk", port = "2222", tmp_root = fs::path(tempdir(), "sftp_imgs") )
sharelink |
ERDA sharelink token, used as both the SSH user name and password. |
host |
ERDA SFTP hostname. Defaults to '"io.erda.au.dk"'. |
port |
ERDA SFTP port. Defaults to '"2222"'. |
tmp_root |
Local directory used for the SSH control socket and temporary downloads. Created if it does not exist. Defaults to a subdirectory of [base::tempdir()]. |
The returned connection object contains a '$disconnect()' closure that kills any tracked child SFTP processes, closes the ControlMaster, and removes the temporary directory. Register it in the calling frame with:
“'r conn <- et_sftp_connect(sharelink, host, port) on.exit(conn$disconnect(), add = TRUE, after = FALSE) “'
A named list (connection object) with elements:
Character vector of SSH '-o' flags for multiplexing.
String '"<sharelink>@<host>"'.
Path to the temporary directory.
The SFTP port string.
'function(pid)' — register a background SFTP PID for cleanup on disconnect.
'function(pids)' — deregister finished PIDs.
Parameterless function — kill tracked children, close the ControlMaster, and remove 'tmp_root'.
## Not run: conn <- et_sftp_connect( sharelink = Sys.getenv("ERDA_SHARELINK"), host = Sys.getenv("ERDA_SFTP_HOST", unset = "io.erda.au.dk"), port = Sys.getenv("ERDA_SFTP_PORT", unset = "2222") ) on.exit(conn$disconnect(), add = TRUE, after = FALSE) ## End(Not run)## Not run: conn <- et_sftp_connect( sharelink = Sys.getenv("ERDA_SHARELINK"), host = Sys.getenv("ERDA_SFTP_HOST", unset = "io.erda.au.dk"), port = Sys.getenv("ERDA_SFTP_PORT", unset = "2222") ) on.exit(conn$disconnect(), add = TRUE, after = FALSE) ## End(Not run)
Calls [et_sftp_list_folders()] and filters results to entries whose [base::basename()] matches the 'YYYY_MM_DD-HH_MM' pattern used by CamAlien onboard systems.
et_sftp_list_dated_subfolders(conn, country_path)et_sftp_list_dated_subfolders(conn, country_path)
conn |
Connection object returned by [et_sftp_connect()]. |
country_path |
Remote SFTP path to the country folder. |
Character vector of matching folder names (or full paths, depending on what the SFTP server returns).
## Not run: conn <- et_sftp_connect(Sys.getenv("ERDA_SHARELINK")) on.exit(conn$disconnect(), add = TRUE, after = FALSE) subfolders <- et_sftp_list_dated_subfolders(conn, "/storage/2025/portugal") ## End(Not run)## Not run: conn <- et_sftp_connect(Sys.getenv("ERDA_SHARELINK")) on.exit(conn$disconnect(), add = TRUE, after = FALSE) subfolders <- et_sftp_list_dated_subfolders(conn, "/storage/2025/portugal") ## End(Not run)
Runs 'ls -1 <remote_path>' over the SFTP connection and returns the non-blank, non-prompt lines.
et_sftp_list_folders(conn, remote_path)et_sftp_list_folders(conn, remote_path)
conn |
Connection object returned by [et_sftp_connect()]. |
remote_path |
Remote SFTP path to list. |
Character vector of folder/file names returned by the server.
## Not run: conn <- et_sftp_connect(Sys.getenv("ERDA_SHARELINK")) on.exit(conn$disconnect(), add = TRUE, after = FALSE) folders <- et_sftp_list_folders(conn, "/storage/2025/portugal") ## End(Not run)## Not run: conn <- et_sftp_connect(Sys.getenv("ERDA_SHARELINK")) on.exit(conn$disconnect(), add = TRUE, after = FALSE) folders <- et_sftp_list_folders(conn, "/storage/2025/portugal") ## End(Not run)
Each element of 'user_comment' is expected to be a JSON string of the form: “'json "gain": "1.4", "detections": ["1359658 Morus alba L. 9.97668 8.829123", ...] “' Each detection string has the layout: '<plantnet_id> <species tokens...> <probability> <score>'
parse_detections(user_comment)parse_detections(user_comment)
user_comment |
Character vector of UserComment EXIF values (one per image row). |
The function is designed to be used with [tidyr::unnest()] on a list column. NA, empty, or malformed entries yield a zero-row data frame.
A list of [tibble::tibble()]s, one per element of 'user_comment', each with columns 'plantnet_id' (integer), 'species' (character), 'probability' (double), 'score' (double).
Strips 'sftp>' prompt lines, blank lines, and '.'/'..' entries from the raw character vector returned by [et_sftp_batch()] when running 'ls -la <path>'. Parses the permission string to determine entry type.
parse_sftp_ls_la(raw_lines)parse_sftp_ls_la(raw_lines)
raw_lines |
Character vector of raw SFTP stdout lines. |
Each line must have at least 9 whitespace-separated fields: '<perms> <links> <owner> <group> <size> <month> <day> <time> <name...>'. Lines with fewer fields are silently dropped.
For symlinks ('lrwxrwxrwx'), the ' -> target' suffix is stripped from the name and the entry is treated as a regular file ('is_dir = FALSE').
Data frame with columns 'name' (character), 'is_dir' (logical), 'size' (integer – file size in bytes, 'NA' for directories), 'mtime' (character – last-modified timestamp from 'ls -la', fields 6-8 joined with a space, e.g. '"Jul 10 11:10"' or '"Jan 5 2024"').