Accessing Astrato query metadata in Databricks
In Databricks, queries executed by Astrato are logged in the system.query.history table. You can use Databricks' regexp_extract function to isolate the JSON string from the statement_text, and then use the : operator (or from_json) to parse out specific fields.
Example Databricks SQL Query:
SQL
WITH RawAstratoQueries AS (
SELECT
statement_text,
execution_time_ms,
total_task_duration_ms,
spilled_local_bytes,
-- 1. Extract the JSON payload from the first line of the query comment
regexp_extract(statement_text, '^-- (\\{.*?\\})', 1) AS astrato_metadata_json
FROM system.query.history
-- Filter for queries that start with an Astrato metadata comment
WHERE statement_text LIKE '-- {"type":%'
AND start_time > current_date() - 7
)
SELECT
-- 2. Parse out specific fields using Databricks JSON extraction
astrato_metadata_json:type::string AS astrato_query_type,
astrato_metadata_json:workbookId::string AS astrato_workbook_id,
astrato_metadata_json:userId::string AS astrato_user_id,
astrato_metadata_json:workbookState::string AS workbook_state,
-- Include Databricks performance metrics for correlation
execution_time_ms,
total_task_duration_ms,
spilled_local_bytes,
statement_text
FROM RawAstratoQueries
ORDER BY execution_time_ms DESC;
Using this query, Databricks administrators can easily build out Databricks SQL dashboards monitoring Astrato cost and performance, sorted by the heaviest workloads. We recommend persisting this to improve performance.
