Performance & Usage Telemetry: Store Astrato activity in Databricks

Accessing Astrato query metadata in Databricks

In Databricks, queries executed by Astrato are logged in the system.query.history table. You can use Databricks' regexp_extract function to isolate the JSON string from the statement_text, and then use the : operator (or from_json) to parse out specific fields.

Example Databricks SQL Query:

SQL

WITH RawAstratoQueries AS (
  SELECT 
    statement_text,
    execution_time_ms,
    total_task_duration_ms,
    spilled_local_bytes,
    -- 1. Extract the JSON payload from the first line of the query comment
    regexp_extract(statement_text, ''-- (\\{.*\\})'', 1) AS astrato_metadata_json
  FROM system.query.history
  -- Filter for queries that start with an Astrato metadata comment
  WHERE statement_text LIKE '-- {"type":%' 
    AND start_time > current_date() - 7
)
SELECT 
  -- 2. Parse out specific fields using Databricks JSON extraction
  astrato_metadata_json:type::string AS astrato_query_type,
  astrato_metadata_json:workbookId::string AS astrato_workbook_id,
  astrato_metadata_json:userId::string AS astrato_user_id,
  astrato_metadata_json:workbookState::string AS workbook_state,
  
  -- Include Databricks performance metrics for correlation
  execution_time_ms,
  total_task_duration_ms,
  spilled_local_bytes,
  statement_text
FROM RawAstratoQueries
ORDER BY execution_time_ms DESC;

Using this query, Databricks administrators can easily build out Databricks SQL dashboards monitoring Astrato cost and performance, sorted by the heaviest workloads. We recommend persisting this to improve performance.