Skip to main content

Performance & Usage Telemetry: Store Astrato activity in Databricks

This solution leverages Databricks query logs to create a persistent query history dataset, enabling teams to analyse performance, cost, and analytics usage at the level of users, dashboards, and business context.

Accessing Astrato query metadata in Databricks

In Databricks, queries executed by Astrato are logged in the system.query.history table. You can use Databricks' regexp_extract function to isolate the JSON string from the statement_text, and then use the : operator (or from_json) to parse out specific fields.

Example Databricks SQL Query:

SQL

WITH RawAstratoQueries AS (
SELECT
statement_text,
execution_time_ms,
total_task_duration_ms,
spilled_local_bytes,
-- 1. Extract the JSON payload from the first line of the query comment
regexp_extract(statement_text, '^-- (\\{.*?\\})', 1) AS astrato_metadata_json
FROM system.query.history
-- Filter for queries that start with an Astrato metadata comment
WHERE statement_text LIKE '-- {"type":%'
AND start_time > current_date() - 7
)
SELECT
-- 2. Parse out specific fields using Databricks JSON extraction
astrato_metadata_json:type::string AS astrato_query_type,
astrato_metadata_json:workbookId::string AS astrato_workbook_id,
astrato_metadata_json:userId::string AS astrato_user_id,
astrato_metadata_json:workbookState::string AS workbook_state,

-- Include Databricks performance metrics for correlation
execution_time_ms,
total_task_duration_ms,
spilled_local_bytes,
statement_text
FROM RawAstratoQueries
ORDER BY execution_time_ms DESC;

Using this query, Databricks administrators can easily build out Databricks SQL dashboards monitoring Astrato cost and performance, sorted by the heaviest workloads. We recommend persisting this to improve performance.

Did this answer your question?