This approach creates a query history layer using ClickHouse’s system.query_log. It captures executed queries, performance metrics, and Astrato metadata (passed via log_comment) and makes them queryable for product analytics.
Unlike aggregate monitoring, this gives row-level visibility into:
which dashboards are used
which users are active
which objects are queried
how queries perform
How it works
Astrato sends metadata with each query using:
SETTINGS log_comment = '{"type":"query",...}'ClickHouse stores this in:
system.query_log.log_comment
This JSON includes:
{
"type":"query",
"userId":"...",
"roleType":"owner",
"objectId":"...",
"sheetId":"...",
"workbookId":"...",
"workbookState":"draft"
}ClickHouse already tracks:
execution time
rows read
bytes scanned
memory usage
query text
So combining both = full usage + performance telemetry.
Base query (raw telemetry)
SELECT
event_time,
query_id,
user,
initial_user,
log_comment,
JSONExtractString(log_comment, 'type') AS meta_type,
JSONExtractString(log_comment, 'userId') AS meta_user_id,
JSONExtractString(log_comment, 'roleType') AS meta_role_type,
JSONExtractString(log_comment, 'objectId') AS meta_object_id,
JSONExtractString(log_comment, 'sheetId') AS meta_sheet_id,
JSONExtractString(log_comment, 'workbookId') AS meta_workbook_id,
JSONExtractString(log_comment, 'workbookState') AS meta_workbook_state,
JSONExtract(log_comment, 'fields', 'Array(String)') AS meta_fields,
JSONExtractRaw(log_comment, 'measures') AS meta_measures,
query_duration_ms,
read_rows,
read_bytes,
written_rows,
written_bytes,
result_rows,
result_bytes,
memory_usage,
query
FROM system.query_log
WHERE length(log_comment) > 20
AND user != 'monitoring-internal'
AND initial_user NOT LIKE 'sql-console:%'
AND type = 'QueryFinish'
ORDER BY event_time DESC
LIMIT 10000;
Optional: Persist into a table (recommended)
CREATE TABLE IF NOT EXISTS astrato.query_history
ENGINE = MergeTree
PARTITION BY toDate(event_time)
ORDER BY (event_time, meta_workbook_id, meta_sheet_id, meta_object_id)
AS
SELECT
event_time,
query_id,
user,
initial_user,
JSONExtractString(log_comment, 'type') AS type,
JSONExtractString(log_comment, 'userId') AS user_id,
JSONExtractString(log_comment, 'roleType') AS role_type,
JSONExtractString(log_comment, 'objectId') AS object_id,
JSONExtractString(log_comment, 'sheetId') AS sheet_id,
JSONExtractString(log_comment, 'workbookId') AS workbook_id,
JSONExtractString(log_comment, 'workbookState') AS workbook_state,
JSONExtract(log_comment, 'fields', 'Array(String)') AS fields,
JSONExtractRaw(log_comment, 'measures') AS measures,
query_duration_ms,
read_rows,
read_bytes,
result_rows,
result_bytes,
memory_usage
FROM system.query_log
WHERE 1 = 0;
Then insert incrementally:
INSERT INTO astrato.query_history
SELECT *
FROM (
SELECT
event_time,
query_id,
user,
initial_user,
JSONExtractString(log_comment, 'type') AS type,
JSONExtractString(log_comment, 'userId') AS user_id,
JSONExtractString(log_comment, 'roleType') AS role_type,
JSONExtractString(log_comment, 'objectId') AS object_id,
JSONExtractString(log_comment, 'sheetId') AS sheet_id,
JSONExtractString(log_comment, 'workbookId') AS workbook_id,
JSONExtractString(log_comment, 'workbookState') AS workbook_state,
JSONExtract(log_comment, 'fields', 'Array(String)') AS fields,
JSONExtractRaw(log_comment, 'measures') AS measures,
query_duration_ms,
read_rows,
read_bytes,
result_rows,
result_bytes,
memory_usage
FROM system.query_log
WHERE type = 'QueryFinish'
AND event_time >= now() - INTERVAL 1 DAY
AND length(log_comment) > 20
);
Enrichment (important)
Export lookup tables from:
https://app.astrato.io/administration/exports
You’ll need:
workbooks
sheets
objects
users
Join them to resolve IDs → names.
Example business questions
a) Most used workbooks
SELECT
workbook_id,
count() AS queries,
uniqExact(user_id) AS users
FROM astrato.query_history
WHERE event_time >= now() - INTERVAL 7 DAY
GROUP BY workbook_id
ORDER BY queries DESC;
b) Slowest sheets
SELECT
sheet_id,
avg(query_duration_ms) AS avg_ms,
count() AS queries
FROM astrato.query_history
WHERE event_time >= now() - INTERVAL 7 DAY
GROUP BY sheet_id
ORDER BY avg_ms DESC;
c) Most expensive objects (by data scanned)
SELECT
object_id,
sum(read_bytes) AS bytes_read,
count() AS queries
FROM astrato.query_history
WHERE event_time >= now() - INTERVAL 7 DAY
GROUP BY object_id
ORDER BY bytes_read DESC;
d) Active users
SELECT
user_id,
count() AS queries,
sum(read_bytes) AS bytes_read
FROM astrato.query_history
WHERE event_time >= now() - INTERVAL 30 DAY
GROUP BY user_id
ORDER BY queries DESC;
Key notes
system.query_logretention depends on ClickHouse config (ttl/flush_interval)use
SYSTEM FLUSH LOGSwhen testingalways filter
type = 'QueryFinish'to avoid partial querieslog_commentmust be valid JSON for extraction to workavoid internal/system users (
monitoring-internal,sql-console:%)
Why this matters
You now have:
product analytics (who uses what)
performance analytics (how fast, how heavy)
full traceability (query → workbook → object → user)
All directly from your warehouse, no extra tracking needed.

