Introduction

This article looks at Astrato's Word Cloud visualization.

Building a Word Cloud
Add a Word Cloud
Define a Word Cloud
Style a Word Cloud
Prepare data for use in a wordcloud

A Word cloud data object consists of a textual dimension and numeric measures,
The word cloud gives users a complete view of all the values in the dimension and highlights the significance of each one of them by giving each record a different text size according to its measure result.

Open the interactive demo

Word Cloud's common use cases are:

Social media analytics
Marketing search engine analytics
Feedback and Survey Analytics

Building a Word Cloud

Add a Word Cloud

Word cloud can be found in the visualization tile, it positioned in the bottom of the visualization list

Define a Word Cloud

Add one dimension and one measure in the Data tab of the properties panel.

Style a Word Cloud

Set Font family and style
Scaling- Select the scaling method out of these four options:
- Sqrt - Square root (default)
- Linear
- Logarithmic
- Squared
Max chars- set the maximum number of characters to show for each dimension record, 40 is the default value.
Word Layout Seed - click on the refresh icon or enter a number in the box to try different word placements.
Color - Select the method of coloring the chart out of these three options:
- Single - only one color for all words
- Categorical- color the words by the dimension order
- Gradient - color the words by the measure result from lowest to highest
Conditional coloring - add specific conditions on dimension or measure values and how to color them

Prepare data for use in a wordcloud

This guide assumes you have a long text field, and are looking to break down the text into clean, individual words.

We recommend one of two approaches, depending on your use case.

You want a wordcloud overview, most likely for static reporting, where drill-down is not needed.
You want a super-powerful wordcloud and are happy to have some extra compute and storage used, to have a limitless drill-down wordcloud. [Recommended]

Pre-aggregated wordcloud. This is more efficient but should be disconnected from the data model, the result looks like this:


select * 
FROM 
    (
                WITH 
                /* Split each message into individual words */
                word_split AS (
                    SELECT
                        VALUE::STRING AS word
                    FROM 
                        ASTRATO_DEMO.PUBLIC.ASTRATO_DEMO,   //🔴 Update the table name, currently it is `ASTRATO_DEMO.PUBLIC.ASTRATO_DEMO`
                        LATERAL FLATTEN(INPUT => SPLIT(message, ' '))  //🔴 Update the field name, currently it is `message`
                ),
                /* Remove any non-alphabetic characters from words and filter out empty strings */
                cleaned_words AS (
                    SELECT
                        REGEXP_REPLACE(word, '[^a-zA-Z]', '') AS cleaned_word
                    FROM
                        word_split
                    WHERE
                        LENGTH(word) > 0
                )
                /* Aggregate and count word occurrences */
                SELECT
                     cleaned_word,        COUNT(*) AS frequency
                FROM
                    cleaned_words
                WHERE
                    LENGTH(cleaned_word) > 0
                GROUP BY
                    cleaned_word
                ORDER BY
                    COUNT(*) DESC limit 300
    
    ) AS subquery
where cleaned_word is not null
;

Comment-level/Row-level wordcloud data. This produces many rows, can be connected to the data model, ideally as a dynamic table if you are using Snowflake. Otherwise use a view to ensure that this updates along with the parent table. The result, with the CommentID looks like this:

WITH 
    /* Split each message into individual words, including the ID */
    word_split AS (
        SELECT
            _ID as CommentID,  //🔴 Update with the actual column name for the ID
            VALUE::STRING AS word
        FROM 
            ASTRATO_DEMO.PUBLIC.ASTRATO_DEMO,   //🔴 Update the table name if needed
            LATERAL FLATTEN(INPUT => SPLIT(message, ' '))  //🔴 Update `message` if the column name is different
    ),
    /* Remove any non-alphabetic characters from words, filter out empty strings, and include the ID */
    cleaned_words AS (
        SELECT
            CommentID,
            REGEXP_REPLACE(word, '[^a-zA-Z]', '') AS cleaned_word
        FROM
            word_split
        WHERE
            LENGTH(word) > 0
    )
    /* Aggregate and count word occurrences, grouped by ID and word */
    SELECT
         CommentID,
         cleaned_word,
         COUNT(*) AS frequency
    FROM
        cleaned_words
    WHERE
        LENGTH(cleaned_word) > 0

    GROUP BY
        CommentID, cleaned_word
    ORDER BY
        CommentID, COUNT(*) DESC;

Stopwords

See below for a CSV containing words that do not represent topics or themes, example: and, the, a, I, am.

If you prefer to add it directly into a single query rather than adding in a new table for stopwords, just add the statement below into your SQL statement.

stopwords.csv

        and lower(cleaned_word) not in('dont','will','now','im','didnt','de','x', 'y', 'your', 'yours', 'yourself', 'yourselves', 'you', 'yond', 'yonder', 'yon', 'ye', 'yet', 'z', 'zillion', 'j', 'u', 'umpteen', 'usually', 'us', 'username', 'uponed', 'upons', 'uponing', 'upon', 'ups', 'upping', 'upped', 'up', 'unto', 'until', 'unless', 'unlike', 'unliker', 'unlikest', 'under', 'underneath', 'use', 'used', 'usedest', 'r', 'rath', 'rather', 'rathest', 'rathe', 're', 'relate', 'related', 'relatively', 'regarding', 'really', 'res', 'respecting', 'respectively', 'q', 'quite', 'que', 'qua', 'n', 'neither', 'neath', 'neaths', 'neath', 'nethe', 'nethermost', 'necessary', 'necessariest', 'necessarier', 'never', 'nevertheless', 'nigh', 'nighest', 'nigher', 'nine', 'noone', 'nobody', 'nobodies', 'nowhere', 'nowheres', 'no', 'noes', 'nor', 'nos', 'no-one', 'none', 'not', 'notwithstanding', 'nothings', 'nothing', 'nathless', 'natheless', 't', 'ten', 'tills', 'till', 'tilled', 'tilling', 'to', 'towards', 'toward', 'towardest', 'towarder', 'together', 'too', 'thy', 'thyself', 'thus', 'than', 'that', 'those', 'thou', 'though', 'thous', 'thouses', 'thoroughest', 'thorougher', 'thorough', 'thoroughly', 'thru', 'thruer', 'thruest', 'thro', 'through', 'throughout', 'throughest', 'througher', 'thine', 'this', 'thises', 'they', 'thee', 'the', 'then', 'thence', 'thenest', 'thener', 'them', 'themselves', 'these', 'therer', 'there', 'thereby', 'therest', 'thereafter', 'therein', 'thereupon', 'therefore', 'their', 'theirs', 'thing', 'things', 'three', 'two', 'o', 'oh', 'owt', 'owning', 'owned', 'own', 'owns', 'others', 'other', 'otherwise', 'otherwisest', 'otherwiser', 'of', 'often', 'oftener', 'oftenest', 'off', 'offs', 'offest', 'one', 'ought', 'oughts', 'our', 'ours', 'ourselves', 'ourself', 'out', 'outest', 'outed', 'outwith', 'outs', 'outside', 'over', 'overallest', 'overaller', 'overalls', 'overall', 'overs', 'or', 'orer', 'orest', 'on', 'oneself', 'onest', 'ons', 'onto', 'a', 'atween', 'at', 'athwart', 'atop', 'afore', 'afterward', 'afterwards', 'after', 'afterest', 'afterer', 'ain', 'an', 'any', 'anything', 'anybody', 'anyone', 'anyhow', 'anywhere', 'anent', 'anear', 'and', 'andor', 'another', 'around', 'ares', 'are', 'aest', 'aer', 'against', 'again', 'accordingly', 'abaft', 'abafter', 'abaftest', 'abovest', 'above', 'abover', 'abouter', 'aboutest', 'about', 'aid', 'amidst', 'amid', 'among', 'amongst', 'apartest', 'aparter', 'apart', 'appeared', 'appears', 'appear', 'appearing', 'appropriating', 'appropriate', 'appropriatest', 'appropriates', 'appropriater', 'appropriated', 'already', 'always', 'also', 'along', 'alongside', 'although', 'almost', 'all', 'allest', 'aller', 'allyou', 'alls', 'albeit', 'awfully', 'as', 'aside', 'asides', 'aslant', 'ases', 'astrider', 'astride', 'astridest', 'astraddlest', 'astraddler', 'astraddle', 'availablest', 'availabler', 'available', 'aughts', 'aught', 'vs', 'v', 'variousest', 'variouser', 'various', 'via', 'vis-a-vis', 'vis-a-viser', 'vis-a-visest', 'viz', 'very', 'veriest', 'verier', 'versus', 'k', 'g', 'go', 'gone', 'good', 'got', 'gotta', 'gotten', 'get', 'gets', 'getting', 'b', 'by', 'byandby', 'by-and-by', 'bist', 'both', 'but', 'buts', 'be', 'beyond', 'because', 'became', 'becomes', 'become', 'becoming', 'becomings', 'becominger', 'becomingest', 'behind', 'behinds', 'before', 'beforehand', 'beforehandest', 'beforehander', 'bettered', 'betters', 'better', 'bettering', 'betwixt', 'between', 'beneath', 'been', 'below', 'besides', 'beside', 'm', 'my', 'myself', 'mucher', 'muchest', 'much', 'must', 'musts', 'musths', 'musth', 'main', 'make', 'mayest', 'many', 'mauger', 'maugre', 'me', 'meanwhiles', 'meanwhile', 'mostly', 'most', 'moreover', 'more', 'might', 'mights', 'midst', 'midsts', 'h', 'huh', 'humph', 'he', 'hers', 'herself', 'her', 'hereby', 'herein', 'hereafters', 'hereafter', 'hereupon', 'hence', 'hadst', 'had', 'having', 'haves', 'have', 'has', 'hast', 'hardly', 'hae', 'hath', 'him', 'himself', 'hither', 'hitherest', 'hitherer', 'his', 'how-do-you-do', 'however', 'how', 'howbeit', 'howdoyoudo', 'hoos', 'hoo', 'w', 'woulded', 'woulding', 'would', 'woulds', 'was', 'wast', 'we', 'wert', 'were', 'with', 'withal', 'without', 'within', 'why', 'what', 'whatever', 'whateverer', 'whateverest', 'whatsoeverer', 'whatsoeverest', 'whatsoever', 'whence', 'whencesoever', 'whenever', 'whensoever', 'when', 'whenas', 'whether', 'wheen', 'whereto', 'whereupon', 'wherever', 'whereon', 'whereof', 'where', 'whereby', 'wherewithal', 'wherewith', 'whereinto', 'wherein', 'whereafter', 'whereas', 'wheresoever', 'wherefrom', 'which', 'whichever', 'whichsoever', 'whilst', 'while', 'whiles', 'whithersoever', 'whither', 'whoever', 'whosoever', 'whoso', 'whose', 'whomever', 's', 'syne', 'syn', 'shalling', 'shall', 'shalled', 'shalls', 'shoulding', 'should', 'shoulded', 'shoulds', 'she', 'sayyid', 'sayid', 'said', 'saider', 'saidest', 'same', 'samest', 'sames', 'samer', 'saved', 'sans', 'sanses', 'sanserifs', 'sanserif', 'so', 'soer', 'soest', 'sobeit', 'someone', 'somebody', 'somehow', 'some', 'somewhere', 'somewhat', 'something', 'sometimest', 'sometimes', 'sometimer', 'sometime', 'several', 'severaler', 'severalest', 'serious', 'seriousest', 'seriouser', 'senza', 'send', 'sent', 'seem', 'seems', 'seemed', 'seemingest', 'seeminger', 'seemings', 'seven', 'summat', 'sups', 'sup', 'supping', 'supped', 'such', 'since', 'sine', 'sines', 'sith', 'six', 'stop', 'stopped', 'p', 'plaintiff', 'plenty', 'plenties', 'please', 'pleased', 'pleases', 'per', 'perhaps', 'particulars', 'particularly', 'particular', 'particularest', 'particularer', 'pro', 'providing', 'provides', 'provided', 'provide', 'probably', 'l', 'layabout', 'layabouts', 'latter', 'latterest', 'latterer', 'latterly', 'latters', 'lots', 'lotting', 'lotted', 'lot', 'lest', 'less', 'ie', 'ifs', 'if', 'i', 'info', 'information', 'itself', 'its', 'it', 'is', 'idem', 'idemer', 'idemest', 'immediate', 'immediately', 'immediatest', 'immediater', 'in', 'inwards', 'inwardest', 'inwarder', 'inward', 'inasmuch', 'into', 'instead', 'insofar', 'indicates', 'indicated', 'indicate', 'indicating', 'indeed', 'inc', 'f', 'fact', 'facts', 'fs', 'figupon', 'figupons', 'figuponing', 'figuponed', 'few', 'fewer', 'fewest', 'frae', 'from', 'failing', 'failings', 'five', 'furthers', 'furtherer', 'furthered', 'furtherest', 'further', 'furthering', 'furthermore', 'fourscore', 'followthrough', 'for', 'forwhy', 'fornenst', 'formerly', 'former', 'formerer', 'formerest', 'formers', 'forbye', 'forby', 'fore', 'forever', 'forer', 'fores', 'four', 'd', 'ddays', 'dday', 'do', 'doing', 'doings', 'doe', 'does', 'doth', 'downwarder', 'downwardest', 'downward', 'downwards', 'downs', 'done', 'doner', 'dones', 'donest', 'dos', 'dost', 'did', 'differentest', 'differenter', 'different', 'describing', 'describe', 'describes', 'described', 'despiting', 'despites', 'despited', 'despite', 'during', 'c', 'cum', 'circa', 'chez', 'cer', 'certain', 'certainest', 'certainer', 'cest', 'canst', 'cannot', 'cant', 'cants', 'canting', 'cantest', 'canted', 'co', 'could', 'couldst', 'comeon', 'comeons', 'come-ons', 'come-on', 'concerning', 'concerninger', 'concerningest', 'consequently', 'considering', 'e', 'eg', 'eight', 'either', 'even', 'evens', 'evenser', 'evensest', 'evened', 'evenest', 'ever', 'everyone', 'everything', 'everybody', 'everywhere', 'every', 'ere', 'each', 'et', 'etc', 'elsewhere', 'else', 'ex', 'excepted', 'excepts', 'except', 'excepting', 'exes', 'enough')

stopwords.csv

Table Visualization