Word Cloud
Piers Batchelor avatar
Written by Piers Batchelor
Updated over a week ago

Introduction

This article looks at Astrato's Word Cloud visualization.

A Word cloud data object consists of a textual dimension and numeric measures,
The word cloud gives users a complete view of all the values in the dimension and highlights the significance of each one of them by giving each record a different text size according to its measure result.

Word Cloud's common use cases are:

  • Social media analytics

  • Marketing search engine analytics

  • Feedback and Survey Analytics
    ​


Building a Word Cloud

Add a Word Cloud

Word cloud can be found in the visualization tile, it positioned in the bottom of the visualization list
​

Define a Word Cloud

  • Add one dimension and one measure in the Data tab of the properties panel.
    ​

Style a Word Cloud

  • Set Font family and style
    ​

  • Scaling- Select the scaling method out of these four options:

    • Sqrt - Square root (default)

    • Linear

    • Logarithmic

    • Squared
      ​

  • Max chars- set the maximum number of characters to show for each dimension record, 40 is the default value.
    ​

  • Word Layout Seed - click on the refresh icon or enter a number in the box to try different word placements.
    ​

  • Color - Select the method of coloring the chart out of these three options:

    • Single - only one color for all words

    • Categorical- color the words by the dimension order

    • Gradient - color the words by the measure result from lowest to highest
      ​

  • Conditional coloring - add specific conditions on dimension or measure values and how to color them


Prepare data for use in a wordcloud

This guide assumes you have a long text field, and are looking to break down the text into clean, individual words.

We recommend one of two approaches, depending on your use case.

  1. You want a wordcloud overview, most likely for static reporting, where drill-down is not needed.

  2. You want a super-powerful wordcloud and are happy to have some extra compute and storage used, to have a limitless drill-down wordcloud. [Recommended]

Pre-aggregated wordcloud. This is more efficient but should be disconnected from the data model, the result looks like this:


select *
FROM
(
WITH
/* Split each message into individual words */
word_split AS (
SELECT
VALUE::STRING AS word
FROM
ASTRATO_DEMO.PUBLIC.ASTRATO_DEMO, //πŸ”΄ Update the table name, currently it is `ASTRATO_DEMO.PUBLIC.ASTRATO_DEMO`
LATERAL FLATTEN(INPUT => SPLIT(message, ' ')) //πŸ”΄ Update the field name, currently it is `message`
),
/* Remove any non-alphabetic characters from words and filter out empty strings */
cleaned_words AS (
SELECT
REGEXP_REPLACE(word, '[^a-zA-Z]', '') AS cleaned_word
FROM
word_split
WHERE
LENGTH(word) > 0
)
/* Aggregate and count word occurrences */
SELECT
cleaned_word, COUNT(*) AS frequency
FROM
cleaned_words
WHERE
LENGTH(cleaned_word) > 0
GROUP BY
cleaned_word
ORDER BY
COUNT(*) DESC limit 300

) AS subquery
where cleaned_word is not null
;

Comment-level/Row-level wordcloud data. This produces many rows, can be connected to the data model, ideally as a dynamic table if you are using Snowflake. Otherwise use a view to ensure that this updates along with the parent table. The result, with the CommentID looks like this:

WITH 
/* Split each message into individual words, including the ID */
word_split AS (
SELECT
_ID as CommentID, //πŸ”΄ Update with the actual column name for the ID
VALUE::STRING AS word
FROM
ASTRATO_DEMO.PUBLIC.ASTRATO_DEMO, //πŸ”΄ Update the table name if needed
LATERAL FLATTEN(INPUT => SPLIT(message, ' ')) //πŸ”΄ Update `message` if the column name is different
),
/* Remove any non-alphabetic characters from words, filter out empty strings, and include the ID */
cleaned_words AS (
SELECT
CommentID,
REGEXP_REPLACE(word, '[^a-zA-Z]', '') AS cleaned_word
FROM
word_split
WHERE
LENGTH(word) > 0
)
/* Aggregate and count word occurrences, grouped by ID and word */
SELECT
CommentID,
cleaned_word,
COUNT(*) AS frequency
FROM
cleaned_words
WHERE
LENGTH(cleaned_word) > 0

GROUP BY
CommentID, cleaned_word
ORDER BY
CommentID, COUNT(*) DESC;

Stopwords

See below for a CSV containing words that do not represent topics or themes, example: and, the, a, I, am.
​
If you prefer to add it directly into a single query rather than adding in a new table for stopwords, just add the statement below into your SQL statement.

        and lower(cleaned_word) not in('dont','will','now','im','didnt','de','x', 'y', 'your', 'yours', 'yourself', 'yourselves', 'you', 'yond', 'yonder', 'yon', 'ye', 'yet', 'z', 'zillion', 'j', 'u', 'umpteen', 'usually', 'us', 'username', 'uponed', 'upons', 'uponing', 'upon', 'ups', 'upping', 'upped', 'up', 'unto', 'until', 'unless', 'unlike', 'unliker', 'unlikest', 'under', 'underneath', 'use', 'used', 'usedest', 'r', 'rath', 'rather', 'rathest', 'rathe', 're', 'relate', 'related', 'relatively', 'regarding', 'really', 'res', 'respecting', 'respectively', 'q', 'quite', 'que', 'qua', 'n', 'neither', 'neath', 'neaths', 'neath', 'nethe', 'nethermost', 'necessary', 'necessariest', 'necessarier', 'never', 'nevertheless', 'nigh', 'nighest', 'nigher', 'nine', 'noone', 'nobody', 'nobodies', 'nowhere', 'nowheres', 'no', 'noes', 'nor', 'nos', 'no-one', 'none', 'not', 'notwithstanding', 'nothings', 'nothing', 'nathless', 'natheless', 't', 'ten', 'tills', 'till', 'tilled', 'tilling', 'to', 'towards', 'toward', 'towardest', 'towarder', 'together', 'too', 'thy', 'thyself', 'thus', 'than', 'that', 'those', 'thou', 'though', 'thous', 'thouses', 'thoroughest', 'thorougher', 'thorough', 'thoroughly', 'thru', 'thruer', 'thruest', 'thro', 'through', 'throughout', 'throughest', 'througher', 'thine', 'this', 'thises', 'they', 'thee', 'the', 'then', 'thence', 'thenest', 'thener', 'them', 'themselves', 'these', 'therer', 'there', 'thereby', 'therest', 'thereafter', 'therein', 'thereupon', 'therefore', 'their', 'theirs', 'thing', 'things', 'three', 'two', 'o', 'oh', 'owt', 'owning', 'owned', 'own', 'owns', 'others', 'other', 'otherwise', 'otherwisest', 'otherwiser', 'of', 'often', 'oftener', 'oftenest', 'off', 'offs', 'offest', 'one', 'ought', 'oughts', 'our', 'ours', 'ourselves', 'ourself', 'out', 'outest', 'outed', 'outwith', 'outs', 'outside', 'over', 'overallest', 'overaller', 'overalls', 'overall', 'overs', 'or', 'orer', 'orest', 'on', 'oneself', 'onest', 'ons', 'onto', 'a', 'atween', 'at', 'athwart', 'atop', 'afore', 'afterward', 'afterwards', 'after', 'afterest', 'afterer', 'ain', 'an', 'any', 'anything', 'anybody', 'anyone', 'anyhow', 'anywhere', 'anent', 'anear', 'and', 'andor', 'another', 'around', 'ares', 'are', 'aest', 'aer', 'against', 'again', 'accordingly', 'abaft', 'abafter', 'abaftest', 'abovest', 'above', 'abover', 'abouter', 'aboutest', 'about', 'aid', 'amidst', 'amid', 'among', 'amongst', 'apartest', 'aparter', 'apart', 'appeared', 'appears', 'appear', 'appearing', 'appropriating', 'appropriate', 'appropriatest', 'appropriates', 'appropriater', 'appropriated', 'already', 'always', 'also', 'along', 'alongside', 'although', 'almost', 'all', 'allest', 'aller', 'allyou', 'alls', 'albeit', 'awfully', 'as', 'aside', 'asides', 'aslant', 'ases', 'astrider', 'astride', 'astridest', 'astraddlest', 'astraddler', 'astraddle', 'availablest', 'availabler', 'available', 'aughts', 'aught', 'vs', 'v', 'variousest', 'variouser', 'various', 'via', 'vis-a-vis', 'vis-a-viser', 'vis-a-visest', 'viz', 'very', 'veriest', 'verier', 'versus', 'k', 'g', 'go', 'gone', 'good', 'got', 'gotta', 'gotten', 'get', 'gets', 'getting', 'b', 'by', 'byandby', 'by-and-by', 'bist', 'both', 'but', 'buts', 'be', 'beyond', 'because', 'became', 'becomes', 'become', 'becoming', 'becomings', 'becominger', 'becomingest', 'behind', 'behinds', 'before', 'beforehand', 'beforehandest', 'beforehander', 'bettered', 'betters', 'better', 'bettering', 'betwixt', 'between', 'beneath', 'been', 'below', 'besides', 'beside', 'm', 'my', 'myself', 'mucher', 'muchest', 'much', 'must', 'musts', 'musths', 'musth', 'main', 'make', 'mayest', 'many', 'mauger', 'maugre', 'me', 'meanwhiles', 'meanwhile', 'mostly', 'most', 'moreover', 'more', 'might', 'mights', 'midst', 'midsts', 'h', 'huh', 'humph', 'he', 'hers', 'herself', 'her', 'hereby', 'herein', 'hereafters', 'hereafter', 'hereupon', 'hence', 'hadst', 'had', 'having', 'haves', 'have', 'has', 'hast', 'hardly', 'hae', 'hath', 'him', 'himself', 'hither', 'hitherest', 'hitherer', 'his', 'how-do-you-do', 'however', 'how', 'howbeit', 'howdoyoudo', 'hoos', 'hoo', 'w', 'woulded', 'woulding', 'would', 'woulds', 'was', 'wast', 'we', 'wert', 'were', 'with', 'withal', 'without', 'within', 'why', 'what', 'whatever', 'whateverer', 'whateverest', 'whatsoeverer', 'whatsoeverest', 'whatsoever', 'whence', 'whencesoever', 'whenever', 'whensoever', 'when', 'whenas', 'whether', 'wheen', 'whereto', 'whereupon', 'wherever', 'whereon', 'whereof', 'where', 'whereby', 'wherewithal', 'wherewith', 'whereinto', 'wherein', 'whereafter', 'whereas', 'wheresoever', 'wherefrom', 'which', 'whichever', 'whichsoever', 'whilst', 'while', 'whiles', 'whithersoever', 'whither', 'whoever', 'whosoever', 'whoso', 'whose', 'whomever', 's', 'syne', 'syn', 'shalling', 'shall', 'shalled', 'shalls', 'shoulding', 'should', 'shoulded', 'shoulds', 'she', 'sayyid', 'sayid', 'said', 'saider', 'saidest', 'same', 'samest', 'sames', 'samer', 'saved', 'sans', 'sanses', 'sanserifs', 'sanserif', 'so', 'soer', 'soest', 'sobeit', 'someone', 'somebody', 'somehow', 'some', 'somewhere', 'somewhat', 'something', 'sometimest', 'sometimes', 'sometimer', 'sometime', 'several', 'severaler', 'severalest', 'serious', 'seriousest', 'seriouser', 'senza', 'send', 'sent', 'seem', 'seems', 'seemed', 'seemingest', 'seeminger', 'seemings', 'seven', 'summat', 'sups', 'sup', 'supping', 'supped', 'such', 'since', 'sine', 'sines', 'sith', 'six', 'stop', 'stopped', 'p', 'plaintiff', 'plenty', 'plenties', 'please', 'pleased', 'pleases', 'per', 'perhaps', 'particulars', 'particularly', 'particular', 'particularest', 'particularer', 'pro', 'providing', 'provides', 'provided', 'provide', 'probably', 'l', 'layabout', 'layabouts', 'latter', 'latterest', 'latterer', 'latterly', 'latters', 'lots', 'lotting', 'lotted', 'lot', 'lest', 'less', 'ie', 'ifs', 'if', 'i', 'info', 'information', 'itself', 'its', 'it', 'is', 'idem', 'idemer', 'idemest', 'immediate', 'immediately', 'immediatest', 'immediater', 'in', 'inwards', 'inwardest', 'inwarder', 'inward', 'inasmuch', 'into', 'instead', 'insofar', 'indicates', 'indicated', 'indicate', 'indicating', 'indeed', 'inc', 'f', 'fact', 'facts', 'fs', 'figupon', 'figupons', 'figuponing', 'figuponed', 'few', 'fewer', 'fewest', 'frae', 'from', 'failing', 'failings', 'five', 'furthers', 'furtherer', 'furthered', 'furtherest', 'further', 'furthering', 'furthermore', 'fourscore', 'followthrough', 'for', 'forwhy', 'fornenst', 'formerly', 'former', 'formerer', 'formerest', 'formers', 'forbye', 'forby', 'fore', 'forever', 'forer', 'fores', 'four', 'd', 'ddays', 'dday', 'do', 'doing', 'doings', 'doe', 'does', 'doth', 'downwarder', 'downwardest', 'downward', 'downwards', 'downs', 'done', 'doner', 'dones', 'donest', 'dos', 'dost', 'did', 'differentest', 'differenter', 'different', 'describing', 'describe', 'describes', 'described', 'despiting', 'despites', 'despited', 'despite', 'during', 'c', 'cum', 'circa', 'chez', 'cer', 'certain', 'certainest', 'certainer', 'cest', 'canst', 'cannot', 'cant', 'cants', 'canting', 'cantest', 'canted', 'co', 'could', 'couldst', 'comeon', 'comeons', 'come-ons', 'come-on', 'concerning', 'concerninger', 'concerningest', 'consequently', 'considering', 'e', 'eg', 'eight', 'either', 'even', 'evens', 'evenser', 'evensest', 'evened', 'evenest', 'ever', 'everyone', 'everything', 'everybody', 'everywhere', 'every', 'ere', 'each', 'et', 'etc', 'elsewhere', 'else', 'ex', 'excepted', 'excepts', 'except', 'excepting', 'exes', 'enough')

Attachment icon
Did this answer your question?