Keyword phrases is an advanced feature that can be used to parse through text data or get around difficult aggregations in many-to-one relationships. This feature operates by going through all of the text data provided, breaking the text apart by the supplied delimiter, and then treating each unique string (n-gram) as its own factor.
1. This can be set up in Sisu from the Edit Columns dropdown. From here, select a string field.
2. Select Split into keyword phrases. You will be presented input boxes for three options:
- Token Delimiter: This is the character that defines what separates the values that need to be parsed. In raw text data, this would be a space (" "). In aggregated lists, this would be whatever the defined delimiter was.
- Max Words per Phrase: This defines the number of values/words/n-grams that will be combined into a single Sisu factor. e.g., "The quick cat ran" broken into 3 max words would produce "The quick cat" and "quick cat ran", as well as all 2-word and 1-word values.
- Min Words per Phrase: This defines the minimum number of values/words/n-grams that will be combined into a single Sisu factor.
There are two situations that Keyword is designed for:
- Raw Text: Use this to parse through raw text to determine the most impactful n-grams. e.g. if you have a dataset that contains ticket descriptions from users with issues about Product A, using keyword can determine that the combination "lid snapped" is particularly impactful on low CSAT scores.
- Many to one aggregations: When dealing with data of different granularity, it is difficult to aggregate string data without losing analytic capability. Keyword helps this problem by retaining all of the lower granularity data in the form of a list. The below dataset is an example of using LISTAGG to retain all the hypothetical shows that a user watched even though the dataset is at a higher grain. Keyword can be used to parse through all the lower grain data and determine if a specific lower grain factor is impacting the metric.
See more: Editing an analysis