Sections in this category

Managing Columns for a KDA

  • Updated

By default, Sisu performs an analysis on all columns contained within the dataset. However, In certain cases, some columns are not relevant to the Analysis being constructed. These columns can be removed, making the Analysis more manageable and clear.

 

Related Articles:



Managing Columns for Analysis

  1. Go to the Projects module, and select the appropriate Project.
  2. In the Analysis menu, select the Analysis that contains the columns you want to manage.
  3. Click Manage Columns.
    The Edit columns screen will be displayed showing all available columns from your data and their data type.

    Click_Manage_Columns.png

  4. Uncheck the box for any column you wish to exclude from the Analysis
  5. Select a column in the list to display its settings, which will differ depending on the data type as described below.
  6. When you have finished editing a column or columns, click Save and re-run your analysis, if desired.
tip_icon_-_small.png If you are making changes to multiple columns, you only need to click Save once when you’re done.

 

info_icon.png

For details about changing and managing text-based columns, refer to Transforming a Column: Keyword Analysis.

For details about changing and managing numeric-based columns, refer to Transforming a Column: Binning Numerical Columns.

 

Advanced Settings

Click Top Drivers to display additional options that you can change for any column, no matter the type:

Click_Top_Drivers.png

 

Statistical model settings

Selecting “Top drivers” will include the top drivers that meet the Confidence threshold indicated in the Confidence field. Subgroups are compared against each other to select the ones that best explain the Metric.

Selecting “All subgroups” includes up to 10,000 of the top subgroups by their impact on the Metric or Metric average value on the subgroup, for columns you have selected. The sisu statistical model is disabled with this selection.

Refer to Understanding Sisu's Statistical Models for a description of each model.

Confidence

Indicate the desired confidence threshold as a percentage.
Sisu tests for statistical significance for all the subgroups within the data set and only surfaces ones that pass the test. By default, the confidence level is set at 95% . 

If you do not see sufficient facts in the results, this may be because

  1. Your Metric did not change significantly and so no factors in your data explain it
  2. Your Metric did change, but the factors do not explain why
  3. You have a smaller data set (<X Rows) 

In these cases you may lower the confidence level to see if you get more facts (but at a lower confidence).


Refer to Understanding Sisu's Statistical Models for more details on how this is determined for different types of Analysis.

Minimum subgroup size

This is the minimum size (percentage of rows) that a subgroup must have for it to show up as a fact.

  • For example, if there are 100,000 rows in the data set, a subgroup will appear as a fact only if it contains at least 100 rows (0.1% of 100,000 rows).
  • For small data sets in particular (less than 10k rows), reducing this setting can help make more facts appear in the analysis result.

Maximum subgroup size

Sisu explores all possible combinations of your data columns and identifies top subgroups that have high impact on your metric.

  • When we explore one column, then it's “Order 1” fact (e.g.,  country = Canada).
  • When we explore a combination of two columns, then it's “Order 2” fact (e.g., country = Canada & gender = F).
  • When we explore a combination of two columns, then it's “Order 2” fact (e.g.,country = Canada, gender = F, and age > 18).

The default settings is 3. You may want to decrease this for faster analysis.