Sections in this category

Understanding Sisu's Statistical Models

  • Updated

Sisu currently supports up to 3 different Statistical models depending on the type of analysis you are running. Depending on the goal of your analysis, you may choose to vary the model applied to diagnose the metric in different ways. This article will describe the primary models available and example use cases for each. 

 

Related Articles:

 

Selecting a Statistical Model for Your Analysis

To select the Statistical Model to be used for your Analysis:

  1. Open the Analysis.
  2. Click the Top Drivers Settings button at the top.
    The Advance Settings modal will be displayed.

 

tip_icon_-_small.png For details about other settings in this modal, refer to Tuning Analyses with Advanced Settings.

 

Each Statistical Model is described in detail in this article. The following table describes the Statistical Models available for the different Analysis and Metric type combinations:

   

 

ANALYSIS AND METRIC TYPE

 

General Performance

Time & Group Comparisons

 

Avg/Rate

Sum/Count

Avg/Rate

Sum/Count

Top Drivers:
High Accuracy Model

Available [default]

Not
Available

Available [default]

Not
Available

Top Drivers: Original Model

Available
[deprecated by 12/31/2021]

Available [default]

Available [deprecated by 12/31/2021]

Available [default]

All Subgroups

Available

Available

Available

Available



Top Drivers (High accuracy model)

The Top Drivers (High accuracy) Statistical Model is the default setting for General Performance Analyses with Metrics that are calculated as Averages (Numerical metrics) or Rates (Categorical metrics). 

This Statistical Model compares Subgroups to identify the statistically significant drivers of change for your metric by producing the top Subgroups that meet the Confidence threshold indicated in this settings modal.

The Confidence threshold setting defaults to 95%. This means that there is a 95% probability that all the FACTs displayed true positive facts, and a 5% chance that there is at least one false-positive fact in the set of facts. 

Decreasing the Confidence threshold can increase the number of facts that you see, but with a lower confidence that all facts returned are true-positives.

If a particular subgroup explains a part of the Metric (i.e., it is correlated) but overlaps with another subgroup that has a higher correlation with the Metric, Sisu concludes that the second subgroup is more likely to be true-positive and includes it in the Analysis and not the first one.

For example, let’s assume you are running an Ice-Cream store business across the US and recently launched a coupon code that targeted people under the age of 18 (Age<18) in the state of California (State=CA). You have selected “Total Revenue” as the Metric to analyze. (Note that the Aggregation method is “Sum” in this example, since you are concerned about the total revenue.)

Your data will generate the following Subgroups (or “Facts”):

    • Potential Fact #1:  Age<18
    • Potential Fact #2:  State=CA
    • Potential Fact #3:  Age<18 AND State=CA
    • Potential Fact #4:  Gender=F

Let’s say that in this Analysis, the 3rd Subgroup generates all the impact, but since it is a subset of the first two subgroups, the other two Subgroups also have some impact. The Sisu algorithm will check the correlation of all Subgroups with the change in the Metric (Revenue) and select the facts that explain it the most.

In this example, since Fact #3 shows the highest correlation, Sisu selects it to include in the Analysis output. When it comes to evaluating Facts #1 and #2, the algorithm will detect that their correlation (and impact) was already captured in Fact #3 and will not pick these facts, even though they have some non-zero impact. 

In addition to the first two facts, if there are other facts (such as Fact #4 in the example above) whose impacts are already captured by Fact #3, Sisu will also disregard those facts to display.

In this way, in explaining the WHY behind the change in the Metric, the Sisu Top Drivers algorithm is more likely to pick true-positive facts (Fact #3) while not selecting other facts that have similar impact but are likely false-positives.

 

tip_icon_-_small.png If you prefer for your Analysis to show you all Facts, select the All Subgroups (no Model) Statistical Method.

 

Top Drivers (Original Model)

The Top Drivers (Original Model) is the default setting for all Time and Group Comparison Analyses for all Metric calculation types (Average, Rate, Sum, Count) 

This original Sisu Model compares subgroups to identify the statistically significant drivers of change for your metric by producing the top Subgroups that meet the Confidence threshold indicated in this settings modal.

The Confidence threshold setting defaults to 95%. This means that there is a 95% probability that each FACT displayed passes a 95% confidence statistical test of being a true-positive fact. 

Decreasing the Confidence threshold can increase the number of facts that you see, but with a lower confidence that all facts returned are true-positives.

 

All Subgroups (no Model)

The All Subgroups (no Model) option uses no statistical model, and is available as a selection for all Analysis Types. Subgroups are ranked by Impact and change in metric, and the top 10,000 (maximum) are included in the Analysis results.

Select this option when you need to see the stats or build a waterfall chart for other subgroups not shown by Top Driver analyses. This can be useful to see how your metric changed/performed for specific features (i.e. columns) that you are interested in viewing your metric with. 

You can adjust the list of columns in the Manage Columns menu or use the Subgroup table filters to find the column(s) you are most interested in.

Since no Statistical Model being used to select subgroups, the confidence threshold is not used and set to 0%