In this article
Various aggregating functions to compute aggregated values from a set of values based on a source value and other arguments.
Aggregating functions compute aggregated values from a set of values. Aggregating functions normally take an argument specifying a source value to aggregate and possibly other arguments specific to a particular function.
Count and Some functions also have special forms where the value to aggregate can be omitted. In that case a source level must be specified instead. See the table below for details.
There are two forms of aggregation in Data Engine:
"Explicit" aggregation where a target aggregation level is specified (see common_args below). In this case a single aggregated value is evaluated for every record of the target level.
"Implicit" aggregation where a target level is not provided. In this case aggregation is performed according to the current context:
If the expression is used in a virtual level with groupings specified, then an aggregated value is computed in each group
If the expression is used in a child level definition, then an aggregated value is computed for each parent level record
Otherwise a single aggregated value is computed (across the whole data set)
All aggregating functions take common optional arguments (marked as common_args in the table below)
Filter expression. It must be an expression of boolean type. If a filter is specified only values satisfying the filter expression are included in aggregation. If no filter is required but the next argument (aggregation target) is to be specified pass True constant as a filter.
Aggregation target defining aggregation "mode". Aggregation target may be any of the following:
Level reference: Target level must be a (direct or indirect) parent of the source value level. (It is also allowed for both levels to be the same in which case no actual aggregation is performed). The resulting value is computed per record of this level. See above for "explicit aggregation". As a special case a level name "/" can be specified which indicates a "top" level (a virtual level containing only a single record with constant values). For Example: AVG(loop:q1, true, response:) evaluates an average q1 value for all loop records corresponding to a single response record.
Hierarchy reference: In this case rollup aggregation is performed for each target level record. The resulting value is computed per record of the level containing the hierarchy. For Example: AVG(response:nps, true, account:^hierarchy) evaluates an average nps value for every account including its child accounts
Any number of grouping expressions: In this case aggregation is performed in groups with the same combinations of all expressions provided. A resulting value conceptually belong to the lowest level of grouping expressions. For Example: AVG(response:salary, true, response:gender, response:region) evaluates an average salary for every combination of gender and region. For all records with the same gender and region values the resulting value is the same.
Tip
to use categorical questions with scores with functions that require a numeric value, you can convert the category question to a numeric first. E.g. to return an average on categorical question “Q1”, you can use the syntax:
average(numeric(:Q1))
Table of Aggregating Functions
Count(a, common_args) Count(level, common_args) | Any | Integer | Count of non-null values of a Count of records on the source level |
CountDistinct(a, common_args) | Any | Integer | Count of all distinct non-null values of a, duplicated values are not counted |
| CountIf(condition, common_args) | Boolean | Integer | Count of data records where the specified condition is true. |
Sum(x, common_args) | Numeric | Integer | Sum of x |
| Average(x, common_args) (alias: Avg) | Numeric | Real | Average value of x |
Min(x, common_args) | Numeric or date/time | Type of x | Minimal value of x |
Max(x, common_args) | Numeric or date/time | Type of x | Maximal value of x |
| Mode(x, common_args) | Numeric | Type of x | Mode of x |
| Median(x, common_args) | Numeric | Integer | Median value of x |
| Percentile(percentile, x, common_args) | Numeric scalar | Type of x | Percentile value of x with percentile level defined by the first argument (should be a constant in range 0-1). Supports only scalar numeric arguments. Median(x) is equivalent to Percentile(0.5, x) |
| Variance(x, common_args) (alias: Var) | Numeric | Real | Sample Variance of x |
| Stddev(x, common_args) (alias: Deviation) | Numeric | Real | Sample Standard deviation of x |
| StdErr(x, common_args) (alias: Standarderror) | Numeric | Real | Sample Standard error of x |
| Variancep(x, common_args) (alias: Varp) | Numeric | Real | Population Variance of x |
| Stddevp(x, common_args) (alias: Deviationp) | Numeric | Real | Population Standard deviation of x |
StdErrp(x, common_args) (alias: Standarderrorp) | Numeric | Real | Population Standard error of x |
Some(b, common_args) Some(level, common_args) | Boolean Level | Boolean | First version is true if any of b in current group is true. Second version evaluates to true if any record on the specified level exists in current group (equivalent to COUNT(level, common_args) > 0) |
| Every(b, common_args) | Boolean | Boolean | True if all of b is true |
| RangeGap(x, n1, n2, common_args) | Category | Real | Argument is expected to be a categorical variable with scores assigned. n1 and n2 must be integer constants specifying number of highest and lowest scores to calculate the number of promoters and detractors respectively. Computes the value: (number_of_promoters - number_of_dertactors)/total_response_count Promoters are defined as responses with the n1 highest score values. Detractors are defined as responses with n2 lowest score values. |
| Range(x,n1,n2,common_args) | Category | Real | Evaluates a proportion of answers in the specified answer range to the total number of answers. See InRange function above for the definition of a "range". |
| First(x, sorting, commom_args) | Any | Type of x | A pseudo-aggregating function evaluating the first value of the specified expression according to the specified sorting criteria. The second argument is expected to be a list of sorting criterion ("sort" node type, example shown here). Data records in each group (defined by list/cut context) are ordered according to the criteria specified and the value form the first record is returned. Note that this function is non-deterministic - if multiple records have the same sorting criteria values then the resulting value is chosen randomly from those records. |
| Last(x, sorting, commom_args) | Any | Type of x | Similar to the First function but with sorting criteria reversed. |
| correlation(x, y, common_args) (alias: corr) | numeric | real | Correlation coefficient of two values x and y |
| AggText(x, delimiter, sorting, common_args) | text | text | Evaluates concatenation of argument values in an aggregation group using the specified delimiter string. Values are concatenated in order specified by "sorting" argument. If sorting is omitted (or null), values are sorted in alphabetical order. The "sorting" argument is expected to be a list of sorting criterion ("sort" node type, example shown here). Due to SQL limitations this function is not supported in cross-tables with totals and with survey-based hierarchies of height greater than one. |
| AggTextRev(x, delimiter, sorting, common_args) | text | text | Similar to AggText but with sorting criteria reversed. |
Shortcut Functions
There are several "shortcut" functions defined for convenience, each built on other functions. These functions are summarized in the table at the end of this section.
The Net Promoter Score (NPS) is calculated based on participant responses to a question that would indicate their willingness to recommend a company's products or services to others. For example, an NPS question might ask "How likely is it that you would recommend our company / product / service to a friend or colleague?". The scoring for this response is most often based on a 0 to 10 scale.
Those who respond with a score of 9 or 10 are labeled Promoters and are considered likely to exhibit value-creating behaviors, such as buying more, remaining customers for longer, and making more positive referrals to other potential customers.
Those who respond with a score of 0 to 6 are labeled Detractors and they are believed to be less likely to exhibit the value-creating behaviors. Responses of 7 and 8 are labeled Passives and their behavior falls in the middle of Promoters and Detractors.
The NPS is usually calculated by subtracting the percentage of customers who are Detractors from the percentage of customers who are Promoters. For purposes of calculating a Net Promoter Score, Passives count towards the total number of participants, but do not directly affect the overall net score.
RangeGap(x, 2,7). So, 2 responses with highest scores will be considered promoters and 7 with lowest scores as detractors. It will work for any scale with 9 categories or more.Function | Description | Example |
|---|---|---|
NPS(x,) WNPS(x) | Calculates the NPS score for a question (promoters – Detractors). Equivalent to the expression: RangeGap(x, 2, 7) | NPS(:nps) will return the NPS score of the question “nps” |
Top(x, n) WTop(x, n) | RangeGap(x, n, 0) | For a scaled question “OSAT” of categories 1-10 Top(:OSAT, 4) Would return the proportion of those who answered the 7th -10th category |
Bottom(x, n) WBottom(x, n) | RangeGap(x, 0, n) | For a scaled question “OSAT” of categories 1-10 Bottom(:OSAT, 4) Would return the proportion of those who answered the 1st – 4th category |
Top1Percent(x) WTop1Percent(x) | The percentage of those who answered the top category score compared to all who answered the question. Equivalent to the expression: Top(x,1)*100 | For a scaled question “OSAT” of categories 1-10 Top1Percent(:OSAT) Would return the percentage of those who answered the 10th category |
Top2Percent(x) WTop2Percent(x) | The percentage of those who answered the top two category scores compared to all who answered the question. Equivalent to the expression: Top(x,2)*100 | For a scaled question “OSAT” of categories 1-10 Top2Percent(:OSAT) Would return the percentage of those who answered the 10th and 9th category |
Top3Percent(x)
WTop3Percent(x) | The percentage of those who answered the top three category scores compared to all who answered the question. Equivalent to the expression: Top(x,3)*100 | For a scaled question “OSAT” of categories 1-10 Top3Percent(:OSAT) Would return the percentage of those who answered the 10th , 9th and 8th category |
Bottom1Percent(x) WBottom1Percent(x) | The percentage of those who answered the bottom category score compared to all who answered the question. Equivalent to the expression: Bottom(x,1)*100 | For a scaled question “OSAT” of categories 1-10 Bottom1Percent(:OSAT) Would return the percentage of those who answered the 1st category
|
Bottom2Percent(x) WBottom2Percent(x) | The percentage of those who answered the bottom two category scores compared to all who answered the question. Equivalent to the expression: Bottom(x,2)*100 | For a scaled question “OSAT” of categories 1-10 Bottom1Percent(:OSAT) Would return the percentage of those who answered the 1st and 2nd category
|
Bottom3Percent(x) WBottom3Percent(x) | The percentage of those who answered the bottom three category scores compared to all who answered the question. Equivalent to the expression: Bottom(x,3)*100 | For a scaled question “OSAT” of categories 1-10 Bottom1Percent(:OSAT) Would return the percentage of those who answered the 1st , 2nd and 3rd category
|
PercentageOfAnswers(x, value1, value2...) WPercentageOfAnswers(x, value1, value2...) | The percentage of those who answered one or more specified category codes compared to all who answered the question. Equivalent to the expression: CountIf(In(x, value1, value2,...))/Count(x)*100 - percentage of the specified answers | For a question “colour” with the repsonses 1. Red 2. Green 3. Blue PercentageOfAnswers(:colour, “1”,”3”) will return the percentage of those who answered red, or blue. |
There are also "weighted" versions of some of the above functions. These versions compute weighted aggregates using weight expression(s) configured at query level. Each weighted function uses a weight specified for the argument's level in the query if any. If no weight expression is specified the result is the same as produced by the respective unweighted function. The list of weighted functions is below:
WCount
WSum
WAvg (or WAverage)
WStdev (or WDeviation)
WVar (or WVariance)
WStderr (or WStandardError)
WMode
WMedian
WRangeGap
WNps
WTop and WTop<n>Percent shortcuts
WBottom and WBottom<n>Percent shortcuts
WPercentageOfAnswers shortcut
Common Arguments for Aggregation Functions
The above aggregation functions all support a set of additional optional arguments that can be applied.
Adding Filter Criteria
If a filter is specified only values satisfying the filter expression are included in aggregation.
Example | Description |
count (:q1, :product=”1”) | Returns the count of q1 where the answers to the question “product were given as category 1 |
Note
for any subsequent arguments, if no additional filter is required, then set the filter argument to true (see examples below)
Setting the Aggregation Level
Target level must be a (direct or indirect) parent of the source value level.
The resulting value is computed per record of this level. See above for "explicit aggregation". As a special case a level name "/" can be specified which indicates a "top" level (a virtual level containing only a single record with constant values).
Example | Description |
AVG(loop:q1, true, response:) | evaluates an average q1 value for all loop records corresponding to a single response record. |
Referencing a Hierarchy Level
In this case rollup aggregation is performed for each target level record. The resulting value is computed per record of the level containing the hierarchy.
Example | Description |
AVG(response:nps, true, account:^hierarchy) | Evaluates an average nps value for every account including its child accounts |