March 15, 2014  Dipti Joshi
Windowing Functions for Analytics
Statistical windowing functions allow analysts to run statistical calculations indatabase, without needing to use an external statistical processing environment such as R. Since most statistical processing platforms are limited to the memory of the single server on which they’re run, the ability to run statistics indatabase on InfiniDB, utilizing multiple cores and the processing power of multiple nodes, is a gamechanger for Data Scientists and Business Intelligence analysts looking to do deep analytics on Big Data.
The term “windowing” refers to the fact that InfiniDB performs a calculation on a subset, or a “window”, of the data. Although these functions provide aggregate values, unlike traditional aggregate functions using GROUP BY clause in standard SQL syntax, they return multiple rows for each group. Windowing functions simplify complex analytics tasks for computing cumulative, moving, centered or reporting aggregates without running subqueries or writing stored procedures.
Traditional Aggregate Function 
Windowing Aggregate Function 

compute aggregates by creating groups 
compute aggregates via partitions and window frames 
output is one row for each group

output is one row for each input row

only one way of aggregating for each group 
different rows in the same partition can have different window frames 
only one way of grouping for each SELECT 
aggregates in the same SELECT can use different partitions 
Window functions operate on multiple rows, or groups of rows, within the result set returned from a query. Partitioning and windowing clauses define the groups of rows that a window function operates on. The order within these groups is defined by an ordering clause for the purpose of function evaluation. The general form of a windowing function is:
FUNCTION_NAME(expr) OVER [PARTITION BY value, [value,…] ][ORDER BY rule, [rule, …]] [{ROWSRANGE} {startbetween}]
As an example, to find the moving average of employees’ salaries within their department as ordered by their hiring date as well as their ranking by hiring date within their department:
mysql> select *, avg(salary) over(partition by depname order by enroll_date ROWS UNBOUNDED PRECEDING) as moving_avg, rank() over(partition by depname order by enroll_date) as rank from empsalary order by depname, rank;
+++++++
 depname  empno  salary  enroll_date  moving_avg  rank 
+++++++
 develop  8  6000  20061001  6000.0000  1 
 develop  10  5200  20070801  5600.0000  2 
 develop  11  5200  20070815  5466.6667  3 
 develop  7  4200  20080101  5150.0000  4 
 develop  9  4500  20080101  5020.0000  4 
 develop  12  7000  20080809  5350.0000  6 
 personnel  2  3900  20061223  3900.0000  1 
 personnel  5  3500  20071210  3700.0000  2 
 sales  1  5000  20061001  5000.0000  1 
 sales  3  4800  20070801  4900.0000  2 
 sales  4  4800  20070808  4866.6667  3 
+++++++
11 rows in set (0.02 sec)
Understanding Partition and Frame
 For each row, calculation for an aggregation is done over a FRAME of rows
 The PARTITION of a row is the group of rows that have a value for a specific column same as the current row
 FRAME for each row is a subset of a PARTITION for the row
Example: The table below shows Partition and Frame for each row for this SQL statement
SELECT x,y,sum(x) OVER (PARTITION BY y RANGE BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) FROM a ORDER BY x, y
ROW NUMBER 
X 
Y 
PARTITION 
FRAME 
FRAME 
FRAME 
FRAME 

1 
1 
1 
Partition for rows 1 to 4 
Frame for row 1 sum(x) =22 

2 
4 
1 
Frame for row 2 sum(x) = 21 

3 
7 
1 
Frame for row 3 sum(x) = 17 

4 
10 
1 
Frame for row 4 sum(x) = 10 

5 
2 
2 
Partition for rows 5 to 7 
Frame for row 5 sum(x) = 15 

6 
5 
2 
Frame for row 6 sum(x) = 13 

7 
8 
2 
Frame for row 7 sum(x) = 8 

8 
3 
3 
Partition for rows 8 to 10 
Frame for row 8 sum(x) =18 

9 
6 
3 
Frame for row 9 sum(x) =15 

10 
9 
3 
Frame for row 10 sum(x) = 9 
Processing Order
A SQL statement using a windowing function in its projection list, processes the join, where, group by and having clause of the query before computing the function as the order shown below.
1. JOIN, WHERE, GROUP BY, HAVING CLAUSE
2. PARTITIONS created and function applied to each row
3. Final ORDER BY of the main query
Supported Windowing functions in InfiniDB
Aggregate
 MAX, MIN, COUNT, SUM, AVG, STD, STDDEV, STDDEV_POP, VARIANCE, VAR_POP, VAR_SAMP
Ranking
 RANK, DENSE_RANK, PERCENT_RANK, CUME_DIST, NTILE, PERCENTILE, PERCENTILE_CONT, PERCENTILE_DISC, MEDIAN, ROW_NUMBER
FIRST/LAST
 NTH_VALUE, FIRST_VALUE, LAST_VALUE
LEAD/LAG
 LAG, LEAD
Do you have a specific use case for windowing functions or another windowing function for InfiniDB that you would like to discuss? Let us know in the comments, below, via email at info@ifninidb.co, or on Twitter @infiniDB or @dipti_smg.