Analytic function in postgresql

Few function in oracle personally i like much more because of simplicity to achieve what we want, one of those function is analytic/windows functions. In postgresql it works 8.4 and later version only.

Key points:

       Analytic functions give aggregate result they do not group the result set.
They return the group value multiple times with each record. As such any
other non-"group by" column or expression can be present in the select clause.
       Analytic functions are computed after all joins, WHERE clause,
GROUP BY and HAVING are computed on the query. The main ORDER BY clause of the query
operates after the analytic functions, So analytic functions can only appear in the
select list and in the main ORDER BY clause of the query.

The general syntax of analytic function is:

function_name ([expression [, expression ... ]]) OVER ( window_definition )

where window_definition has the syntax

[ existing_window_name ]

[ PARTITION BY expression [, ...] ]

[ ORDER BY expression [ ASC | DESC | USING operator ] [ NULLS { FIRST | LAST } ] [, ...] ]

Let see simple example for each function

1 avg(salary) over (partition by dept)

avg() return (average salary of the partition and) group value multiple time for each record with average salary for partition.

SELECT dept,salary, avg(salary) over (partition by dept) FROM test ORDER BY dept,salary DESC;

dept | salary | avg

------------+--------+-----------------------

accounting | 3700 | 2508.3333333333333333

accounting | 2950 | 2508.3333333333333333

accounting | 2100 | 2508.3333333333333333

accounting | 1700 | 2508.3333333333333333

accounting | 1650 | 2508.3333333333333333

it | 3700 | 2700.0000000000000000

it | 3250 | 2700.0000000000000000

it | 2600 | 2700.0000000000000000

it | 1250 | 2700.0000000000000000

management | 3250 | 2962.5000000000000000

management | 2950 | 2962.5000000000000000

management | 2400 | 2962.5000000000000000

marketing | 1650 | 1325.0000000000000000

marketing | 1000 | 1325.0000000000000000

(16 rows)

2) rank() over(partition by dept order by salary desc)

rank() is kind-of like rownum. It returns which position given row gets when ordering by (whatever we put in “order by” clause inside “over ()”. ) . rank() is not really rownum – it’s not unique.
In case of a tie of 2 records at position N, RANK declares 2 positions N and skips position N+1 and gives position N+2 to the next record. While DENSE_RANK declares 2 positions N but does not skip position N+1.

don=# SELECT person,dept,salary, rank() over(partition by dept order by salary desc), dense_rank() over(partition by dept order by salary desc) FROM test ORDER BY dept,salary DESC;

person | dept | salary | rank | dense_rank

-----------+------------+--------+------+------------

phoebe | accounting | 3700 | 1 | 1

lincoln | accounting | 2950 | 2 | 2

rachel | accounting | 2950 | 2 | 2

benjamin | accounting | 2100 | 4 | 3

michael | accounting | 1700 | 5 | 4

paul | accounting | 1650 | 6 | 5

alexander | it | 3700 | 1 | 1

lj | it | 3250 | 2 | 2

sara | it | 2600 | 3 | 3

ross | it | 1250 | 4 | 4

theodore | management | 3250 | 1 | 1

joey | management | 3250 | 1 | 1

monica | management | 2950 | 3 | 2

chandler | management | 2400 | 4 | 3

fernando | marketing | 1650 | 1 | 1

bradley | marketing | 1000 | 2 | 2

(16 rows)

3) row_number() over(partition by deptno order by hiredate desc)

ROW_NUMBER( ) gives a running serial number to a partition of records. It is very useful in reporting, especially in places where different partitions have their own serial numbers.

don=# select dept,salary,row_number() over ( partition by dept order by salary desc) from test;

dept | salary | row_number

------------+--------+------------

accounting | 3700 | 1

accounting | 2950 | 2

accounting | 2950 | 3

accounting | 2100 | 4

accounting | 1700 | 5

accounting | 1650 | 6

it | 3700 | 1

it | 3250 | 2

it | 2600 | 3

it | 1250 | 4

management | 3250 | 1

management | 3250 | 2

management | 2950 | 3

management | 2400 | 4

marketing | 1650 | 1

marketing | 1000 | 2

(16 rows)4)

4) count(salary) over( partition by dept)

same like avg(), but it return count value for each partition

don=# select dept,salary,count(*) over ( partition by dept order by salary desc) from test;

dept | salary | count

------------+--------+-------

accounting | 3700 | 6

accounting | 2950 | 6

accounting | 2100 | 6

accounting | 1700 | 6

accounting | 1650 | 6

it | 3700 | 4

it | 3250 | 4

it | 2600 | 4

it | 1250 | 4

management | 3250 | 4

management | 2950 | 4

management | 2400 | 4

marketing | 1650 | 2

marketing | 1000 | 2

(16 rows)

5) LEAD and LAG

LEAD has the ability to compute an expression on the next rows (rows which are going to come after the current row) and return the value to the current row. The general syntax of LEAD is shown below:

LEAD (, , ) OVER ()

is the expression to compute from the leading row.
is the index of the leading row relative to the current row.
is a positive integer with default 1.
is the value to return if the points to a row outside the partition range.

SELECT deptno, empno, sal,

LEAD(sal, 1, 0) OVER (PARTITION BY dept ORDER BY sal DESC NULLS LAST) NEXT_LOWER_SAL,

LAG(sal, 1, 0) OVER (PARTITION BY dept ORDER BY sal DESC NULLS LAST) PREV_HIGHER_SAL

FROM emp WHERE deptno IN (10, 20) ORDER BY deptno, sal DESC;



 DEPTNO  EMPNO   SAL NEXT_LOWER_SAL PREV_HIGHER_SAL

------- ------ ----- -------------- ---------------

     10   7839  5000           2450               0

     10   7782  2450           1300            5000

     10   7934  1300              0            2450

     20   7788  3000           3000               0

     20   7902  3000           2975            3000

     20   7566  2975           1100            3000

     20   7876  1100            800            2975

     20   7369   800              0            1100



8 rows selected.

full query

SELECT person, dept, salary, avg(salary) over (partition by dept), rank() over(partition by dept order by salary desc),row_number() over(partition by dept order by salary desc) , count(salary) over( partition by dept ) FROM test ORDER BY dept,salary DESC ;

what i learnt - Data and Analytics

Pages

Friday, September 17, 2010

Analytic function in postgresql

No comments:

Labels