what i learnt - Data and Analytics: September 2010

Friday, September 24, 2010

Analytic function in postgresql - continue.....

list of analytic function available in postgresql is

• row_number()
• rank()
• dense_rank() 
• lag(value any [, offset integer [, DEFAULT any ]])
• lead(value any [, offset integer [, DEFAULT any ]])
• first_value(value any)
• last_value(value any)
• nth_value(value any, nth integer)
• percent_rank()
• cume_dist()
• ntile(num_buckets integer)

you could find analytic function marked over green posted here,

here let look on remaining functions.

FIRST_VALUE and LAST_VALUE function

Syntax

FIRST_VALUE() OVER () or LAST_VALUE() OVER()

  The FIRST_VALUE analytic function picks the first record
from the partition after doing the ORDER BY. The  is computed
on the columns of this first record and results are returned.

  The LAST_VALUE function is used in similar context except
that it acts on the last record of the partition. But it not
picking up the right record instead its pick up same data
what salary column picks.

example

don=#  select person, dept, salary,first_value(salary)
over(partition by dept order by salary desc)-salary “ salary diff ”
from test;

 person   |    dept    | salary | diff by 1st record
-----------+------------+--------+-----------------------
phoebe    | accounting |   3700 |        0
lincoln   | accounting |   2950 |      750
rachel    | accounting |   2950 |      750
benjamin  | accounting |   2100 |     1600
michael   | accounting |   1700 |     2000
paul      | accounting |   1650 |     2050
alexander | it         |   3700 |        0
lj        | it         |   3250 |      450
sara      | it         |   2600 |     1100
ross      | it         |   1250 |     2450
theodore  | management |   3250 |        0
joey      | management |   3250 |        0
monica    | management |   2950 |      300
chandler  | management |   2400 |      850
fernando  | marketing  |   1650 |        0
bradley   | marketing  |   1000 |      650
(16 rows)

in above example first_vlaue() function return first record from
the partition value after order by clause executed.
i.e in accounting partition in would return 3700 ( first after order by desc )
always, from there we could calculate difference in salary from top salary
of the partition.

Thursday, September 23, 2010

cluster already running - error while startup postgres postmaster

cluster already running / removed stale pid file, error you may get while start up the postmaster.

postmaster daemons create pid files when start up, remove them when the process exits. some time pid file exist even after postmaster daemons exit because the process crashed before it could remove the pid file or the system crashes and processes do not get a chance to remove their pidfiles even if they want to.

--> in this case just find out & remove postmaster pid file ( you could find out pid file in data directory of postgresql installed path)

or
--> find out the the running process id (PID) using port no & kill PID

lsof | grep 5432 --(lsof | grep PortNo) and kill the pid
kill -9 PID

or
--> find out the the running process id (PID) using process name & kill PID

ps -ef | grep postmaster --(lsof | grep PortNo) and kill the pid
kill -9 PID

Once stop the running postmaster using pid, start the postgres database server.

Function for spell out a number

Oracle having in built functionality to convert number into word format by using Juline(J) and spell out (SP) Pattern Modifiers for Date/Time Formatting.
Convert Number into Word in postgresql
for example

SELECT TO_CHAR(TO_DATE(12345,'J'),'JSP') FROM dual;

output like

TWELVE THOUSAND THREE HUNDRED FORTY-FIVE

in postgresql spell out (SP) pattern modifier not implemented (upto version postgresql 9.0)

here i have posted function to convert given number into word, lets look

create or replace function fn_spellout(p_num varchar) returns varchar as
$$
declare
v_word varchar(50);
v_spell varchar(1000) := '';
v_length integer;
v_pos integer := 1;
begin
select length(p_num) into v_length;
for v_pos in 1 .. v_length loop
--while v_pos <= v_length loop
SELECT case substring(p_num,v_pos,1)
when '1' then 'ONE '
when '2' then 'TWO '
when '3' then 'THREE '
when '4' then 'FOUR '
when '5' then 'FIVE '
when '6' then 'SIX '
when '7' then 'SEVEN '
when '8' then 'EIGHT '
when '9' then 'NINE '
when '0' then 'ZERO '
ELSE 'NULL'
end into v_word;
v_spell := v_spell || v_word;
--v_pos := v_pos+1;
end loop;
return v_spell;
end;
$$ language plpgsql

while execute this function it return output like

select fn_spellout('12345')

"ONE TWO THREE FOUR FIVE "

Friday, September 17, 2010

Analytic function in postgresql

Few function in oracle personally i like much more because of simplicity to achieve what we want, one of those function is analytic/windows functions. In postgresql it works 8.4 and later version only.

Key points:

       Analytic functions give aggregate result they do not group the result set.
They return the group value multiple times with each record. As such any
other non-"group by" column or expression can be present in the select clause.
       Analytic functions are computed after all joins, WHERE clause,
GROUP BY and HAVING are computed on the query. The main ORDER BY clause of the query
operates after the analytic functions, So analytic functions can only appear in the
select list and in the main ORDER BY clause of the query.

The general syntax of analytic function is:

function_name ([expression [, expression ... ]]) OVER ( window_definition )

where window_definition has the syntax

[ existing_window_name ]

[ PARTITION BY expression [, ...] ]

[ ORDER BY expression [ ASC | DESC | USING operator ] [ NULLS { FIRST | LAST } ] [, ...] ]

Let see simple example for each function

1 avg(salary) over (partition by dept)

avg() return (average salary of the partition and) group value multiple time for each record with average salary for partition.

SELECT dept,salary, avg(salary) over (partition by dept) FROM test ORDER BY dept,salary DESC;

dept | salary | avg

------------+--------+-----------------------

accounting | 3700 | 2508.3333333333333333

accounting | 2950 | 2508.3333333333333333

accounting | 2100 | 2508.3333333333333333

accounting | 1700 | 2508.3333333333333333

accounting | 1650 | 2508.3333333333333333

it | 3700 | 2700.0000000000000000

it | 3250 | 2700.0000000000000000

it | 2600 | 2700.0000000000000000

it | 1250 | 2700.0000000000000000

management | 3250 | 2962.5000000000000000

management | 2950 | 2962.5000000000000000

management | 2400 | 2962.5000000000000000

marketing | 1650 | 1325.0000000000000000

marketing | 1000 | 1325.0000000000000000

(16 rows)

2) rank() over(partition by dept order by salary desc)

rank() is kind-of like rownum. It returns which position given row gets when ordering by (whatever we put in “order by” clause inside “over ()”. ) . rank() is not really rownum – it’s not unique.
In case of a tie of 2 records at position N, RANK declares 2 positions N and skips position N+1 and gives position N+2 to the next record. While DENSE_RANK declares 2 positions N but does not skip position N+1.

don=# SELECT person,dept,salary, rank() over(partition by dept order by salary desc), dense_rank() over(partition by dept order by salary desc) FROM test ORDER BY dept,salary DESC;

person | dept | salary | rank | dense_rank

-----------+------------+--------+------+------------

phoebe | accounting | 3700 | 1 | 1

lincoln | accounting | 2950 | 2 | 2

rachel | accounting | 2950 | 2 | 2

benjamin | accounting | 2100 | 4 | 3

michael | accounting | 1700 | 5 | 4

paul | accounting | 1650 | 6 | 5

alexander | it | 3700 | 1 | 1

lj | it | 3250 | 2 | 2

sara | it | 2600 | 3 | 3

ross | it | 1250 | 4 | 4

theodore | management | 3250 | 1 | 1

joey | management | 3250 | 1 | 1

monica | management | 2950 | 3 | 2

chandler | management | 2400 | 4 | 3

fernando | marketing | 1650 | 1 | 1

bradley | marketing | 1000 | 2 | 2

(16 rows)

3) row_number() over(partition by deptno order by hiredate desc)

ROW_NUMBER( ) gives a running serial number to a partition of records. It is very useful in reporting, especially in places where different partitions have their own serial numbers.

don=# select dept,salary,row_number() over ( partition by dept order by salary desc) from test;

dept | salary | row_number

------------+--------+------------

accounting | 3700 | 1

accounting | 2950 | 2

accounting | 2950 | 3

accounting | 2100 | 4

accounting | 1700 | 5

accounting | 1650 | 6

it | 3700 | 1

it | 3250 | 2

it | 2600 | 3

it | 1250 | 4

management | 3250 | 1

management | 3250 | 2

management | 2950 | 3

management | 2400 | 4

marketing | 1650 | 1

marketing | 1000 | 2

(16 rows)4)

4) count(salary) over( partition by dept)

same like avg(), but it return count value for each partition

don=# select dept,salary,count(*) over ( partition by dept order by salary desc) from test;

dept | salary | count

------------+--------+-------

accounting | 3700 | 6

accounting | 2950 | 6

accounting | 2100 | 6

accounting | 1700 | 6

accounting | 1650 | 6

it | 3700 | 4

it | 3250 | 4

it | 2600 | 4

it | 1250 | 4

management | 3250 | 4

management | 2950 | 4

management | 2400 | 4

marketing | 1650 | 2

marketing | 1000 | 2

(16 rows)

5) LEAD and LAG

LEAD has the ability to compute an expression on the next rows (rows which are going to come after the current row) and return the value to the current row. The general syntax of LEAD is shown below:

LEAD (, , ) OVER ()

is the expression to compute from the leading row.
is the index of the leading row relative to the current row.
is a positive integer with default 1.
is the value to return if the points to a row outside the partition range.

SELECT deptno, empno, sal,

LEAD(sal, 1, 0) OVER (PARTITION BY dept ORDER BY sal DESC NULLS LAST) NEXT_LOWER_SAL,

LAG(sal, 1, 0) OVER (PARTITION BY dept ORDER BY sal DESC NULLS LAST) PREV_HIGHER_SAL

FROM emp WHERE deptno IN (10, 20) ORDER BY deptno, sal DESC;



 DEPTNO  EMPNO   SAL NEXT_LOWER_SAL PREV_HIGHER_SAL

------- ------ ----- -------------- ---------------

     10   7839  5000           2450               0

     10   7782  2450           1300            5000

     10   7934  1300              0            2450

     20   7788  3000           3000               0

     20   7902  3000           2975            3000

     20   7566  2975           1100            3000

     20   7876  1100            800            2975

     20   7369   800              0            1100



8 rows selected.

full query

SELECT person, dept, salary, avg(salary) over (partition by dept), rank() over(partition by dept order by salary desc),row_number() over(partition by dept order by salary desc) , count(salary) over( partition by dept ) FROM test ORDER BY dept,salary DESC ;

Thursday, September 16, 2010

Execute procedure in oracle10G EX

writing procedure in oracle 10g EX is same as any other version of oracle database server but while executing procedure we have to call procedure in begin end block rather then EXECUTE or EXEC command

let see simple example.............

creating procedure

CREATE OR REPLACE PROCEDURE sp_addVal (var1IN NUMBER, var2 IN NUMBER, var3 OUT NUMBER)
IS
BEGIN
var3:=var1+var2;
END;

executing procedure

DECLARE
v_sum NUMBER;
BEGIN
sp_addVal(50,50,v_sum);
DBMS_OUTPUT.PUT_LINE('output of test is '||v_sum);
END;

[
other then oracle10g EX simply call the procedure like
VARIABLE v_sum NUMBER;
EXEC sp_addVal(50,50,v_sum)
]
output

output of test is 100

Sunday, September 5, 2010

explicit type converstion in postgresql

To convert value from one type to another use explicit type conversion function

1) to_timestamp -- used to convert bigint to timestamp with or with out timezone.
2) epoch -- used to convert timestamp to bigint value.

here see the example

1) to_timestamp

SELECT to_timestamp(1283437580); -- it returns the output like

to_timestamp
timestamp with time zone
----------------------
"2010-09-02 19:56:20+05:30"

want to display the timestamp with out timezone

SELECT to_timestamp(1283437580)::timestamp;

to_timestamp
timestamp with time zone
----------------------
"2010-09-02 19:56:20"

2) epoch

SELECT EXTRACT(epoch FROM current_timestamp(0)) -- it return output like

date_part
double precision
-------------------
1283753630

-- more explicit type conversion will update here...

what i learnt - Data and Analytics

Pages