extracting ranges of data

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
Novak
Participant
Posts: 97
Joined: Mon May 21, 2007 10:08 pm
Location: Australia

extracting ranges of data

Post by Novak »

Hi,

Was hoping for some tips on the best approach to extract the records on a date basis and load them into appropriate Oracle partitions.
Our target tables will be partitioned by month, and in our every datastage batch we will get %98 of records going into the current month's partition and the rest will go to the earlier ones (late arriving data).

Example:

Target table targetSales with partitions 2010apr, 2010may, 2010jun, etc.

With the SYSDATE = 24/JUN/2010, I get following content in the unpartitioned sourceSales table:

ID CREATED_DATE PROCESSED_FLAG PROCESSED_DATE

33 24/JUN/2010 N 24/JUN/2010
34 12/JUN/2010 N 24/JUN/2010
35 15/APR/2010 N 24/JUN/2010
36 12/MAY/2010 N 24/JUN/2010

Since we are planning to use the PARTITION EXCHANGE function of Oracle, we have to extract all of the records belonging to the same period as any of our newly arrived records. With that done we would load them into temporary table that would be used in PARTITION EXCHANGE. That part seems easy.

The challenge is to extract the records and prepare them to be loaded into the appropriate partitions? Like so:

33,34 into partition 2010jun
35 into partition 2010apr
36 into partition 2010may

Appreciate any suggestions?

Regards,

Novak
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

If it's a small, finite number of partitions, multiple output links from a Transformer stage with appropriate constraints on each would seem to me to be the way to go.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
udayk_2007
Participant
Posts: 72
Joined: Wed Dec 12, 2007 2:29 am

Post by udayk_2007 »

you can achieve the desired result by using range partitioning on database table.This will ensure that data belonging to a month goes in a single partition.we have implemented in oracle,not sure if range partitioning is available in other databases

create table tablename (
ID INTEGER,
CREATED_DATE DATE
PROCESSED_FLAG VARCHAR
PROCESSED_DATE DATE
)
partition by range (CREATED_DATE)
(
partition APR2010 values less than (to_date('05/01/2010','mm/dd/yyyy')) tablespace ABC,
partition MAY2010 values less than (to_date('06/01/2010','mm/dd/yyyy')) tablespace XYZ,
partition JUN2010 values less than (to_date('07/01/2010','mm/dd/yyyy')) tablespace RST,
);

Regards
Ulhas
Novak
Participant
Posts: 97
Joined: Mon May 21, 2007 10:08 pm
Location: Australia

Post by Novak »

Hi,

The partition creation is fine, thats already been done, and in future will be taken care of by our DBA.

One of the main issues is the fact that partition number is not finite. It depends on the range of data coming in. How to best determine this number and get the data into respective partitions?

Regards,

Novak
Sreenivasulu
Premium Member
Premium Member
Posts: 892
Joined: Thu Oct 16, 2003 5:18 am

Post by Sreenivasulu »

Try creating partitions on the oracle table dynamically as and when before the data is getting loaded. But i think you need to re-create the table before each load if a new partition date comes.

Regards
Sreeni
Sainath.Srinivasan
Participant
Posts: 3337
Joined: Mon Jan 17, 2005 4:49 am
Location: United Kingdom

Post by Sainath.Srinivasan »

I will expect loading the remaining 2% (100 - 98) to be more challenging than partition load.

This is due to the volume of records to be scanned from old partitions to locate the required subset to extract.

Extracting from partitions can be done as

SELECT *
FROM tableName
PARTITION (partitionName)

You can directly do in Oracle with insert-append.
Novak
Participant
Posts: 97
Joined: Mon May 21, 2007 10:08 pm
Location: Australia

Post by Novak »

Got delayed a bit with some other tasks...

In the meantime I realized the 'Oracle 8 partition' could serve the purpose. And because I am not finding a great deal of information about this option - what are the risks of trying to use it when loading the tables on Oracle 11g?
The production database will be 11g but currently we are developing on 10i

Regards,

Novak
Novak
Participant
Posts: 97
Joined: Mon May 21, 2007 10:08 pm
Location: Australia

Post by Novak »

The issue has been resolved by the means of a workaround.

Instead of using the 'Oracle 8 Partition' feature of the Enterprise stage, I am using a sequence that works as follows:

1. It fetches all of the unprocessed records from the source table and determines the ranges they belong to (monthly in our case)
2. Creates a temporary table.
3. Starts a loop. Counter is a number of the months with the unprocessed records
4. Execute a job that loads the temporary table with one month worth of data and performs the partition exchange in the 'After SQL'.
5. Loop finishes
6. Rebuild the indexes. This is a must because they got invalidated during the 'Partition Exchange' command. There is a very nice package in Oracle 10 called DBMS_INDEX_UTIL that can be used for this purpose.

Regards,

Novak
Post Reply