Page 1 of 1

extracting ranges of data

Posted: Thu Jun 24, 2010 8:14 pm
by Novak
Hi,

Was hoping for some tips on the best approach to extract the records on a date basis and load them into appropriate Oracle partitions.
Our target tables will be partitioned by month, and in our every datastage batch we will get %98 of records going into the current month's partition and the rest will go to the earlier ones (late arriving data).

Example:

Target table targetSales with partitions 2010apr, 2010may, 2010jun, etc.

With the SYSDATE = 24/JUN/2010, I get following content in the unpartitioned sourceSales table:

ID CREATED_DATE PROCESSED_FLAG PROCESSED_DATE

33 24/JUN/2010 N 24/JUN/2010
34 12/JUN/2010 N 24/JUN/2010
35 15/APR/2010 N 24/JUN/2010
36 12/MAY/2010 N 24/JUN/2010

Since we are planning to use the PARTITION EXCHANGE function of Oracle, we have to extract all of the records belonging to the same period as any of our newly arrived records. With that done we would load them into temporary table that would be used in PARTITION EXCHANGE. That part seems easy.

The challenge is to extract the records and prepare them to be loaded into the appropriate partitions? Like so:

33,34 into partition 2010jun
35 into partition 2010apr
36 into partition 2010may

Appreciate any suggestions?

Regards,

Novak

Posted: Thu Jun 24, 2010 9:29 pm
by ray.wurlod
If it's a small, finite number of partitions, multiple output links from a Transformer stage with appropriate constraints on each would seem to me to be the way to go.

Posted: Thu Jun 24, 2010 11:16 pm
by udayk_2007
you can achieve the desired result by using range partitioning on database table.This will ensure that data belonging to a month goes in a single partition.we have implemented in oracle,not sure if range partitioning is available in other databases

create table tablename (
ID INTEGER,
CREATED_DATE DATE
PROCESSED_FLAG VARCHAR
PROCESSED_DATE DATE
)
partition by range (CREATED_DATE)
(
partition APR2010 values less than (to_date('05/01/2010','mm/dd/yyyy')) tablespace ABC,
partition MAY2010 values less than (to_date('06/01/2010','mm/dd/yyyy')) tablespace XYZ,
partition JUN2010 values less than (to_date('07/01/2010','mm/dd/yyyy')) tablespace RST,
);

Regards
Ulhas

Posted: Fri Jun 25, 2010 1:49 am
by Novak
Hi,

The partition creation is fine, thats already been done, and in future will be taken care of by our DBA.

One of the main issues is the fact that partition number is not finite. It depends on the range of data coming in. How to best determine this number and get the data into respective partitions?

Regards,

Novak

Posted: Fri Jun 25, 2010 1:53 am
by Sreenivasulu
Try creating partitions on the oracle table dynamically as and when before the data is getting loaded. But i think you need to re-create the table before each load if a new partition date comes.

Regards
Sreeni

Posted: Fri Jun 25, 2010 2:15 am
by Sainath.Srinivasan
I will expect loading the remaining 2% (100 - 98) to be more challenging than partition load.

This is due to the volume of records to be scanned from old partitions to locate the required subset to extract.

Extracting from partitions can be done as

SELECT *
FROM tableName
PARTITION (partitionName)

You can directly do in Oracle with insert-append.

Posted: Tue Jun 29, 2010 12:52 am
by Novak
Got delayed a bit with some other tasks...

In the meantime I realized the 'Oracle 8 partition' could serve the purpose. And because I am not finding a great deal of information about this option - what are the risks of trying to use it when loading the tables on Oracle 11g?
The production database will be 11g but currently we are developing on 10i

Regards,

Novak

Posted: Wed Aug 04, 2010 12:01 am
by Novak
The issue has been resolved by the means of a workaround.

Instead of using the 'Oracle 8 Partition' feature of the Enterprise stage, I am using a sequence that works as follows:

1. It fetches all of the unprocessed records from the source table and determines the ranges they belong to (monthly in our case)
2. Creates a temporary table.
3. Starts a loop. Counter is a number of the months with the unprocessed records
4. Execute a job that loads the temporary table with one month worth of data and performs the partition exchange in the 'After SQL'.
5. Loop finishes
6. Rebuild the indexes. This is a must because they got invalidated during the 'Partition Exchange' command. There is a very nice package in Oracle 10 called DBMS_INDEX_UTIL that can be used for this purpose.

Regards,

Novak