Page 1 of 1

need to hash the incomig data into different seq files, how?

Posted: Wed Sep 23, 2009 5:54 am
by zulfi123786
Hi ,
I have a huge chunk of data which i need to put into different files based on a key. which stage does this???

Primary aim is to sort the entire data but unix sort fails as the file is huge, hence devide the data into diff files and then apply trasformation then concatenate diff files.........

Posted: Wed Sep 23, 2009 6:12 am
by ArndW
Use the built-in partitioning of PX; create a multinode configuration file and the default hashing algorithm then write to a fileset. Or write to a single sequential file then use the UNIX "split" command to create several files.

Posted: Wed Sep 23, 2009 6:19 am
by Sainath.Srinivasan
What is the purpose of Unix sort ? Are you doing it outside DataStage ?

I will avoid 'split' as it does by size and lines rather than content.

Create a new configuration file to use resource as per your current job's needs and use it in your job.

As ArndW sugggested, use PX partitioning. If you know the values you want to split by, you can run the job as multi-instance with each running one type.

Posted: Wed Sep 23, 2009 6:21 am
by zulfi123786
we are not supposed to change the config file ... the current one is a single node one.....

Posted: Wed Sep 23, 2009 6:24 am
by Sainath.Srinivasan
zulfi123786 wrote:we are not supposed to change the config file ... the current one is a single node one.....
Why?

What is the size in volume and rows ?

What is your plan of approach ?

What action causes concern ?

Posted: Wed Sep 23, 2009 7:15 am
by ArndW
zulfi, not using diferent config files in PX is like buying a Ferrari car but only using 1st gear.

Posted: Wed Sep 23, 2009 8:52 am
by zulfi123786
Yup!!!! i do know but this is the current Environment which is setup in Production......... and i don't have the authority of changing the config file, I need to find some way using the what is existing.