need to hash the incomig data into different seq files, how?

zulfi123786 · Post by **zulfi123786** » Wed Sep 23, 2009 5:54 am

Hi ,
I have a huge chunk of data which i need to put into different files based on a key. which stage does this???

Primary aim is to sort the entire data but unix sort fails as the file is huge, hence devide the data into diff files and then apply trasformation then concatenate diff files.........

ArndW · Post by **ArndW** » Wed Sep 23, 2009 6:12 am

Use the built-in partitioning of PX; create a multinode configuration file and the default hashing algorithm then write to a fileset. Or write to a single sequential file then use the UNIX "split" command to create several files.

Sainath.Srinivasan · Post by **Sainath.Srinivasan** » Wed Sep 23, 2009 6:19 am

What is the purpose of Unix sort ? Are you doing it outside DataStage ?

I will avoid 'split' as it does by size and lines rather than content.

Create a new configuration file to use resource as per your current job's needs and use it in your job.

As ArndW sugggested, use PX partitioning. If you know the values you want to split by, you can run the job as multi-instance with each running one type.

zulfi123786 · Post by **zulfi123786** » Wed Sep 23, 2009 6:21 am

we are not supposed to change the config file ... the current one is a single node one.....

Sainath.Srinivasan · Post by **Sainath.Srinivasan** » Wed Sep 23, 2009 6:24 am

zulfi123786 wrote:we are not supposed to change the config file ... the current one is a single node one.....

Why?

What is the size in volume and rows ?

What is your plan of approach ?

What action causes concern ?

ArndW · Post by **ArndW** » Wed Sep 23, 2009 7:15 am

zulfi, not using diferent config files in PX is like buying a Ferrari car but only using 1st gear.

zulfi123786 · Post by **zulfi123786** » Wed Sep 23, 2009 8:52 am

Yup!!!! i do know but this is the current Environment which is setup in Production......... and i don't have the authority of changing the config file, I need to find some way using the what is existing.