need to hash the incomig data into different seq files, how?

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
zulfi123786
Premium Member
Premium Member
Posts: 730
Joined: Tue Nov 04, 2008 10:14 am
Location: Bangalore

need to hash the incomig data into different seq files, how?

Post by zulfi123786 »

Hi ,
I have a huge chunk of data which i need to put into different files based on a key. which stage does this???

Primary aim is to sort the entire data but unix sort fails as the file is huge, hence devide the data into diff files and then apply trasformation then concatenate diff files.........
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Use the built-in partitioning of PX; create a multinode configuration file and the default hashing algorithm then write to a fileset. Or write to a single sequential file then use the UNIX "split" command to create several files.
Sainath.Srinivasan
Participant
Posts: 3337
Joined: Mon Jan 17, 2005 4:49 am
Location: United Kingdom

Post by Sainath.Srinivasan »

What is the purpose of Unix sort ? Are you doing it outside DataStage ?

I will avoid 'split' as it does by size and lines rather than content.

Create a new configuration file to use resource as per your current job's needs and use it in your job.

As ArndW sugggested, use PX partitioning. If you know the values you want to split by, you can run the job as multi-instance with each running one type.
zulfi123786
Premium Member
Premium Member
Posts: 730
Joined: Tue Nov 04, 2008 10:14 am
Location: Bangalore

Post by zulfi123786 »

we are not supposed to change the config file ... the current one is a single node one.....
Sainath.Srinivasan
Participant
Posts: 3337
Joined: Mon Jan 17, 2005 4:49 am
Location: United Kingdom

Post by Sainath.Srinivasan »

zulfi123786 wrote:we are not supposed to change the config file ... the current one is a single node one.....
Why?

What is the size in volume and rows ?

What is your plan of approach ?

What action causes concern ?
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

zulfi, not using diferent config files in PX is like buying a Ferrari car but only using 1st gear.
zulfi123786
Premium Member
Premium Member
Posts: 730
Joined: Tue Nov 04, 2008 10:14 am
Location: Bangalore

Post by zulfi123786 »

Yup!!!! i do know but this is the current Environment which is setup in Production......... and i don't have the authority of changing the config file, I need to find some way using the what is existing.
Post Reply