Loading Huge Data

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
ds_teg
Premium Member
Premium Member
Posts: 51
Joined: Tue Aug 11, 2009 6:53 am
Location: Chicago

Loading Huge Data

Post by ds_teg »

I am having 10 files of same format and each is having 25 GB of data . So I need to load this 250 GB data into a teradata table . The table is not a multi set table and have one unique primary index .

I need to do some quality checks like date of birth is valid date or not .These quality checks can be performed from both datastage or teradata basic sql.

Which one of the below options is better and why

a) To read the files with file pattern from a sequential stage and load into teradata using multi load method .


b) Direcly write a multi load script and load into teradata table .

Thanks
ds_teg
Premium Member
Premium Member
Posts: 51
Joined: Tue Aug 11, 2009 6:53 am
Location: Chicago

Post by ds_teg »

Any idea on this post ?? :roll: :roll:
vmcburney
Participant
Posts: 3593
Joined: Thu Jan 23, 2003 5:25 pm
Location: Australia, Melbourne
Contact:

Post by vmcburney »

Try reading it through DataStage as it gives you more control over metadata and easier access to transform functions and a reject link for bad quality data. You may need to set the multiple readers option in the sequential file stage to speed up the sequential file read.
ds_teg
Premium Member
Premium Member
Posts: 51
Joined: Tue Aug 11, 2009 6:53 am
Location: Chicago

Post by ds_teg »

Thanks vincent for your suggestion . I am planning to use file pttern to read files in parallel . I believe multiple readers per node wont be there if we are using file pattern option .

Also , I would like to know how restartability works in using the two options that i have specified in the post .

Thanks in advance .
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

There is no "restartability" out of the box, that's something you need to design into your jobs based on your job's architecture and requirements.
-craig

"You can never have too many knives" -- Logan Nine Fingers
ds_teg
Premium Member
Premium Member
Posts: 51
Joined: Tue Aug 11, 2009 6:53 am
Location: Chicago

Post by ds_teg »

Ok thanks craig for the response .
Post Reply