Page 1 of 1

Loading Huge Data

Posted: Mon Sep 20, 2010 4:41 pm
by ds_teg
I am having 10 files of same format and each is having 25 GB of data . So I need to load this 250 GB data into a teradata table . The table is not a multi set table and have one unique primary index .

I need to do some quality checks like date of birth is valid date or not .These quality checks can be performed from both datastage or teradata basic sql.

Which one of the below options is better and why

a) To read the files with file pattern from a sequential stage and load into teradata using multi load method .


b) Direcly write a multi load script and load into teradata table .

Thanks

Posted: Wed Sep 22, 2010 8:09 am
by ds_teg
Any idea on this post ?? :roll: :roll:

Posted: Wed Sep 22, 2010 8:27 am
by vmcburney
Try reading it through DataStage as it gives you more control over metadata and easier access to transform functions and a reject link for bad quality data. You may need to set the multiple readers option in the sequential file stage to speed up the sequential file read.

Posted: Wed Sep 22, 2010 4:57 pm
by ds_teg
Thanks vincent for your suggestion . I am planning to use file pttern to read files in parallel . I believe multiple readers per node wont be there if we are using file pattern option .

Also , I would like to know how restartability works in using the two options that i have specified in the post .

Thanks in advance .

Posted: Wed Sep 22, 2010 7:11 pm
by chulett
There is no "restartability" out of the box, that's something you need to design into your jobs based on your job's architecture and requirements.

Posted: Thu Sep 23, 2010 11:43 am
by ds_teg
Ok thanks craig for the response .