Data initialization

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
dubuku_01
Participant
Posts: 79
Joined: Fri Nov 18, 2011 2:18 pm
Location: chennai

Data initialization

Post by dubuku_01 »

Hi all,

I have used dataset for staging area and first time I will do the historical load and the data will be in the staging dataset (option: overwrite) . Once after after the dimensions and facts are loaded ,I am taking max (updated_ts) in the dataset and load into a sequential file and via routine am passing it for the next run.

My question is,

1) Whats the case if one incremental load doesnt have data and zero records will come from the source and during that time what will be value in the dataset, will it be initialized if zero records goes into it.

2) even if the dataset is initialized, there wont be any record in it and its not possible to maintain the same updated timestamp in the target sequential file, because this job will run in a schedule.

When there is no records in the dataset for incremental load, how to maintain the previous maximum updated ts value


Thanks,
dubuku
jwiles
Premium Member
Premium Member
Posts: 1274
Joined: Sun Nov 14, 2004 8:50 pm
Contact:

Post by jwiles »

Possible options:

1) Don't run the job if the dataset is empty (a job sequence can control this)
2) Retrieve the timestamp from the sequential file and pass it into the job (job parm, copy of the sequential file, etc.), creating a dummy record with that value and combining it with the records from the dataset (funnel).

Regards,
- james wiles


All generalizations are false, including this one - Mark Twain.
dubuku_01
Participant
Posts: 79
Joined: Fri Nov 18, 2011 2:18 pm
Location: chennai

Post by dubuku_01 »

Thanks jwiles,

I understand even if the dataset is empty it will be there in the file folder and how i can stop running the incremental timestamp finding job, when the dataset is empty
jwiles
Premium Member
Premium Member
Posts: 1274
Joined: Sun Nov 14, 2004 8:50 pm
Contact:

Post by jwiles »

A job sequence can control this. Use the orchadmin command line utility to examine the dataset. Hint: "orchadmin dump n -1 dataset_name" will dump one row from each partition of the dataset to standard output. You could redirect that to a file, then use the unix/linux command wc to count the rows in that file. If there are 0 rows, the dataset is empty.

The job sequence activity stages Execute Command, User Variables and Nested Condition can be used to execute the commands, capture the results and conditionally execute the timestamp job if needed.

Regards,
- james wiles


All generalizations are false, including this one - Mark Twain.
SURA
Premium Member
Premium Member
Posts: 1229
Joined: Sat Jul 14, 2007 5:16 am
Location: Sydney

Re: Data initialization

Post by SURA »

Hi Dubukku

Lot of gray areas in your post.

I may not handle the approach in this way at all! Anyhow good luck with your approach!

Your question may lead to ask more questions! If there are no records for that run means, it will not have no record at all for that date?

If the job hasn't run for two days and doing the catch up. In that case also are you going to run day by day? Or Get data from X date to Y date?

Before arrive a solution, try to find the possibilities and then try to find the solution.
Thanks
Ram
----------------------------------
Revealing your ignorance is fine, because you get a chance to learn.
dubuku_01
Participant
Posts: 79
Joined: Fri Nov 18, 2011 2:18 pm
Location: chennai

Re: Data initialization

Post by dubuku_01 »

My question is simple, when there is no record in dataset, how to identify that in sequence job and based on that condition I can stop running the incremental timestamp job for that run

Thanks,
Dubuku
SURA
Premium Member
Premium Member
Posts: 1229
Joined: Sat Jul 14, 2007 5:16 am
Location: Sydney

Re: Data initialization

Post by SURA »

So all you need is to know the number of records in the .ds file to decide!

The below link may lead you to drive further.

viewtopic.php?p=260565
Thanks
Ram
----------------------------------
Revealing your ignorance is fine, because you get a chance to learn.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

... or you can just give what James posted a try.
-craig

"You can never have too many knives" -- Logan Nine Fingers
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

The number of records in a Data Set is given by

Code: Select all

dsrecords myDataSet.ds
If you want just the number use the -n option

Code: Select all

dsrecords -n myDataSet.ds
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
jwiles
Premium Member
Premium Member
Posts: 1274
Joined: Sun Nov 14, 2004 8:50 pm
Contact:

Post by jwiles »

I always forget about that one....
- james wiles


All generalizations are false, including this one - Mark Twain.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

OK, so pursue the Wiles/Wurlod variant then. :wink:
-craig

"You can never have too many knives" -- Logan Nine Fingers
Post Reply