Data initialization
Moderators: chulett, rschirm, roy
Data initialization
Hi all,
I have used dataset for staging area and first time I will do the historical load and the data will be in the staging dataset (option: overwrite) . Once after after the dimensions and facts are loaded ,I am taking max (updated_ts) in the dataset and load into a sequential file and via routine am passing it for the next run.
My question is,
1) Whats the case if one incremental load doesnt have data and zero records will come from the source and during that time what will be value in the dataset, will it be initialized if zero records goes into it.
2) even if the dataset is initialized, there wont be any record in it and its not possible to maintain the same updated timestamp in the target sequential file, because this job will run in a schedule.
When there is no records in the dataset for incremental load, how to maintain the previous maximum updated ts value
Thanks,
dubuku
I have used dataset for staging area and first time I will do the historical load and the data will be in the staging dataset (option: overwrite) . Once after after the dimensions and facts are loaded ,I am taking max (updated_ts) in the dataset and load into a sequential file and via routine am passing it for the next run.
My question is,
1) Whats the case if one incremental load doesnt have data and zero records will come from the source and during that time what will be value in the dataset, will it be initialized if zero records goes into it.
2) even if the dataset is initialized, there wont be any record in it and its not possible to maintain the same updated timestamp in the target sequential file, because this job will run in a schedule.
When there is no records in the dataset for incremental load, how to maintain the previous maximum updated ts value
Thanks,
dubuku
Possible options:
1) Don't run the job if the dataset is empty (a job sequence can control this)
2) Retrieve the timestamp from the sequential file and pass it into the job (job parm, copy of the sequential file, etc.), creating a dummy record with that value and combining it with the records from the dataset (funnel).
Regards,
1) Don't run the job if the dataset is empty (a job sequence can control this)
2) Retrieve the timestamp from the sequential file and pass it into the job (job parm, copy of the sequential file, etc.), creating a dummy record with that value and combining it with the records from the dataset (funnel).
Regards,
- james wiles
All generalizations are false, including this one - Mark Twain.
All generalizations are false, including this one - Mark Twain.
A job sequence can control this. Use the orchadmin command line utility to examine the dataset. Hint: "orchadmin dump n -1 dataset_name" will dump one row from each partition of the dataset to standard output. You could redirect that to a file, then use the unix/linux command wc to count the rows in that file. If there are 0 rows, the dataset is empty.
The job sequence activity stages Execute Command, User Variables and Nested Condition can be used to execute the commands, capture the results and conditionally execute the timestamp job if needed.
Regards,
The job sequence activity stages Execute Command, User Variables and Nested Condition can be used to execute the commands, capture the results and conditionally execute the timestamp job if needed.
Regards,
- james wiles
All generalizations are false, including this one - Mark Twain.
All generalizations are false, including this one - Mark Twain.
Re: Data initialization
Hi Dubukku
Lot of gray areas in your post.
I may not handle the approach in this way at all! Anyhow good luck with your approach!
Your question may lead to ask more questions! If there are no records for that run means, it will not have no record at all for that date?
If the job hasn't run for two days and doing the catch up. In that case also are you going to run day by day? Or Get data from X date to Y date?
Before arrive a solution, try to find the possibilities and then try to find the solution.
Lot of gray areas in your post.
I may not handle the approach in this way at all! Anyhow good luck with your approach!
Your question may lead to ask more questions! If there are no records for that run means, it will not have no record at all for that date?
If the job hasn't run for two days and doing the catch up. In that case also are you going to run day by day? Or Get data from X date to Y date?
Before arrive a solution, try to find the possibilities and then try to find the solution.
Thanks
Ram
----------------------------------
Revealing your ignorance is fine, because you get a chance to learn.
Ram
----------------------------------
Revealing your ignorance is fine, because you get a chance to learn.
Re: Data initialization
My question is simple, when there is no record in dataset, how to identify that in sequence job and based on that condition I can stop running the incremental timestamp job for that run
Thanks,
Dubuku
Thanks,
Dubuku
Re: Data initialization
So all you need is to know the number of records in the .ds file to decide!
The below link may lead you to drive further.
viewtopic.php?p=260565
The below link may lead you to drive further.
viewtopic.php?p=260565
Thanks
Ram
----------------------------------
Revealing your ignorance is fine, because you get a chance to learn.
Ram
----------------------------------
Revealing your ignorance is fine, because you get a chance to learn.
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
The number of records in a Data Set is given by
If you want just the number use the -n option
Code: Select all
dsrecords myDataSet.ds
Code: Select all
dsrecords -n myDataSet.ds
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.