Page 1 of 1

Data initialization

Posted: Wed Jun 12, 2013 11:28 am
by dubuku_01
Hi all,

I have used dataset for staging area and first time I will do the historical load and the data will be in the staging dataset (option: overwrite) . Once after after the dimensions and facts are loaded ,I am taking max (updated_ts) in the dataset and load into a sequential file and via routine am passing it for the next run.

My question is,

1) Whats the case if one incremental load doesnt have data and zero records will come from the source and during that time what will be value in the dataset, will it be initialized if zero records goes into it.

2) even if the dataset is initialized, there wont be any record in it and its not possible to maintain the same updated timestamp in the target sequential file, because this job will run in a schedule.

When there is no records in the dataset for incremental load, how to maintain the previous maximum updated ts value


Thanks,
dubuku

Posted: Wed Jun 12, 2013 12:36 pm
by jwiles
Possible options:

1) Don't run the job if the dataset is empty (a job sequence can control this)
2) Retrieve the timestamp from the sequential file and pass it into the job (job parm, copy of the sequential file, etc.), creating a dummy record with that value and combining it with the records from the dataset (funnel).

Regards,

Posted: Wed Jun 12, 2013 4:52 pm
by dubuku_01
Thanks jwiles,

I understand even if the dataset is empty it will be there in the file folder and how i can stop running the incremental timestamp finding job, when the dataset is empty

Posted: Thu Jun 13, 2013 12:04 am
by jwiles
A job sequence can control this. Use the orchadmin command line utility to examine the dataset. Hint: "orchadmin dump n -1 dataset_name" will dump one row from each partition of the dataset to standard output. You could redirect that to a file, then use the unix/linux command wc to count the rows in that file. If there are 0 rows, the dataset is empty.

The job sequence activity stages Execute Command, User Variables and Nested Condition can be used to execute the commands, capture the results and conditionally execute the timestamp job if needed.

Regards,

Re: Data initialization

Posted: Mon Jun 24, 2013 6:15 pm
by SURA
Hi Dubukku

Lot of gray areas in your post.

I may not handle the approach in this way at all! Anyhow good luck with your approach!

Your question may lead to ask more questions! If there are no records for that run means, it will not have no record at all for that date?

If the job hasn't run for two days and doing the catch up. In that case also are you going to run day by day? Or Get data from X date to Y date?

Before arrive a solution, try to find the possibilities and then try to find the solution.

Re: Data initialization

Posted: Mon Jun 24, 2013 6:46 pm
by dubuku_01
My question is simple, when there is no record in dataset, how to identify that in sequence job and based on that condition I can stop running the incremental timestamp job for that run

Thanks,
Dubuku

Re: Data initialization

Posted: Mon Jun 24, 2013 7:02 pm
by SURA
So all you need is to know the number of records in the .ds file to decide!

The below link may lead you to drive further.

viewtopic.php?p=260565

Posted: Mon Jun 24, 2013 9:30 pm
by chulett
... or you can just give what James posted a try.

Posted: Mon Jun 24, 2013 11:29 pm
by ray.wurlod
The number of records in a Data Set is given by

Code: Select all

dsrecords myDataSet.ds
If you want just the number use the -n option

Code: Select all

dsrecords -n myDataSet.ds

Posted: Tue Jun 25, 2013 10:02 pm
by jwiles
I always forget about that one....

Posted: Tue Jun 25, 2013 10:36 pm
by chulett
OK, so pursue the Wiles/Wurlod variant then. :wink: