need unix script

bobbysridhar · Post by **bobbysridhar** » Thu Apr 05, 2012 7:48 am

Hi,
I am loading rejected records into sequential file. Once the data is loaded, I want to seperate them based on timestamp. I have column for timestamp in rejected file.
Could anyone help me in writing unix script to partition the file based on timestamp after the datastage job is finished.

Thanks,
Sridhar

qt_ky · Post by **qt_ky** » Thu Apr 05, 2012 8:17 am

Need more info:

Please give an example of a reject record including the timestamp column.

How are you wanting to decide on the partitions? Based on timestamp range? If so, how are you deciding ranges? Based on row counts, etc.?

bobbysridhar · Post by **bobbysridhar** » Thu Apr 05, 2012 8:37 am

Hi,
Thanks for your reply.
I want to partition it based on timestamp.
After Lookup, whatever records failed in lookup are going to a transformer using reject link. In transformer I am creating timestamp and sending it to reject file.
So, I want to partition the file based on that timestamp.
Because in datastage we don't have file rollover based on size.
we have to load the file and then after job finished only we can do partition using unix script.(as of my knowledge)
So, I want to partition the file based on that timestamp.

Thanks,
Sridhar

chulett · Post by **chulett** » Thu Apr 05, 2012 8:54 am

Related to this post I assume:

viewtopic.php?p=419960

Not sure the timestamp is really going to help you chunk this up, why not look into something like the UNIX split command?

qt_ky · Post by **qt_ky** » Thu Apr 05, 2012 9:15 am

You still have to decide and explain what you mean by partitioning on the timestamp.

Example:

- All timestamps ending in an odd number go to file1, ending in an even number go to file2... (like modulus partitioning in DataStage).

- All timestamps from hour 1:00am to 2:00am go to file1, 2:00am to 3:00am go to file2, etc. (like range partitioning in DataStage).

Read my first reply again and give an example.

bobbysridhar · Post by **bobbysridhar** » Thu Apr 05, 2012 9:34 am

Hi,
I am passing Jobstarttimestamp to reject file.
I want to split the file based on Jobstarttimestamp.
If job run now and then again run after few minutes I want to spilit them into two files, because the job run two times.
Means I want to spilit the file for each time job runs.
I have the DSjobstarttimestamp to keep track of it for each job run.

qt_ky · Post by **qt_ky** » Thu Apr 05, 2012 9:48 am

In that case, you don't really need to take the timestamp into consideration. You could call an after-job script to rename the reject file immediately after each job run completes. It could increment the file name's extension by a number or it could name the file so the file name contains a timestamp. In fact, you can do that within the job itself with no need for an after-job script or unix script.

Try including the DSJobStartTimestamp macro withing the reject file name:

Code: Select all

/path/reject_#DSJobStartTimestamp#.txt

bobbysridhar · Post by **bobbysridhar** » Thu Apr 05, 2012 11:09 am

Hi,
Thanks for your reply.
Do we need to send DSjobstarttimestamp as parameter in the job to give it as file name to reject file.
Please explain me how the file name get the timestamp at runtime and what needs to be done to achieve this

bobbysridhar · Post by **bobbysridhar** » Thu Apr 05, 2012 11:39 am

Could somebody please provide me after job script to rename the file immediately after each job run.

chulett · Post by **chulett** » Thu Apr 05, 2012 12:13 pm

You don't need a script for that, just the 'move' command does a rename in UNIX. At its most basic:

mv <old_name> <new_name>

You'd have to be more specific as to exactly how you want it renamed to get more specific help. You could also pass in the 'timestamp' to use as a Job Parameter and then include it in the output filename of the Sequential File stage rather than renaming it after job.

What happened to splitting up the file?

bobbysridhar · Post by **bobbysridhar** » Thu Apr 05, 2012 12:47 pm

thank you guys,
I was able to resolve by passing timestamp to file name.
For everyjob run it is creating newfile with updated timestamp.
I put the file name as \pathaname\#DSJobStartTimeStamp#

thanks again

chulett · Post by **chulett** » Thu Apr 05, 2012 12:59 pm

Are you sure you're ok with that? From what I recall, the output from that macro has spaces in it, which can confuse things when used raw like that in a filename. The colons can also cause issues... but if it works for you like that, then we good.

bobbysridhar · Post by **bobbysridhar** » Thu Apr 05, 2012 1:19 pm

It will be great if we can able to remove places. Now I am not able to view data in unix environment but only able to view it from sequential file in dsjob after job run.

chulett · Post by **chulett** » Thu Apr 05, 2012 2:21 pm

You can view it in UNIX, you'd just need to enclose the filename (spaces and all) in single quotes to do so. In order to remove the spaces, you'd need something to retrieve the current timestamp before the job starts and then use a routine to remove everything but the numbers. A Sequence job could be leveraged to capture and format the timestamp and then pass it to the job as a parameter, then you'd use that parameter in the filename rather than the macro.

You could also write to a static filename and then rename it after job to include a timestamp and it's simple to build that one without any 'punctuation' in it. Sorry, but I don't have the syntax for that off the top of my head but I'm sure someone does. The only 'issue' with that approach is you must do all your file viewing / validation from UNIX as View Data from inside the job would never find the file it wrote to, seeing as how that name no longer exists.

qt_ky · Post by **qt_ky** » Thu Apr 05, 2012 7:46 pm

From UNIX script or after-job command or sequence job Execute Command stage, you can rename (move) the file using UNIX date command syntax. For example, to rename file.txt to a new file name formatted as file_YYYYMMDD.txt use the date command within tick marks (the date command is executed, and its output is substituted in place of tick marks):

Code: Select all

mv file.txt file_`date +%Y%m%d`.txt

file_20120405.txt

You can include time format options as well if you want to. Read the UNIX manual page for the date command. UNIX command line:

Code: Select all

man date