Page 1 of 1

Is this possible in DataStage? Setting the data format

Posted: Tue Jul 18, 2006 8:01 am
by nkreddy
I have 25 custom fields in the incoming file. The file we get can have date fields in any of the 25 custom fields. It is not defined that Custom field 1 can only have date fields....The fields keep changing for every file...

I need to check if the incoming value is a date eg: (20060107) and change the format to 2006-01-07 while loading into the Oracle table.

How do I put a condition to check if the incoming value is a date? The incoming value can be just any other number.

Please advise...

p.s: If this can't be done in a Transformer, I have another file which would tell which custom field is what?

Posted: Tue Jul 18, 2006 8:22 am
by Krazykoolrohit
are you sure they are integers or EPOCH values which come in?

If its just a integer(the example you specified) then you will have to write a custome routine to seperate the date,year,month and check for their validity.

If its a EPOCH value, then use tag.to.date routine and coonvert it to date, an inproper value will be turned into null.

Posted: Tue Jul 18, 2006 8:38 am
by nkreddy
Rohit,

In my parallel job, I am reading all the custom fields with varchar 50 as the datatype...

As mentioned in my previous post, Field 1 can have a date value (20060713) or some other value like (+10) or (5788416)..I would have to determine if that is a date field. In the above example, I will have to load the table with 2006-07-13. If there is another value like +10, I just load it as is..

Re: Is this possible in DataStage? Setting the data format

Posted: Tue Jul 18, 2006 8:55 am
by Krazykoolrohit
nkreddy wrote: p.s: If this can't be done in a Transformer, I have another file which would tell which custom field is what?
Does this file says which field is date or which field is integer clearly?

If yes, then you can first seperate out the date and integers using a flag. Then use this flag to convert the dates later. Ofcourse you may drop the flag before you write it to dataset.

Re: Is this possible in DataStage? Setting the data format

Posted: Tue Jul 18, 2006 9:06 am
by nkreddy
Nope...That is the challenge...If it says, then it could have been done by using some of the IsValid functions.

There could be any value in the field. The code has to be intelligent enough to figure out it is a date and reformat it to the specified YYYY-MM-DD

Posted: Tue Jul 18, 2006 9:09 am
by ray.wurlod
It would be easy to do with a server job (or a BASIC Transfomer stage). The Iconv() function is largely agnostic to the format of a date, except for the order of the components (see Date Conversion with ICONV for more information).

Posted: Tue Jul 18, 2006 9:19 am
by Krazykoolrohit
If we convert +10 with Iconv and try to convert it to date, wont it convert it to null?

In this case he wants to retain all the invalid values as well.

Posted: Tue Jul 18, 2006 9:27 am
by ray.wurlod
You don't need to throw away the original value. Apply an Iconv() function and test with Status() to see whether it was a valid date. If it was, use Oconv to convert that converted value into the desired format; otherwise pass through the original result.

Code: Select all

FUNCTION ConvertIfDate(Arg1)
If UnAssigned(Arg1) Or IsNull(Arg1)
Then
   Ans = @NULL
End
Else
* If Arg1 is a date then generate a date in YYYY-MM-DD format
   Ans = Arg1
   Test = Iconv(Arg1, "DYMD")
   If Status() = 0 Then Ans = Oconv(Test, "D-YMD[4,2,2]")
End
RETURN(Ans)

Posted: Tue Jul 18, 2006 12:09 pm
by nkreddy
Thank you for the message...

But the thing is I am dealing with more than 40 million rows in the file and is it advisable to use BASIC transformer in a parallel job.

If that is the only recommended solution, then I will have to use it.

I have also tried with your solution of using the Transform function. It is treating the value 200604 as 2020-06-04 and at the same time 199901 is output as 199901 itself.

Please advise..

Posted: Tue Jul 18, 2006 5:49 pm
by ray.wurlod
199901 is not a valid date in YMD format. There is no month 99. Threrefore it is returning the original data. The routine was written according to your original specification, which showed dates in YMD order. Adapt it to suit. You could do other comparisons for other formats in the same routine, but may introduct ambiguities by doing so.
You may also need to adjust your century pivot date, given that you are using two digit years (also not in the original specification). If you don't want two digit years to be considered, adjust the routine to explicity test the first eight characters of Arg1 as matching "8N".