Page 1 of 1

URL parsing in Datastage

Posted: Tue Dec 13, 2005 11:28 am
by maheshsada
we are looking to load URL data from files, but should not load any detail information held in the final part of the URL (e.g. personal information, chat messages in the URL parameters).
Effectively what we need to do is load the URL string up to (and including) the final / in the url - this should be unaffected by the number of / in the url.
For example:
http://www.url1.co.uk/message_new.php?s ... Text=Hello World&cNickname=John Smith

should be loaded as http://www.url1.co.uk/

http://www.url2.com/tones/wallpreview.a ... na1Picture

should be loaded as http://www.url2.com/tones/

http://www.url3.com/images/icon/news.gif

should be loaded as http://www.url3.com/images/icon/

http://www.url4.net/mqa/m/_UK18_/-6643847943/9593.3gp

should be loaded as http://www.url4.net/mqa/m/_UK18_/-6643847943/

Is there a facility to do this in Datastage?

Many Thanks for your help

Regards
Magesh S

Posted: Tue Dec 13, 2005 11:40 am
by ArndW
This is an example of the powerful string handling functions available in Server. This solves your question:

Code: Select all

FIELD(In.URL,'/',1,DCOUNT(In.URL,'/')-1)
The DCOUNT() function returns the number of fields delimited by '/' and the FIELD(string,delimiter,from[,to]) function gives you all fields except the last. It's easy and efficient.

Posted: Tue Dec 13, 2005 12:07 pm
by chulett
If you want to throw some money at the problem, there is always the dreaded Click Pack option from Ascential. It adds specific URL and weblog stages, from what I recall, plus the ability to use Perl in Server jobs.

Posted: Tue Dec 13, 2005 3:42 pm
by ray.wurlod
ArndW wrote:This is an example of the powerful string handling functions available in Server. This solves your question:

Code: Select all

FIELD(In.URL,'/',1,DCOUNT(In.URL,'/')-1)
The DCOUNT() function returns the number of fields delimited by '/' and the FIELD(string,delimiter,from[,to]) function gives you all fields except the last. It's easy and efficient.
Slightly more efficient is

Code: Select all

Field(In.URL, "/", 1, Count(In.URL, "/"))

Posted: Tue Dec 13, 2005 7:42 pm
by vmcburney
The click pack web log reader stage is very efficient and it comes with some server routines that help parse things such as URLS. Are they still charging extra money for it? Thought it would be free by now.

Posted: Tue Dec 13, 2005 9:01 pm
by chulett
Not positive, but I think it's still a chargable option. Also heard it is no longer being actively supported and will be phased out in the next release. :?

Posted: Wed Dec 14, 2005 3:52 am
by maheshsada
Hi
I have used Field(DSLink4.col1,"/",1,Count(DSLink4.col1,"/")), iam gettting the following warning

DataStage Job 97 Phantom 9010
Program "JOB.1124520702.DT.1386333214.TRANS2": Line 43, Improper data type.
Attempting to Cleanup after ABORT raised in stage test1..CTransformerStage5
DataStage Phantom Aborting with @ABORT.CODE = 3

but the job gets completed and iam getting the result what i required. If i remove the count and use some number like 2 or 3, the job is getting completed without warning.

any updates

regards
Magesh S

Posted: Wed Dec 14, 2005 1:32 pm
by ArndW
Most likely you are getting a count() returned value of 0, which the Field() function doesn't like. If you reset the job you should be able to see your exact error in the "from previous run" log entry.

Posted: Fri Dec 16, 2005 6:37 am
by maheshsada
Hi

I have checked individually the count function its giving valid values (i.e. count of the "/"). When i put the count in the field function its giving the phantom error.

any updates

regards
Magesh S

Posted: Fri Dec 16, 2005 8:35 am
by chulett
Look closely at your data, as simply adding the count function inside the field function is not the issue. Did you try resetting the job so we can get more specific information about the phantom?

You can also do the count outside of the field function first to determine if there are any delimiters in the string before actually attempting the field function.