URL parsing in Datastage
Moderators: chulett, rschirm, roy
-
- Participant
- Posts: 69
- Joined: Tue Jan 18, 2005 12:15 am
URL parsing in Datastage
we are looking to load URL data from files, but should not load any detail information held in the final part of the URL (e.g. personal information, chat messages in the URL parameters).
Effectively what we need to do is load the URL string up to (and including) the final / in the url - this should be unaffected by the number of / in the url.
For example:
http://www.url1.co.uk/message_new.php?s ... Text=Hello World&cNickname=John Smith
should be loaded as http://www.url1.co.uk/
http://www.url2.com/tones/wallpreview.a ... na1Picture
should be loaded as http://www.url2.com/tones/
http://www.url3.com/images/icon/news.gif
should be loaded as http://www.url3.com/images/icon/
http://www.url4.net/mqa/m/_UK18_/-6643847943/9593.3gp
should be loaded as http://www.url4.net/mqa/m/_UK18_/-6643847943/
Is there a facility to do this in Datastage?
Many Thanks for your help
Regards
Magesh S
Effectively what we need to do is load the URL string up to (and including) the final / in the url - this should be unaffected by the number of / in the url.
For example:
http://www.url1.co.uk/message_new.php?s ... Text=Hello World&cNickname=John Smith
should be loaded as http://www.url1.co.uk/
http://www.url2.com/tones/wallpreview.a ... na1Picture
should be loaded as http://www.url2.com/tones/
http://www.url3.com/images/icon/news.gif
should be loaded as http://www.url3.com/images/icon/
http://www.url4.net/mqa/m/_UK18_/-6643847943/9593.3gp
should be loaded as http://www.url4.net/mqa/m/_UK18_/-6643847943/
Is there a facility to do this in Datastage?
Many Thanks for your help
Regards
Magesh S
This is an example of the powerful string handling functions available in Server. This solves your question:
The DCOUNT() function returns the number of fields delimited by '/' and the FIELD(string,delimiter,from[,to]) function gives you all fields except the last. It's easy and efficient.
Code: Select all
FIELD(In.URL,'/',1,DCOUNT(In.URL,'/')-1)
<a href=http://www.worldcommunitygrid.org/team/ ... TZ9H4CGVP1 target="WCGWin">
</a>
</a>
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Slightly more efficient isArndW wrote:This is an example of the powerful string handling functions available in Server. This solves your question:
The DCOUNT() function returns the number of fields delimited by '/' and the FIELD(string,delimiter,from[,to]) function gives you all fields except the last. It's easy and efficient.Code: Select all
FIELD(In.URL,'/',1,DCOUNT(In.URL,'/')-1)
Code: Select all
Field(In.URL, "/", 1, Count(In.URL, "/"))
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
-
- Participant
- Posts: 3593
- Joined: Thu Jan 23, 2003 5:25 pm
- Location: Australia, Melbourne
- Contact:
The click pack web log reader stage is very efficient and it comes with some server routines that help parse things such as URLS. Are they still charging extra money for it? Thought it would be free by now.
Certus Solutions
Blog: Tooling Around in the InfoSphere
Twitter: @vmcburney
LinkedIn:Vincent McBurney LinkedIn
Blog: Tooling Around in the InfoSphere
Twitter: @vmcburney
LinkedIn:Vincent McBurney LinkedIn
-
- Participant
- Posts: 69
- Joined: Tue Jan 18, 2005 12:15 am
Hi
I have used Field(DSLink4.col1,"/",1,Count(DSLink4.col1,"/")), iam gettting the following warning
DataStage Job 97 Phantom 9010
Program "JOB.1124520702.DT.1386333214.TRANS2": Line 43, Improper data type.
Attempting to Cleanup after ABORT raised in stage test1..CTransformerStage5
DataStage Phantom Aborting with @ABORT.CODE = 3
but the job gets completed and iam getting the result what i required. If i remove the count and use some number like 2 or 3, the job is getting completed without warning.
any updates
regards
Magesh S
I have used Field(DSLink4.col1,"/",1,Count(DSLink4.col1,"/")), iam gettting the following warning
DataStage Job 97 Phantom 9010
Program "JOB.1124520702.DT.1386333214.TRANS2": Line 43, Improper data type.
Attempting to Cleanup after ABORT raised in stage test1..CTransformerStage5
DataStage Phantom Aborting with @ABORT.CODE = 3
but the job gets completed and iam getting the result what i required. If i remove the count and use some number like 2 or 3, the job is getting completed without warning.
any updates
regards
Magesh S
Most likely you are getting a count() returned value of 0, which the Field() function doesn't like. If you reset the job you should be able to see your exact error in the "from previous run" log entry.
<a href=http://www.worldcommunitygrid.org/team/ ... TZ9H4CGVP1 target="WCGWin">
</a>
</a>
-
- Participant
- Posts: 69
- Joined: Tue Jan 18, 2005 12:15 am
Look closely at your data, as simply adding the count function inside the field function is not the issue. Did you try resetting the job so we can get more specific information about the phantom?
You can also do the count outside of the field function first to determine if there are any delimiters in the string before actually attempting the field function.
You can also do the count outside of the field function first to determine if there are any delimiters in the string before actually attempting the field function.
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers