Im confused

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

JDionne
Participant
Posts: 342
Joined: Wed Aug 27, 2003 1:06 pm

Post by JDionne »

So basicaly this method will not work....I have not seen any way to put an array into a stage variable...but i could be wrong.
But im just confirming that this will not give me the answer that I need..correct?
Jim

quote:Originally posted by kcbland
[br]The usage of COUNT is wrong. It will give substring matches, where as you need whole word matches. In addition, you are counting the occurrences of values within a string that looks like this:

"'VAL1','VAL2','VAL3'" etc.

You need each code value in a list, and a command that does not hit on substrings, such as LOCATE combined with an array in the form of:

VAL1
VAL2
VAL3
VAL4





Kenneth Bland
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

The beauty of a DS user function is that you can build test cases that stay with the function. You can do very thorough testing to make sure it does what you want by using the FANTASTIC capability built into Manager's function test button.

If you build logic such as this into stage variables, you have to debug and develop by running your job. You can use your own imagination to decide if running a job and comparing output, setting up test cases in source data, etc, is easier than doing test cases in a function.

Kenneth Bland
JDionne
Participant
Posts: 342
Joined: Wed Aug 27, 2003 1:06 pm

Post by JDionne »

you make a good point. so for arguments sake lets say that i have a routine that does what I want (wich is almost true because a user of this forum was kind enough to give me a sample with my own code) do i put that call into the dirivation of a stage procedure? how do i use the routine once its created?
Jim

quote:Originally posted by kcbland
[br]The beauty of a DS user function is that you can build test cases that stay with the function. You can do very thorough testing to make sure it does what you want by using the FANTASTIC capability built into Manager's function test button.

If you build logic such as this into stage variables, you have to debug and develop by running your job. You can use your own imagination to decide if running a job and comparing output, setting up test cases in source data, etc, is easier than doing test cases in a function.

Kenneth Bland
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

Create your function in DS Manager. Make sure you compile it. In your DataStage job, go to the derivation field for your desired column. Double-click to edit the field, then do a right click. Under DS Routine, you will see a list of DS user defined functions, simply walk the tree and pick your function. It will automatically drop it into the derivation. Now simply put the appropriate link.columns into the arguments.

Kenneth Bland
JDionne
Participant
Posts: 342
Joined: Wed Aug 27, 2003 1:06 pm

Post by JDionne »

I think that i am finaly getting a handle on all of this...Now just to learn DS Basic...I am reading the manual now...Fun reading I tell you. Thanx for all the help
Jim

quote:Originally posted by kcbland
[br]Create your function in DS Manager. Make sure you compile it. In your DataStage job, go to the derivation field for your desired column. Double-click to edit the field, then do a right click. Under DS Routine, you will see a list of DS user defined functions, simply walk the tree and pick your function. It will automatically drop it into the derivation. Now simply put the appropriate link.columns into the arguments.

Kenneth Bland
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

While I have a spare minute, I'll elaborate why your problem should be dealt with via a table. My guess from looking at your SQL is that you are attempting to categorize items by certain conditions that exist today. It looks geographic, so that I'll refer to it as a sales terrority example. So, depending on today's categorization, a fact falls into a particular sales territory. However, tomorrow, you may re-align your categories and the particular fact from yesterday now falls under a new sales territory. This is classic restating of history and solved via atomic-level tables or slowly changing dimensions.

Somehow, you should be able to create a table that maps the current alignment with a beginning and ending effective date range. Therefore, a realignment will keep the history, and also, you can use surrogate keys instead of embedding textual information. You gain the ability to tie facts to the version of categorization effective at the time of the fact, plus the ability to show facts in a restated form without updating the physical rows. I should think you can put together a matrix that can be referenced to determine the exact categorization.

Good luck!
Kenneth Bland
JDionne
Participant
Posts: 342
Joined: Wed Aug 27, 2003 1:06 pm

Post by JDionne »

WOW!! there is something to think about...Ill get back to you on this
:)
Jim

quote:Originally posted by kcbland
[br]While I have a spare minute, I'll elaborate why your problem should be dealt with via a table. My guess from looking at your SQL is that you are attempting to categorize items by certain conditions that exist today. It looks geographic, so that I'll refer to it as a sales terrority example. So, depending on today's categorization, a fact falls into a particular sales territory. However, tomorrow, you may re-align your categories and the particular fact from yesterday now falls under a new sales territory. This is classic restating of history and solved via atomic-level tables or slowly changing dimensions.

Somehow, you should be able to create a table that maps the current alignment with a beginning and ending effective date range. Therefore, a realignment will keep the history, and also, you can use surrogate keys instead of embedding textual information. You gain the ability to tie facts to the version of categorization effective at the time of the fact, plus the ability to show facts in a restated form without updating the physical rows. I should think you can put together a matrix that can be referenced to determine the exact categorization.

Good luck!
Kenneth Bland
JDionne
Participant
Posts: 342
Joined: Wed Aug 27, 2003 1:06 pm

Post by JDionne »

Next time you have a spare min. could you please answer a few questions about your Table suggestion...seeing how it makes alot of buisness sence.
Our logic requires a different number of fields to be checked based on the Trade..IE Pac needs Ultcode, Ultname, USCode. But another trade may only need Ultname. How would the table look in that instance. Would I just leave the fields blank? Also could you help me to understand how this is applied. Id put in the key and use the table as a lookup? I think thats what you were getting at. Im also guessing that i would you the time frame in the where clause so that I get the correct name.
Thank you for your help here, it does mean alot
Jim

quote:Originally posted by kcbland
[br]While I have a spare minute, I'll elaborate why your problem should be dealt with via a table. My guess from looking at your SQL is that you are attempting to categorize items by certain conditions that exist today. It looks geographic, so that I'll refer to it as a sales terrority example. So, depending on today's categorization, a fact falls into a particular sales territory. However, tomorrow, you may re-align your categories and the particular fact from yesterday now falls under a new sales territory. This is classic restating of history and solved via atomic-level tables or slowly changing dimensions.

Somehow, you should be able to create a table that maps the current alignment with a beginning and ending effective date range. Therefore, a realignment will keep the history, and also, you can use surrogate keys instead of embedding textual information. You gain the ability to tie facts to the version of categorization effective at the time of the fact, plus the ability to show facts in a restated form without updating the physical rows. I should think you can put together a matrix that can be referenced to determine the exact categorization.

Good luck!
Kenneth Bland
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

By looking at your previously posted SQL, you have a minimum of (5) distinct category tables that are arranged in a hierarchical manner:

1. COUNTRY
2. ULTCODE
3. USCODE
4. ULTPORT
5. SLINE


I may not have the order correct, but you will get the idea. What you have to do is begin traversing the lowest level of the hierarchy upwards. When you get a hit, then you use that row. You have some very specific cases:

When Sline = 'ECUA'
and [Name] = 'PACIFIC FRUIT'
Then 'PACIFIC FRUIT'

I don't know if your order of case statements in the SQL are putting priority over the specific cases, so I'll pretend that it works out to be exclusive. I would guess your business rules state that an explicitly named item will go into a category no matter the country.

So, I suggest you create a table with a natural key of SLINE and NAME. The first attribute is the category. In the above example you'd have a row "ECUA" + "PACIFIC FRUIT" and the category would be "PACIFIC FRUIT". I would put a beginning and ending effective date and assign a unique surrogate key. I would do this for every situation where you name an explicit SLINE and NAME.

Next, I'd create a table with a natural key of ULTPORT. The first attribute is the category. I would put a beginning and ending effective date and assign a unique surrogate key. I would do this for every situation where you name an explicit ULTPORT.

I'm hoping you get the idea. What you would do to resolve a category by first trying an SLINE lookup using a natural key join with a business date falling between the beginning and ending effective date. If you don't get a row, then you move to an ULTPORT lookup. If you don't get a row on the ULTPORT lookup, you move to an ULTCODE lookup. And so on, so that the very last thing you can do is a COUNTRY lookup.

This traversal technique is a common approach. Coupled with beginning and ending effective dates, you can have different categorizations over time. You will also be able to do historical reporting without restating history. You will also be able to exactly state the categorization changes, as well as manage this in a more scalable approach. You will also be able to generate reports, in case there is questions regarding the categorizations. The categorizations move into the business analysts hands for maintenance, instead of developers.

Good luck!




Kenneth Bland
JDionne
Participant
Posts: 342
Joined: Wed Aug 27, 2003 1:06 pm

Post by JDionne »

Thant makes all the sence in the world to me. I do understand that, but what happens when I have to mix the criteria, ie in these ultports, and this sline, and this country all make up the trade? I do have instances when I have to address that.
Jim


quote:Originally posted by kcbland
[br]By looking at your previously posted SQL, you have a minimum of (5) distinct category tables that are arranged in a hierarchical manner:

1. COUNTRY
2. ULTCODE
3. USCODE
4. ULTPORT
5. SLINE


I may not have the order correct, but you will get the idea. What you have to do is begin traversing the lowest level of the hierarchy upwards. When you get a hit, then you use that row. You have some very specific cases:

When Sline = 'ECUA'
and [Name] = 'PACIFIC FRUIT'
Then 'PACIFIC FRUIT'

I don't know if your order of case statements in the SQL are putting priority over the specific cases, so I'll pretend that it works out to be exclusive. I would guess your business rules state that an explicitly named item will go into a category no matter the country.

So, I suggest you create a table with a natural key of SLINE and NAME. The first attribute is the category. In the above example you'd have a row "ECUA" + "PACIFIC FRUIT" and the category would be "PACIFIC FRUIT". I would put a beginning and ending effective date and assign a unique surrogate key. I would do this for every situation where you name an explicit SLINE and NAME.

Next, I'd create a table with a natural key of ULTPORT. The first attribute is the category. I would put a beginning and ending effective date and assign a unique surrogate key. I would do this for every situation where you name an explicit ULTPORT.

I'm hoping you get the idea. What you would do to resolve a category by first trying an SLINE lookup using a natural key join with a business date falling between the beginning and ending effective date. If you don't get a row, then you move to an ULTPORT lookup. If you don't get a row on the ULTPORT lookup, you move to an ULTCODE lookup. And so on, so that the very last thing you can do is a COUNTRY lookup.

This traversal technique is a common approach. Coupled with beginning and ending effective dates, you can have different categorizations over time. You will also be able to do historical reporting without restating history. You will also be able to exactly state the categorization changes, as well as manage this in a more scalable approach. You will also be able to generate reports, in case there is questions regarding the categorizations. The categorizations move into the business analysts hands for maintenance, instead of developers.

Good luck!




Kenneth Bland
cwong
Charter Member
Charter Member
Posts: 26
Joined: Tue Apr 30, 2002 8:02 am
Location: Canada

Post by cwong »

Jim,

Let's see if this would help you a bit further ...
In the example, the value setting to the "matched" strings (CntryGroup1/ULTGroup1 ...) can be retrieved from a table or like kind ... rather than updating here in this routine on continuous basis; since we know the sale Org/Terriority got re-drawn very frequently. However if no other better approach is to be avail, this still works.

I "converted" one of the case from your big SQL statement as below:
Note that I used the "+" to enclose the value string to ensure an exact match; this is similar to Ken's trick.
Index ... > 0 ==> the searchFor string exists
Index ... = 0 ==> the searchFor string does NOT exist

====================================================================
COUNTRY = "+":Arg1:"+"
ULTPORT = "+":Trim(Arg2):"+"

CntryGroup1= "+588+570+580+582+583+560+559+549+565+557+552+538+555+579+566+546+574+561+684+553+"
ULTGroup1= "+MARSHAL ISL+KWAJALEIN+"
CntryGroup2= "+462+"
ULTGroup2= "+ABAKAN+KHABAROVSK+KORSAKOV+MAGADAN+PETROPAVLOVSK+VLADIVOSTOK+VOSTOCHNY+YUZHNO SAKHAL+YUZHNO SAKHALINSK+"

Begin Case
* Case When -- (PACIFIC TRADE GENERAL):

Case Index(CntryGroup1,COUNTRY,1) > 0 AND Index(ULTGroup1,ULTPORT,1) = 0
Sub_Trade = 'PACIFIC TOTAL'

Case Index(CntryGroup2,COUNTRY,1) > 0 AND Index(ULTGroup2,ULTPORT,1) > 0
Sub_Trade = 'PACIFIC TOTAL'

End Case

* for testing purpose, you may wish to confirm the input and output;
* and this is quite often we will do so in a DS Routine development panel.
*Ans="Country=":COUNTRY:"Ultport=":ULTPORT:"SubTrade=":Sub_Trade

Ans=Sub_Trade

================================================================================
==> Your first case out from your SQL
Set Sub_Trade =

(COUNTRY IN ('588','570','580','582','583','560','559','549','565','557','552',
'538','555','579','566','546','574','561','684','553')
AND ULTPORT NOT IN ('MARSHAL ISL','KWAJALEIN'))
OR
-- (PACIFIC RUSSIA):
(COUNTRY = 462 AND ULTPORT IN ('ABAKAN','KHABAROVSK','KORSAKOV','MAGADAN',
'PETROPAVLOVSK','VLADIVOSTOK','VOSTOCHNY',
'YUZHNO SAKHAL','YUZHNO SAKHALINSK'))
Then 'PACIFIC TOTAL'


cwong
Optico
Participant
Posts: 23
Joined: Thu Sep 18, 2003 9:32 am

Post by Optico »

Hi everybody.

It seems that there are many ways to solve this problem.

We at Optico has solved it by writing a short routine that emulates the SQL 'IN' functionality.

E.G. in a constraint we have the following to check whether a specific user is in the list of users to run for : In(TrimB(Input.cluser), "CHAR", User) = 0

The DS function In looks as follows :

*********************************************************************
*
* This routine emulates 'IN' from SQL, so the same functionality can
* be used directly in DataStage
* Input is given as three arguments :
* The 1st argument is the field to be tested.
* The 2nd argument is the datatype (right now only CHAR and NUM)
* The 3rd argument is the list of testvalues separated by commas.
* If the datatype is CHAR the individual values are
* surrounded by quotes as in the real SQL IN paragraph.
*
* The function resolves the 3rd argument in single values to test for
* A test is done on each individual value in the list, and if a match
* is found, zero is returned to the caller. Otherwise 99 is returned
* If the datatype passed as argument2 is not implemented, -1 is
* returned.
*
*
* NB ! Right now all character testing is done solely on UPPERCASE as
* this is what we needed. The function can be changed to be case
* sensitive by changing the line :
* If DatTyp = "CHAR" Then WorkField = UpCase(InField) etc.
* to : WorkField = InField
*
*********************************************************************

Ans = 99

DatTyp = UpCase(DataType)

If DatTyp "NUM" And DatTyp "CHAR" Then Return(-1)

If DatTyp = "CHAR" Then WorkField = UpCase(InField) Else WorkField = InField


Antal = Dcount(TestValues, ",")


For x = 1 to Antal

If DatTyp = "CHAR"
Then
Value = UpCase(Field(Field(TestValues, ",", x), "'", 2))
End
Else
Value = Field(TestValues, ",", x)
End

If WorkField = Value Then Return(0)

Next x


Return(Ans)

B. Sorensen,
Optico IT ApS
roy
Participant
Posts: 2598
Joined: Wed Jul 30, 2003 2:05 am
Location: Israel

Post by roy »

Well right Kenneth,
But, there is always a but, nothing prevents us from formating the tested value of link.val to the full length of the set of strings we check lets say in numeric FMT(link.val,"3'0'R") Or strings with spaces instead of zero.
so in fact we will check it correctly[:)]

quote:Originally posted by kcbland
[br]The usage of COUNT is wrong. It will give substring matches, where as you need whole word matches. In addition, you are counting the occurrences of values within a string that looks like this:

"'VAL1','VAL2','VAL3'" etc.

You need each code value in a list, and a command that does not hit on substrings, such as LOCATE combined with an array in the form of:

VAL1
VAL2
VAL3
VAL4





Kenneth Bland


IHTH (I Hope This Helps),

Roy R.
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

Roy, in one of my previous posts I stated that your choices for this type of string locating should be:

INDEX
LOCATE

Usage of COUNT means that the entire array of values have to be scanned, even if the value required is found in the first cell. This is why INDEX is optimal, as it can stop on the first find. The problem is substring matching, which both INDEX and COUNT will errantly find. If you look up a couple posts, you see that Cristal shows a technique of putting "+" around the values in the array. Your recommendation to pad means that every cell in the array has to be padded to the same width, and that the find value has to have the same padding. Your statement is true, but you're adding bytes to an already inefficient function choice.

I'm still puzzling over the SQL because I believe this logic should be in a table and handled via a hierarchical walk. There's probably business rules that vastly simplify what needs to be done. In my experience, these types of solutions do not belong in embedded logic, but in tables.

Kenneth Bland
JDionne
Participant
Posts: 342
Joined: Wed Aug 27, 2003 1:06 pm

Post by JDionne »

I do also beleve that it should be in a table, but I did post a few questions regarding that as follows.

also i have been pluggin through this code and have a question.
this is a bit of code that i have writen

quote:
*-- (ATLANTIC FRANCE):
If COUNTRY = '427' Then FOUND = @TRUE Else FOUND = @FALSE
If COUNTRY = '427' AND (ULTCODE '42776':@AM:'46905':@AM:'46961':@AM:'48945':@AM:'46931':@AM:'47527':@AM:'50220':@AM:'46994':@AM:'27900':@AM:'47500':@AM:'47531':@AM:'47300':@AM:'47507':@AM:'47537':@AM:'47125':@AM:'47500':@AM:'46935':@AM:'48939':@AM:'50801':@AM:'48452':@AM:'42782') Then FOUND1 = @TRUE Else FOUND1 = @FALSE
If FOUND or FOUND2 Then
Ans = 'TRANS-ATLANTIC TOTAL'
Goto Fini
End


In the line
quote:
If COUNTRY = '427' AND (ULTCODE '42776':@AM:'46905':@AM:'46961':@AM:'48945':@AM:'46931':@AM:'47527':@AM:'50220':@AM:'46994':@AM:'27900':@AM:'47500':@AM:'47531':@AM:'47300':@AM:'47507':@AM:'47537':@AM:'47125':@AM:'47500':@AM:'46935':@AM:'48939':@AM:'50801':@AM:'48452':@AM:'42782')

should i have a locate infrount of ULTCODE?
Jim

quote:Originally posted by JDionne
[br]Thant makes all the sence in the world to me. I do understand that, but what happens when I have to mix the criteria, ie in these ultports, and this sline, and this country all make up the trade? I do have instances when I have to address that.
Jim
Post Reply