Look for available number - routine question

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

kudaka
Premium Member
Premium Member
Posts: 37
Joined: Thu Apr 22, 2004 2:14 pm

Look for available number - routine question

Post by kudaka »

This may be crazy. But, I am trying to solve the following problem.
This is what I need to do:
1. I will call a routine from DataStage passing a number (int1)
2. The routine has to open a seq/hash file consisting of numbers.
3. check if "int1" is in this file.
4. If not found, return "int1"
5. If found, add 1 to "int1" search the file again
6. Repeat step 5 until not found then return "int1"

The seq/hash file is very big (14mil) consisting of one column of numbers.
Don't ask me why do I want to do this. It is customer's requirement.

What is the approach to do this?

Thanks.
Kud
kduke
Charter Member
Charter Member
Posts: 5227
Joined: Thu May 29, 2003 9:47 am
Location: Dallas, TX
Contact:

Post by kduke »

Put it in a hash file. It would run lots faster than looping through a sequential file over and over.
Mamu Kim
vmcburney
Participant
Posts: 3593
Joined: Thu Jan 23, 2003 5:25 pm
Location: Australia, Melbourne
Contact:

Post by vmcburney »

A hash file could be difficult to create as it sounds like you have one non-unique column of numbers. Writing to a hash file with that column as a key removes duplicates, writing it without a key takes away the lookup speed of hashing.

How are you using this routine? Are you calling it from DataStage jobs?

If you wanted to turn this into a derivation routine to be called from a transformer I would pre-build a hash file with colunn 1 - the lookup number (a key field), column 2 - count (the number of instances of that number). The hash file would be created by sending the file through an aggregation stage into a hash file. You could then do a lookup for the number and get back either a NULL (number does not exist) or the number of times it exists.

If it is a standalone routine, for example run from a sequence job, I would consider a Unix awk script that did the search and count or a DataStage server job that searched for the number and kept a count in a transformer and outputted those rows that matched that number. I've found processing a large text file via a server job is faster then the routine file processing commands as it has better row buffering and memory management.
kudaka
Premium Member
Premium Member
Posts: 37
Joined: Thu Apr 22, 2004 2:14 pm

Post by kudaka »

The file consists of one column of unique numbers.

The routine has to return a number that is not in the file.
vmcburney
Participant
Posts: 3593
Joined: Thu Jan 23, 2003 5:25 pm
Location: Australia, Melbourne
Contact:

Post by vmcburney »

Then a hash file would be perfect and step 5 does not make any sense.
5. If found, add 1 to "int1" search the file again
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

It sounds like what you are doing is "filling in the gaps" in a surrogate key assignment sequence. Methinks you have a table who's primary key has reached it's maximum assignment value, so you have to find the unused numbers.

If I'm right, you have quite a good little problem here.

I'd turn the problem around. How about you build a sorted list of AVAILABLE numbers. Therefore, all you do is keep fetching the next available number off a stack. This is very easy to do.

Now, if your table has the keys assigned, you need to pick out the unused numbers. There's a lot of ways to do this with SQL. For your next available assignment function, all you need to do is write a DS BASIC function to read the text file of available numbers into a COMMON memory variable array and a indexing variable on initialization. Then, simply keep returning the next available number incrementing the indexing variable with each call.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

I'm with Ken. Alternatively, re-educate your customer.

Surrogate keys (assuming that's what you're doing) are SOLELY about uniqueness; it doesn't matter if there are gaps.

At the very least have them justify their requirement. Be prepared to tell them that the requirement is nonsensical. Or stupid, if you're passionate about it. Being prepared to do this has won as many deals as it has lost.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
kudaka
Premium Member
Premium Member
Posts: 37
Joined: Thu Apr 22, 2004 2:14 pm

Post by kudaka »

kcbland, You are correct. I am trying to find the unused numbers.
The system is AS400, we are using Oracle plug-in to bridge to it and read AS400 files.
What is the method to prepare a list of unused numbers?
As ray.wurlod said this does not make any sense to handle this problem in datastage, but customer does not want to do any enhancements to AS400.(Also they do not have people to support it). Soon this system will go away.
Please help me building a list of unused numbers.
Thanks.
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

You can do this many ways. One method may be to select all of the primary keys in use in the table and write it to a hash file. Then, use a pre-generated file containing one row of all numbers from 1 to the max in your table. Use a DataStage job to read that file of numbers, referencing the hash file, and writing output to another file of all numbers not in the hash file. This output file would be your file of unused number from which your function should assign.

To optimize this process, you will have to periodicaly recreate your sequential series file from some number other than 1 so that you don't constantly rescan from 1 when you've already filled in gaps up to 1000000. You're going to have to think about this.

Another method would be to create a table of that sequential series file of numbers from 1 to the maximum allowed on the table. Now do an outer join to this table of your target, and constrain the query to return all rows not in the target table and spool this to your surrogate key unused number file.

I'm sure others could contribute some thoughts as well.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
kudaka
Premium Member
Premium Member
Posts: 37
Joined: Thu Apr 22, 2004 2:14 pm

Post by kudaka »

Creating pre-generated file containing numbers in a column from 1 to the max number has become a challenge for me.
Any ideas? Can we make a simple sql stmt that return 1 thru max number?
Or any shell script?
Please share any ideas.
Thanks.
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

Sure, just find a big table somewhere and select the rownumber from it. In Oracle, I'd do:

Code: Select all

select rownum from bigtable where rownum <= 10000000
This will return a list of numbers from 1 to 10000000.

You could actually get the list of unused numbers in a SQL statement like this:

Code: Select all

select rownum from bigtable where rownum not in (select key from target)
Any big table will do, as long as it has more rows in it than what's allowed in your target table.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
kudaka
Premium Member
Premium Member
Posts: 37
Joined: Thu Apr 22, 2004 2:14 pm

Post by kudaka »

I came up with this script. Is it good?
number=1
while [ $number -le 100 ]
do
echo $number >> a.txt
number=`expr $number + 1`
done

Thanks
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

Sure but it's slow.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
kudaka
Premium Member
Premium Member
Posts: 37
Joined: Thu Apr 22, 2004 2:14 pm

Post by kudaka »

Yes it is vey slow. On the other hand we do not have any big table.

If I am success in building unused number file, how to fetch each number to assign it to my record? How to build stack mechanism in to data stage?
Looks like I am lost.
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

Okay, you don't have a big table. How about a medium table? This creates a virtual big table:

Code: Select all

select rownum from
   (select rownum from mediumtable
    union all
    select rownum from mediumtable
    union all
    select rownum from mediumtable
    union all
    select rownum from mediumtable
   )
where rownum not in (select key from target)
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
Post Reply