Thank you for your code!!......This sounds what I need.....!! :D
Sorry for taking a long time to contact you but I was directed into another project and just now I am able to work on this one. I have a couple questions for your, pardon for my ignorance but I am new with Ascential and my questions might be very, very, basic.
1) The Unix command that needs to be translated to dos - Can you describe what that command(s) do or intends to do?
2) The temp hashfiles that are created, if I have 10 columns in my file to delete duplicate values, does it create 10 hashfiles? and are the hashfiles deleted after the program is finished?
Thanks,
Emilio
rleishman wrote:Emilio,
I can't believe the most interesting topic of the week was posted when I was asleep. Well now you Yanks are going home to bed and it's Australia's turn...
Here's my understanding of your requirement: given an input file with n columns, you want to produce an output file where no single value of any of the n columns is repeated.
What is not clear to me is whether the number of output rows is important. For example, would you accept the following as an alternative output for your example above?
If so, then I think I have a solution. I have attached a routine below that I used for a different problem; I just wanted a generic method of removing duplicates. Here's what you do:Code: Select all
NONE, 100, , 200, ,,AUCTION
- Copy the code below and create a DS Routine called IsUnique with 4 parameters: TEMPDIR, HashFile, Initialize, and Val
- In your DS Job, create a stage variable for each output column with expressions of the form:- Create another stage variable called Initialize BELOW the other stage variables. Set the default value to 1, and the expression to 0. This means that it will be 1 for the first row processed (causing the IsUnique hash file mechanism to be initialized) and then 0 for each subsequent row.Code: Select all
If IsUnique("X:\xxx\yyy\temp", DSJOB : "<col_name>", Initialize, In_link.col) Then In_link.col Else @NULL
- Map each of the stage variables to their respective columns.
How does it work? For each column, a different hash file is created in a temp directory of your choosing. You pass a value into the routine (and tell it which hash file to look in) and it tells you whether it has seen the value before (success) or not (failure). If the routine returns success, then you send the column value to the output link, otherwise send @NULL.
A couple of notes:
- I wrote this for Unix only. There is a Unix command I have highlighted with stars that you will have to translate into DOS. It is basically this:although in Unix I separated the commands in such a way that if one failed the next would not run.Code: Select all
CD <TEMPDIR> DEL <TEMPDIR>\<HashFile> %DSHOME%\bin\mkdbfile <HashFile> 30 1 4 20 50 80 1628
- I'm a UV BASIC beginner. Hopefully some bigger brains than mine will critique this routine (nicely of course )so that I can improve it - listen to them.
- I originally designed it without the ability handle NULLs in the Val param. I made a change in this version that should work but didn't test it.
- I had to remove some references to other custom routines in our library. It should still compile, but...
Good luck.
[/code]Code: Select all
Common /CDMUniq/ Initialized, UniqFileName, UniqFileHandle EQUATE RoutineName TO 'IsUnique' Ans = 0 HashPath = TEMPDIR : "" : HashFile * If the Initialize parameter is set, remove and rebuild the hash file If Initialize Then Initialized = 0 * The Unix command does 3 things (and fails immediately if any of them fail): * : Go to the TEMP Directory * : Remove the existing hash file if it exists * : Create a new hash file with mkdbfile * I had to use this method, because it would fail with "unable to create ..." error if I created it with * full path name from another directory. Dont know why. **************************** StrCommand = "cd " : TEMPDIR : " && rm -rf " : HashFile : " && $DSHOME/bin/mkdbfile " : HashFile : " 30 1 4 20 50 80 1628" **************************** Call DSExecute("UNIX", StrCommand , OutPut, RetCode) if RetCode <> 0 then Call DSLogWarn("Unix error (" : RetCode : ") in ": StrCommand : " ::::: " : OutPut, "IsUnique") end * Attempt to open the file. Openpath HashPath TO UniqFileHandle Then Initialized = 1 UniqFileName = HashPath End Else Call DSLogWarn("Unable to open hash file " : HashPath, "IsUnique") Ans = -1 End End If Initialized Then * If the currently open file is a different file, then open the correct file If UniqFileName <> HashPath Then Openpath HashPath TO UniqFileHandle Then Initialized = 1 UniqFileName = HashPath End Else Call DSLogWarn("Unable to open hash file " : HashPath, "IsUnique") Ans = -1 End End If Ans = 0 And Not(IsNull(Val)) Then * Check if Val already exists Readu ValExists From UniqFileHandle, Val Then * Return FALSE if exists Ans = 0 End Else * Return TRUE and add to file if not exists Writeu Val On UniqFileHandle, Val Then Ans = 1 End Else Call DSLogWarn("Unable to write hash file " : HashFile, "IsUnique") Ans = -1 End End End End