Page 1 of 1

Split 'group by' result into files

Posted: Wed Jul 23, 2003 12:01 am
by idocrm
Gurus,
I have an extract job to extract data using group by and produce a sequential file that contains all the groups. How can I split these groups into sequential files so that each file contains just one group of data? How can I do this inside Datastage? (I know there are ways to do it outside of DS.) Thank you.

Posted: Wed Jul 23, 2003 5:41 am
by Peytot
I hope I interestand your question.
Use a transformer and in the constraint, determine the conditions to split the data and in the output, create as many file as you want.
If you have PX and the version 7, you can use the stage called 'Filter' (I think).

Pey

Posted: Wed Jul 23, 2003 6:28 am
by vmcburney
You would do this from a DataStage routine where you read in the sequential file containing the full result set and use a loop to write out each groupby line into a seperate file using BASIC commands. The programming commands for handling sequential files are very easy to use within DataStage.

Have a look at the Server Job Developers Guide for the sequential file commands.

Vincent McBurney
Data Integration Services
www.intramatix.com

Posted: Wed Jul 23, 2003 7:31 am
by kcbland
You have to give us more information about how you want the data split. Do you want N number of files based on N number of groups? Do you have a single column that determines the grouping?

DataStage job designs have fixed designs, meaning if you have a single output link from a transformer you only get a single target, in your case a file. If you have a fixed number of groups, then you can have a fixed number of output files. That works for only a small number of files (20 output links is about the most manageable in the designer).

So, if you have an indeterminate number of output groups that fluctuates according to the data in the current run, your only solution is a programmatic one. I have an after job routine that splits my data along month lines so that I can produce a single pure insert file for each of my target table's partitions (by month). This routine separates and creates the appropriate files as needed. I have a generic sqlldr script that is called from a controlling script that looks at the partition load files in my load directory and launches the appropriate sqlldr sessions for each partition. The enabler to this approach is the splitting routine.

You're going to have to do something like this is my suspicion from my interpretation of your questions motivation.

Good luck!

Kenneth Bland

Posted: Wed Jul 23, 2003 8:13 am
by idocrm
Thank you for all of your replies. Yes, the N number of files will be based on N number of groups, so it is dynamic. I do have a single column that determines the grouping. So my incoming file will be something like this:
a|1
a|2
a|3
b|1
b|2
c|1
c|2
and my result files should be a, b and c. Thank you all again.

Posted: Wed Jul 23, 2003 10:54 am
by kduke
idocrm

Sometimes Perl is a good solution for something like this. You can either create a lot of output links or write a very short Perl or shell script.

Kim.

Kim Duke
DwNav - ETL Navigator
www.Duke-Consulting.com

Posted: Wed Jul 23, 2003 5:04 pm
by ray.wurlod
If there is a known and finite number of distinct values of the grouping column, a really efficient mechanism would be a Transformer stage that splits the input stream into multiple outputs using constraint expressions based on the grouping column. For example:
GroupingColumn = "a"
GroupingColumn = "b"
GroupingColumn = "c"

Find elsewhere on this site how to prevent creating sequential files where no rows are processed (omit last new line).

Ray Wurlod
Education and Consulting Services
ABN 57 092 448 518

Posted: Fri Nov 13, 2009 6:25 pm
by Sreedhar
Hi ,
Not to hijack the thread, I searched for post on how to prevent creating a flat file when there are no records processed. Can you direct me to one or tell me the search terms, I used lot of combinations but of no use.

I am appending to a file (using ftp stage in server job) and when there are no records processed , still the file gets a new blank line appended which I need to prevent. I want the new blank line to be appeneded only when there is data.

I have selected the option to No to omit nwe line...parameter, I tried to use the INROWNUM>1 in the transformer constraint but looks like if the job runs the file si touched and a new line is appended!

Posted: Fri Nov 13, 2009 7:40 pm
by chulett
Sorry, this is a hijack plain and simple and you should have just started a new post for your issue. And FYI, there's no way to prevent creating a flat file when no records are processed. None.

Posted: Tue Nov 17, 2009 3:07 pm
by DSguru2B
I am not sure if you are sitting in a unix env. but if you are then this will help.
viewtopic.php?t=101739&highlight=shell+script

Posted: Tue Nov 17, 2009 3:38 pm
by Kryt0n
chulett wrote:Sorry, this is a hijack plain and simple and you should have just started a new post for your issue. And FYI, there's no way to prevent creating a flat file when no records are processed. None. ...
A hijack Dick Turpin would be proud of! A completely unrelated post from 2003... impressive!

Posted: Tue Nov 17, 2009 4:01 pm
by chulett
Wow... a reference back to the 1700s, fitting somehow. And I love how it starts out - I don't mean to hijack this, but... [whips out pistol] fly this plane to Cuba! Oh, and could you answer this DataStage question while we're in the air? :wink:

Posted: Tue Nov 17, 2009 4:57 pm
by Kryt0n
Well, Dick Turpin was just so misunderstood, what he meant was:
"Stand and deliver... your answer to this DataStage question"

He only used the gun because he needed the answer urgently... and those support costs... just sheer highway... oh... nevermind!

Posted: Tue Nov 17, 2009 6:15 pm
by ray.wurlod
In "Good Omens", by Terry Pratchett and Neil Gaiman, one of the characters calls his car Dick Turpin, because it's always holding up traffic.

Are we far enough off topic yet?

Please don't hijack threads unless yours is precisely the same issue (which it rarely is).