Split 'group by' result into files

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
idocrm
Participant
Posts: 17
Joined: Wed Jul 23, 2003 9:41 pm

Split 'group by' result into files

Post by idocrm »

Gurus,
I have an extract job to extract data using group by and produce a sequential file that contains all the groups. How can I split these groups into sequential files so that each file contains just one group of data? How can I do this inside Datastage? (I know there are ways to do it outside of DS.) Thank you.
Peytot
Participant
Posts: 145
Joined: Wed Jun 04, 2003 7:56 am
Location: France

Post by Peytot »

I hope I interestand your question.
Use a transformer and in the constraint, determine the conditions to split the data and in the output, create as many file as you want.
If you have PX and the version 7, you can use the stage called 'Filter' (I think).

Pey
vmcburney
Participant
Posts: 3593
Joined: Thu Jan 23, 2003 5:25 pm
Location: Australia, Melbourne
Contact:

Post by vmcburney »

You would do this from a DataStage routine where you read in the sequential file containing the full result set and use a loop to write out each groupby line into a seperate file using BASIC commands. The programming commands for handling sequential files are very easy to use within DataStage.

Have a look at the Server Job Developers Guide for the sequential file commands.

Vincent McBurney
Data Integration Services
www.intramatix.com
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

You have to give us more information about how you want the data split. Do you want N number of files based on N number of groups? Do you have a single column that determines the grouping?

DataStage job designs have fixed designs, meaning if you have a single output link from a transformer you only get a single target, in your case a file. If you have a fixed number of groups, then you can have a fixed number of output files. That works for only a small number of files (20 output links is about the most manageable in the designer).

So, if you have an indeterminate number of output groups that fluctuates according to the data in the current run, your only solution is a programmatic one. I have an after job routine that splits my data along month lines so that I can produce a single pure insert file for each of my target table's partitions (by month). This routine separates and creates the appropriate files as needed. I have a generic sqlldr script that is called from a controlling script that looks at the partition load files in my load directory and launches the appropriate sqlldr sessions for each partition. The enabler to this approach is the splitting routine.

You're going to have to do something like this is my suspicion from my interpretation of your questions motivation.

Good luck!

Kenneth Bland
idocrm
Participant
Posts: 17
Joined: Wed Jul 23, 2003 9:41 pm

Post by idocrm »

Thank you for all of your replies. Yes, the N number of files will be based on N number of groups, so it is dynamic. I do have a single column that determines the grouping. So my incoming file will be something like this:
a|1
a|2
a|3
b|1
b|2
c|1
c|2
and my result files should be a, b and c. Thank you all again.
kduke
Charter Member
Charter Member
Posts: 5227
Joined: Thu May 29, 2003 9:47 am
Location: Dallas, TX
Contact:

Post by kduke »

idocrm

Sometimes Perl is a good solution for something like this. You can either create a lot of output links or write a very short Perl or shell script.

Kim.

Kim Duke
DwNav - ETL Navigator
www.Duke-Consulting.com
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

If there is a known and finite number of distinct values of the grouping column, a really efficient mechanism would be a Transformer stage that splits the input stream into multiple outputs using constraint expressions based on the grouping column. For example:
GroupingColumn = "a"
GroupingColumn = "b"
GroupingColumn = "c"

Find elsewhere on this site how to prevent creating sequential files where no rows are processed (omit last new line).

Ray Wurlod
Education and Consulting Services
ABN 57 092 448 518
Sreedhar
Participant
Posts: 187
Joined: Mon Oct 30, 2006 12:16 am

Post by Sreedhar »

Hi ,
Not to hijack the thread, I searched for post on how to prevent creating a flat file when there are no records processed. Can you direct me to one or tell me the search terms, I used lot of combinations but of no use.

I am appending to a file (using ftp stage in server job) and when there are no records processed , still the file gets a new blank line appended which I need to prevent. I want the new blank line to be appeneded only when there is data.

I have selected the option to No to omit nwe line...parameter, I tried to use the INROWNUM>1 in the transformer constraint but looks like if the job runs the file si touched and a new line is appended!
Regards,
Shree
785-816-0728
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Sorry, this is a hijack plain and simple and you should have just started a new post for your issue. And FYI, there's no way to prevent creating a flat file when no records are processed. None.
-craig

"You can never have too many knives" -- Logan Nine Fingers
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

I am not sure if you are sitting in a unix env. but if you are then this will help.
viewtopic.php?t=101739&highlight=shell+script
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
Kryt0n
Participant
Posts: 584
Joined: Wed Jun 22, 2005 7:28 pm

Post by Kryt0n »

chulett wrote:Sorry, this is a hijack plain and simple and you should have just started a new post for your issue. And FYI, there's no way to prevent creating a flat file when no records are processed. None. ...
A hijack Dick Turpin would be proud of! A completely unrelated post from 2003... impressive!
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Wow... a reference back to the 1700s, fitting somehow. And I love how it starts out - I don't mean to hijack this, but... [whips out pistol] fly this plane to Cuba! Oh, and could you answer this DataStage question while we're in the air? :wink:
-craig

"You can never have too many knives" -- Logan Nine Fingers
Kryt0n
Participant
Posts: 584
Joined: Wed Jun 22, 2005 7:28 pm

Post by Kryt0n »

Well, Dick Turpin was just so misunderstood, what he meant was:
"Stand and deliver... your answer to this DataStage question"

He only used the gun because he needed the answer urgently... and those support costs... just sheer highway... oh... nevermind!
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

In "Good Omens", by Terry Pratchett and Neil Gaiman, one of the characters calls his car Dick Turpin, because it's always holding up traffic.

Are we far enough off topic yet?

Please don't hijack threads unless yours is precisely the same issue (which it rarely is).
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply