Hi,
A job cause me a huge grief because of the use of sorts.
I explain me :
In my job, I have a input with a huge numbers of lines ordered by col1.
Then, i make an inner join and an agregation on col1,col2,col3,col4.
So i use a sort stage (key sort col1,col2,col3,col4) with col1 previously sorted.
So, at this moment data as sort by col1,col2,col3,col4
After that, I need to sort by col2,col3,col4 only.
Is there a method to do this without cut the dataflow ?
Do I have to write in a dataset all data and then sort ?
thanks,
martin.
sort key problem
Moderators: chulett, rschirm, roy
In order to have better design of the job..
1. There is join stage in the job...It is advised that is you are having huge unsorted data, u can use lookup stage.. ( of course ..here the other link where you are putting the join should have less rows which can fit in your physical memory or performance will degrade again ! )
2.Do not use stable sort which is much more expensive..
3.Use restirct memory clause in sort, which can improve the performance.
you have mentioned that writing the dataset and then sorting..
are you using database stages currently ?
-------
MK
1. There is join stage in the job...It is advised that is you are having huge unsorted data, u can use lookup stage.. ( of course ..here the other link where you are putting the join should have less rows which can fit in your physical memory or performance will degrade again ! )
2.Do not use stable sort which is much more expensive..
3.Use restirct memory clause in sort, which can improve the performance.
you have mentioned that writing the dataset and then sorting..
are you using database stages currently ?
-------
MK
Martin,
This might not be the answer you looking for but whenever we have to sort and rearrange huge amount of data, we do it via a UNIX script rather than using the ETL Tool.
UNIX does these operation fairly quickly and efficiently and we never had any memory issues etc. The only catch is you will have to add few extra
steps in your Sequence like calling Shell script from Master Sequence etc.
After all no ETL tool is built to handle almost every situation efficiently.
This might not be the answer you looking for but whenever we have to sort and rearrange huge amount of data, we do it via a UNIX script rather than using the ETL Tool.
UNIX does these operation fairly quickly and efficiently and we never had any memory issues etc. The only catch is you will have to add few extra
steps in your Sequence like calling Shell script from Master Sequence etc.
After all no ETL tool is built to handle almost every situation efficiently.
Datawarehouse Consultant
Thanks for your response.mk_ds09 wrote:In order to have better design of the job..
1. There is join stage in the job...It is advised that is you are having huge unsorted data, u can use lookup stage.. ( of course ..here the other link where you are putting the join should have less rows which can fit in your physical memory or performance will degrade again ! )
2.Do not use stable sort which is much more expensive..
3.Use restirct memory clause in sort, which can improve the performance.
you have mentioned that writing the dataset and then sorting..
are you using database stages currently ?
-------
MK
1-well, I didn't developped key join in my join stage. The key is col1,col2. So as my input are sorted by col1, col2. The dataflow is not broken.
2- I didn't use the stable sort. In fact, I never find a situation with the need of stable sort.
3- I must admit I doesn't consider this parameter. I always let it to 20MB, the default value. Can you tell me me how you define it ?
no, i didn't use database, except to extract data.
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact: