An issue that came up in this answer: Say, I have a large range of numerical values (doubles in this case), say from 0.0 - 50,000.0, in a column x. They are not evenly distributed in this range. Now I want to build partitions to group them. Obviously a GROUP BY x is useless as it will typically group for each and every value. So I will have to define some intervals to group over - in my case, these intervals can be equally wide. So I guess I will have to build categories by rounding x to them, say by -- group by interval of ]x - 50, x + 50] select round(x / 100.0, 0) * 100.0 as xRounded, count(*) from MyTable group by xRounded order by xRounded This will group by intervals of 100 integers, each. In order to sum up extrem intervals, I could use a further case expression, say select case when xRounded <= 1000.0 then 1000.0 when xRounded >= 20000.0 then 20000.0 else xRounded end as xRounded2, count(*) from (select round(x / 100.0, 0) * 100.0 as xRounded from MyTable) S group by xRounded2 order by xRounded2 Is there a better way to do so? asked 16 Jun '11, 11:05 Volker Barth |
I'm sure there are better ways to do this but you could create a join with a table for your partitions. The below shows an example for ranges of 100 from 0 to 5000 using the sa_rowgenerator to create the partitions (and generate test data). BEGIN //create data SELECT RAND()*5000 as myVal INTO #myTemp FROM sa_rowgenerator(0,3000); //sort data to view distribution SELECT srg.row_num as lower_bound, srg.row_num + 100 as upper_bound, count(myVal) as cnt FROM sa_rowgenerator(0, 4900, 100) srg, #myTemp mt WHERE mt.myVal >= lower_bound AND mt.myVal < upper_bound GROUP BY lower_bound ORDER BY lower_bound; END go answered 16 Jun '11, 12:37 Tyson Lewis Volker Barth Thanks, the join with sa_rowgenerator seems rather smart - and somewhat better readable than the "rounding" approach, particularly when not using multiples of 10 as intervals. Performance is not really a concern so I guess that will work.
(17 Jun '11, 04:15)
Volker Barth
|
select round ( x, -2 ) etcetera
Oh yes, indeed, that would be easier (and I forget the 2nd round parameter) - in my real sample, I've build groups by 200 or 500, so I need the addional division/multiplication, methinks...