How to GROUP over a large range of numerical values?

An issue that came up in this answer:

Say, I have a large range of numerical values (doubles in this case), say from 0.0 - 50,000.0, in a column x. They are not evenly distributed in this range.

Now I want to build partitions to group them. Obviously a GROUP BY x is useless as it will typically group for each and every value. So I will have to define some intervals to group over - in my case, these intervals can be equally wide.

So I guess I will have to build categories by rounding x to them, say by

-- group by interval of ]x - 50, x + 50] 
select round(x / 100.0, 0) * 100.0 as xRounded, count(*)
from MyTable
group by xRounded
order by xRounded

This will group by intervals of 100 integers, each.

In order to sum up extrem intervals, I could use a further case expression, say

select case when xRounded <= 1000.0 then 1000.0
            when xRounded >= 20000.0 then 20000.0
            else xRounded end as xRounded2, count(*)
from (select round(x / 100.0, 0) * 100.0 as xRounded from MyTable) S
group by xRounded2
order by xRounded2

Is there a better way to do so?

group-by olap sa-12

asked 16 Jun '11, 11:05

Volker Barth
40.5k●365●556●827
accept rate: 34%

edited 16 Jun '11, 11:31

select round ( x, -2 ) etcetera

(16 Jun '11, 11:22) Breck Carter

Replies hidden

Oh yes, indeed, that would be easier (and I forget the 2nd round parameter) - in my real sample, I've build groups by 200 or 500, so I need the addional division/multiplication, methinks...

(16 Jun '11, 11:30) Volker Barth

flat view

One Answer:

active answers oldest answers newest answers popular answers

I'm sure there are better ways to do this but you could create a join with a table for your partitions. The below shows an example for ranges of 100 from 0 to 5000 using the sa_rowgenerator to create the partitions (and generate test data).

BEGIN

//create data
SELECT RAND()*5000 as myVal INTO #myTemp FROM sa_rowgenerator(0,3000);

//sort data to view distribution
SELECT srg.row_num as lower_bound, srg.row_num + 100 as upper_bound, count(myVal) as cnt
FROM sa_rowgenerator(0, 4900, 100) srg, #myTemp mt
WHERE mt.myVal >= lower_bound AND mt.myVal < upper_bound
GROUP BY lower_bound
ORDER BY lower_bound;

END
go

permanent link

answered 16 Jun '11, 12:37

Tyson Lewis
2.2k●16●41
accept rate: 22%

edited 17 Jun '11, 04:10

Volker Barth
40.5k●365●556●827

Thanks, the join with sa_rowgenerator seems rather smart - and somewhat better readable than the "rounding" approach, particularly when not using multiples of 10 as intervals. Performance is not really a concern so I guess that will work.

(17 Jun '11, 04:15) Volker Barth

flat view

Your answer

toggle preview

community wiki:

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

Markdown Basics

*italic* or _italic_
**bold** or __bold__
link:[text](http://url.com/ "title")
image?![alt text](/path/img.jpg "title")
numbered list: 1. Foo 2. Bar
to add a line break simply add two spaces to where you would like the new line to be.
basic HTML tags are also supported

learn more about Markdown

Question tags:

sa-12 ×439
olap ×19
group-by ×11

question asked: 16 Jun '11, 11:05

question was seen: 2,693 times

last updated: 17 Jun '11, 04:15

SAP SQL Anywhere

How to GROUP over a large range of numerical values?

Follow this question

Related questions