The forum will experience an outage sometime between February 10 at 7:00pm EST and February 12 at 11:59 EST for installation of security updates. The actual time and duration of the outage are unknown but attempts will be made to minimize the downtime. We apologize for any inconvenience.

An issue that came up in this answer:

Say, I have a large range of numerical values (doubles in this case), say from 0.0 - 50,000.0, in a column x. They are not evenly distributed in this range.

Now I want to build partitions to group them. Obviously a GROUP BY x is useless as it will typically group for each and every value. So I will have to define some intervals to group over - in my case, these intervals can be equally wide.

So I guess I will have to build categories by rounding x to them, say by

-- group by interval of ]x - 50, x + 50] 
select round(x / 100.0, 0) * 100.0 as xRounded, count(*)
from MyTable
group by xRounded
order by xRounded

This will group by intervals of 100 integers, each.

In order to sum up extrem intervals, I could use a further case expression, say

select case when xRounded <= 1000.0 then 1000.0
            when xRounded >= 20000.0 then 20000.0
            else xRounded end as xRounded2, count(*)
from (select round(x / 100.0, 0) * 100.0 as xRounded from MyTable) S
group by xRounded2
order by xRounded2

Is there a better way to do so?

asked 16 Jun '11, 11:05

Volker%20Barth's gravatar image

Volker Barth
accept rate: 32%

edited 16 Jun '11, 11:31

select round ( x, -2 ) etcetera

(16 Jun '11, 11:22) Breck Carter
Replies hidden

Oh yes, indeed, that would be easier (and I forget the 2nd round parameter) - in my real sample, I've build groups by 200 or 500, so I need the addional division/multiplication, methinks...

(16 Jun '11, 11:30) Volker Barth

I'm sure there are better ways to do this but you could create a join with a table for your partitions. The below shows an example for ranges of 100 from 0 to 5000 using the sa_rowgenerator to create the partitions (and generate test data).


//create data
SELECT RAND()*5000 as myVal INTO #myTemp FROM sa_rowgenerator(0,3000);

//sort data to view distribution
SELECT srg.row_num as lower_bound, srg.row_num + 100 as upper_bound, count(myVal) as cnt
FROM sa_rowgenerator(0, 4900, 100) srg, #myTemp mt
WHERE mt.myVal >= lower_bound AND mt.myVal < upper_bound
GROUP BY lower_bound
ORDER BY lower_bound;

permanent link

answered 16 Jun '11, 12:37

Tyson%20Lewis's gravatar image

Tyson Lewis
accept rate: 22%

edited 17 Jun '11, 04:10

Volker%20Barth's gravatar image

Volker Barth

Thanks, the join with sa_rowgenerator seems rather smart - and somewhat better readable than the "rounding" approach, particularly when not using multiples of 10 as intervals. Performance is not really a concern so I guess that will work.

(17 Jun '11, 04:15) Volker Barth
Your answer
toggle preview

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here



Answers and Comments

Markdown Basics

  • *italic* or _italic_
  • **bold** or __bold__
  • link:[text]( "title")
  • image?![alt text](/path/img.jpg "title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported

Question tags:


question asked: 16 Jun '11, 11:05

question was seen: 1,090 times

last updated: 17 Jun '11, 04:15