I'm hoping that some of you can point me to some information that might be able to help me out with this question.

I'm getting ready to start a consulting gig with a company that monitors meter data. Collects that data and more or less acts as a third party to other companies that need that data.

I'm going to be writing stored procedures and doing some schema modifications for them. Currently they are running on SQL Server 2005 and 2008. One of their complaints is the raw speed of getting the data from the devices into SQL Server. The other complaint they had was the licensing cost.

Apparently the licensing cost is quite steep with SQL Server. Especially since the Enterprise edition is required for some of the things they need to do.

I would like to be able to determine if SQL Anywhere could possibly be a fit in a scenario such as this.

I'm still trying to get my arms wrapped around the entire architecture, but I would love to be able to come in and say "Here is how SQL Anywhere can handle it, and the cost is nowhere near SQL Server.”

The biggest thing that stands out to me though, is the raw number of rows that get fed into their database. It is in the millions of rows and approaches a billion.

Is there any reason that SQL Anywhere wouldn’t be able to physically handle a number of rows like this quickly?? Their tables are very narrow. No more then 15 or 20 columns in a table.

Any thoughts or suggestions would be greatly appreciated!!

TIA

Jeff Gibson
Intercept Solutions – Sybase SQL Anywhere OEM Partner
Nashville, TN

asked 05 Jan '12, 18:23

Jeff%20Gibson's gravatar image

Jeff Gibson
1.4k284654
accept rate: 21%

edited 15 Mar '13, 21:29

Mark%20Culp's gravatar image

Mark Culp
23.3k9132274

Comment Text Removed

Obviously the answer is "it depends" but let me provide a few comments.

  1. We are aware of production SQL Anywhere databases with > 1 billion rows in a single table.
  2. When you dealing with that kind of scale, the choice of data type and the encoding used for each value, along with physical database design issues, can become critically important. With 1 billion rows, every wasted byte per row costs you 1GB, which you also have to backup, recover, and index.
  3. Any sequential scan of the data is probably a non-starter. Consequently, you have think hard about the application's access requirements, and the indexes you will need to support those access paths. Each additional index that's required will result in a significant amount of overhead.
  4. Think hard about the archival of historical data, and how you will offload or delete it. With a single table, TRUNCATE of only a portion of the records isn't an option. However, mass DELETEs, particularly with multiple indexes, can cause difficult contention problems with concurrent insertions.
  5. You'll probably need 3 times the amount of disk you think you'll need.
permanent link

answered 06 Jan '12, 14:15

Glenn%20Paulley's gravatar image

Glenn Paulley
10.7k571104
accept rate: 43%

Thanks for the info on this Glenn.

Based on the conversations I have had with this company, I really think there may be an opportunity to showcase the abilities of SQL Anywhere. Especially since the licensing cost is SO much lower than SQL Server.

Can you possibly point me to any white papers that would cover the encoding aspect of a database?

Is there a particular page size that is better when you are working with this amount of data that needs to be inserted very quickly into the database?

Thanks for the follow up on this Glenn. Once I get into their environment and start to understand it better, I'll see if I can follow up with some more information.

(06 Jan '12, 14:32) Jeff Gibson
Replies hidden
2

You're certainly going to want to use 8K pages. In terms of encoding, I didn't mean anything fancy. What I meant was simple things like storing tinyint values (0,1,2,..) rather than using strings for columns such as "status" and the like. With the number of rows involved, you'll want to maximize the entropy of each and every bit - because every byte you waste is costly with a table that large.

(06 Jan '12, 17:02) Glenn Paulley

As you are asking for personal experience, we have several productive databases with xx million rows and so far are happy with the performance. The only obstacle we have seen is, that the optimizer might switch out of a sudden its usually choosen strategy and then you will have to influence your statements accordingly (e.g. by introducing a user estimate) Furthermore point 3 of Glenns List (sequential scan) is really a situation which you want to avoid. It led me to the suggestion in the past to introduce an option with which you can prevent any statement from running if a sequential scan is choosen, but so far this option doesn't exists.

permanent link

answered 09 Jan '12, 04:20

Martin's gravatar image

Martin
8.6k118151237
accept rate: 14%

2

"an option with which you can prevent any statement from running if a sequential scan is chosen" ... I like it! (possibly a whole class of "don't do this" directives)

(09 Jan '12, 16:22) Breck Carter

These links might help but I look forward to more complete answers from others:

Capacity Planning Whitepaper

Improving Performance Whitepaper

Collection of Benchmarks (last updated 2008)

Personally I am curious as to the details of the situation. When you say millions/billions are fed into the database, is that per second, minute, hour, day etc.? Does the database start with that many and then add that many each period, or is it always growing by that much each period. Is there any cleanup to delete old records? Does the database ever start empty or has it always been growing? I think these will help identify whether it is a storage or performance or both type of labor for the database.

permanent link

answered 06 Jan '12, 11:15

Siger%20Matt's gravatar image

Siger Matt
3.2k496997
accept rate: 13%

1

I'm curious, too...

FWIW, the second whitepaper seems to be from 2003, so it might be more appropriate for older versions (at least w.r.t. some points). The SQL Anywhere folks will know better:)

@Jeff: For the "billion" rows - I assume you're using short scale - i.e. a thousand millions, right?

(06 Jan '12, 11:28) Volker Barth
Replies hidden

Short scale would be correct (I had to look that up, I wasn't sure what you were talking about). Honestly it would probably only be in the hundreds of millions of rows at the most, but I wanted to put it out there that it might approach a billion rows.

(06 Jan '12, 13:31) Jeff Gibson

I can't answer that question at this point Matt. If I were to pull a number out of my rear end, I think that they are looking at somewhere around 50,000 rows an hour. So maybe they would be sent to them in packages of 8,000 to 12,000 rows every 10 to 15 minutes.

I think from my initial discussions, they are looking at the fact that they hit a bottle neck at that point.

I'll know more when I start getting more information from them. I just wanted to make sure that I wasn't going in the wrong direction with thinking SQL Anywhere could do this when it may require ASE to handle those amounts of data.

Personally I don't see why SQL Anywhere couldn't, but just wanted to see what other developer had seen in this environment.

(06 Jan '12, 13:58) Jeff Gibson
Your answer
toggle preview

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

Markdown Basics

  • *italic* or _italic_
  • **bold** or __bold__
  • link:[text](http://url.com/ "title")
  • image?![alt text](/path/img.jpg "title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported

Question tags:

×246
×22
×3

question asked: 05 Jan '12, 18:23

question was seen: 2,288 times

last updated: 15 Mar '13, 21:29