The forum will experience an outage sometime between February 10 at 7:00pm EST and February 12 at 11:59 EST for installation of security updates. The actual time and duration of the outage are unknown but attempts will be made to minimize the downtime. We apologize for any inconvenience.

I'm familiar with building SQL Anywhere text indexes, obtaining scores and accessing the index with sa_text_index_vocab system procedure, but I would really like to get more than the term and its frequency of use. In particular, I want to provide some context to each result - usually just half a dozen "words" or so before and after the search term that is found. The objective is to give the user enough information to make a more informed decision as to whether one result may be of greater value than another even though the Sybase "score" value may be lower.

Most general search engines do this, but I believe that the text index has to record "positional" information - i.e., this search term was found at this offset into the particular record, not just that it was found somewhere in the record. It's also valuable if you're doing proximity searching (i.e., when term1 is immediately prior to term2 rather than further away).

In my context, I'm also using the sample ifilter to index PDF and other binary content, so opening each record in my code when a result is being presented is not a great approach; "offset" data from the index would be very helpful. My current approach requires a document "preview" to be created before the index is built, which is quite a lot of extra work. Any other suggestions? Thanks!

asked 07 Feb '12, 11:41

DougMWat's gravatar image

DougMWat
1365512
accept rate: 0%


Having the context available would be an excellent enhancement. There could be an option for the size of context required, or perhaps requested at the time if that was not to inefficient eg:

SELECT ID, ct.score, ct.context
    FROM MarketingInformation CONTAINS ( MarketingInformation.Description, 'stretch* | comfort*' ) AS ct 
    ORDER BY ct.score DESC;

or perhaps

SELECT ID, ct.score, ct.context(20)  
    FROM MarketingInformation CONTAINS ( MarketingInformation.Description, 'stretch* | comfort*' ) AS ct 
    ORDER BY ct.score DESC;

for twenty words either side of the match(s) - clearly with multiple matches ct.context(20) would be quite big, but that would be the application's problem to deal with.

permanent link

answered 07 Feb '12, 14:33

Justin%20Willey's gravatar image

Justin Willey
6.4k101132197
accept rate: 20%

Your answer
toggle preview

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

Markdown Basics

  • *italic* or _italic_
  • **bold** or __bold__
  • link:[text](http://url.com/ "title")
  • image?![alt text](/path/img.jpg "title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported

Question tags:

×24
×15

question asked: 07 Feb '12, 11:41

question was seen: 713 times

last updated: 07 Feb '12, 14:33