I'm familiar with building SQL Anywhere text indexes, obtaining scores and accessing the index with sa_text_index_vocab system procedure, but I would really like to get more than the term and its frequency of use. In particular, I want to provide some context to each result - usually just half a dozen "words" or so before and after the search term that is found. The objective is to give the user enough information to make a more informed decision as to whether one result may be of greater value than another even though the Sybase "score" value may be lower.
Most general search engines do this, but I believe that the text index has to record "positional" information - i.e., this search term was found at this offset into the particular record, not just that it was found somewhere in the record. It's also valuable if you're doing proximity searching (i.e., when term1 is immediately prior to term2 rather than further away).
In my context, I'm also using the sample ifilter to index PDF and other binary content, so opening each record in my code when a result is being presented is not a great approach; "offset" data from the index would be very helpful. My current approach requires a document "preview" to be created before the index is built, which is quite a lot of extra work. Any other suggestions? Thanks!
asked 07 Feb '12, 11:41
Having the context available would be an excellent enhancement. There could be an option for the size of context required, or perhaps requested at the time if that was not to inefficient eg:
SELECT ID, ct.score, ct.context FROM MarketingInformation CONTAINS ( MarketingInformation.Description, 'stretch* | comfort*' ) AS ct ORDER BY ct.score DESC;
SELECT ID, ct.score, ct.context(20) FROM MarketingInformation CONTAINS ( MarketingInformation.Description, 'stretch* | comfort*' ) AS ct ORDER BY ct.score DESC;
for twenty words either side of the match(s) - clearly with multiple matches ct.context(20) would be quite big, but that would be the application's problem to deal with.
answered 07 Feb '12, 14:33