Is there any function in sybase to convert rtf text to plain text. We use SA17

asked 16 Oct, 04:12

Rolle's gravatar image

Rolle
439293345
accept rate: 0%

FWIW, are you sure the RTF text does not contain embedded images, or could they be ignored during the conversion?

(19 Oct, 17:16) Volker Barth

Yes, it's only rtf text I want to convert to plain text.

(20 Oct, 02:20) Rolle

I have found this procedure as I modified a little. It almost works. What is missing is line breaks. Now these disappear. Anyone who has any idea?

CREATE OR REPLACE FUNCTION GetTextFromRTF(in @LineIn long varchar)
returns long varchar
deterministic
begin
  declare @BraceCount integer = 0;
  declare @StartCount integer = 0;
  declare @EndCount integer = 0;
  declare @LoopCount integer = 0;
  declare @InText integer = 0;
  declare @S long varchar = '';
  declare @P1 integer;

  while @LoopCount <= Length(@LineIn) loop
    if substring(@LineIn, @LoopCount, 1) = '{' then
      set @BraceCount = @BraceCount + 1
    end if;
    if substring(@LineIn, @LoopCount, 1) = '}' then
      set @BraceCount = @BraceCount - 1
    end if;
    if(substring(@LineIn, @LoopCount, 1) = ' ') and
      (@BraceCount = 1) and
      (@InText = 0) then
      set @StartCount = @LoopCount + 1;
      set @InText = 1;
    end if;
    if(substring(@LineIn, @LoopCount, 1) = '\\') and
      (@BraceCount = 1) and
      (@InText = 1) then
      set @EndCount = @LoopCount;
      set @S= @S + substring(@LineIn, @StartCount, @EndCount - @StartCount);
      set @InText = 0;
    end if;
    set @LoopCount = @LoopCount + 1;
  end loop;
  return(@S)
end
(26 Oct, 05:53) Rolle
Replies hidden
1

I found the same function and modified it as you did. I have two other mods in this one related to paragraphs and also dealing with characters in languages other than English i.e. accented characters. I haven't tried with Russian or anything but it was fine for a customer with Danish, Norwegian and Swedish.

I don't remember why it has the bit at the top referencing rtf1 but see if it is of any use to you.

ALTER FUNCTION "spaceman"."GetTextFromRTF"(in @LineIn long varchar) returns long varchar deterministic begin declare @BraceCount integer; declare @StartCount integer; declare @EndCount integer; declare @LoopCount integer; declare @SpecialPosition integer; declare @InText integer; declare @S long varchar; declare @SpecialText long varchar; declare @P1 integer; set @BraceCount=0; set @StartCount=0; set @EndCount=0; set @LoopCount=1; set @SpecialPosition=0; set @S=''; set @SpecialText=''; set @InText=0;

if substr(@LineIn,1,6) !='{\rtf1' then return(@LineIn) else

set @LineIn = replace(@LineIn, '\pard','\perd'); set @LineIn = replace(@LineIn, '\par','\par ');

while locate(@LineIn, string(char(92), char(39)))>0 loop set @SpecialPosition = locate(@LineIn, string(char(92), char(39))); if @SpecialPosition >=1 then set @SpecialText = substr(@LineIn, @SpecialPosition, 4); set @LineIn = replace(@LineIn, @SpecialText, char(hextoint(substr(@SpecialText, 3, 2)))); end if; end loop;

while @LoopCount <= Length(@LineIn) loop if SUBSTRING(@LineIn,@LoopCount,1) = '{' then set @BraceCount=@BraceCount+1 end if; if SUBSTRING(@LineIn,@LoopCount,1) = '}' then set @BraceCount=@BraceCount-1 end if; if(SUBSTRING(@LineIn,@LoopCount,1) = ' ') and (@BraceCount = 1) and (@InText = 0) then set @StartCount=@LoopCount+1; set @InText=1 end if; if(SUBSTRING(@LineIn,@LoopCount,1) = '\') and (@BraceCount = 1) and (@InText = 1) then set @EndCount=@LoopCount; set @S=@S+SUBSTRING(@LineIn,@StartCount,@EndCount- @StartCount); set @InText=0 end if; set @LoopCount=@LoopCount+1 end loop; return(@S) end if; end

(27 Oct, 08:14) RADicalSYS

No, SQL Anywhere does not have any functions to convert documents from one format to another. You will need to use some external application to do this type of operation.

permanent link

answered 16 Oct, 08:27

Mark%20Culp's gravatar image

Mark Culp
23.4k9132275
accept rate: 40%

Ok, thanks for your reply. I have googlat and tried to find some help but it seemed to be sparse. Thought if I could do it with a procedure or reached. Have you done it lately with any external application?

(16 Oct, 08:31) Rolle
1

Depending on the expected content of the rich text, there are several examples posted on the internet that use regular expressions that can do this for most rich text.

There are also several examples using Java and C# (.NET CLR). The C# version uses a MS RichTextBox control text property which contains the text of the control excluding the rich text format codes. You could then use the external environment feature to access the Java or .NET CLR code like a procedure.

(16 Oct, 10:30) Chris Keating

I would rather try to use SA to solve this. Where can I find an example using regular expressions? Or do you have any examples?

(17 Oct, 02:19) Rolle
Replies hidden

Mark, Do you have any example to solve this with regular expressions?

(18 Oct, 13:47) Rolle

FYI here is one version of the RTF specifications... are you sure you want regular expressions to deal with that?

(18 Oct, 15:13) Breck Carter

Thanks! I'm open to other suggestions if they work with SA17.

(18 Oct, 15:37) Rolle
2

As Breck clarified with the RTF spec and I hinted in my initial response, this will be difficult to achieve with regular expressions except in some specific cases. Most of the threads that I read on line on this topic suggested that a parser would be the only real solution to this. I proposed taking advantage of java or .NET as they have existing objects that can do that heavy lifting and can be used as an external environment in SQL Anywhere.

(18 Oct, 16:10) Chris Keating
1

Hm, I see a good option: textutil (BSD, MacOS): http://osxdaily.com/2014/02/20/batch-convert-docx-to-txt-mac/ It is not to find a console library that converts from RTF to TXT (to HTML is possible, but not to TXT).
I'd better close this question as "use 3rd-party tools".

(19 Oct, 05:12) Vlad
More comments hidden
showing 5 of 8 show all flat view
Your answer
toggle preview

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

Markdown Basics

  • *italic* or _italic_
  • **bold** or __bold__
  • link:[text](http://url.com/ "title")
  • image?![alt text](/path/img.jpg "title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported

Question tags:

×6

question asked: 16 Oct, 04:12

question was seen: 153 times

last updated: 27 Oct, 08:14