No, SQL Anywhere does not have any functions to convert documents from one format to another. You will need to use some external application to do this type of operation. Ok, thanks for your reply. I have googlat and tried to find some help but it seemed to be sparse. Thought if I could do it with a procedure or reached. Have you done it lately with any external application?
(16 Oct '17, 08:31)
Rolle
1
Depending on the expected content of the rich text, there are several examples posted on the internet that use regular expressions that can do this for most rich text. There are also several examples using Java and C# (.NET CLR). The C# version uses a MS RichTextBox control text property which contains the text of the control excluding the rich text format codes. You could then use the external environment feature to access the Java or .NET CLR code like a procedure.
(16 Oct '17, 10:30)
Chris Keating
I would rather try to use SA to solve this. Where can I find an example using regular expressions? Or do you have any examples?
(17 Oct '17, 02:19)
Rolle
Replies hidden
Mark, Do you have any example to solve this with regular expressions?
(18 Oct '17, 13:47)
Rolle
FYI here is one version of the RTF specifications... are you sure you want regular expressions to deal with that?
(18 Oct '17, 15:13)
Breck Carter
Thanks! I'm open to other suggestions if they work with SA17.
(18 Oct '17, 15:37)
Rolle
2
As Breck clarified with the RTF spec and I hinted in my initial response, this will be difficult to achieve with regular expressions except in some specific cases. Most of the threads that I read on line on this topic suggested that a parser would be the only real solution to this. I proposed taking advantage of java or .NET as they have existing objects that can do that heavy lifting and can be used as an external environment in SQL Anywhere.
(18 Oct '17, 16:10)
Chris Keating
1
Hm, I see a good option: textutil (BSD, MacOS): http://osxdaily.com/2014/02/20/batch-convert-docx-to-txt-mac/
It is not to find a console library that converts from RTF to TXT (to HTML is possible, but not to TXT).
(19 Oct '17, 05:12)
Vlad
More comments hidden
|
FWIW, are you sure the RTF text does not contain embedded images, or could they be ignored during the conversion?
Yes, it's only rtf text I want to convert to plain text.
I have found this procedure as I modified a little. It almost works. What is missing is line breaks. Now these disappear. Anyone who has any idea?
I found the same function and modified it as you did. I have two other mods in this one related to paragraphs and also dealing with characters in languages other than English i.e. accented characters. I haven't tried with Russian or anything but it was fine for a customer with Danish, Norwegian and Swedish.
I don't remember why it has the bit at the top referencing rtf1 but see if it is of any use to you.
ALTER FUNCTION "spaceman"."GetTextFromRTF"(in @LineIn long varchar) returns long varchar deterministic begin declare @BraceCount integer; declare @StartCount integer; declare @EndCount integer; declare @LoopCount integer; declare @SpecialPosition integer; declare @InText integer; declare @S long varchar; declare @SpecialText long varchar; declare @P1 integer; set @BraceCount=0; set @StartCount=0; set @EndCount=0; set @LoopCount=1; set @SpecialPosition=0; set @S=''; set @SpecialText=''; set @InText=0;
if substr(@LineIn,1,6) !='{\rtf1' then return(@LineIn) else
set @LineIn = replace(@LineIn, '\pard','\perd'); set @LineIn = replace(@LineIn, '\par','\par ');
while locate(@LineIn, string(char(92), char(39)))>0 loop set @SpecialPosition = locate(@LineIn, string(char(92), char(39))); if @SpecialPosition >=1 then set @SpecialText = substr(@LineIn, @SpecialPosition, 4); set @LineIn = replace(@LineIn, @SpecialText, char(hextoint(substr(@SpecialText, 3, 2)))); end if; end loop;
while @LoopCount <= Length(@LineIn) loop if SUBSTRING(@LineIn,@LoopCount,1) = '{' then set @BraceCount=@BraceCount+1 end if; if SUBSTRING(@LineIn,@LoopCount,1) = '}' then set @BraceCount=@BraceCount-1 end if; if(SUBSTRING(@LineIn,@LoopCount,1) = ' ') and (@BraceCount = 1) and (@InText = 0) then set @StartCount=@LoopCount+1; set @InText=1 end if; if(SUBSTRING(@LineIn,@LoopCount,1) = '\') and (@BraceCount = 1) and (@InText = 1) then set @EndCount=@LoopCount; set @S=@S+SUBSTRING(@LineIn,@StartCount,@EndCount- @StartCount); set @InText=0 end if; set @LoopCount=@LoopCount+1 end loop; return(@S) end if; end