<?xml version="1.0" encoding="utf-8" ?>
<?xml-stylesheet type="text/xsl" href="RSS_xslt_style.asp" version="1.0" ?>
<rss version="2.0" xmlns:WebWizForums="https://syndication.webwiz.net/rss_namespace/">
 <channel>
  <title>Codejock Developer Community : Manual: How to add UTF-8 support</title>
  <link>http://forum.codejock.com/</link>
  <description><![CDATA[This is an XML content feed of; Codejock Developer Community : Syntax Edit : Manual: How to add UTF-8 support]]></description>
  <copyright>Copyright (c) 2006-2013 Web Wiz Forums - All Rights Reserved.</copyright>
  <pubDate>Sat, 23 May 2026 03:48:12 +0000</pubDate>
  <lastBuildDate>Sun, 18 Jul 2010 00:28:31 +0000</lastBuildDate>
  <docs>http://blogs.law.harvard.edu/tech/rss</docs>
  <generator>Web Wiz Forums 12.04</generator>
  <ttl>360</ttl>
  <WebWizForums:feedURL>forum.codejock.com/RSS_post_feed.asp?TID=16970</WebWizForums:feedURL>
  <image>
   <title><![CDATA[Codejock Developer Community]]></title>
   <url>http://forum.codejock.com/forum_images/codejock-logo.gif</url>
   <link>http://forum.codejock.com/</link>
  </image>
  <item>
   <title><![CDATA[Manual: How to add UTF-8 support : Hello  I was very surprised...]]></title>
   <link>http://forum.codejock.com/forum_posts.asp?TID=16970&amp;PID=59373&amp;title=manual-how-to-add-utf8-support#59373</link>
   <description>
    <![CDATA[<strong>Author:</strong> <a href="http://forum.codejock.com/member_profile.asp?PF=6154">elmue</a><br /><strong>Subject:</strong> 16970<br /><strong>Posted:</strong> 18 July 2010 at 12:28am<br /><br />Hello<br><br>I was very surprised that the Syntax Edit control is not able to show UTF-8 files !!<br>It supports ANSI and Unicode but not UTF-8 although UTF-8 is more widley used than Unicode!<br><br>It is not very difficult to add UTF-8 Support if your project is Unicode compiled.<br>If you still use the MBCS compiler switch some more work will be necessary due to some design flaws in the code.<br><br>___________________________________<br><br>Add tis line to XTPSyntaxEditBufferManager.h<br><br>&nbsp;&nbsp;&nbsp; BOOL IsUnicodeFile(CFile *pFile);<br>___________________________________<br><br><br>In XTPSyntaxEditBufferManager.cpp:<br><br>convert the function<br>BOOL IsUnicodeFile(CFile *pFile)<br>into a member function and add the following bold lines:<br><br><br>BOOL CXTPSyntaxEditBufferManager::IsUnicodeFile(CFile *pFile)<br>{<br>&nbsp;&nbsp;&nbsp; pFile-&gt;SeekToBegin();<br><br>&nbsp;&nbsp;&nbsp; WORD wPrefix;<br>&nbsp;&nbsp;&nbsp; UINT uReaded = pFile-&gt;Read(&amp;wPrefix, 2);<br>&nbsp;&nbsp;&nbsp; if (uReaded == 2 &amp;&amp; wPrefix == 0xFEFF)<br>&nbsp;&nbsp;&nbsp; {<br>&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; return TRUE;<br>&nbsp;&nbsp;&nbsp; }<br><br><b>&nbsp;&nbsp;&nbsp; // Check if UTF-8 file identifier "ï»¿" exists<br>&nbsp;&nbsp;&nbsp; pFile-&gt;SeekToBegin();<br><br>&nbsp;&nbsp;&nbsp; BYTE u8_Buf&#091;3&#093;;<br>&nbsp;&nbsp;&nbsp; UINT u32_Read = pFile-&gt;Read(u8_Buf, 3);<br>&nbsp;&nbsp;&nbsp; if&nbsp; (u32_Read == 3 &amp;&amp; u8_Buf&#091;0&#093; == 0xEF &amp;&amp; u8_Buf&#091;1&#093; == 0xBB &amp;&amp; u8_Buf&#091;2&#093; == 0xBF)<br>&nbsp;&nbsp;&nbsp; {<br>&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; m_nCodePage = CP_UTF8;<br>&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; return FALSE;<br>&nbsp;&nbsp;&nbsp; }<br></b><br><br>_________________________________________________<br><br><br>Then in <br><br>void CXTPSyntaxEditBufferManager::SerializeEx() add the bold lines:<br><br><br>&nbsp;&nbsp;&nbsp; else if (ar.IsStoring())<br>&nbsp;&nbsp;&nbsp; {<br>&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; if (bUnicode == -1)<br>&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; bUnicode = m_bUnicodeFileFormat;<br><br>&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; if (bUnicode &amp;&amp; bWriteUnicodeFilePrefix)<br>&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; {<br>&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; ar &lt;&lt; (BYTE)0xFF;<br>&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; ar &lt;&lt; (BYTE)0xFE;<br>&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; }<br><br><b>&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; if (!bUnicode &amp;&amp; m_nCodePage == CP_UTF8)<br>&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; {<br>&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; ar &lt;&lt; (BYTE)0xEF;<br>&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; ar &lt;&lt; (BYTE)0xBB;<br>&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; ar &lt;&lt; (BYTE)0xBF;<br>&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; }<br></b><br>&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; CByteArray arBuffer;<br><br>&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; int nCRLFStyle = GetCurCRLFType();<br><br>____________________________________________<br><br><br>And last but no least in XTPSyntaxEditCtrl.cpp<br><br>&nbsp;&nbsp;&nbsp; int nBytes = ::WideCharToMultiByte(uCodePage, 0, (LPWSTR)lpSource, -1, lpMBCSSource, nLen, NULL, NULL);<br><br><b>&nbsp;&nbsp;&nbsp; // <strike>ASSERT(nBytes &lt;= (int)dwBytes);</strike>&nbsp; // removed: Nonsense</b><br><br>&nbsp;&nbsp;&nbsp; lpMBCSSource&#091;nBytes&#093; = _T('\0');<br><br><br><br>This Assert is complete nonsense.<br>It is normal that a conversion from Unicode to any codepage makes the string longer.<br>There is no reason to check the length of the string and assert that it is shorter than the Unicode string.<br>In UTF8 one character may be represented by 4 bytes.<br><br>_______________________________________________<br><br>The buffer manager has several design flaws.<br>Instead of working with a lot of fixed size buffers and calling WideCharToMultiByte several times <br>it would have been nicer to write a class that encapsulates this stuff and automatically converts from<br><br>Unicode --&gt; ANSI<br>ANSI --&gt; Unicode<br>UTF8 --&gt; Unicode<br>Unicode --&gt; UTF8<br>ANSI --&gt; UTF8 <br>etc..<br>and this class should take care to allocate the buffer.<br><br>This would result in much cleaner code and less coding flaws.<br><br>The syntax highlighting will not work any longer if the user enters lines that are longer than 128 characters.<br>Why do the Codejock programmers not use a dynamic buffer that is allocated once the file gets parsed ?<br><br>I hope to see UTF8 support in the next version.<br><br>Elmü<br><br>]]>
   </description>
   <pubDate>Sun, 18 Jul 2010 00:28:31 +0000</pubDate>
   <guid isPermaLink="true">http://forum.codejock.com/forum_posts.asp?TID=16970&amp;PID=59373&amp;title=manual-how-to-add-utf8-support#59373</guid>
  </item> 
 </channel>
</rss>