Manual: How to add UTF-8 support |
Post Reply |
Author | |
elmue
Groupie Joined: 05 June 2010 Location: Germany Status: Offline Points: 24 |
Post Options
Thanks(0)
Posted: 18 July 2010 at 12:28am |
Hello
I was very surprised that the Syntax Edit control is not able to show UTF-8 files !! It supports ANSI and Unicode but not UTF-8 although UTF-8 is more widley used than Unicode! It is not very difficult to add UTF-8 Support if your project is Unicode compiled. If you still use the MBCS compiler switch some more work will be necessary due to some design flaws in the code. ___________________________________ Add tis line to XTPSyntaxEditBufferManager.h BOOL IsUnicodeFile(CFile *pFile); ___________________________________ In XTPSyntaxEditBufferManager.cpp: convert the function BOOL IsUnicodeFile(CFile *pFile) into a member function and add the following bold lines: BOOL CXTPSyntaxEditBufferManager::IsUnicodeFile(CFile *pFile) { pFile->SeekToBegin(); WORD wPrefix; UINT uReaded = pFile->Read(&wPrefix, 2); if (uReaded == 2 && wPrefix == 0xFEFF) { return TRUE; } // Check if UTF-8 file identifier "" exists pFile->SeekToBegin(); BYTE u8_Buf[3]; UINT u32_Read = pFile->Read(u8_Buf, 3); if (u32_Read == 3 && u8_Buf[0] == 0xEF && u8_Buf[1] == 0xBB && u8_Buf[2] == 0xBF) { m_nCodePage = CP_UTF8; return FALSE; } _________________________________________________ Then in void CXTPSyntaxEditBufferManager::SerializeEx() add the bold lines: else if (ar.IsStoring()) { if (bUnicode == -1) bUnicode = m_bUnicodeFileFormat; if (bUnicode && bWriteUnicodeFilePrefix) { ar << (BYTE)0xFF; ar << (BYTE)0xFE; } if (!bUnicode && m_nCodePage == CP_UTF8) { ar << (BYTE)0xEF; ar << (BYTE)0xBB; ar << (BYTE)0xBF; } CByteArray arBuffer; int nCRLFStyle = GetCurCRLFType(); ____________________________________________ And last but no least in XTPSyntaxEditCtrl.cpp int nBytes = ::WideCharToMultiByte(uCodePage, 0, (LPWSTR)lpSource, -1, lpMBCSSource, nLen, NULL, NULL); // lpMBCSSource[nBytes] = _T('\0'); This Assert is complete nonsense. It is normal that a conversion from Unicode to any codepage makes the string longer. There is no reason to check the length of the string and assert that it is shorter than the Unicode string. In UTF8 one character may be represented by 4 bytes. _______________________________________________ The buffer manager has several design flaws. Instead of working with a lot of fixed size buffers and calling WideCharToMultiByte several times it would have been nicer to write a class that encapsulates this stuff and automatically converts from Unicode --> ANSI ANSI --> Unicode UTF8 --> Unicode Unicode --> UTF8 ANSI --> UTF8 etc.. and this class should take care to allocate the buffer. This would result in much cleaner code and less coding flaws. The syntax highlighting will not work any longer if the user enters lines that are longer than 128 characters. Why do the Codejock programmers not use a dynamic buffer that is allocated once the file gets parsed ? I hope to see UTF8 support in the next version. Elmü |
|
Post Reply | |
Tweet
|
Forum Jump | Forum Permissions You cannot post new topics in this forum You cannot reply to topics in this forum You cannot delete your posts in this forum You cannot edit your posts in this forum You cannot create polls in this forum You cannot vote in polls in this forum |