Hello
I was very surprised that the Syntax Edit control is not able to show UTF-8 files !!
It supports ANSI and Unicode but not UTF-8 although UTF-8 is more widley used than Unicode!
It is not very difficult to add UTF-8 Support if your project is Unicode compiled.
If you still use the MBCS compiler switch some more work will be necessary due to some design flaws in the code.
___________________________________
Add tis line to XTPSyntaxEditBufferManager.h
BOOL IsUnicodeFile(CFile *pFile); ___________________________________
In XTPSyntaxEditBufferManager.cpp:
convert the function BOOL IsUnicodeFile(CFile *pFile) into a member function and add the following bold lines:
BOOL CXTPSyntaxEditBufferManager::IsUnicodeFile(CFile *pFile) { pFile->SeekToBegin();
WORD wPrefix; UINT uReaded = pFile->Read(&wPrefix, 2); if (uReaded == 2 && wPrefix == 0xFEFF) { return TRUE; }
// Check if UTF-8 file identifier "" exists pFile->SeekToBegin();
BYTE u8_Buf[3]; UINT u32_Read = pFile->Read(u8_Buf, 3); if (u32_Read == 3 && u8_Buf[0] == 0xEF && u8_Buf[1] == 0xBB && u8_Buf[2] == 0xBF) { m_nCodePage = CP_UTF8; return FALSE; }
_________________________________________________
Then in
void CXTPSyntaxEditBufferManager::SerializeEx() add the bold lines:
else if (ar.IsStoring()) { if (bUnicode == -1) bUnicode = m_bUnicodeFileFormat;
if (bUnicode && bWriteUnicodeFilePrefix) { ar << (BYTE)0xFF; ar << (BYTE)0xFE; }
if (!bUnicode && m_nCodePage == CP_UTF8) { ar << (BYTE)0xEF; ar << (BYTE)0xBB; ar << (BYTE)0xBF; }
CByteArray arBuffer;
int nCRLFStyle = GetCurCRLFType();
____________________________________________
And last but no least in XTPSyntaxEditCtrl.cpp
int nBytes = ::WideCharToMultiByte(uCodePage, 0, (LPWSTR)lpSource, -1, lpMBCSSource, nLen, NULL, NULL);
// ASSERT(nBytes <= (int)dwBytes); // removed: Nonsense
lpMBCSSource[nBytes] = _T('\0');
This Assert is complete nonsense. It is normal that a conversion from Unicode to any codepage makes the string longer. There is no reason to check the length of the string and assert that it is shorter than the Unicode string. In UTF8 one character may be represented by 4 bytes.
_______________________________________________
The buffer manager has several design flaws. Instead of working with a lot of fixed size buffers and calling WideCharToMultiByte several times it would have been nicer to write a class that encapsulates this stuff and automatically converts from
Unicode --> ANSI ANSI --> Unicode UTF8 --> Unicode Unicode --> UTF8 ANSI --> UTF8 etc.. and this class should take care to allocate the buffer.
This would result in much cleaner code and less coding flaws.
The syntax highlighting will not work any longer if the user enters lines that are longer than 128 characters. Why do the Codejock programmers not use a dynamic buffer that is allocated once the file gets parsed ?
I hope to see UTF8 support in the next version.
Elmü
|