Codejock Forums Homepage
Forum Home Forum Home > Codejock Products > Visual C++ MFC > Syntax Edit
  New Posts New Posts RSS Feed - Manual: How to add UTF-8 support
  FAQ FAQ  Forum Search   Events   Register Register  Login Login

Manual: How to add UTF-8 support

 Post Reply Post Reply
Author
Message
elmue View Drop Down
Groupie
Groupie


Joined: 05 June 2010
Location: Germany
Status: Offline
Points: 24
Post Options Post Options   Thanks (0) Thanks(0)   Quote elmue Quote  Post ReplyReply Direct Link To This Post Topic: Manual: How to add UTF-8 support
    Posted: 18 July 2010 at 12:28am
Hello

I was very surprised that the Syntax Edit control is not able to show UTF-8 files !!
It supports ANSI and Unicode but not UTF-8 although UTF-8 is more widley used than Unicode!

It is not very difficult to add UTF-8 Support if your project is Unicode compiled.
If you still use the MBCS compiler switch some more work will be necessary due to some design flaws in the code.

___________________________________

Add tis line to XTPSyntaxEditBufferManager.h

    BOOL IsUnicodeFile(CFile *pFile);
___________________________________


In XTPSyntaxEditBufferManager.cpp:

convert the function
BOOL IsUnicodeFile(CFile *pFile)
into a member function and add the following bold lines:


BOOL CXTPSyntaxEditBufferManager::IsUnicodeFile(CFile *pFile)
{
    pFile->SeekToBegin();

    WORD wPrefix;
    UINT uReaded = pFile->Read(&wPrefix, 2);
    if (uReaded == 2 && wPrefix == 0xFEFF)
    {
        return TRUE;
    }

    // Check if UTF-8 file identifier "" exists
    pFile->SeekToBegin();

    BYTE u8_Buf[3];
    UINT u32_Read = pFile->Read(u8_Buf, 3);
    if  (u32_Read == 3 && u8_Buf[0] == 0xEF && u8_Buf[1] == 0xBB && u8_Buf[2] == 0xBF)
    {
        m_nCodePage = CP_UTF8;
        return FALSE;
    }


_________________________________________________


Then in

void CXTPSyntaxEditBufferManager::SerializeEx() add the bold lines:


    else if (ar.IsStoring())
    {
        if (bUnicode == -1)
            bUnicode = m_bUnicodeFileFormat;

        if (bUnicode && bWriteUnicodeFilePrefix)
        {
            ar << (BYTE)0xFF;
            ar << (BYTE)0xFE;
        }

        if (!bUnicode && m_nCodePage == CP_UTF8)
        {
            ar << (BYTE)0xEF;
            ar << (BYTE)0xBB;
            ar << (BYTE)0xBF;
        }

        CByteArray arBuffer;

        int nCRLFStyle = GetCurCRLFType();

____________________________________________


And last but no least in XTPSyntaxEditCtrl.cpp

    int nBytes = ::WideCharToMultiByte(uCodePage, 0, (LPWSTR)lpSource, -1, lpMBCSSource, nLen, NULL, NULL);

    // ASSERT(nBytes <= (int)dwBytes);  // removed: Nonsense

    lpMBCSSource[nBytes] = _T('\0');



This Assert is complete nonsense.
It is normal that a conversion from Unicode to any codepage makes the string longer.
There is no reason to check the length of the string and assert that it is shorter than the Unicode string.
In UTF8 one character may be represented by 4 bytes.

_______________________________________________

The buffer manager has several design flaws.
Instead of working with a lot of fixed size buffers and calling WideCharToMultiByte several times
it would have been nicer to write a class that encapsulates this stuff and automatically converts from

Unicode --> ANSI
ANSI --> Unicode
UTF8 --> Unicode
Unicode --> UTF8
ANSI --> UTF8
etc..
and this class should take care to allocate the buffer.

This would result in much cleaner code and less coding flaws.

The syntax highlighting will not work any longer if the user enters lines that are longer than 128 characters.
Why do the Codejock programmers not use a dynamic buffer that is allocated once the file gets parsed ?

I hope to see UTF8 support in the next version.

Elmü

Back to Top
 Post Reply Post Reply
  Share Topic   

Forum Jump Forum Permissions View Drop Down

Forum Software by Web Wiz Forums® version 12.04
Copyright ©2001-2021 Web Wiz Ltd.

This page was generated in 0.203 seconds.