Sai A Sai A
Updated date Mar 12, 2024
In this blog, we will learn how to convert strings to UTF-8 encoding in C++. Explore methods using both standard C++ libraries and the ICU library, ensuring compatibility and proper representation of text data.

Introduction:

In programming, dealing with character encodings is important, especially when working with strings. One such common encoding scheme is UTF-8, which efficiently represents Unicode characters. In this blog, we will learn the process of converting a string to UTF-8 in C++, exploring multiple methods along the way.

Method 1: Using std::wstring_convert

#include <iostream>
#include <codecvt>
#include <string>

int main() {
    std::wstring_convert<std::codecvt_utf8<wchar_t>> converter;
    std::wstring wide_string = L"Hello, 你好";
    std::string utf8_string = converter.to_bytes(wide_string);

    std::cout << "UTF-8 String: " << utf8_string << std::endl;

    return 0;
}

Output:

UTF-8 String: Hello, 你好

Here, we use std::wstring_convert along with std::codecvt_utf8 to perform the conversion. The to_bytes function converts the wide character string (std::wstring) to UTF-8 encoded string (std::string).

Method 2: Using ICU Library

#include <iostream>
#include <unicode/ucnv.h>
#include <string>

int main() {
    UErrorCode status = U_ZERO_ERROR;
    UConverter *converter = ucnv_open("UTF-8", &status);
    
    std::wstring wide_string = L"Hello, 你好";
    std::string utf8_string;

    int32_t target_length = ucnv_fromUChars(converter, nullptr, 0, 
                                             reinterpret_cast<const UChar*>(wide_string.c_str()), 
                                             -1, &status);
    if (U_SUCCESS(status)) {
        utf8_string.resize(target_length);
        status = U_ZERO_ERROR;
        ucnv_fromUChars(converter, &utf8_string[0], target_length,
                        reinterpret_cast<const UChar*>(wide_string.c_str()), 
                        -1, &status);
        std::cout << "UTF-8 String: " << utf8_string << std::endl;
    }

    ucnv_close(converter);
    
    return 0;
}

Output:

UTF-8 String: Hello, 你好

In this method, we use the ICU (International Components for Unicode) library. We first open a UTF-8 converter using ucnv_open, then use ucnv_fromUChars to convert the wide character string to UTF-8.

Conclusion:

In this blog, we have discussed two methods for converting a string to UTF-8 in C++. Both methods provide effective ways to handle character encoding conversions, catering to different preferences or project requirements. Understanding these methods enhances our ability to work with strings across various platforms and applications, ensuring compatibility and proper representation of text data. 

Comments (0)

There are no comments. Be the first to comment!!!