Introduction:
In programming, dealing with character encodings is important, especially when working with strings. One such common encoding scheme is UTF-8, which efficiently represents Unicode characters. In this blog, we will learn the process of converting a string to UTF-8 in C++, exploring multiple methods along the way.
Method 1: Using std::wstring_convert
#include <iostream>
#include <codecvt>
#include <string>
int main() {
std::wstring_convert<std::codecvt_utf8<wchar_t>> converter;
std::wstring wide_string = L"Hello, 你好";
std::string utf8_string = converter.to_bytes(wide_string);
std::cout << "UTF-8 String: " << utf8_string << std::endl;
return 0;
}
Output:
UTF-8 String: Hello, 你好
Here, we use std::wstring_convert
along with std::codecvt_utf8
to perform the conversion. The to_bytes
function converts the wide character string (std::wstring
) to UTF-8 encoded string (std::string
).
Method 2: Using ICU Library
#include <iostream>
#include <unicode/ucnv.h>
#include <string>
int main() {
UErrorCode status = U_ZERO_ERROR;
UConverter *converter = ucnv_open("UTF-8", &status);
std::wstring wide_string = L"Hello, 你好";
std::string utf8_string;
int32_t target_length = ucnv_fromUChars(converter, nullptr, 0,
reinterpret_cast<const UChar*>(wide_string.c_str()),
-1, &status);
if (U_SUCCESS(status)) {
utf8_string.resize(target_length);
status = U_ZERO_ERROR;
ucnv_fromUChars(converter, &utf8_string[0], target_length,
reinterpret_cast<const UChar*>(wide_string.c_str()),
-1, &status);
std::cout << "UTF-8 String: " << utf8_string << std::endl;
}
ucnv_close(converter);
return 0;
}
Output:
UTF-8 String: Hello, 你好
In this method, we use the ICU (International Components for Unicode) library. We first open a UTF-8 converter using ucnv_open
, then use ucnv_fromUChars
to convert the wide character string to UTF-8.
Conclusion:
In this blog, we have discussed two methods for converting a string to UTF-8 in C++. Both methods provide effective ways to handle character encoding conversions, catering to different preferences or project requirements. Understanding these methods enhances our ability to work with strings across various platforms and applications, ensuring compatibility and proper representation of text data.
Comments (0)