Introduction:
In programming, handling character encoding can be tough sometimes. One of the most widely used character encodings is UTF-8, which allows the representation of virtually all characters in the Unicode standard. In this blog, we will explore into UTF-8 encoding and learn how to convert UTF-8 encoded bytes into human-readable strings using C++.
Method 1: Using Standard Libraries
The first method we will explore is utilizing C++ standard libraries to convert UTF-8 encoded bytes into strings. The std::wstring_convert
class from the <codecvt>
header can be used for this purpose.
#include <iostream>
#include <locale>
#include <codecvt>
int main() {
std::string utf8String = u8"Hello, 你好, नमस्ते";
std::wstring_convert<std::codecvt_utf8<wchar_t>> converter;
std::wstring wideString = converter.from_bytes(utf8String);
std::wcout << wideString << std::endl;
return 0;
}
Output:
Hello, 你好, नमस्ते
In this method, we first define a UTF-8 encoded string utf8String
. We then use std::wstring_convert
with std::codecvt_utf8
to convert this UTF-8 string into a wide string (std::wstring
). Finally, we print the wide string to the console.
Method 2: Using ICU Library
Another approach to UTF-8 string conversion is by utilizing the International Components for Unicode (ICU) library, which provides comprehensive support for Unicode-related operations.
#include <iostream>
#include <unicode/ucnv.h>
int main() {
std::string utf8String = u8"Hello, 你好, नमस्ते";
UErrorCode status = U_ZERO_ERROR;
UConverter *conv = ucnv_open("UTF-8", &status);
if (U_FAILURE(status)) {
std::cerr << "Error opening converter: " << u_errorName(status) << std::endl;
return 1;
}
UChar *uBuffer = new UChar[utf8String.length()];
int32_t uLength = ucnv_toUChars(conv, uBuffer, utf8String.length(), utf8String.c_str(), utf8String.length(), &status);
if (U_FAILURE(status)) {
std::cerr << "Error converting string: " << u_errorName(status) << std::endl;
ucnv_close(conv);
delete[] uBuffer;
return 1;
}
std::wcout << uBuffer << std::endl;
ucnv_close(conv);
delete[] uBuffer;
return 0;
}
Output:
Hello, 你好, नमस्ते
Here, we include the necessary ICU header and utilize the ucnv_open()
function to open a converter for UTF-8 encoding. We then use ucnv_toUChars()
to convert the UTF-8 string into a sequence of UChar
characters. Finally, we print the converted wide string.
Conclusion:
In this blog, we explored two methods for converting UTF-8 encoded strings into human-readable strings using C++. The first method used C++ standard libraries, while the second method used the ICU library. Both methods provided accurate conversion of UTF-8 encoded bytes into strings, demonstrating the flexibility and robustness of C++ in handling character encoding.
Comments (0)