Get UTF-8 string from array of bytes in Node.JS
- Date: 26 Oct 2011
- Category: development/node.js
- Tagged with: utf8, decode, string, and javascript
While I was working on js-yaml - JavaScript port of PyYAML (by the time of writing this post, js-yaml is still in WIP stage), I found that I need something to convert stream of bytes into a string. So this is a quick and simple example of how to get an UTF8 string from the stream of bytes.
Lets start with problem definition: “We need to get string from its
representation in bytes”. In python it would be something akin
to this (assuming codes
is a list of integers):
bytes(codes).decode('utf-8')
Now lets encode and then decode a Russian word, that describes my feeling about
using JavaScript on the server: какашка
. We can get array of UTF8 bytes
representation of a string with following snippet:
function getBytes(str) {
var bytes = [], char;
str = encodeURI(str);
while (str.length) {
char = str.slice(0, 1);
str = str.slice(1);
if ('%' !== char) {
bytes.push(char.charCodeAt(0));
} else {
char = str.slice(0, 2);
str = str.slice(2);
bytes.push(parseInt(char, 16));
}
}
return bytes;
};
The function above returns an array of integers, so for example, it will return
[90]
for 'Z'
, or [208, 175]
for 'Я'
or [90, 208, 175]
for 'ZЯ'
. Now
lets get bytes array for our “magic word”…
var bytes = getBytes('какашка');
// -> [ 208, 186, 208, 176, 208, 186, 208, 176, 209, 136, 208, 186, 208, 176 ]
And now! Ladies and Gentlemen! *drum roll* Here is our snippet to get string representation back:
var buff = new Buffer(bytes);
console.log(buff.toString('utf8'));
// -> какашка