Copyright © 2007-2010, 2011, 2012 Tony Garnock-Jones and 2007-2010 LShift Ltd.
Authors: Tony Garnock-Jones (tonygarnockjones@gmail.com), LShift Ltd. (query@lshift.net).
References
An implementation of RFC 4627 (JSON, the JavaScript Object Notation) for Erlang.
The basic API is comprised of the encode/1
and decode/1
functions.
The data type mapping I've implemented is as per Joe Armstrong's
message http://www.erlang.org/ml-archive/erlang-questions/200511/msg00193.html - see json()
.
When serializing a string, if characters are found with codepoint >127, we rely on the unicode encoder to build the proper byte sequence for transmission. We still use the \uXXXX escape for control characters (other than the RFC-specified specially recognised ones).
decode/1
will autodetect the unicode encoding used, and any
strings returned in the result as binaries will contain UTF-8
encoded byte sequences for codepoints >127. Object keys containing
codepoints >127 will be returned as lists of codepoints, rather
than being UTF-8 encoded. If you have already transformed the text
to parse into a list of unicode codepoints, perhaps by your own use
of unicode_decode/1
, then use decode_noauto/1
to
avoid redundant and erroneous double-unicode-decoding.
Similarly, encode/1
produces text that is already UTF-8
encoded. To get raw codepoints, use encode_noauto/1
and
encode_noauto/2
. You can use unicode_encode/1
to
UTF-encode the results, if that's appropriate for your application.
I'm lenient in the following ways during parsing:
byte() = integer()
An integer >=0 and =<255.
json() = jsonobj() | jsonarray() | jsonnum() | jsonstr() | true | false | null
An Erlang representation of a general JSON value.
jsonarray() = [json()]
A JSON array value.
jsonkey() = string()
A field-name within a JSON "object".
jsonnum() = integer() | float()
A JSON numeric value.
jsonobj() = {obj, [{jsonkey(), json()}]}
A JSON "object" or "struct".
jsonstr() = binary()
A JSON string value.
decode/1 | Decodes a JSON value from an input binary or string of Unicode-encoded text. |
decode_noauto/1 | As decode/1 , but does not perform Unicode decoding on its input. |
digit_hex/1 | Returns the number corresponding to Hexchar. |
encode/1 | Encodes the JSON value supplied, first into Unicode codepoints, and then into UTF-8. |
encode_noauto/1 | Encodes the JSON value supplied into raw Unicode codepoints. |
encode_noauto/2 | As encode_noauto/1 , but prepends reversed text
to the supplied accumulator string. |
equiv/2 | Tests equivalence of JSON terms. |
exclude_field/2 | Exclude a named field from a JSON "object". |
from_record/3 | Used by the ?RFC4627_FROM_RECORD macro in rfc4627.hrl . |
get_field/2 | Retrieves the value of a named field of a JSON "object". |
get_field/3 | Retrieves the value of a named field of a JSON "object", or a default value if no such field is present. |
hex_digit/1 | Returns the character code corresponding to Nibble. |
mime_type/0 | Returns the IANA-registered MIME type for JSON data. |
set_field/3 | Adds or replaces a named field with the given value. |
to_record/3 | Used by the ?RFC4627_TO_RECORD macro in rfc4627.hrl . |
unicode_decode/1 | Autodetects and decodes using the Unicode encoding of its input. |
unicode_encode/1 | Encodes the given characters to bytes, using the given Unicode encoding. |
decode(Input::binary() | [byte()]) -> {ok, json(), Remainder} | {error, Reason}
Decodes a JSON value from an input binary or string of Unicode-encoded text.
Given a binary, converts it to a list of bytes. Given a list/string, interprets it as a list of bytes.
Uses unicode_decode/1
on its input, which results in a list
of codepoints, and then decodes a JSON value from that list of
codepoints.
{ok, Result, Remainder}
, where Remainder is the
remaining portion of the input that was not consumed in the process
of decoding Result, or {error, Reason}
.
decode_noauto(Input::string()) -> {ok, json(), string()} | {error, any()}
As decode/1
, but does not perform Unicode decoding on its input.
digit_hex(Hexchar::char()) -> integer()
Returns the number corresponding to Hexchar.
Hexchar must be one of the characters$0
through $9
, $A
through $F
or $a
through $f
.
encode(X::json()) -> [byte()]
Encodes the JSON value supplied, first into Unicode codepoints, and then into UTF-8.
The resulting string is a list of byte values that should be interpreted as UTF-8 encoded text.
During encoding, atoms and binaries are accepted as keys of JSON objects (typejsonkey()
) as well as the usual strings
(lists of character codepoints).
encode_noauto(X::json()) -> string()
Encodes the JSON value supplied into raw Unicode codepoints.
The resulting string may contain codepoints with value >=128. You
can use unicode_encode/1
to UTF-encode the results, if
that's appropriate for your application.
jsonkey()
) as well as the usual strings
(lists of character codepoints).
encode_noauto(Str::json(), Acc::string()) -> string()
As encode_noauto/1
, but prepends reversed text
to the supplied accumulator string.
Tests equivalence of JSON terms.
After Bob Ippolito'sequiv
predicate in mochijson.
Exclude a named field from a JSON "object".
from_record(R::Record, RecordName::atom(), Fields::[any()]) -> jsonobj()
Used by the ?RFC4627_FROM_RECORD
macro in rfc4627.hrl
.
-record(myrecord, {field1,
field})
, and a value V = #myrecord{}
, the code
?RFC4627_FROM_RECORD(myrecord, V)
will return a JSON "object"
with fields corresponding to the fields of the record. The macro
expands to a call to the from_record
function.
Retrieves the value of a named field of a JSON "object".
Retrieves the value of a named field of a JSON "object", or a default value if no such field is present.
hex_digit(Nibble::integer()) -> char()
Returns the character code corresponding to Nibble.
Nibble must be >=0 and =<15.mime_type() -> string()
Returns the IANA-registered MIME type for JSON data.
Adds or replaces a named field with the given value.
Returns a JSON "object" that contains the new field value as well as all the unmodified fields from the first argument.to_record(JsonObject::jsonobj(), DefaultValue::Record, Fields::[atom()]) -> Record
Used by the ?RFC4627_TO_RECORD
macro in rfc4627.hrl
.
-record(myrecord, {field1,
field})
, and a JSON "object" J = {obj, [{"field1", 123},
{"field2", 234}]}
, the code ?RFC4627_TO_RECORD(myrecord, J)
will return a record #myrecord{field1 = 123, field2 = 234}
.
The macro expands to a call to the to_record
function.
unicode_decode(C::[byte()]) -> [char()]
Autodetects and decodes using the Unicode encoding of its input.
From RFC4627, section 3, "Encoding":
JSON text SHALL be encoded in Unicode. The default encoding is UTF-8.
Since the first two characters of a JSON text will always be ASCII characters [RFC0020], it is possible to determine whether an octet stream is UTF-8, UTF-16 (BE or LE), or UTF-32 (BE or LE) by looking at the pattern of nulls in the first four octets.
00 00 00 xx UTF-32BE 00 xx 00 xx UTF-16BE xx 00 00 00 UTF-32LE xx 00 xx 00 UTF-16LE xx xx xx xx UTF-8
Interestingly, the BOM (byte-order mark) is not mentioned. We support it here by using it to detect our encoding, discarding it if present, even though RFC4627 explicitly notes that the first two characters of a JSON text will be ASCII.
If a BOM (http://unicode.org/faq/utf_bom.html) is present, we use that; if not, we use RFC4627's rules (as above). Note that UTF-32 is the same as UCS-4 for our purposes (but see also http://unicode.org/reports/tr19/tr19-9.html). Note that UTF-16 is not the same as UCS-2!
Note that I'm using xmerl's UCS/UTF support here. There's another UTF-8 codec in asn1rt, which works on binaries instead of lists.unicode_encode(EncodingAndCharacters::{Encoding, [char()]}) -> [byte()]
Encodes the given characters to bytes, using the given Unicode encoding.
For convenience, we supply a partial inverse of unicode_decode; If a BOM is requested, we more-or-less arbitrarily pick the big-endian variant of the encoding, since big-endian is network-order. We don't support UTF-8 with BOM here.Generated by EDoc, Nov 21 2012, 14:49:55.