Module rfc4627

Description
Data Types
Function Index
Function Details

An implementation of RFC 4627 (JSON, the JavaScript Object Notation) for Erlang.

Authors: Tony Garnock-Jones (tonygarnockjones@gmail.com), LShift Ltd. (query@lshift.net).

References

RFC 4627, the JSON RFC
JSON in general
Joe Armstrong's message describing the basis of the JSON data type mapping that this module uses

Description

An implementation of RFC 4627 (JSON, the JavaScript Object Notation) for Erlang.

The basic API is comprised of the encode/1 and decode/1 functions.

Data Type Mapping

The data type mapping I've implemented is as per Joe Armstrong's message http://www.erlang.org/ml-archive/erlang-questions/200511/msg00193.html - see json().

When serializing a string, if characters are found with codepoint >127, we rely on the unicode encoder to build the proper byte sequence for transmission. We still use the \uXXXX escape for control characters (other than the RFC-specified specially recognised ones).

decode/1 will autodetect the unicode encoding used, and any strings returned in the result as binaries will contain UTF-8 encoded byte sequences for codepoints >127. Object keys containing codepoints >127 will be returned as lists of codepoints, rather than being UTF-8 encoded. If you have already transformed the text to parse into a list of unicode codepoints, perhaps by your own use of unicode_decode/1, then use decode_noauto/1 to avoid redundant and erroneous double-unicode-decoding.

Similarly, encode/1 produces text that is already UTF-8 encoded. To get raw codepoints, use encode_noauto/1 and encode_noauto/2. You can use unicode_encode/1 to UTF-encode the results, if that's appropriate for your application.

Differences to the specification

I'm lenient in the following ways during parsing:

repeated commas in arrays and objects collapse to a single comma
any character =<32 is considered whitespace
leading zeros for numbers are accepted
we don't restrict the toplevel token to only object or array - any JSON value can be used at toplevel

Data Types

byte()

byte() = integer()

An integer >=0 and =<255.

json()

An Erlang representation of a general JSON value.

jsonarray()

jsonarray() = [json()]

A JSON array value.

jsonkey()

jsonkey() = string()

A field-name within a JSON "object".

jsonnum()

jsonnum() = integer() | float()

A JSON numeric value.

jsonobj()

jsonobj() = {obj, [{jsonkey(), json()}]}

A JSON "object" or "struct".

jsonstr()

jsonstr() = binary()

A JSON string value.

Function Index

decode/1	Decodes a JSON value from an input binary or string of Unicode-encoded text.
decode_noauto/1	As `decode/1`, but does not perform Unicode decoding on its input.
digit_hex/1	Returns the number corresponding to Hexchar.
encode/1	Encodes the JSON value supplied, first into Unicode codepoints, and then into UTF-8.
encode_noauto/1	Encodes the JSON value supplied into raw Unicode codepoints.
encode_noauto/2	As `encode_noauto/1`, but prepends reversed text to the supplied accumulator string.
equiv/2	Tests equivalence of JSON terms.
exclude_field/2	Exclude a named field from a JSON "object".
from_record/3	Used by the `?RFC4627_FROM_RECORD` macro in `rfc4627.hrl`.
get_field/2	Retrieves the value of a named field of a JSON "object".
get_field/3	Retrieves the value of a named field of a JSON "object", or a default value if no such field is present.
hex_digit/1	Returns the character code corresponding to Nibble.
mime_type/0	Returns the IANA-registered MIME type for JSON data.
set_field/3	Adds or replaces a named field with the given value.
to_record/3	Used by the `?RFC4627_TO_RECORD` macro in `rfc4627.hrl`.
unicode_decode/1	Autodetects and decodes using the Unicode encoding of its input.
unicode_encode/1	Encodes the given characters to bytes, using the given Unicode encoding.

Function Details

decode/1

decode(Input::binary() | [byte()]) -> {ok, json(), Remainder} | {error, Reason}

Remainder = string()
Reason = any()

Decodes a JSON value from an input binary or string of Unicode-encoded text.

Given a binary, converts it to a list of bytes. Given a list/string, interprets it as a list of bytes.

Uses unicode_decode/1 on its input, which results in a list of codepoints, and then decodes a JSON value from that list of codepoints.

Returns either {ok, Result, Remainder}, where Remainder is the remaining portion of the input that was not consumed in the process of decoding Result, or {error, Reason}.

decode_noauto/1

decode_noauto(Input::string()) -> {ok, json(), string()} | {error, any()}

As decode/1, but does not perform Unicode decoding on its input.

Expects a list of codepoints - an ordinary Erlang string - rather than a list of Unicode-encoded bytes.

digit_hex/1

digit_hex(Hexchar::char()) -> integer()

Returns the number corresponding to Hexchar.

Hexchar must be one of the characters $0 through $9, $A through $F or $a through $f.

encode/1

encode(X::json()) -> [byte()]

Encodes the JSON value supplied, first into Unicode codepoints, and then into UTF-8.

The resulting string is a list of byte values that should be interpreted as UTF-8 encoded text.

During encoding, atoms and binaries are accepted as keys of JSON objects (type jsonkey()) as well as the usual strings (lists of character codepoints).

encode_noauto/1

encode_noauto(X::json()) -> string()

Encodes the JSON value supplied into raw Unicode codepoints.

The resulting string may contain codepoints with value >=128. You can use unicode_encode/1 to UTF-encode the results, if that's appropriate for your application.

During encoding, atoms and binaries are accepted as keys of JSON objects (type jsonkey()) as well as the usual strings (lists of character codepoints).

encode_noauto/2

encode_noauto(Str::json(), Acc::string()) -> string()

As encode_noauto/1, but prepends reversed text to the supplied accumulator string.

equiv/2

equiv(A::json(), B::json()) -> bool()

Tests equivalence of JSON terms.

After Bob Ippolito's equiv predicate in mochijson.

exclude_field/2

exclude_field(JsonObject::jsonobj(), Key::atom()) -> jsonobj()

Exclude a named field from a JSON "object".

from_record/3

from_record(R::Record, RecordName::atom(), Fields::[any()]) -> jsonobj()

Record = tuple()

Used by the ?RFC4627_FROM_RECORD macro in rfc4627.hrl.

Given a record type definiton of

-record(myrecord, {field1,
  field})

, and a value V = #myrecord{}, the code ?RFC4627_FROM_RECORD(myrecord, V) will return a JSON "object" with fields corresponding to the fields of the record. The macro expands to a call to the from_record function.

get_field/2

get_field(JsonObject::jsonobj(), Key::atom()) -> {ok, json()} | not_found

Retrieves the value of a named field of a JSON "object".

get_field/3

get_field(Obj::jsonobj(), Key::atom(), DefaultValue::json()) -> json()

Retrieves the value of a named field of a JSON "object", or a default value if no such field is present.

hex_digit/1

hex_digit(Nibble::integer()) -> char()

Returns the character code corresponding to Nibble.

Nibble must be >=0 and =<15.

mime_type/0

mime_type() -> string()

Returns the IANA-registered MIME type for JSON data.

set_field/3

set_field(JsonObject::jsonobj(), Key::atom(), NewValue::json()) -> jsonobj()

Adds or replaces a named field with the given value.

Returns a JSON "object" that contains the new field value as well as all the unmodified fields from the first argument.

to_record/3

to_record(JsonObject::jsonobj(), DefaultValue::Record, Fields::[atom()]) -> Record

Record = tuple()

Used by the ?RFC4627_TO_RECORD macro in rfc4627.hrl.

Given a record type definiton of

-record(myrecord, {field1,
  field})

, and a JSON "object"

J = {obj, [{"field1", 123},
  {"field2", 234}]}

, the code ?RFC4627_TO_RECORD(myrecord, J) will return a record #myrecord{field1 = 123, field2 = 234}. The macro expands to a call to the to_record function.

unicode_decode/1

unicode_decode(C::[byte()]) -> [char()]

Autodetects and decodes using the Unicode encoding of its input.

From RFC4627, section 3, "Encoding":

JSON text SHALL be encoded in Unicode. The default encoding is UTF-8.

Since the first two characters of a JSON text will always be ASCII characters [RFC0020], it is possible to determine whether an octet stream is UTF-8, UTF-16 (BE or LE), or UTF-32 (BE or LE) by looking at the pattern of nulls in the first four octets.
00 00 00 xx UTF-32BE 00 xx 00 xx UTF-16BE xx 00 00 00 UTF-32LE xx 00 xx 00 UTF-16LE xx xx xx xx UTF-8

Interestingly, the BOM (byte-order mark) is not mentioned. We support it here by using it to detect our encoding, discarding it if present, even though RFC4627 explicitly notes that the first two characters of a JSON text will be ASCII.

If a BOM (http://unicode.org/faq/utf_bom.html) is present, we use that; if not, we use RFC4627's rules (as above). Note that UTF-32 is the same as UCS-4 for our purposes (but see also http://unicode.org/reports/tr19/tr19-9.html). Note that UTF-16 is not the same as UCS-2!

Note that I'm using xmerl's UCS/UTF support here. There's another UTF-8 codec in asn1rt, which works on binaries instead of lists.

unicode_encode/1

unicode_encode(EncodingAndCharacters::{Encoding, [char()]}) -> [byte()]

Encoding = 'utf-32' | 'utf-32be' | 'utf-32le' | 'utf-16' | 'utf-16be' | 'utf-16le' | 'utf-8'

Encodes the given characters to bytes, using the given Unicode encoding.

For convenience, we supply a partial inverse of unicode_decode; If a BOM is requested, we more-or-less arbitrarily pick the big-endian variant of the encoding, since big-endian is network-order. We don't support UTF-8 with BOM here.

Generated by EDoc, Nov 21 2012, 14:49:55.