What's new in version 1.1

Compatible with both wxWidgets 2.8 and 2.9

The new version is compatible with both wxWidgets 2.8 and 2.9. Tried using wxGTK 2.8.10 and 2.9.1 (daily snapshot wxWidgets-2009-10-17)

A new AsXxxxxx() function

The wxJSONValue::AsXxxxx() function can be used to get the value of a JSON value but you have first to check if it is of the expected type. So you would probably write code like this one:

  wxJSONValue v["key"] = 100;
  int i;
  if ( v["key"].IsInt() ) {
    i = v["key"].AsInt();
  }
  else {
    cout << "Error: value is not of the expected type";
  }

This release adds a new version of all overloaded AsXxxxxx() function which stores the value in the provided argument and returns TRUE if the value stored in the JSON value object is of the correct type. This is the function prototype for integer value:

  bool AsInt( int& i );

Now you can get the value and check if it is of the expected type in only one call:

  wxJSONValue v["key"] = 100;
  int i;
  if ( !v["key"].AsInt( i ) ) {
    cout << "Error: value is not of the expected type";
  }

The new reader and writer organizaion

Until version 1.0 the wxJSON reader and writer had some issues mostly related to speed. The problem was that both the reader and the writer performed a character conversion from / to UTF-8 and unicode for every char read from / written to streams. Worst, in ANSI builds, every char undergoes to a double conversion for both the reader and the writer (the following is for the reader):

from UTF-8 to wchar_t
from wchar_t to locale dependent char

Also note that such a conversion is, for most characters, not needed at all because those chars are in the US-ASCII charset (0x00..0x7F).

In version 3.0 of the GUI framework, developers have introduced a radical change to Unicode support and the wxString class has totally changed in its internal organisation. In particular, the wxString class now stores strings in UTF-16 encoding on Windows and in UTF-8 on unix systems. The drawback is that on *nix systems the usual character access using subscripts such as:

    wxString s;
    s[n];

is VERY inefficient because of the UTF-8 encoding. The conseguence is that in wxJSON there is a speed issue also when the JSON text input is from wxString and not only from streams.

What are the goals of the new 1.1 version

In order to find the best organization for the reader and the writer I have to first point out what are the goals of this new release of wxJSON:

compatibility with wxWidgets 2.8 and 2.9

compatibility with wxJSON version 1.0: I do not want to break the compatibility with 1.0 version otherwise I will have to change the major version number

speed improvements: I will try to speed-up both the reader and the writer. The conversion of each character is very slow; there are better solutions as pointed out by Piotr Likus in his e-mail of november 2008

simplicity: JSON format is very easy to read and write for humans but it is also easy for machines to parse and generate. The wxJSON library has to be simple in the processing of JSON text.

The new wxJSON organisation

The wxJSON library allows you to write / read JSON text to / from two different types of objects:

a string of type wxString
a stream of type wxInput/OutputStream

These two kinds of I/O classes are very different because of the internal representation of the JSON text: in particular, wxString uses UTF-16 on windows and UTF-32 on *nix systems up to wxWidgets 2.8. UTF-8 is used on *nix systems in wxWidgets 2.9. For streams the encoding is alwasy UTF-8. A further different encoding is used in ANSI mode: locale dependent one-byte characters.

Encoding formats in the different wxWidget's modes / versions /platforms

These encoding differences complicates very much the organization of the writer and the reader because character read from / written to JSON text has to be converted to a unique type for processing. Actually, each char is converted to a wchar_t type and it occurs in ANSI mode, too. This conversion slows down the processing very much. A further complication is that wxWidgets 2.9 does no more return a char or wchar_t type when accessing string objects but a helper class: wxUniChar which has its own encoding format so that it has to be further converted to wchar_t.

The solution is to use only one encoding format for all types of I/O, build mode and wxWidget's versions: UTF-8 is the only one applicable to all these cases. Using UTF-8 as the unique I/O format has several advantages:

UTF-8 does not have endianness or byte order issues

the pocessing of characters is byte-oriented so there is no need to deal with wchar_t or wxUniChar: special JSON characters, literal and numbers lie in the US-ASCII character set (one UTF-8 code unit).

the read operation of string values is easy: when a double-quote byte is encontered, the UTF-8 stream is simply copied to a temporary memory buffer until the next unescaped double-quote char (we just process escaped chars). When the buffer has been read, it is converted in one single step to wxString using wxString::FromUTF8() function.

the write operation of strings is easy: the wxString value is converted to a temporary UTF-8 memory buffer using the static wxString::ToUTF8() function. The bytes are then written to the UTF-8 stream: we only have to process control chars and escaped characters.

from the point of view of the processing, there is no difference between ANSI and Unicode because the processing is byte-oriented.

The only drawback is when input / output is not from / to a stream (which is in UTF-8 format) but from / to a wxString object. The solution I found is:

when input is from a string object, the reader first convert the wxString JSON text input in a temporary UTF-8 memory buffer which is used to construct a wxMemoryInputStream object

when output is to be sent to a string object, the writer first construct a temporary empty wxMemoryOutputStream which holds the JSON text output. When the write operation is done, the temporary memory buffer is converted to a wxString object using the wxString::FormUTF8() function.

So, as opposed to the previous versions, the read / write operations are faster on streams and slower on strings because of the construction of the temporary UTF-8 memory buffers.

New feature in ANSI mode

The default behaviour of the wxJSON library in ANSI builds when reading from / writing to streams is to use UTF-8 encoded text so that the writer produces valid JSON text and the reader gives some limited Unicode support (see Unicode support: ANSI builds for details).

In version 1.1 I introduced a new feature for ANSI builds: both the reader and the writer can be constructed with a special flag:

  wxJSONREADER_NOUTF8_STREAM
  wxJSONWRITER_NOUTF8_STREAM

that suppress UTF-8 conversion:

the writer produces ANSI JSON text which can be correctly read by the application that wrote it provided that it is localized in the same locale
the reader does not try to convert the JSON text input from UTF-8 because it suppose that the text is stored in ANSI format.

The advantage of this ANSI mode is speed: string's data is simply copied from the stream to wxString by the reader; JSON values containing string are simply copied from the wxString object to the stream by the writer.

Also note that this feature can be used also if the stream is in UTF-8 format. By suppressing UTF-8 conversion the reader copies all UTF-8 code units in the wxString object: naturally, the meaning of those code-units are misunderstood by the wxString object which treats them as locale dependent one-byte characters: the advantage is that writing them back in ANSI mode just preserve the UTF-8 encoding because each UTF-8 code-unit is simply copied to the stream. The drawback is that the UTF-8 stream read by the wxJSON reader may contain characters that can be represented in the current locale. If UTF-8 conversion is disabled in the reader, no conversion takes place.

For example: suppose that in a UTF-8 input stream there is the following string:

  C3 A0 C3 A8 C3 AC C2 A9 C2 AE

which has to following meaning (five characters, ten UTF-8 code-units)

  àèì©®

The character's meaning could be correcly interpreted in an ANSI application localized in a Latin-1 charset If UTF-8 conversion is not disabled, the wxJSON reader reads the five Latin-1 characters correctly. On the other hand, if UTF-8 conversion is disabled, the ten UTF-8 code-units are simply copied to the wxString object which contains 10 characters whose meaning (in ths ISO-8859-1 charset) is the following:

  Ã Ã¨Ã¬Â©Â®