wxJSON internals

Introduction

The wxJSONValue class is much like a variant type which can hold various types (see the documentation of the wxWidgets' wxVariant class). The JSON value class is a bit different from wxVariant because it cannot hold whatever type you want but only the following:

The type of the JSON value contained in a wxJSONValue object is an enumerated constant of type wxJSONType. Starting from version 0.5 the wxJSON library supports 64-bits integers which introduced new JSON types. For more info see 64-bits and 32-bits integers.

There is no need to specify the type of a wxJSONValue object: it is automatically set when you construct the object or when a value is assigned to it. For example:

 wxJSONValue v;            // a 'null' value
 wxJSONValue v1( 10 );     // signed integer type
 wxJSONValue v2( 12.90);   // double type
 v1 = "some string";       // now 'v1' is of type string

 wxJSONValue v3;           // a 'null' value
 v3.Append( 10 );          // 'v3' is now of type wxJSONTYPE_ARRAY

The only exception to this is when you want to set the wxJSONTYPE_INVALID. Note that you should cast the wxJSONTYPE_INVALID constant to a wxJSONType type because some compilers may assume the constant value to be an int:

 wxJSONValue value( (wxJSONType) wxJSONTYPE_INVALID );

The wxJSONRefData structure
The wxJSONValue class does not actually hold data. It only holds a pointer to a data structure of type wxJSONRefData. The latter actually holds data: the type of the value, the value itself, comments, etc.. This is used to implement copy-on-write (see wxJSON internals: reference counting for details).

All data is stored in the wxJSONRefData class which is just a simple structure: the class does not define an interface for accessing data: it only defines the data members and the ctors and dtor. The interface is totally defined in the wxJSONValue class which, in turn, does not contain any data (with the exception of the pointer to referenced data).

The data structure holds the type of the JSON value, the JSON value itself, the comment lines, if any, etc. To know more about the individual data member defined in the class see the documentation of wxJSONRefData. The data structure holds data in two different modes:

The union is defined as follows:

  union wxJSONValueHolder  {
    int              m_valInt;
    unsigned int     m_valUInt;
    short int        m_valShort;
    unsigned short   m_valUShort;
    long int         m_valLong;
    unsigned long    m_valULong;
    double           m_valDouble;
    const wxChar*    m_valCString;
    bool             m_valBool;

 #if defined( wxJSON_64BIT_INT )
    wxInt64          m_valInt64;
    wxUint64         m_valUInt64;
 #endif
  };

The wxJSONRefData structure also holds the three complex objects that represent the three JSON value types: strings, arrays and objects (this is referred to JSON objets, not C++ class's instances):

    wxString             m_valString;
    wxJSONInternalArray  m_valArray;
    wxJSONInternalMap    m_valMap;

Note that primitive types are stored in a union and not in a structure: this means that when you store a value in one of the data member, all other are also affected. I will explain more clearly with an example:

integers are stored using the most wide storage size; (unsigned) long int by default and wx(U)Int64 on platforms that support 64-bits integers. (to know more about 64-bits integer support read 64-bits and 32-bits integers). So if you store a int data type of value -1, all other data member will get a value that depends on the data type stored. Below you find an hardcopy of the memory dump of a JSON value object which was assigned the integer value of -1:

intern01.png

A value of -1 is stored as all binary '1' in the union but the value returned by the wxJSONValue class depends on the type you want. In other words, if you get the value as an integer you get -1 but if you get the value as an unsigned integer you get different values depending on the size of the requested type. Also note that when the same value is returned as a double, the wxJSONValue::AsDouble() function does not promote the int to a double: the function just returns the bits as they are stored and interpreted as a double thus returning a NaN.

wxJSON internals: reference counting

Starting from version 0.4.0 the internal representation of a JSON value has totally changed because of the implementation of the reference counting tecnique also known as copy-on-write. Now the wxJSONValue class does not actually contain any data: every instance of this class holds a pointer to the actual data structure defined in the wxJSONRefData class. The structure contains a special data member that counts the number of instances of wxJSONValue objects that share the data.

If you look at the example memory dump seen above, you will note the wxJSONValue::m_refData data member that points to the actual data structure and the wxJSONRefData::m_refCount data member that counts how many JSON value objects share the data structure (one, in the example).

Reference counting is very simple: if you copy an instance of a wxJSONValue object, the data contained in the wxJSONRefData structure is not really copied but, instead, it is shared by the two JSON value objects which data pointer points to the same memory area. Here is an example:

  wxJSONValue v1( 12 );
  wxJSONvalue v2( v1 );

cow02.png

Reference counting is implemented in many wxWidget's classes such as wxBitmap, wxImage, etc but the wxJSONValue class is a bit different because objects of this class may contain other wxJSONValue objects and they can be nested to a virtually infinite level. This cause references to not being propagated in the hierarchy. Also, because values are accessed using the subscript operators - which are non-const functions - COW for wxJSONValue objects is not as efficient as we may expect.

In the following paragraphs I try to explain what happens when you make a copy of a wxJSONValue and then call some non-const functions on one of the two instances.

Making a copy of an array type.

In the following example I create an array type and set a value to the fourth element. The subscript operator automatically creates the first for elements and initializes them to a null value. Then, the integer value is assigned to the fourth element by the assignment operator. Note that the first three array's element share the same data: this is because the subscript operator automatically creates all instances of the items until the requested index. Needed items are created by copying (using COW) a temporary NULL JSON value:

  wxJSONValue v1;
  v1[3] = 12;           // set the value of the fourth item

cow08.png

Writing the value to a JSON text document we get the following:

 [
    null,
    null,
    null,
    12
 ]

Now copy the v1 JSON value to a v3 value. Note that the root JSON data structure is shared by the two instances.

  wxJSONValue v1;
  v1[3] = 12;           // set the value of the fourth item
  wxJSONValue v3(v1);   // make a copy of 'v1'

cow09.png

We already noted that the three null values in the array share the same data structure but because the root value is shared we only have a reference count of THREE for the NULL values. In fact, the data is shared by SIX JSON value objects: 3 items in v1 plus 3 items in v3 (six values in total) but as the parent object is shared, the wxJSONRefData::m_refCount data member only counts 3 shares.

Writing to a shared data

  wxJSONValue v1;
  v1[3] = 12;           // set the value of the fourth item
  wxJSONValue v3(v1);   // makes a copy of 'v1'

  v3[1] = 2;            // change the value of the second array's element

When we change a value of an array's element we would expect that a real copy of only that array's element is done and it is assigned the new value.

We are wrong. In fact, the wxJSONValue object makes a copy of the whole root object v3. Why? The answer is simple: the first impression is that the assignment operator is called for the second array element of v3 and this would cause a real copy of that array's element. In reality, before calling the operator= memberfunction the code fragment seen above must return a reference to the second element of v3's array. This reference is returned by the operator[] (the subscript operator) which is itself a non-const memberfunction. So, the subscript operator of the root value object makes a real copy of the referenced data. All array's elements are copied from v1's instance to v3. You may notice from the memory dump that the copy of elements is not a real copy but it uses COW. Below you find the memory dump of the two object after we had changed one array's element. As you can see, each root value has now an exclusive copy of the array:

cow10.png

In order to avoid the copy of the top-level array type we have had to use a const member function to access the second array's element. Note that we cannot use the wxJSONValue::ItemAt() function because this function returns a copy of the data not a reference to it:

  wxJSONValue v1;
  v1[3] = 12;           // set the value of the fourth item
  wxJSONValue v2(v1);   // makes a copy of 'v1'

  // does not work!!! what we do is to change a temporary copy
  // of the second array's element
  v2.ItemAt( 1 ) = "value changed";

The only suitable function is the wxJSONValue::Find() function which is, unfortunately, protected so it cannot be called from outside the class.

Another drawback of using non-const subscript operators is that the copy operation is done also when we do not write to the JSON value object but also when we read from it. This is an example:

  wxJSONValue v1;
  v1[3] = 12;           // set the value of the fourth item
  wxJSONValue v2(v1);   // makes a copy of 'v1'

  int i = v1[3].AsInt();   // read from 'v1'

Because the operator[] memberfunction is non-const, the read operation causes the wxJSONValue class to make an exclusive copy of shared data even when the access to the value is only for read purposes. The good news is that we can use wxJSONValue::ItemAt() in this case thus avoiding the copy operation of the shared data ( OK, tested see samples/test11.cpp function Test51() )

  wxJSONValue v1;
  v1[3] = 12;           // set the value of the fourth item
  wxJSONValue v2(v1);   // makes a copy of 'v1'

  int i = v1.ItemAt( 3 ).AsInt();

The problem is that we can use ItemAt() for only one level in the JSON value's hierarchy.

So is COW totally useless? No, it is not!

Even when using subscripts operators, the real copy of shared data is done only until the parent of the requested level: every other JSON value objects of the same level and of inferior levels are not really copied: COW is used for all of them. In the following image you see that in the above example of a four element's array, the JSON array value v1 is copied to v3 but the individual items are not really copied because 3 items of v1 and 2 items of v3 refer to the same referenced data (the NULL value):

cow11.png

In this example, the array's items are NULL values, thus the time that was saved in the COW in not really much but remember that an array's item may contain another array which may contain one or more key/value hashmaps, which may contain one or more array which .... and so on.

wxJSON internals: the C string type

  wxJSONValue( const wxChar* str );
  wxJSONValue( const wxString& str );

You may ask yourself why there are 2 different constructors for strings. For convenience, you may think, in order to save an implicit conversion from const wxChar* to wxString. The answer is: NO. The two constructors store the string in a very different way.

Both ctors store strings and they could be stored just as wxString objects. In fact, this is the default behaviour of the class if the WXJSON_USE_CSTRING macro is not defined.

If this macro is defined, however, the first ctor stores the pointer in the wxJSONRefData structure assuming that the string is statically allocated and it does NOT copy the string. This behaviour saves a string's copy which can be time-consuming but, on the other hand, you must be sure that the pointed-to buffer is not freed / deallocated for the lifetime of the wxJSONValue object (this is always true for static strings). The following code fragment is an example on how to use the static string type:

  wxJSONValue aString( _T("This is a static string"));

The code above is correct, because the pointed-to string is really statically allocated (and, on most platforms, static strings are allocated in the code segment so that they cannot be changed for the lifetime of the application).

The following code is not correct and it would probably result in a SEGFAULT when you try to access the wxJSONValue data. The problem is that the string is constructed on the stack which will be deallocated when the function returns. So, the returned JSON object contains a pointer to a deallocated memory area.

  // Example 1
  wxJSONValue MyFunction()
  {
    char buffer[128];
    snprintf( buffer, 128, "This is a string constructed on the stack");
    wxJSONValue aString( buffer );
    return aString;
  }

The code above should be written as follows:

  // Example 2
  wxJSONValue MyFunction()
  {
    char buffer[128];
    snprintf( buffer, 128, "This is a string constructed on the stack");
    wxJSONValue aString( wxString( buffer));
    return aString;
  }

Now it is correct because the wxString object holds a copy of the buffer memory area. Note that if the WXJSON_USE_CSTRING macro is not defined, there is no need to actually construct a temporary wxString object in order to force the wxJSONValue class to create an instance of the wxString object: it is automaticlly created by the wxJSONValue( const wxChar*) ctor. This mean that you can use use the code in Example 1 without worry about C-strings. By default, the wxJSON_USE_CSTRING macro is not defined.

If your application uses many static strings that never changes, you can save time by defining the symbol when compiling the wxJSON library

NOTES: the static C-string type value is, probably, useless and, in fact, it is never used in wxJSONValue by default. The C string value is useless because the only reason for using it is speed: time is saved when no string-copy operation is performed. But the wxString object uses copy-on-write to avoid unnecessary copies so it is much more wisely (and SAFE) to never use C-strings.

64-bits and 32-bits integers

Starting from version 0.5, the wxJSON library supports 64-bits integers but only on those platforms that have native support for 64-bits integers such as, for example, Win32 and GNU/Linux.

Starting from version 1.0 the wxJSONValue also handles long int and short int data types.

By default, the library checks if the wxLongLong_t macro is defined by wxWidgets and, if it is, the library enables 64-bits integer support. The wxLongLong_t macro is the wxWidget's platform independent data type for representing a 64-bits integer and it is defined by the GUI framework as a placeholder for the underlying compiler / platform specific data type: __int64 on Windows and long long on GNU/Linux systems. To know more about the wxWidget's 64-bits integer stuff read the documentation of the wxLongLong class. If the system / platform do not support 64-bits integers, integer values are stored in a:

The user can disable 64-bits integer support by defining the:

  wxJSON_NO_64BIT_INT

macro in the include/wx/json_defs.h header file (just uncomment the line where the macro is defined).

All integer values are stored in the widest storage size: wx(U)int64 or long int depending the platform. The m_type data member of the JSON value is set to the generic integer type: wxJSONTYPE_INT or wxJSONTYPE_UINT regardless its size: in other words, no matter the type of the original value: the only thing that matters is the sign of the value: signed or unsigned.

  wxJSONValue i( 100)               // an int
  wxJSONValue l( (short) 100)       // a short int
  wxJSONValue l( (long) 100)        // a long int
  wxJSONvalue i64( (wxInt64) 100 ); // a long long int

All the above integer values are stored in the wxJSONValueHolder::m_valInt64 or in the wxJSONValueHolder::m_valLong data member. The JSON value type is set to wxJSONTYPE_INT for all cases. As the storage area of all primitive types is the same (it is an union) it is very easy to return an integer value in different sizes provided that the requested integer type has sufficient bits to store the value.

How can the user know the storage needs of a integer data type? First of all you have to ask yourself if you really need to know this information. In other words, if your application only uses the int data type (for integers) and it only reads its own JSON data file, it is improbable that an integer value stored in the JSON value class will hold other than an int. On the other hand, if your application communicate with other applications over a network connection, it may be possible that the JSON value class holds integers which are so large that they cannot fit in a simple int data type.

In order to know the storage needs of the value stored in the class you call the wxJSONValue::GetType() function which returns different constants depending on the weight of the numeric value:

The GetType() function relies on the definition of the SHORT_MAX, SHORT_MIN, USHORT_MAX, LONG_MAX, LONG_MIN, ULONG_MAX, macros to check if the value fits in a particular data type. If the macros are not defined (I do not know if this could happen), the wxJSON library defines them by itself according to the rules of the C99 standard (see the include/wx/json_defs.h header file):

   C99 type      width (bits)         limits
   --------      ------------         ------
   short            16                -32.768 .. +32.767
   ushort           16                0 .. 65.535
   long             32                -2.147.483.648 .. +2.147.483.647
   ulong            32                0 .. +4.294.967.295

Note that the C99 standard only defines the minimum width of these types; in addition, the C++ language does not define a minimum size for these integer types.

Also note that the wxJSONValue::GetType() function never returns wxJSONTYPE_INT. This is because the int data type has a variable bit-width that depends on the platform: on Win32 and GNU/Linux, the int type is the same as long (32 bits wide) but on other platforms it may be only 16 because the minimum width of int is 16 in the C99 standard. For this reason, it is always a good practice to never use int in C/C++ programming language but the long data type which ensures 32-bits integers.

The wxJSONValue class lets you use int as the returned data type because it defines the Is(U)Int memberfunction which returns the correct result depending on the definition of the INT_MAX, INT_MIN and UINT_MAX macros.

The array of values.

An object of this type holds an array of wxJSONValue objects. This means that you can have an array of integers, doubles, strings, arrays and key/value maps, too Moreover, the array can contain all these types. In other words, the first element can be an integer, the second element is another array, and the third one a key/value map.

The type is implemented using a wxObjArray class which stores wxJSONValue objects. The declaration of this type follows the wxWidget's container classes declaration for arrays of objects:

  class wxJSONValue;
  WX_DECLARE_OBJARRAY( wxJSONValue, wxJSONInternalArray )

Note that the name of the type contains the word internal. This means that the type is used internally by the wxJSONValue class and should not be used by the application. However, the class's API defines a member function that can be used to get the internal array type:

  const wxJSONInternalArray* AsArray() const;

which returns the pointer of the array, stored in the wxJSONValue::m_value.m_valArray data member. There is no need for the application to access the internal representation of the JSON array-type. Use the wxJSONValue::Item, wxJSONValue::ItemAt and the subscript operator for retreiving array's values.

The map of key/value pairs.

An object of this type is a map of key / value pairs where the key is a string and the value is a wxJSONValue object: it can hold bools, integers, strings, arrays and key/value maps, too.

This type is implemented using the wxHashMap class which is a simple, type-safe, and reasonably efficient hash map class whose interface is a subset of the interface of STL containers.

The definition of the hashmap for wxJSONValue objects is as follows:

  WX_DECLARE_STRING_HASH_MAP( wxJSONValue, wxJSONInternalMap );

Note that the name of the type contains the word internal. This means that the type is used internally by the wxJSONvalue class and should not be used by the application. However, the wxJSONValue API defines a member function that can be used to get this object:

  const wxJSONInternalMap* AsMap() const;

There is no need for the application to access the internal representation of the JSON hashmap-type. Use the wxJSONValue::Item(const wxString&), wxJSONValue::ItemAt and the subscript operator for retreiving hashmap's values.

The comparison function and operator

You may have noticed that the wxJSONValue class does not define a comparison operator (the operator==() function). This is not a forgetfullness but a precise design choice because comparing wxJSON Value objects may be time-consuming and the meaning of equal is not applicable to JSON objects. Consider the following two JSON objects:

 // first object
 {
   "font" : {
     "size" = 12,
     "face" = "Arial",
     "bold" = true
   }
 }

 // second object
 {
   "font" : {
     "face" = "Arial",
     "bold" = true
     "size" = 12,
   }
 }

You have to note that the two objects are not equal because the order of the key/value pairs is not the same. Althrough, the values that the two objects contain are the same.

For this reason the wxJSONValue class does not define the comparison operator but a similar function: the wxJSONValue::IsSameAs() which returns TRUE if the two objects contain the same values even if they are not in the same order: this applies only for key/value maps but not for arrays because the latter are ordered collections of values.

The comparison function is much time-consuming because it is recursive. All items are compared for sameas until the first couple of items returns FALSE.

If the two objects are very complex, the comparison function is very slow and you are discouraged to use it unless it is strictly necessary. I have defined this function only for debugging purposes.

Comparing different types

A problem in the interpretation of IsSameAs arise when comparing different types that can be converted or promoted to another type. Consider the two following JSON values:

  wxJSONValue v1( 100 );
  wxJSONValue v2( 100.0 );
  bool r = v1.IsSameAs( v2 );  // should return TRUE

The above values will be stored as different types: the first as an integer and the second as a double but they are, in fact, the same value and the function should return TRUE. Until the release 0.2.1 included, the wxJSON library had a bug that cause the IsSameAs() function to return FALSE in the above example. This was due to the fact that the function first compared the types and if they differ, FALSE was immediatly returned without trying a type conversion.

Starting from release 0.2.2, this bug was fixed and the wxJSONValue::IsSameAs() function now correctly compares compatible types: by now, they only are the numeric types. In other words, a string that contains the same value as a numeric one is not the same. Example:

  wxJSONValue v1( 100 );
  wxJSONValue v2( _T("100"));
  bool r = v1.IsSameAs( v2 );  // returns FALSE

The comparison function compares different numeric types by promoting them to double an then comparing the double values. In this way the function correctly handles this situation:

  wxJSONValue v1( -1 );            // this is -1
  wxJSONValue v2( (unsigned) -1);  // this is 4.294.967.296
  bool r = v1.IsSameAs( v2 );      // returns FALSE

C/C++ comments in JSON text

Starting with release 0.2, the wxJSON library recognizes and stores C/C++ comments in JSON value objects. See wxJSON internals: C/C++ comments storage for a detailed implementation.

The wxJSONReader class

The wxJSONWriter class


Generated on Fri Nov 13 22:52:29 2009 for wxJSON by  doxygen 1.5.5