JSON alternatives for data

JSON is one of the most mainstream, universal data formats out there. It is loved because of its simplicity and because it is human-readable, making it easy to use. It is also criticized because it is verbose, it doesn’t support comments, it has limited data types, and it doesn’t enforce a data structure using a schema.

So, is there anything better than JSON? Before we can answer that question, we have to understand that JSON is used in two very different cases. In this second article of a series of two we discuss JSON alternatives for data.

JSON alternatives for configuration files (click here to read)
JSON alternatives for data transfer and storage (this article)

These two categories come with different needs. For example, for configuration files it is important that data format is easy to read and supports comments for documentation. For data transfer and storage, it is important that the data is concise and can be serialized and de-serialized in a fast way.

We’ll first discuss how well JSON itself does for data, and then discuss a number of alternatives.

JSON

When to use: in most cases JSON is a great choice for data.

Website: https://www.json.org/

JSON is excellent for data exchange and storage. It hits a sweet spot between being an extremely simple format and being powerful and flexible enough to store any kind of structure. Because it is a text based format, it is easy to work with. Today it is one of the most ubiquitous data formats, which is an extra reason to choose JSON. In many cases, choosing JSON is simply the best choice, but there are interesting alternatives that can be a good fit depending on your use case.

Some of the main criticisms of JSON for data are that it is verbose because it repeats field names for every object and requires double quotes around every key and string. It doesn’t enforce a data structure (though you can specify a JSON schema). And it has a limited number of data types, for example it has no built-in support for dates.

Here is an example of a JSON array containing a set of objects:

[
  { "name": "Chris", "age": 23, "city": "New York" },
  { "name": "Emily", "age": 19, "city": "Atlanta" },
  { "name": "Joe", "age": 32, "city": "New York" },
  { "name": "Kevin", "age": 19, "city": "Atlanta" },
  { "name": "Michelle", "age": 27, "city": "Los Angeles" },
  { "name": "Robert", "age": 45, "city": "Manhattan" },
  { "name": "Sarah", "age": 31, "city": "New York" }
]

CSV

When to use: CSV is perfect for tabular data.

Website: https://en.wikipedia.org/wiki/Comma-separated_values

The CSV data format is even simpler and older than JSON. CSV stands for “comma separated value”, and it’s great for tabular data. It is a popular standard since the format is so simple, universal, and easy to read and parse. It has a limited use case: tabular data. The data format is very concise.

You can read a more in depth comparison in the article “JSON vs CSV: what is the difference and what should I use?“.

The above JSON data example can be represented in CSV as follows:

name,age,city
Chris,23,New York
Emily,19,Atlanta
Joe,32,New York
Kevin,19,Atlanta
Michelle,27,Los Angeles
Robert,45,Manhattan
Sarah,31,New York

Note that CSV does not really support nested structures, it is possible to encode nesting in field names, like "user.address.city", where the dots mean a nested object.

BSON

When to use: BSON is perfect for MongoDB data, requiring advanced data types and fast scanning.

Website: https://www.mongodb.com/basics/bson

BSON is a binary data format that was developed for MongoDB. It was developed to support more advanced data types like Date, 64 bit Long, Regex, and more. It is optimized for efficient storage and scanning.

A downside of BSON is that it is more complex and less universal than JSON. Since it is a binary format, you can’t inspect a BSON file with a simple text editor, but you need special tooling. BSON is not necessarily faster or smaller than JSON (especially if you use JSON in combination with zip). If you intend to use BSON, it is important to first do an experiment to see how BSON performs on your data, to weigh the pros and cons.

Here is a BSON example which encodes the JSON document {"hello":"world"}

\x16\x00\x00\x00           // total document size
\x02                       // 0x02 = type String
hello\x00                  // field name
\x06\x00\x00\x00world\x00  // field value
\x00                       // 0x00 = type EOO ('end of object')

Protocol Buffers and FlatBuffers

When to use: for strongly typed, high performance communication between servers.

Website: https://protobuf.dev/ and https://flatbuffers.dev/

Both Protocol Buffers and FlatBuffers offer a solution for serializing structured data in a binary form. They allow you to define a strongly typed schema for your data first, and then generate optimized code for the language that you’re working in (C++, C#, Go, Java, Kotlin, Python, and more) to read and write data. It is considerably faster than JSON serialization/deserialization, and can be used for server to server communication.

It is important to realize that the serialized data is not directly readable: you need the corresponding proto definition file and tooling to deserialize and read it. And this comes with a compile-step to generate the classes allowing to do this.

Here an example of a proto definition:

message Person {
  optional string name = 1;
  optional int32 id = 2;
  optional string email = 3;
}

With this definition, you can generate a class for usage in your programming language, for example sending data from a Java application:

// Java code
Person john = Person.newBuilder()
    .setId(1234)
    .setName("John Doe")
    .setEmail("jdoe@example.com")
    .build();
output = new FileOutputStream(args[0]);
john.writeTo(output);

And then receiving data in a C++ application:

// C++ code
Person john;
fstream input(argv[1], ios::in | ios::binary);
john.ParseFromIstream(&input);
id = john.id();
name = john.name();
email = john.email();

So what is the difference between Protocol Buffers and FlatBuffers? It is explained as follows on the site of FlatBuffers:

Protocol Buffers is indeed relatively similar to FlatBuffers, with the primary difference being that FlatBuffers does not need a parsing/ unpacking step to a secondary representation before you can access data, often coupled with per-object memory allocation. The code is an order of magnitude bigger, too. Protocol Buffers has no optional text import/export.

Internet Object

When to use: not yet, it is a proof of concept but looks promising.

Website: https://docs.internetobject.org

Internet Object is an interesting concept. It describes itself as a “thin, schema-first and robust data-interchange object format for Internet”. It looks like the best of CSV and JSON combined. It is a plain text format. It introduces a header where you can define the data structure as a schema, making it a format as compact as CSV, but not limited to tabular data. It supports rich data structures like nested objects and arrays like JSON. It keeps the number of data types limited, in the spirit of JSON.

Here is an Internet Object example:

# The address definition
~ $address: {street:string, zip:{string, maxLength:5}, city:string}
#
# The person definition
~ $schema: {
  name:string,               # The person name
  age:int,                   # The person age
  homeAddress?: $address,    # The home address
}
---
# Records
~ Chris, 23, {Park Avenue, 50010, New York}
~ Emily, 19, {2nd St, 22018, Atlanta}
~ Joe, 32, {Wall Street, 4089, New York}
~ Kevin, 19, {Balerma St, 22018, Atlanta}
~ Michelle, 27, {Broadway, 3019, Los Angeles}
~ Robert, 45, {Franklin Ave, 44200, Brooklyn}
~ Sarah, 31, {Fifth avenue, 10027, New York}

XML

When to use: in general, don’t use XML, JSON is almost always the better choice.

Website: https://www.xml.com/

It is good to mention XML too as an alternative to JSON for data. JSON was developed (or better: discovered by Douglas Crockford) as an alternative for XML. JSON is a much simpler format, is faster, is much more concise, and is less ambiguous. JSON is in most cases the better choice over XML. XML though has support for comments and built-in options for providing a schema alongside the data, and it is similar to HTML.

Here an XML example:

<?xml version="1.0"?>
 <ProductList title="Electronics">
  <Product>
    <Name>Smartphone X</Name>
    <Description>Fast and with a great battery life.</Description>
    <Cost>$395.00</Cost>
    <Shipping>$8.95</Shipping>
  </Product>
  <Product>
    <Name>Product Two</Name>
    ...
  </Product>
  <Product>
    <Name>Smartphone Y</Name>
    <Description>Good value for your money.</Description>
    <Cost>$199.00</Cost>
    <Shipping>$8.95</Shipping>
  </Product>
  ...
 </ProductList>

Conclusion about JSON alternatives for data

In general, the simplest solutions are the most robust and ubiquitous. Both JSON and CSV are popular because of their simplicity. It makes them easy to use in any environment. CSV is more limited since it only supports tabular data. But on the other hand, it is a more concise format than JSON.

There is one interesting concept, Internet Object, which combines the best of CSV and JSON in a new data format. This is interesting to keep an eye on.

There are binary formats such as BSON, Protocol Buffers, and FlatBuffers, which may offer better performance and offer more advanced data types. This comes at a cost though: binary formats are less easy to use and inspect, and requires more advanced tooling and knowledge.

JSON

CSV

BSON

Protocol Buffers and FlatBuffers

Internet Object

XML

Conclusion about JSON alternatives for data

Contents

Recent posts

Categories

About