JSON Query: a small, flexible, and expandable JSON query language

| 7 min read

This article presents a new query language: JSON Query: a small, flexible, and expandable JSON query language. A core feature of JSON Query is its intermediate JSON format. This is basically an abstract syntax tree (or AST). This makes it which is easy to integrate with it and implement it in any environment.

JSON Query has a human friendly text format. This text format can be converted into an intermediate JSON format, and vice versa. The JSON format is compiled and can then be executed against a JSON document.

JSON Query Overview

The JavaScript implementation of JSON Query has a compiler supporting 40+ functions which only weights 1.3 kB when minified and gzipped. Adding the parser and stringifier of the text format brings the total weight of the library to just 2.9 kB. This is extremely lightweight, which matters when running in a browser environment.

Try out JSON Query on JSON Editor Online

In the following sections we’ll go into the motivation and reasoning behind JSON Query and explore its possibilities.

The challenge with query language syntax

There is a rich variety of query languages for JSON available like jq, jmespath, and others. The difficulty that I have with them is that I do not need them on a daily basis, and I find their syntax hard to remember. Most of them use special characters like @ and ? and special notations like .[]. Writing a query using them requires a lot of back and forth between the error I get whilst writing the query and the online documentation of the language. And even then I regularly hit limitations in the query language. Often, I resort to writing a small script in JavaScript+Lodash because that is so expressive and powerful.

For people who only sometimes need query a JSON document, a graphical user interface helps a lot. JSON Editor Online comes with a “Wizard” which helps create simple queries containing filtering, sorting, and basic transform:

JSON Editor Online Query Wizard

This Wizard is limited though, and only works one way: you can generate a query using the Wizard, but you can’t populate the Wizard from a given query. As soon as you need something different, you need to consult the documentation and be on your own again.

The idea behind JSON Query

How about defining a query language that is extremely easy to parse for machines? Then, it will be straightforward to build a graphical user interface on top of it. It will also be easy to create an implementation in various languages and environments, like the browser, a Python backend, a command line interface, and more.

In order to keep the language simple, we can utilize JSON. The simplest possible way to describe operations is defining a function call as an array with the function name as first argument, followed by its arguments:

[name, argument1, argument2, ...]

Would this be enough though to describe all we need for a query language? I think the following features are essential for a powerful query language:

  1. Have a small and consistent API with as little special cases and special syntaxes as possible.
  2. Be able to write a “pipe” or a “chain” of multiple operations like filter(...) | sort(...) | pick(...), as opposed to nested function calls reading in the opposite order like pick(sort(filter(data, ...), ...), ...).
  3. Be able to write multiple queries in parallel, for example outputting in an object like { topScorers: ..., averageScore: ... }.
  4. Have a rich set of built-in functions like map, filter, groupBy, uniq, max, etc. And have a rich set of operators, including relational operators, mathematical operators. And also: support for regular expressions is a must for rich filtering.

The interesting thing is that all of this can be described by composing function calls. The API itself straightforward since it is just [name, arg1..., arg2, ...]. Of course, the arguments of each function has to be documented. A function pipe can be implemented to chain methods in series: ["pipe", query1, query2, ...]. And a function object can be implemented to execute queries in parallel: ["object", {"prop1": query1, "prop2": query2, ...}]. Let’s try out what it looks like for the following example JSON data:

{
  "friends": [
    { "name": "Emily", "age": 19, "city": "Atlanta" },
    { "name": "Kevin", "age": 19, "city": "Atlanta" },
    { "name": "Chris", "age": 23, "city": "New York" },
    { "name": "Michelle", "age": 27, "city": "Los Angeles" },
    { "name": "Sarah", "age": 31, "city": "New York" },
    { "name": "Joe", "age": 32, "city": "New York" },
    { "name": "Robert", "age": 45, "city": "Manhattan" }
  ]
}

Let’s write a query where we get the array with friends, filter by friends living in New York, sort the friends by name, and just return the name and age of the friends, and not the city. This can look as follows:

[
  "pipe",
  ["get", "friends"],
  ["filter", ["eq", ["get", "city"], "New York"]],
  ["sort", ["get", "name"]],
  ["map", ["get", "name"]]
]

The output of this query will be:

["Chris", "Joe", "Sarah"]

A few things to note here:

  • We use a function get to get a property from an object, like ["get", "name"]. It would be possible to write for example sort in a simpler way, like ["sort", "name"] since we know that the first argument will be treated as a property name. However, this doesn’t work for conditions in the filter function for example: from just a query like ["eq", "New York", "city"] it is not possible to know which of the arguments was intended as a property name and which is a static string. Therefore, it’s best to consistently use a property getter like ["get", "city"] everywhere to describe getting property or nested path in JSON objects.
  • The functions filter, map, and sort will apply the query they received as first argument to each element in the array that they are operating on.

We can go a step further by introducing a function object which executes multiple queries in parallel and will output a JSON object:

[
  "object",
  {
    "newYorkFriends": [
      "pipe",
      ["get", "friends"],
      ["filter", ["eq", ["get", "city"], "New York"]],
      ["sort", ["get", "name"]],
      ["map", ["get", "name"]]
    ],
    "totalFriends": ["pipe", ["get", "friends"], ["size"]]
  }
]

Output JSON data:

{
  "newYorkFriends": ["Chris", "Joe", "Sarah"],
  "totalFriends": 7
}

All in all, we can conclude that this query language is simple, flexible and extensible. A downside is that it is quite verbose. Let’s look into that next.

Adding a human friendly syntax

Writing function calls using JSON is very straightforward for a machine to read, but it quite verbose due to all the brackets, quotes, and commas. The JSON Query language is perfect as an intermediate format though, and we can build a human friendly syntax on top of it.

The following table describes a set of transforms to make the functions more compact and better readable for humans:

Type Text format JSON format
Function ["name", argument1, argument2, ...] name(argument1, argument2, ...)
Operator ["operator", left, right] (left operator right)
Pipe ["pipe", query1, query2, ...] query1 | query2 | ...
Object ["object", {
  "prop1": query1,
  "prop2": query2,
  ...
}]
{prop1: query1, prop2: query2, ...}
Array ["array", item1, item2, ...] [item1, item2, ...]
Property ["get", "prop1"]
["get", "prop1", "prop2"]
["get", "prop three"]
.prop1
.prop1.prop2
."prop three"

Now, when taking the same JSON Query again:

[
  "pipe",
  ["get", "friends"],
  ["filter", ["eq", ["get", "city"], "New York"]],
  ["sort", ["get", "name"]],
  ["map", ["get", "name"]]
]

We can convert that in a straightforward way to the text format:

.friends
  | filter(.city == "New York")
  | sort(.name)
  | map(.name)

This is much easier to read, and this syntax looks quite similar to query languages like jq and jmespath and others. So it will be quite familiar to most programmers, and it’s easy to learn.

Conclusion

JSON Query is a small but powerful query language. It offers both a human friendly syntax and a machine friendly JSON syntax. The intermediate JSON format makes it easy to interoperate with. This adds a lot of value and opens up new possibilities. It allows for example to build a visual user interface on top of it, implement it in different environments like a backend and execute queries there, compile it into some optimized binary code for performance, or built a command line interface (CLI) on top of it. The query language is extensible too: since the compiler basically only looks up functions in a map and then executes them, adding new functions is a matter of adding them to the lookup map.

You can now use JSON Query on JSON Editor Online for querying JSON, and it is the new default query language. It will allow to improve the query Wizard in the future. And unlike the previous default query language (JavaScript+Lodash), JSON Query can be safely stored and shared. So, stay tuned for improvements in JSON Editor Online in that regard.