Skip to content

Leveraging Scripted Fields In Elasticsearch⚓︎

Summary⚓︎

There are lots of comments on the internet that using scripted field search is slow and CPU intensive and you should not do it. However some times you you really need it (particuarly for ad-hoc analysis) this post provides some examples of scripted fields. And it goes on to explain how to uses these queries in vega-lite visualisation.

Setup⚓︎

I have a docker-compose in the following github repo with kibana and elasticsearch https://github.com/swarmee/projects/tree/master/elastic-stack-6.3

This docker-compose file includes a short lived container that loads up a nested template and some sample data.

So you should just be able to run docker-compose up to bring everything up.

Remember that your host needs to have the following setting.

sudo sysctl -w vm.max_map_count=262144

Often there is the need to search for fields containing specific text (for example three dashes '---'). This can be acheieved using .contains within a scripted field.

GET /real-estate-sales/_search?filter_path=hits.hits.inner_hits.*.hits.hits._source
{
  "query": {
    "nested": {
      "path": "role.party.address",
      "query": {
        "script": {
          "script": "doc['role.party.address.suburb.keyword'].value.contains('stone')"
        }
      },
      "inner_hits": {
        "_source": "role.party.address.suburb"
      }
    }
  },
  "_source": false
}

Substring Aggregation⚓︎

Sometimes we need to perform an aggregation on part of keyword field (e.g. the first two digits of a phone number). This can be acheieved using .substring(n,n+x) within a scripted field.

GET /abns/_search
{
  "aggs": {
    "substringAggregation": {
      "terms": {
        "script": {
          "source": "doc['australianBusinessRegistration.lastUpdateDatetime.keyword'].value.substring(0,4)"
        },
        "size": 100
      }
    }
  },
  "size": 0
}

Basic Length Scripted Field Visualisation⚓︎

Sometimes you need to analyse the distribution of the length of a field. The below vega-lite visulisation allows you to get a graph of the scripted length field.

{
  "$schema": "https://vega.github.io/schema/vega-lite/v2.json",
  "title": "Name Length Scripted Field",
  "data": {
    "url": {
      "index": "real-estate-sales",
      "body": {
        "aggs": {
          "nested-level": {
            "nested": {"path": "role.party.name"},
            "aggs": {
              "name-length": {
                "terms": {
                  "script": "doc['role.party.name.fullName.keyword'].value.length()",
                  "size": 20
                }
              }
            }
          }
        },
        "size": 0
      }
    },
    "format": {"property": "aggregations.nested-level.name-length.buckets"}
  },
  "mark": "line",
  "encoding": {
    "x": {
      "field": "key",
      "type": "quantitative",
      "sort": {"field": "key"},
      "axis": {"title": "party fullName length"}
    },
    "y": {
      "field": "doc_count",
      "type": "quantitative",
      "axis": {"title": "number of names"}
    },
    "size": {"value": "4"},
    "shape": {"value": "circle"}
  }
}