Skip to content

Loading Data Into Elasticsearch With Python (Bulk API)⚓︎

Executive Summary⚓︎

Sometimes logstash does not give you the flexibility you need to massage source documents in the required format, and you need to write a little code. Python provides this flexibile as well as a simple wrapper around the bulk API that means that you can load the data into elasticsearch quickly (vs loading documents one at a time).


The helper.bulk function does all the hard work, we just need to create the input that it expects and it will push our data into elasticsearch.

The below code illustrates how to leverage this capability. At a high level the steps are;

  • Import the required packages
  • Setup some environment variables
  • Create a list called actions and start adding documents to the list with the requried structure.
  • When the list reaches the required length push the data to elastic with the helper function, empty out the list and then repeat.
from elasticsearch import helpers
from elasticsearch.client import Elasticsearch
import random

#set variables
elastichost = 'localhost'
port        = '9200'
direction   = ['in','out']
outputIndex = 'events'
counter     = 0
saveSize    = 3
eventCount  = 50 
es = Elasticsearch([{'host': elastichost, 'port' : port}])

actions = []

for lp in range(eventCount):
    source = {'size'     : random.randint(1,101),
              'duration' : random.randint(1,101),
              'direction': random.choice(direction),
              'id'       : counter 
    action = {
        "_index": outputIndex,
        '_op_type': 'index',
        "_type": "_doc",
        "_id": counter,
        "_source": source
    counter += counter
    if len(actions) >= saveSize:
          helpers.bulk(es, actions)
          del actions[0:len(actions)]

if len(actions) > 0:
  helpers.bulk(es, actions)

print('All Finished')

Credit to Mark Harwood who provided the example that I used to create this snippet (