Skip to content

Loading Data Into Elasticsearch With Python (eland)⚓︎

Executive Summary⚓︎

There is a new Pandas style python interface to elasticsearch called eland if you are familiar with pandas it makes elasticsearch very approachable.The below examples illustrate how to complete some simple tasks in Eland.

Load Data Into Elasticsearch⚓︎

The below code illustrates how to leverage eland to load data into elasticsearchh. At a high level the steps are; * Import the required packages * Setup some environment variables * Create a elasticsearch connection * Pull a Dataset from the Internet and format it as required. * Save the data into elasticsearch.

from elasticsearch import Elasticsearch
import eland as ed
import pandas as pd
from re import sub

## Setup Variables

elasticHost      = 'localhost'
elasticUsername  = 'elastic'
elasticPassword  = 'elastic'
elasticScheme    = 'http'
elasticPort      =  9200
elasticTimeout   =  100
elasticOpaqueId  = 'python-eland-requests'      

## Setup Connection to Elasticsearch
es = Elasticsearch(
    [elasticHost],
    http_auth=(elasticUsername, elasticPassword),
    scheme=elasticScheme,
    port=elasticPort,
    request_timeout=elasticTimeout,
    opaque_id=elasticOpaqueId
)

## Create a Pandas Dataframe of Data to be Loaded into Elasticsearch
df = pd.read_json('https://raw.githubusercontent.com/vega/vega/master/docs/data/cars.json')

## Replace NaN (null) Values with Zero 
df.fillna(0, inplace=True)

# Rename the Columns to be Camel Case
def camel_case_string(string):
    string =  sub(r"(_|-)+", " ", string).title().replace(" ", "")
    string = string[0].lower() + string[1:]
    return string
df.columns = [camel_case_string(x) for x in df.columns]

## Save the Data into Elasticsearch
df = ed.pandas_to_eland(
    pd_df=df,
    es_client=es,
    # Where the data will live in Elasticsearch
    es_dest_index="cars",
    # Type overrides for certain columns, the default is keyword
    # name has been set to free text and year to a date field.
    es_type_overrides={
        "name": "text",
        "year": "date"
    },
    # If the index already exists replace it
    es_if_exists="replace",
    # Wait for data to be indexed before returning
    es_refresh=True,
)

Extract Data From Elasticsearch⚓︎

The below code illustrates how to leverage eland to extract data from elasticsearchh. At a high level the steps are; * Import the required packages * Setup some environment variables * Create a elasticsearch connection * Create eland Dataframe and perform some filtering. * Create Pandas Dataframe from eland dataframe and save to CSV.

from elasticsearch import Elasticsearch
import eland as ed
import pandas as pd

## Setup Variables
elasticHost      = 'localhost'
elasticUsername  = 'elastic'
elasticPassword  = 'elastic'
elasticScheme    = 'http'
elasticPort      =  9200
elasticTimeout   =  100
elasticOpaqueId  = 'python-eland-requests'      

## Setup Connection to Elasticsearch
es = Elasticsearch(
    [elasticHost],
    http_auth=(elasticUsername, elasticPassword),
    scheme=elasticScheme,
    port=elasticPort,
    request_timeout=elasticTimeout,
    opaque_id=elasticOpaqueId
)

## Create the eland Dataframe Configuation
edf = ed.DataFrame(es_client=es, es_index_pattern='cars' )

## Filter Using Pandas Syntax
edf = edf[edf['cylinders'] >= 7 ]  ## add .es_info() to see what is being submitted to elasticsearch

## Filter eland Dataframe (using es query)
edf = edf.es_query({"bool":{"must":[{"range":{"cylinders":{"gte":7}}}]}}) 

## Create a Pandas Dataframe from eland Dataframe
df  = ed.eland_to_pandas(edf)

df.to_csv('output.csv', index=False)

For a detailed introduction into eland please check out this video on youtube by Seth Michael Larson.