Skip to main content

Elasticsearch

Certified

Important Capabilities

CapabilityStatusNotes
Platform InstanceEnabled by default

This plugin extracts the following:

  • Metadata for indexes
  • Column types associated with each index field

CLI based Ingestion

Install the Plugin

pip install 'acryl-datahub[elasticsearch]'

Starter Recipe

Check out the following recipe to get started with ingestion! See below for full configuration options.

For general pointers on writing and running a recipe, see our main recipe guide.

source:
type: "elasticsearch"
config:
# Coordinates
host: 'localhost:9200'

# Credentials
username: user # optional
password: pass # optional

# SSL support
use_ssl: False
verify_certs: False
ca_certs: "./path/ca.cert"
client_cert: "./path/client.cert"
client_key: "./path/client.key"
ssl_assert_hostname: False
ssl_assert_fingerprint: "./path/cert.fingerprint"

# Options
url_prefix: "" # optional url_prefix
env: "PROD"
index_pattern:
allow: [".*some_index_name_pattern*"]
deny: [".*skip_index_name_pattern*"]
ingest_index_templates: False
index_template_pattern:
allow: [".*some_index_template_name_pattern*"]

sink:
# sink configs

Config Details

Note that a . is used to denote nested fields in the YAML recipe.

Field [Required]TypeDescriptionDefaultNotes
ca_certsstringPath to a certificate authority (CA) certificate.
client_certstringPath to the file containing the private key and the certificate, or cert only if using client_key.
client_keystringPath to the file containing the private key if using separate cert and key files.
hoststringThe elastic search host URI.localhost:9200
ingest_index_templatesbooleanIngests ES index templates if enabled.False
passwordstringThe password credential.
platform_instancestringThe instance of the platform that all assets produced by this recipe belong to
ssl_assert_fingerprintstringVerify the supplied certificate fingerprint if not None.
ssl_assert_hostnamebooleanUse hostname verification if not False.False
url_prefixstringThere are cases where an enterprise would have multiple elastic search clusters. One way for them to manage is to have a single endpoint for all the elastic search clusters and use url_prefix for routing requests to different clusters.
use_sslbooleanWhether to use SSL for the connection or not.False
usernamestringThe username credential.
verify_certsbooleanWhether to verify SSL certificates.False
envstringThe environment that all assets produced by this connector belong toPROD
index_patternAllowDenyPatternregex patterns for indexes to filter in ingestion.{'allow': ['.*'], 'deny': ['^_.', '^ilm-history.'], 'ignoreCase': True}
index_pattern.allowarray(string)
index_pattern.denyarray(string)
index_pattern.ignoreCasebooleanWhether to ignore case sensitivity during pattern matching.True
index_template_patternAllowDenyPatternThe regex patterns for filtering index templates to ingest.{'allow': ['.*'], 'deny': ['^_.*'], 'ignoreCase': True}
index_template_pattern.allowarray(string)
index_template_pattern.denyarray(string)
index_template_pattern.ignoreCasebooleanWhether to ignore case sensitivity during pattern matching.True

Code Coordinates

  • Class Name: datahub.ingestion.source.elastic_search.ElasticsearchSource
  • Browse on GitHub

Questions

If you've got any questions on configuring ingestion for Elasticsearch, feel free to ping us on our Slack.