Usage¶
For a minimal investment of time, Django Simple Elasticsearch offers a number of perks. Implementing a class
with the ElasticsearchTypeMixin
lets you:
- initialize your Elasticsearch indices and mappings via the included
es_manage
management command - perform Elasticsearch bulk indexing via the same
es_manage
management command - perform Elasticsearch bulk indexing as well as individual index/delete requests on demand in your code
- connect the available
ElasticsearchTypeMixin
save and delete handlers to Django’s available model signals (iepost_save
,post_delete
)
Let’s look at an example implementation of ElasticsearchTypeMixin
. Here’s a couple of blog-related Models
in a models.py
file:
class Blog(models.Model):
name = models.CharField(max_length=50)
description = models.TextField()
class BlogPost(models.Model):
blog = models.ForeignKey(Blog)
slug = models.SlugField()
title = models.CharField(max_length=50)
body = models.TextField()
created_at = models.DateTimeField(auto_now_add=True)
To start with simple_elasticsearch
, you’ll need to tell it that the BlogPost
class implements the
ElasticsearchTypeMixin
mixin, so in your settings.py
set the ELASTICSEARCH_TYPE_CLASSES
setting:
ELASTICSEARCH_TYPE_CLASSES = [
'blog.models.BlogPost'
]
If you do not add this setting, everything will still work except for the es_manage
command - it won’t know
what indices to create, type mappings to set or what objects to index. As you add additional
ElasticsearchTypeMixin
-based index handlers, add them to this list.
All right, let’s add in ElasticsearchTypeMixin
to the BlogPost
model. Only pertinent changes from the
above models.py
are shown:
from simple_elasticsearch.mixins import ElasticsearchTypeMixin
...
class BlogPost(models.Model, ElasticsearchTypeMixin):
blog = models.ForeignKey(Blog)
slug = models.SlugField()
title = models.CharField(max_length=50)
body = models.TextField()
created_at = models.DateTimeField(auto_now_add=True)
@classmethod
def get_queryset(cls):
return BlogPost.objects.all().select_related('blog')
@classmethod
def get_index_name(cls):
return 'blog'
@classmethod
def get_type_name(cls):
return 'posts'
@classmethod
def get_type_mapping(cls):
return {
"properties": {
"created_at": {
"type": "date",
"format": "dateOptionalTime"
},
"title": {
"type": "string"
},
"body": {
"type": "string"
},
"slug": {
"type": "string"
},
"blog": {
"properties": {
"id": {
"type": "long"
},
"name": {
"type": "string"
},
"description": {
"type": "string"
}
}
}
}
}
@classmethod
def get_document(cls, obj):
return {
'created_at': obj.created_at,
'title': obj.title,
'body': obj.body,
'slug': obj.slug,
'blog': {
'id': obj.blog.id,
'name': obj.blog.name,
'description': obj.blog.description,
}
}
With this mixin implementation, you can now use the es_manage
management command to bulk reindex all BlogPost
items. Note that there are additional @classmethods
you can override to customize functionality. Sane defaults
have been provided for these - see the source for details.
Of course, our BlogPost
implementation doesn’t ensure that your Elasticsearch index is updated every time you
save or delete - for this, you can use the ElasticsearchTypeMixin
built-in save and delete handlers.
from django.db.models.signals import post_save, pre_delete
...
post_save.connect(BlogPost.save_handler, sender=BlogPost)
pre_delete.connect(BlogPost.delete_handler, sender=BlogPost)
Awesome - Django’s magic is applied.
Notes¶
Prior to version 2.2.0 of this package, only models with numerical primary keys could be indexed properly due to the way the
queryset_iterator()
utility function was implemented. This has been changed and the primary key no longer matters.Ordering the bulk queryset is important due to the fact that records may have been added during the indexing process (indexing data can take a long time); if the results are ordered properly, the indexing process will catch the most recent records. For most cases, the default bulk ordering of
pk
will suffice (Django’s default primary key field is an auto-incrementing integer).If a model has PK using a
UUIDField
however, things change: UUIDs are randomly generated, so ordering by aUUIDField
PK will most likely result in newly created items being missed in the indexin process. Overriding theElasticsearchTypeMixin
class methodget_bulk_ordering()
addresses this issue - set it to order by aDateTimeField
on the model.
TODO:
- add examples for more complex data situations
- add examples of using
es_manage
management command options - add examples/scenarios when to use
post_indices_create
andpost_indices_rebuild
signals (ie. adding percolators to new indices)