Usage¶
For a minimal investment of time, Django Simple Elasticsearch offers a number of perks. Implementing a class
with the ElasticsearchTypeMixin lets you:
- initialize your Elasticsearch indices and mappings via the included
es_managemanagement command - perform Elasticsearch bulk indexing via the same
es_managemanagement command - perform Elasticsearch bulk indexing as well as individual index/delete requests on demand in your code
- connect the available
ElasticsearchTypeMixinsave and delete handlers to Django’s available model signals (iepost_save,post_delete)
Let’s look at an example implementation of ElasticsearchTypeMixin. Here’s a couple of blog-related Models
in a models.py file:
class Blog(models.Model):
name = models.CharField(max_length=50)
description = models.TextField()
class BlogPost(models.Model):
blog = models.ForeignKey(Blog)
slug = models.SlugField()
title = models.CharField(max_length=50)
body = models.TextField()
created_at = models.DateTimeField(auto_now_add=True)
To start with simple_elasticsearch, you’ll need to tell it that the BlogPost class implements the
ElasticsearchTypeMixin mixin, so in your settings.py set the ELASTICSEARCH_TYPE_CLASSES setting:
ELASTICSEARCH_TYPE_CLASSES = [
'blog.models.BlogPost'
]
If you do not add this setting, everything will still work except for the es_manage command - it won’t know
what indices to create, type mappings to set or what objects to index. As you add additional
ElasticsearchTypeMixin-based index handlers, add them to this list.
All right, let’s add in ElasticsearchTypeMixin to the BlogPost model. Only pertinent changes from the
above models.py are shown:
from simple_elasticsearch.mixins import ElasticsearchTypeMixin
...
class BlogPost(models.Model, ElasticsearchTypeMixin):
blog = models.ForeignKey(Blog)
slug = models.SlugField()
title = models.CharField(max_length=50)
body = models.TextField()
created_at = models.DateTimeField(auto_now_add=True)
@classmethod
def get_queryset(cls):
return BlogPost.objects.all().select_related('blog')
@classmethod
def get_index_name(cls):
return 'blog'
@classmethod
def get_type_name(cls):
return 'posts'
@classmethod
def get_type_mapping(cls):
return {
"properties": {
"created_at": {
"type": "date",
"format": "dateOptionalTime"
},
"title": {
"type": "string"
},
"body": {
"type": "string"
},
"slug": {
"type": "string"
},
"blog": {
"properties": {
"id": {
"type": "long"
},
"name": {
"type": "string"
},
"description": {
"type": "string"
}
}
}
}
}
@classmethod
def get_document(cls, obj):
return {
'created_at': obj.created_at,
'title': obj.title,
'body': obj.body,
'slug': obj.slug,
'blog': {
'id': obj.blog.id,
'name': obj.blog.name,
'description': obj.blog.description,
}
}
With this mixin implementation, you can now use the es_manage management command to bulk reindex all BlogPost
items. Note that there are additional @classmethods you can override to customize functionality. Sane defaults
have been provided for these - see the source for details.
Of course, our BlogPost implementation doesn’t ensure that your Elasticsearch index is updated every time you
save or delete - for this, you can use the ElasticsearchTypeMixin built-in save and delete handlers.
from django.db.models.signals import post_save, pre_delete
...
post_save.connect(BlogPost.save_handler, sender=BlogPost)
pre_delete.connect(BlogPost.delete_handler, sender=BlogPost)
Awesome - Django’s magic is applied.
Notes¶
Prior to version 2.2.0 of this package, only models with numerical primary keys could be indexed properly due to the way the
queryset_iterator()utility function was implemented. This has been changed and the primary key no longer matters.Ordering the bulk queryset is important due to the fact that records may have been added during the indexing process (indexing data can take a long time); if the results are ordered properly, the indexing process will catch the most recent records. For most cases, the default bulk ordering of
pkwill suffice (Django’s default primary key field is an auto-incrementing integer).If a model has PK using a
UUIDFieldhowever, things change: UUIDs are randomly generated, so ordering by aUUIDFieldPK will most likely result in newly created items being missed in the indexin process. Overriding theElasticsearchTypeMixinclass methodget_bulk_ordering()addresses this issue - set it to order by aDateTimeFieldon the model.
TODO:
- add examples for more complex data situations
- add examples of using
es_managemanagement command options - add examples/scenarios when to use
post_indices_createandpost_indices_rebuildsignals (ie. adding percolators to new indices)