Usage

For a minimal investment of time, Django Simple Elasticsearch offers a number of perks. Implementing a class with the ElasticsearchTypeMixin lets you:

  • initialize your Elasticsearch indices and mappings via the included es_manage management command
  • perform Elasticsearch bulk indexing via the same es_manage management command
  • perform Elasticsearch bulk indexing as well as individual index/delete requests on demand in your code
  • connect the available ElasticsearchTypeMixin save and delete handlers to Django’s available model signals (ie post_save, post_delete)

Let’s look at an example implementation of ElasticsearchTypeMixin. Here’s a couple of blog-related Models in a models.py file:

class Blog(models.Model):
    name = models.CharField(max_length=50)
    description = models.TextField()

class BlogPost(models.Model):
    blog = models.ForeignKey(Blog)
    slug = models.SlugField()
    title = models.CharField(max_length=50)
    body = models.TextField()
    created_at = models.DateTimeField(auto_now_add=True)

To start with simple_elasticsearch, you’ll need to tell it that the BlogPost class implements the ElasticsearchTypeMixin mixin, so in your settings.py set the ELASTICSEARCH_TYPE_CLASSES setting:

ELASTICSEARCH_TYPE_CLASSES = [
    'blog.models.BlogPost'
]

If you do not add this setting, everything will still work except for the es_manage command - it won’t know what indices to create, type mappings to set or what objects to index. As you add additional ElasticsearchTypeMixin-based index handlers, add them to this list.

All right, let’s add in ElasticsearchTypeMixin to the BlogPost model. Only pertinent changes from the above models.py are shown:

from simple_elasticsearch.mixins import ElasticsearchTypeMixin

...

class BlogPost(models.Model, ElasticsearchTypeMixin):
    blog = models.ForeignKey(Blog)
    slug = models.SlugField()
    title = models.CharField(max_length=50)
    body = models.TextField()
    created_at = models.DateTimeField(auto_now_add=True)

    @classmethod
    def get_queryset(cls):
        return BlogPost.objects.all().select_related('blog')

    @classmethod
    def get_index_name(cls):
        return 'blog'

    @classmethod
    def get_type_name(cls):
        return 'posts'

    @classmethod
    def get_type_mapping(cls):
        return {
            "properties": {
                "created_at": {
                    "type": "date",
                    "format": "dateOptionalTime"
                },
                "title": {
                    "type": "string"
                },
                "body": {
                    "type": "string"
                },
                "slug": {
                    "type": "string"
                },
                "blog": {
                    "properties": {
                        "id": {
                            "type": "long"
                        },
                        "name": {
                            "type": "string"
                        },
                        "description": {
                            "type": "string"
                        }
                    }
                }
            }
        }

    @classmethod
    def get_document(cls, obj):
        return {
            'created_at': obj.created_at,
            'title': obj.title,
            'body': obj.body,
            'slug': obj.slug,
            'blog': {
                'id': obj.blog.id,
                'name': obj.blog.name,
                'description': obj.blog.description,
            }
        }

With this mixin implementation, you can now use the es_manage management command to bulk reindex all BlogPost items. Note that there are additional @classmethods you can override to customize functionality. Sane defaults have been provided for these - see the source for details.

Of course, our BlogPost implementation doesn’t ensure that your Elasticsearch index is updated every time you save or delete - for this, you can use the ElasticsearchTypeMixin built-in save and delete handlers.

from django.db.models.signals import post_save, pre_delete

...

post_save.connect(BlogPost.save_handler, sender=BlogPost)
pre_delete.connect(BlogPost.delete_handler, sender=BlogPost)

Awesome - Django’s magic is applied.

Notes

  • Prior to version 2.2.0 of this package, only models with numerical primary keys could be indexed properly due to the way the queryset_iterator() utility function was implemented. This has been changed and the primary key no longer matters.

    Ordering the bulk queryset is important due to the fact that records may have been added during the indexing process (indexing data can take a long time); if the results are ordered properly, the indexing process will catch the most recent records. For most cases, the default bulk ordering of pk will suffice (Django’s default primary key field is an auto-incrementing integer).

    If a model has PK using a UUIDField however, things change: UUIDs are randomly generated, so ordering by a UUIDField PK will most likely result in newly created items being missed in the indexin process. Overriding the ElasticsearchTypeMixin class method get_bulk_ordering() addresses this issue - set it to order by a DateTimeField on the model.

TODO:

  • add examples for more complex data situations
  • add examples of using es_manage management command options
  • add examples/scenarios when to use post_indices_create and post_indices_rebuild signals (ie. adding percolators to new indices)