Localized Search Implementation with Elasticsearch

Can we say Elasticsearch is great for localized search? Let’s do a check. It is an engine that gives you most of the standard search features out of the box. There are many ways to look for an optimal window to implement fast and indexed document search, scoring docs based on certain formulas, autocomplete search, context suggestion, localized text comparison based on analyzers and so on!

I am here to discuss about implementing a localized search for remote languages, regardless of being supported by analyzers in ES or not and how to get good results (for starters), if not the best.

I will use Node.js and ES as the technical stack. Let’s define some standard types for our index schema. I have three cases considered here:

  1. English Analyzer
  2. Hindi Analyzer (Comes tagged with ES. See: Language Analyzers)
  3. Standard Analyzer (Use if your language does not have an inbuilt analyzer in ES)


const standardTypes = {
'KEYWORD_ASCII': {
'type': 'keyword',
'ignore_above': 256
},
'STANDARD_TEXT': {
'type': 'text',
'analyzer': 'standard'
},
'ENGLISH_TEXT': {
'type': 'text',
'analyzer': 'english'
},
'HINDI_TEXT': {
'type': 'text',
'analyzer': 'hindi'
}
}
module.exports = standardTypes;

We need to define the schema in a way to support all the standard types. I have chosen three languages to display search. English, Hindi ( Indian native ), and Telugu (Regional South Indian Language with no default analyzer in ES).


'use strict';
…….
"descriptors": {
"properties": {
"english": {
"properties": {
"description": standardTypes.ENGLISH_TEXT,
"name": standardTypes.ENGLISH_TEXT
}
},
"hindi": {
"properties": {
"description": standardTypes.HINDI_TEXT,
"name": standardTypes.HINDI_TEXT
}
},
"telugu": {
"properties": {
"description": standardTypes.STANDARD_TEXT,
"name": standardTypes.STANDARD_TEXT
}
}
}
}
…..
module.exports = schema;

view raw

schema.js

hosted with ❤ by GitHub

We have Telugu under standard analyzer as it is based on the Unicode Text Segmentation algorithm, as specified in Unicode Standard Annex #29 and works well for most languages. We can also use Simple Analyzer as it is a modified form of Standard Analyzer and divides text on characters which are not a letter.

Now, we have a schema defined. Next, you create an index with the schema and populate the index with related documents. I am not sharing actual documents which were used for my testing, but one can find text resources online to populate an index. For Node.js, one can use ES client for Node.js, or an easier way would be ES rest API.

There is a whole variety of search one can perform on a document having the above schema for all fields with custom analyzers. [ Full-Text Queries in ES ]

I was able to get great search results for English and Hindi, and search results for Telugu were not much below the bar. The ease with which one can create an almost real-time search engine is something unbelievable. I have not gone into many technical details of analyzers and how they function by combining the appropriate character filterstokenizer, and token filters. It is expected for a standard analyzer to be just acceptable with the results, of course, it is only for starters. An Elasticsearch user must implement full-fledged custom analyzer for a regional language to get more accurate results. Moreover, ES provides with few add-ons for Asian languages such as Korean, Chinese, etc.

So, we can conclude that Elasticsearch is indeed great to boost your product’s localization and accessibility in small time cost and high return value.