Skip to content

A Java demonstration application that uses the Amazon Elasticsearch Service, Spring Boot and Spring MVC

License

Notifications You must be signed in to change notification settings

IanLKaplan/BookSearchES

Repository files navigation

BookSearchES

A Java demonstration application that uses the Amazon Elasticsearch Service, Spring Boot and Spring MVC. This application was developed by Topstone Software Consulting (www.topstonesoftware.com)

Introduction

This article discusses how the open source Elasticsearch database and the Amazon Web Services (AWS) Elasticsearch Service can be used as the foundation for the type of product search that is used by on-line shopping sites.

To illustrate this application of Elasticsearch, this article discusses a demonstration application, written in Java. This application is implemented using the Spring framework (Spring Boot and Spring MVC).

The Java code for this demonstration application is published on GitHub (https://github.com/IanLKaplan/BookSearchES) under the Apache 2 software license.

This application is an expanded version of a similar demonstration application that uses the AWS DynamoDB database (see Spring and DynamoDB and the GitHub repository https://github.com/IanLKaplan/booksearch)

Software Highlights

This application uses the AWS Elasticsearch Service for storing and searching the Book Info data. In this application, the AWS Elasticsearch Service is accessed via signed HTTP. The HTTP code and the Elasticsearch code is written to be independent of the Book Search application.

The documentation that exists to help a Java programmer develop code for Elasticsearch is scattered and can be difficult to find. My hope is that the code provided in this application will ease the path for other Java programmers developing Elasticsearch applications.

The signed HTTP code uses the AWSRequestSigningApacheInterceptor.java class, which has been released by Amazon as open source.

The Elasticsearch code and the signed HTTP code are packaged as service objects. See HttpService.java and ElasticsearchService.java. An additional class, HttpGetWithEntity.java provides an extended version of the HTTP Rest operation by providing an entity in the GET transaction.

Faceted Search

When you shop at a web site like Amazon or Lands End you are usually using search operations to find the products that you may be interested in purchasing. For example, if you are thinking of purchasing a tablet computer you might follows down the Amazon categories

Electronics ==> Computers & Accessories ==> Computers & Tablets ==> Tablets

You can further select the tablet you are interested in by the operating system and the tablet size (in inches).

These search categories (e.g., 10 to 10.9 inch tablets) are sometimes referred to as facets. Search that displays these facets is referred to as faceted search.

Some web sites display the categories (facets) with an associated count. This can be seen in the screen capture from an on-line clothing retailer. If a category has a large number of items associated with it, this tells the user that they need to use more detailed selection to find the items they are interested in.

Often unstructured search is used to find an item on a retail web site. For example, if you are searching for a "G8" LED bulb to replace a halogen bulb, you might search for "g8 led bulb" instead of trying to find the Amazon category for LED light bulbs.

When designing the system architecture for a shopping site a database should be choosen that supports both faceted search and unstructured search (the search for the "G8" LED bulb). Elasticsearch is built on top of the Apache Lucene database, which is designed to support text search.

Elasticsearch

Elasticsearch is an open source (Apache license) database that is based on the Lucene full text indexing system.

Elasticsearch is designed to support large scale data sets. The Elasticsearch index can be "sharded" across multiple processors. This allows the Elasticsearch processing load to be distributed over multiple Elastricsearch "nodes".

An Elasticsearch instance will have one or more indices and each index will have an associated data type. The Elasticsearch data type is equivalent to a database table schema for a relational database. The Elasticsearch type definition is referred to as a mapping.

An Elasticsearch mapping is flexible and additional fields can be added to a mapping without affecting the existing data (although existing data elements will not have data defined for the new field).

The mapping (schema definition) associated with the Elasticsearch bookindex/book info index/type is shown below.

The text fields can be searched for arbitrary strings. For example, searching for the word "venice" will in the bookinfo "title" field will return all of the bookinfo entries that contain the world "venice" in the tile.

The keyword type defines a field that is searched by exact match. Searches on the "genre" field must exactly match the strings in that field to return a bookinfo element. For example, a search on "Science Fiction" and a search on "Fiction" return different data elements. For keyword fields, capitalization matters. For text fields, capitalization is ignored.

Some fields, like "publisher" are stored as both text fields and keyword fields. When data is stored in the "publisher" field, it will update both the text and the keyword parts of the field.

{
  "bookindex": {
    "mappings": {
      "bookinfo": {
        "_all": {
          "enabled": false
        },
        "properties": {
          "author": {
            "type": "text"
          },
          "author_last_name": {
            "type": "keyword"
          },
          "genre": {
            "type": "keyword"
          },
          "price": {
            "type": "float"
          },
          "publisher": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword"
              }
            }
          },
          "title": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword"
              }
            }
          },
          "year": {
            "type": "date",
            "format": "YYYY"
          }
        }
      }
    }
  }
}

The Amazon Elasticsearch Service

The Amazon Elasticsearch Service provides hosted instances of Elasticsearch. This allows Elasticsearch to be configured from the Elasticsearch Service web page. This avoids the complexity of directly configuring an Elasticsearch cluster.

The Elasticsearch Search service runs a recent version of Elasticsearch and allows the user to upgrade the version on demand.

The Elasticsearch Service comes with an instance of the Kibana. The Kibana tool allows you to test Elastic search queries against the data in your Elasticsearch Service instance. This feature was very useful when developing the queries used by the Book Search application.

Elasticsearch is a REST Service

Elasticsearch is a Web service and all communication with Elasticsearch takes place over HTTP using the REST operations GET, PUT, POST and DELETE. For example, the Elasticsearch query below uses a GET operation.

GET index/type/_search
{
  "query": {
     "bool": {
        "filter": {
           "match": {
              "author": "gibson"
           }
        }
     }
   }
}

The query result is returned in the HTTP response as a JSON structure.

HTTP POST operations have an entity (e.g., a string) argument associated with the operation. GET operations were originally designed to fetch data associated with a URL (URI). The GET operation used to query Elasticsearch are an extension of the standard HTTP GET and include an entity (see the HttpGetWithEntity object in the associated Book Search application GitHub source code).

Elasticsearch and Security

An Amazon Elasticsearch Service instance (domain) can be configured as either a public access end-point, which can be accessed from the Internet, or as an Amazon Virtual Private Cloud (VPC) accessible service.

A VPC service has higher security, but debugging and monitoring can be more difficult. When the Elasticsearch Service is configured within a VPC, the HTTP transactions are simpler, since they do not have to be signed and authorized.

The Book Search application is designed to run with an Internet accessible Elasticsearch Service end-point. This makes debugging and testing easier and allows the Book Service application to run on either my local computer system or an Amazon Elastic Beanstalk server. Access to the Elasticsearch Service end-point can be "locked down" to a single IP address or an IP address range for increased security.

Signed HTTP

The Book Search application uses the Java Apache HTTP Client library for communication with the Elasticsearch Service.

Amazon has published documentation on how to build signed HTTP transactions based on the Apache HTTP Client. Unfortunately, this documentation can be difficult to find. The Book Search application includes AWSRequestSigningApacheInterceptor class which is at the core of these transactions. The Java code below shows how this class is used to build signed HTTP objects.

   protected static CloseableHttpClient signedClient() {
        AWS4Signer signer = new AWS4Signer();
        signer.setServiceName( SERVICE_NAME );
        signer.setRegionName( region.getName() );
        AWSCredentials credentials = getCredentials(ES_ID, ES_KEY);
        AWSCredentialsProvider credProvider = new AWSStaticCredentialsProvider( credentials );
        HttpRequestInterceptor interceptor = new AWSRequestSigningApacheInterceptor(SERVICE_NAME, signer, credProvider);
        return HttpClients.custom()
                .addInterceptorLast(interceptor)
                .build();
    }
    protected static String sendHTTPTransaction( HttpUriRequest request ) {
        String httpResult = null;
        CloseableHttpClient httpClient = signedClient();
        try {
            HttpResponse response = httpClient.execute(request);
            BufferedReader bufferedReader = new BufferedReader(new InputStreamReader(response.getEntity().getContent()));
            httpResult = IOUtils.toString(bufferedReader);
        } catch (IOException e) {
            logger.error("HTTP Result error: " + e.getLocalizedMessage());
        }
        return httpResult;
    }

The code below shows how the sendHTTPTransaction() function is used to send an HTTP get with an entity argument (e.g., the type of GET that is used for a search operation).

public static String getDocument(final String index, final String type, final String suffix, final String jsonPayload) {
        String responseString = "";
        String url = buildURL(index, type, suffix);
        try {
            HttpGetWithEntity get = new HttpGetWithEntity( url );
            get.setHeader("Content-type", "application/json");
            StringEntity stringEntity = new StringEntity( jsonPayload, StandardCharsets.UTF_8);
            get.setEntity(stringEntity);
            responseString = sendHTTPTransaction( get );
        } catch (Exception e) {
            logger.error("HttpGet with entity failed: " + e.getLocalizedMessage());
        }
        return responseString;
    }

For more details, please refer to the Java source code in the associated GitHub repository.

Elasticsearch Documentation

One of the challenges that you will face if you decide to use Elasticsearch is the documentation. In developing the Book Search application and the associated Elasticsearch support code, I relied on three documentation sources:

  1. Elasticsearch in Action by Radu Gheorghe, Matthew Lee Hinman, and Roy Russo Manning Publications, November 2015
  2. The Elasticsearch Reference published on the elastic.co web site.
  3. Lost of Web searches to answer questions that I could not find answers to in the above references.

The book Elasticsearch in Action is very useful in understanding the capabilities of Elasticsearch and its architecture. As with most Manning books, the writing quality is high and I recommend reading the first five chapters of this book.

There are several problems with Elasticsearch in Action when it comes to writing software that uses Elasticsearch. The book is based on Elasticsearch version 2.X. At the time this web page was written Elasticsearch is on version 6.X.

There have been significant changes in Elasticsearch between 2.X and 6.X. Some of the queries and other operations described in the book do not work with Elasticsearch version 6.X.

The Elasticsearch architecture has also changed. In Elasticsearch in Action the authors state that there can be multiple types per index. Later versions of Elasticsearch allow one type per index, making indices and types equivalent.

For the Java developer there are few resources to guide the development of Java code for Elasticsearch outside of the Amazon documentation (which is often incomplete and fragemented). In Elasticsearch in Action most operations are described in terms of command-line curl operations. A search operation, from Elasticsearch in Action, is shown below.

% curl 'localhost:9200/get-together/group/_search?pretty' -d '{
  "query": {
    "query_string": {
      "query": "elasticsearch"
    }
  }
}'

I hope that this article and the associated code on GitHub will provide a useful resource for Java developers. The HTTP and Elasticsearch code is independent of the Book Search demonstration application and you may freely use it in your own code.

Spring Boot and Spring MVC

The Book Search application is built using the Spring framework. You should be able to clone the code and import it as a Spring Tool Suite project (Spring Tool Suite is a version of Eclipse customized for Spring. The project uses Maven to load the necessary Java libraries.

The Book Search Application

The Book Search application can be run on your local system. Before you do this, however, you will need to configure an AWS Elasticsearch Service domain. You will need to get the ID and secret key for accessing your domain. The ID and secret key can then be added to the IElassticsearch.java Interface. You will also need to add your Elasticsearch end-point URL to the Interface code.

When you run the Book Search application it will create an index and load a mapping into your Elasticsearch domain.

The books.json file contains sample data that can be loaded into the application. The JSON file can be loaded with the LoadESFromJSON utility program.

Amazon Cloud Architecture, Spring and Elasticsearch Consulting

Topstone Software has extensive experience building scalable web applications on Amazon Web Services. We can help you design your AWS application architecture to provide scalability and optimized cost.

We designed and built the nderground social network. nderground is a social network designed for privacy and security. nderground has been live with close to 100% uptime for over three years.

At Topstone Software we have experience building Spring framework applications that utilize a variety of AWS services. These include Elasticsearch, DynamoDB, S3, the Simple Email System and Elastic Beanstalk. We can provide the consulting help to speed your application development or we can develop applications to your specification.

Ian Kaplan, Topstone Software Consulting, October 2018

About

A Java demonstration application that uses the Amazon Elasticsearch Service, Spring Boot and Spring MVC

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published