How to Integrate Elasticsearch with MongoDB

How to integrate Elasticsearch with MongoDBRecently, I faced an unusual requirement during the implementation of a project. My task was to implement a web crawler to index the content of a few websites and save those data into an Elasticsearch index for further analysis. The pitfall decision in this case lay in the fact that I had no strong reason to keep the extracted data in anywhere else, once that all user interaction with this data would be done using an web application that connects directly to Elasticsearch. But, if the Elasticsearch index mapping changes at anytime in the future, I would be forced to re-index part or all of the data, which means extract the same data from web sites again.

Adopting a relational database to address this need seemed to me an unjustified implementation effort. It would drastically increase the time, cost and complexity to implement and maintain the project, just to avoid a future risk of changes in my index mapping. Deal with database modeling, choose a persistence framework, implement extra tests, … I feel tired just to think about it. So, talking with my friend Paulo about this problem, he told me about the elasticsearch-river-mongodb project, an Elasticsearch plugin that propagates changes data from a MongoDB collection to an Elasticsearch index.

Use MongoDB seemed to be a good idea. The data extracted from website are not well structured and it is highly probable to suffer frequently changes. A schema-free / document oriented database fits well in this case, once that it’s flexible enough to accommodate changes on data structure with a minimum impact.

But, How to integrate Elasticsearch with MongoDB?

Despite the fact that elasticsearch-river-mongodb project seems to be awesome, offering filter and transformation capabilities, it is deprecated, having Elasticsearch 1.7.3 and MongoDB 3.0.0 as most recently supported versions. You can find more information about the deprecation decision on the article “Deprecating Rivers”.

It is a shame, but all is not lost. The MongoDB team offers mongodb-connector project which creates a pipeline to target systems and has a document manager for Elasticsearch. Great! And I’m so happy with the final result of this solution that I want to share my experience with you. My intention along of this post is to show what I found useful, what was tricky and what limitations I found during the implementation of this solution. Continue reading »

Monitoring Java applications with ELK

 Monitoring Java applications with ELKMonitoring Java applications with ELK (Elasticsearch, Logstash and Kibana) will show you step-by-step how to properly centralize your Java application logs by sending all its log messages to a remote ELK server. Using this approach you can have all information generated by Java applications, running along multiple servers, in a centralized place. This way you can easily create dashboards and start to analyze your applications in a more high level and practical manner.

You know it’s sad but true

Let’s think about a very common scenario in many companies: many developed Java applications running across multiple application servers, each application performing many operations per day and logging thousands and thousands of lines that generally nobody checks unless some problem occurs along the applications. It sounds familiar, doesn’t it? The biggest issue here is that, unless we are debugging a production problem, the logs have no value at all. They are not telling us anything about aspects we must care about, such as business process performance. There’s gold within these logs!

How about building a better scenario?

Think about the sad story I just told you. Now imagine all your Java applications producing the same amount of logs but then sending them to a centralized place where all received data is accordingly analyzed, modified and finally presented in a real accessible way. Would you like to know how many payments did your system realize in the last minute, day or week? What about how many times a specific exception was thrown? The possibilities are infinite.

Let’s see how to achieve this desired scenario using the ELK stack.

Proposed solution

Our proposed solution will combine one Java Application configured to use Logback (the successor of the famous Log4J), one specialized Log Appender class “LogstashTcpSocketAppender” (provided by the Logstash team) and one ELK server.

Tutorial – Monitoring Java applications with ELK

Step 1 – Setup the ELK stack

We have two detailed articles about how to setup the ELK stack on Ubuntu and Windows, please check them following the links bellow:

Step 2 – Configure Logstash to receive our logs

Within the ELK server, create a new configuration file, /etc/logstash/conf.d/logback-listener.conf for Ubuntu 16.04 and D:\ELK\logstash-2.3.4\conf.d\logback-listener.conf for Windows, inserting the following content: Continue reading »

How to install ELK on Windows

install elk windowsIn this tutorial we will provide you detailed instructions on how to install ELK (Elasticsearch, Logstash and Kibana) on Windows.

A short introduction about the ELK stack

The ELK is a powerful and versatile stack for collecting, analyzing and exploring data in real time.

The components of the ELK stack are:

Elasticsearch – Search and analyze data in real time.

Logstash – Collect, enrich, and transport data.

Kibana – Explore and visualize data.

Tutorial – How to install ELK on Windows

Step 1 – Install Java 8

This is a mandatory step once both Elasticsearch and Logstash require Java. We are recommending the Java 8 because so far is the most recent stable version.

While a JRE can be used for the Elasticsearch service, due to its use of a client VM (as oppose to a server JVM which offers better performance for long-running applications) its usage is discouraged and a warning will be issued.

Download JDK installer

Access the Java download page (http://www.oracle.com/technetwork/pt/java/javase/downloads/jdk8-downloads-2133151.html), click on “Accept License Agreement” and then select the option “Windows x64”. So far the newest version is jdk-8u101-windows-x64.exe.

Install JDK

Just execute the JDK installer and follow the wizard instructions.

Step 2 – Create a folder to keep the ELK components grouped

Create a directory “D:\ELK”. This directory will be used to keep all ELK components grouped in the same folder.

Step 3 – Download and configure Elasticsearch 2.3.5

Download Elasticsearch

Download the Elasticsearch ZIPPED package from here: https://download.elastic.co/elasticsearch/release/org/elasticsearch/distribution/zip/elasticsearch/2.3.5/elasticsearch-2.3.5.zip

Extract its content to the “D:\ELK” folder. The result will be “D:\ELK\elasticsearch-2.3.5”.

Continue reading »

How to install ELK on Ubuntu 16.04

install ELKIn this tutorial we will provide you with detailed instructions on how to install ELK (Elasticsearch, Logstash and Kibana) on Ubuntu 16.04.

A short introduction about the ELK stack

The ELK is a powerful and versatile stack for collect, analyze and explore data in real time. The components of an ELK are:

Elasticsearch – Search and analyze data in real time.

Logstash – Collect, enrich, and transport data.

Kibana – Explore and visualize data.

Tutorial

Step 1 – Install Java 8

This is a mandatory step once both Elasticsearch and Logstash require Java. We are recommending the Java 8 because so far is the most recent stable version.

First of all we need to add the Oracle Java PPA:

Then just update the apt package database and install the package oracle-java8-installer:

Continue reading »