Last updated at Tue, 14 Dec 2021 14:53:31 GMT
UPDATE POSTED 12.12.21: If you are using Log4j, please be aware that on December 10, 2021, Apache released version 2.15.0 of their Log4j framework, which included a fix for CVE-2021-44228, a critical (CVSSv3 10) remote code execution (RCE) vulnerability affecting Apache Log4j 2.14.1 and earlier versions. This is a critical vulnerability, and we strongly urge you to update your Log4j instances immediately. You can learn more about this on our blog post on Log4Shell, our response to the impact to Rapid7 solutions, and our blog on using InsightVM to detect Log4Shell.
Over the last few months I’ve been writing about the benefit of using self-describing logging to make your log data easier and faster to process. I covered .NET and NodeJS. In this article I am going to apply my thinking to Java and Log4J. I am going to show you a technique for creating self-describing logging in the JSON format.
In order to get full benefit from reading this piece you need to have a working knowledge of Java in general and the Maven package management framework in particular. (The demonstration code that accompanies this article is written under Maven.) In addition, understanding the use of Appenders under Log4J is useful too. Yet, regardless of familiarity with the stated technologies, the concepts covered will serve you well nonetheless.
So, let’s get down to business.
The Value of Self-Describing Data Formats
Consider the following log entry.
canCompareLess result is Less
Now consider this log entry.
{ "message": "canCompareLess result is Less",
"createTime": "Jun 26, 2016 11:26:35 AM",
"loggedBy": "NumberComparerTest.canCompareLess",
"ipAddress": "192.168.1.131"
}
The first entry doesn’t tell you a lot, other than something about canCompareLess.
The second entry tells you a lot more. You know that there is a message and the time that the message was created. You know that entry was logged by NumberComparerTest.canCompareLess, Also, you know the IP address from where the entry was made. Why do you know these things? You know because the entry was made in the JSON format. JSON not only reports information, the format reports implicitly the metadata about the information. JSON is what we call a self-describing format. There are other data formats that are self-describing, XML and key-value pairs, for example.
Self-describing formats provide a lot of benefits when it comes to logging data. How many times have you tried to make sense of log files that are nothing more than strings of data that seem to be entered without rhyme or reason? Without prior knowledge as to what the various log entries mean you are lost. Self-describing formats address this problem. As mentioned above, not only does the format report the information, it also reports what the information is about. Let’s take a look at how we can use JSON, an implicitly self-describing data format, when entering log data using Log4J.
Where’s the code?
You can find the code for this article on GitHub, here.
The code sample is a Maven project that uses TestNG to test a simple class, NumberComparer, within the Maven project. The unit tests running against NumberComparer log information to a Log4J Console Appender and a Log4J Logentries Appender. The Logentries Appender sends log information directly to a log on the Logentries site.
Logentries provides a nice set of instructions for installing the Logentries Log4J Appender as a Maven Dependency. You can find the instructions here.
Working with Log4J Appenders
Log4J is the common logging technology that is used to manage logging behavior in Java. The way that Log4J works is that you include the Log4J package in your Java project. In this case, because I have built my Java project using Maven, I include an entry for Log4J in the Dependencies section of the pom.xml file, like so (Listing 1.):
<dependencies>
.
.
.
<dependency>
<groupId>log4j</groupId>
<artifactId>log4j</artifactId>
<version>1.2.17</version>
</dependency>
.
.
.
</dependencies>
**Listing 1: Adding Log4J as a dependency in a Maven **pom.xml file
Then I defined Appenders in the log4j.xml configuration file as shown in Listing 2.
.
.
<appender name="le" class="com.logentries.log4j.LogentriesAppender">
<param name="Token" value="MY_LOGENTRIES_TOKEN" />
<param name="Ssl" value="false" />
<layout class="org.apache.log4j.PatternLayout">
<param name="ConversionPattern" value="%m" />
</layout>
</appender>
<appender name="console" class="org.apache.log4j.ConsoleAppender">
<layout class="org.apache.log4j.PatternLayout">
<param name="ConversionPattern"value="%d{EEE MMM dd HH:mm:ss ZZZ yyyy},(%F:%L) %-5p: %m%n" />
</layout>
</appender>
.
.
Listing 2: Configuring Log4J Appenders in the log4j.xml file.
Each Appender in the Log4J.xml configuration defines a Conversion Pattern template that is used to format log entry output in a way that is relevant to the intended destination associated with the Appender. Once the Appender configuration is in place, when a log entry is made from within your application, Log4J intercepts the log entry and routes the entry to its Appenders. (Please see Figure 1 below.)
The benefit of using Appenders in Log4J is that log entry behavior is separated from log reporting behavior. A programmer decides what needs to be logged. A System Admin responsible for the operation of the application in production can add, remove and modify Appenders to meet the needs at hand. Thus, as shown in Figure 1 above, you can see that the System Admin configured a Console Appender to make a log entry according to the following Conversion Pattern definition:
<param name="ConversionPattern" value="%d{EEE MMM dd HH:mm:ss ZZ
Z yyyy},(%F:%L) %-5p: %m%n" />
That results in the output:
Sun Jun 26 12:05:05 -0700 2016,(NumberComparerTest.java:55) INFO : canCompareLess result is Less
What is a Conversion Pattern?
A Conversion Pattern is a template by which log information is rendered. The occasion when a developer is making a log output is called a log event. The term used for the information that the developer creates is called, the message. But, Log4J knows a lot more. Log4J knows the time of the log event, the class from which the event was raised as well as the line number in the class file. Also, Log4J knows the priority of the event, WARNING, ERROR, INFO, for example.
All this information is available when the Appender is about to output to the log. The format by which all this information is rendered is called the Conversion Pattern. Anytime you add an Appender to log4J.xml, you configure a Conversion Pattern that the Appender uses to render the entirety of the log information.
You can learn more about the details of Log4J Conversion Patterns here.
So, the good news is that a programmer can output log information that is relevant to the application and the System Admin can determine the format by which the information is reported according to the purpose of the Appender being used. Separation of concerns is usually a good thing. But, in this case there is a problem. We’re hoping the log message created by the programmer makes sense. But, what if it doesn’t. What if it is nothing more than, canCompareLess result is Less
. This is where self-description takes place.
Using JSON to Create Self-Describing Log Entries
Logging data that is relevant to the internal operation of an application is a decision that is made in development. Once that application code is in production, internal application logging cannot be changed. Thus, if we agree that logging using a self-describing format is a good thing to do, the most effective place to start is when the developer logs a message.
The word on the street is that developers are intrinsically lazy. Lazy is good! Lazy is why developers make so many good tools, so that they don’t have to do a lot of work. Therefore, in this spirit of laziness, wouldn’t it be nice if there was a way to make it easy for a developer to write a self-describing log message? Allow me to offer the following.
Consider the class, Logentry
, shown below in Listing 3:
package demo.reselbob.logging;
import java.util.Date;
import java.net.InetAddress;
import java.net.UnknownHostException;
import com.google.gson.Gson;
public class Logentry {
private String message = null;
private Date createTime = null;
private String loggedBy = null;
private String ipAddress = null;
/**
* The constructor for the utility class to create a log message in
* JSON format
*
* @param msg , the message for the log entry
* @param logBy, the entity or party making the log entry
*/
public Logentry(String msg, String logBy) {
message = msg;
createTime = new Date();
loggedBy = logBy;
try {
ipAddress = InetAddress.getLocalHost().getHostAddress();
} catch (UnknownHostException e) {
ipAddress ="UNKNOWN";
}
}
/**
* @return a log entry in JSON format
*/
public String toString(){
Gson gson = new Gson();
return gson.toJson(this);
}
}
Listing 3: A simple utility class used for creating log messages in data format
The class, LogEntry
is a simple utility class that takes two parameters in its constructor, the message (msg
) to log and the entity or person creating the log entry (logBy
). The class publishes one method, an override of the method, toString()
. toString()
emits in JSON format the values of the private member variables as shown in Listing 4, below.
{"message":"canCompareLess result is
true","createTime":"Jun 26, 2016 2:18:28 PM","loggedBy":"NumberComparerTest.canCompareLess","ipAddress":"192.168.1.131"}
Listing 4: The method, LogEntry.toString()
outputs raw JSON
LogEntry imports a package, Gson. Gson is a library that you use to convert Java objects into JSON objects. Thus, using Gson we convert the class, LogEntry
into a JSON object exposing each private member’s name and value. Then we go one step further. Within 2
we convert the JSON object into a string representation of itself. Presto! We have self-describing log data.
Not only does the class, LogEntry
report in a self-describing manner the intended log message as well as who or what the message was logged by, the class also reports the time the message was logged and the IP address of the machine from which it was logged. All the developer needs to do is call logger.info
like so:
String str = "canCompareLess result is " + (result == CompareState.Less);
logger.info(new Logentry(str,this.getClass().getSimpleName() + "." + name));
Thus, for the cost of adding an expression instead of a string to the logger.info()
statement, the code reaps the benefit of producing self-describing JSON as the log message in a consistent manner. But, wait there’s more. Let’s take a look at what happens to log data entered as JSON when it reaches the Logentries site.
Figure 2, below shows the posting of our log data made in JSON format in Logentries, in raw view, line by line for unindented JSON.
<
Now, let’s take a look at the same data on the Logentries site in Table View, as shown below in Figure 3:
Notice that Logentries provides the intelligence to segment data submitted as JSON into named columns. And, because the data can be segmented automatically, it is indexed automatically. As we can see, for the cost of using a simple class and using an expression rather than a string to create a log message, we pick up a whole lot of power and intelligence downstream.
Pretty cool, I’d say. And, it’s all because we are using a self-describing data format.
Putting It All Together
We’ve covered a lot in this article. I showed you some basic concepts about Log4J and Log4J Appenders. I showed you the benefits of using JSON to describe your data. Also, I showed you a Java class I created that makes logging JSON data a consistent, easy undertaking. To get a better understanding of the details behind all this, take a look at the links about Log4J Appenders and using Conversion Patterns that I provided throughout the text.
Creating self-describing log data using JSON provides not only a consistent, informative way to log your data, but also provides the structure you need to make your log data easy to understand and easy to search. Having developers use JSON as a standard format for log messages avoids confusing situations downstream. And, when you use the Logentries’s Log4J Appender you’ll be able to not only store your data easily and reliably, but the automatic intelligence that Logentries provides brings an added level of power to indexing your data and thus making inspection and analysis even easier.
There is little downside to logging data in self-describing formats. So, go ahead, be lazy and do it.