Last updated at Fri, 11 Mar 2022 17:00:11 GMT
While it could be true that life is more about seeking than finding, log searches are all about getting results. You need to get the right data back as quickly as possible. In this blog, let’s explore how to make the best use of InsightIDR’s Log Search capabilities to get the correct data returned back to you as fast as possible.
First, you need an understanding of how Rapid7’s InsightIDR Log Search feature works. You may even want to review this doc to familiarize yourself with some of the newer search functionality that has been recently released and to understand some of the Log Search nuances discussed here.
The basics
Let’s begin by looking at how the Rapid7 InsightIDR search engine extracts data. The search engine processes both structured and unstructured log data to extract out valuable fields into key-value-pairs (KVPs) whenever it is possible for it to do so. These normalized fields of data or KVPs allow you to search the data more efficiently.
While the normalized fields of data are typically the same for similar types of logs, in InsightIDR they are not normalized across the product. That is, you’ll see the same extracted fields, or keys, pulled out for logs in the same Log Set, but the extracted fields and key names used in other Log Sets may be different.
As everyone who has spent any amount of time looking at log data knows, individual log entries can be all over the place. Some vendors have great logs that contain structured data with all the valuable information that you need, but not all products do this. Sometimes the logs consist, at least in part, of unstructured data. That is, the logs are not in KVP or cannot be easily broken into distinct fields of data.
The Rapid7 search engine automatically identifies the keys in structured data, as well as from unstructured data, and automatically identifies the KVPs to make the data searchable. This allows you to search for any value or text appearing in your log lines without creating a dedicated index. That is, you can search by specifying either just text like “ksmith” or “/.*smith.*/”, or you can search with the KVP specified – for example “destination_account=ksmith” – with equal ease in the search engine. However, is one of these searches better than the other? Let’s keep going to answer that question.
As InsightIDR is completely cloud-native, the architecture is designed to take advantage of many cloud-native search optimizations, including shared resources and auto-scaling. Therefore, in terms of search performance, a number of specialized algorithms are used that are designed to search for data across millions of log lines. These include optimizations to find needle-in-a-haystack entries, statistical algorithms for specific functions (e.g. for percentile), parallelization for aggregate operations, and regular expression optimizations. How quickly the results are returned can vary based on the number of logs, the number of (matching) loglines in that time range, the particular query, and the nature of the data – e.g. the number of keys and values in a logline.
Did you know that statistics on the last 100 searches that have been performed in your InsightIDR instance are available in the Settings section of InsightIDR? Go to Settings -> Log Search -> Log Search Statistics to view them. In addition to basic information, such as when the query completed and how long it took, you can also use the “index factor” that is provided to determine how efficient your query is. The index factor is a value from 0 to 100 that represents how much the indexing technology was used during the search. The higher the index value, the more efficient the search is. The Log Search Statistics page is especially helpful if you want to optimize a query you will be running against a large data set or using frequently.
How to improve your searches
As you can see, the Log Search query performance can be influenced by a number of factors. While we discuss some general considerations in this section, keep in mind that for queries that run frequently, you may want to test out different options to find what works best for your logs.
General recommendations
Here are some of the best ways to speed up your log searches:
- Specify smaller time ranges. Longer time ranges add to the amount of time the query will take because the search query is analyzing a larger number of logs. This can easily be several hundred million records, such as with firewall logs.
- Search across a single Log Set at a time. Searching across different Log Sets usually slows down the search, as different types of logs may have different key-value pairs. That is, these searches often cannot be optimally indexed.
- Add functions only as needed. Searches with only a where() search specified are faster than searches with groupby() and calculate().
- Use simple queries when possible. Simple queries return data faster. Complex queries with many parts to calculate are often slower.
- Consider the amount of data being searched. Both the number of log entries that are being searched and the size of the log should be considered. As the logs are stored in key-value pairs, the more keys that the logs have, the slower they are to search.
The old adage about deciding if you want your result to be fast, cheap, or good applies here, too — except that with log searches, the triad that influences your results is fast, amount of data to be searched, and complexity of the query. With log searches, these tradeoffs are important. If you are searching a Log Set with large logs, such as process start events, then you may have to decide which optimization makes the most sense: Should you run your search against a smaller time range but still use a complex query with functions? Or would you rather search a longer time range but forgo the groupby() or calculate()? Or would you rather search a long time range using a complex query and then just wait for the search to complete?
If you need to search across Log Sets for a user, computer, IP address, etc., then maybe it makes more sense to build a dashboard with cards for the data points that you need instead of using Log Search. Use the filter option on the dashboard to specify the user, computer, etc. on which you need to filter. In fact, a great dashboard collection might just be your iced dirty chai latte, the combination that solves most of your log search challenges all at once. If you haven’t already done so, you may want to check out the new Dashboard Libraries, as more are being added every month, and they can make building out a useful dashboard almost instantaneous.
Specifying keys vs. free text
It is an interesting paradox of Log Search that specifying a key as part of the search does not always improve the search speed. That is, it is sometimes faster to use a key like “destination_account=ksmith,” but not always. When you specify a key-value-pair to search, then the log entries must be parsed for the search to complete, and this can be more time-consuming than just doing a text search.
In general, when the term appears infrequently, running a text-based search (e.g. /ksmith/) is usually faster.
Also, you may get better results searching for only a text value instead of searching a specific key. That is, this query:
where(FAILED_OTHER)
… might be more efficient and run faster than this query:
where(result=FAILED_OTHER)
Of course, this only applies if the value will not be part of any of the other fields in the log entry. If the value might be part of other fields, then you will need to specify the key in order for the results to be accurate.
Expanding on this further, the more specific you can be with the value, the faster the results will be returned. Be specific, and specify as much of the text as possible. A search that contains a very specific value with no key specified is often the fastest way to search, although you should test this with your particular logs to see what works best with them.
The corollary to this is that partial-match-type searches tend to be slower than if a full value is specified. That is, searching for /adm_ksmith/ will be faster than /adm_.*/. Finally, case-insensitive searches are only slightly slower than when the case is specified. “Loose” searches — those that are for partial and case-insensitive searches — are slower, largely because partial match searches are slower. However, these types of searches are usually not so slow that you should try to avoid them.
Contradictorily, it is also sometimes the case that specifying a key to search rather than free text can greatly improve indexing and therefore reduce the search times. This is particularly true if the term you are searching appears frequently in the logs.
Additional log search tips
Here are some other ways to improve your search.
- Check to see if a field exists before grouping on it. Some fields (the key part of the key-value pair) do not exist in every log entry. If you run groupby on a field and it doesn’t always exist, the query will run faster if you first verify that the field is part of the logs that are being grouped on. Example:
where(direction)groupby(direction)
- Which logical operators you use can make a big difference in your search results. AND is recommended, as it filters the data, resulting in fewer logs that need to be searched. In other words, AND improves the indexing factor of the search. OR should be avoided if possible, as it will match more data and slow down the search. In general, less data can be indexed when the search includes an OR logical operator. You do need to use common sense, because depending on your search criteria, it may be that you need to use OR.
- Avoid using a no-equal whenever possible. In general, when you are searching for specific text, the indexer is able to skip over chunks of log data and work efficiently. In order to search for a “not equal to,” every entry must be checked. The “no equal” expressions are NOT, !=, and !==, and they should be avoided whenever possible. Again, use common sense, because your query may not work unless you use a “no-equal.”
- The order that you specify text in the query is not important. That is, the queries are not evaluated left-to-right — rather, the entire query is first evaluated to determine how it can be best indexed.
- Using regular expression is usually not slower than using a native LEQL search.
For example, a search like
where(/vpn asset.*/i)
… is a perfectly fine search.
However, using logical operators in the regular expression will make the search slower for exactly the same reason that they can make the regular search slower. In addition, using the logical operators — especially the (“|”), which is logical OR — can be more impactful in regular expression searches, as they disable the use of indexing the logs. For example, a query like this:
where(geoip_country_name=/China|India/)
… should be avoided if possible. Instead, use this query:
where(geoip_country_name=/China/ OR geoip_country_name=/India/)
You could also use the functions IN or IIN:
where(geoip_country_name IN [China,India])
To summarize how the indexing works, let’s look at a Log Search query that I have constructed:
where(direction=OUTBOUND AND connection_status!=DENY AND destination_port=/21|22|23|25|53|80|110|111|135|139|143|443|445|993 |995|1723|3306 |3389|5900|8080/)groupby(geoip_organization)
Should this query be optimized? The first thing that the log search evaluator will do is to determine if any of the search can be indexed.
In looking at the components of the search, it has three computations that are all being ANDed together: “direction=OUTBOUND,” “connection_status!=DENY,” and then the port evaluation. Remember, AND is good since it can reduce the amount of data that must be evaluated. “direction=OUTBOUND” can be indexed and will reduce the amount of data against which the other computations must be run. “connection_status!=DENY” cannot be indexed since it contains “not equal” — in other words, every log entry must be checked to determine if it contains this KVP. However, the connection_status computation is vital to how this query works, and it cannot be removed.
Is there a way to optimize this part of the query? The “connection_status” key has only two possible values, so it can easily be changed to an equal statement instead of a no-equal. Also, not all firewall logs have this field so we can add verifying that the field exists to the query. Finally, the destination_port search is not optimal, as it contains a long series of OR computations. This computation is also an important criteria for the search, and it cannot be removed. However, it could be improved by replacing the regular expression with the IN function.
where(direction=OUTBOUND AND connection_status AND connection_status=ACCEPT AND destination_port IN [21,22,23,25,53,80,110,111,135,139,143,443,445,993,995,1723,3306, 3389,5900,8080])groupby(geoip_organization)
Will this change improve the search greatly? The best way to find out is to test the searches with your own log data. However, keep in mind that “direction=OUTBOUND” will be evaluated first, because it can be indexed. In addition, since in these particular logs (firewall logs), this first computation greatly reduces the amount of log entries left to be evaluated, other optimizations to the query will not greatly enhance the speed of the search. That is, in this particular case, both queries take about the same amount of time to complete.
However, the search might run faster without any keys specified. Could I remove them and speed up my search? Given the nature of the search, I do need to keep “connection_status” and “destination_port” as the values in these fields can occur in other parts of the logs. However, I could remove “direction” and run this search:
where(OUTBOUND AND connection_status!=DENY AND destination_port IN [21,22,23,25,53,80,110,111,135,139,143,443,445,993,995,1723,3306, 3389,5900,8080])groupby(geoip_organization)
In fact, this query runs about 30% faster than those with “direction=” key specified.
Let’s look at a second example. I want to find all the failed authentications for all the workstations on my 10.0.2.0 subnet. I can run one of these three searches:
where(source_asset_address=/10\.0\.2\..*/ AND result!=SUCCESS)groupby(source_asset_address)
where(source_asset_address=/10\.0\.2\..*/ AND result=/FAILED.*/)groupby(source_asset_address)
where(source_asset_address=/10\.0\.2\..*/ AND result IN [FAILED_BAD_LOGIN,FAILED_BAD_PASSWORD,FAILED_ACCOUNT_LOCKED, FAILED_ACCOUNT_DISABLED,FAILED_OTHER])groupby(source_asset_address)
Which one is better? Since the first one uses a “not equal” as part of the computation, the percentage of the search data that can be indexed will be less than the other two searches. However, the second search has a partial match (/FAILED.*/) versus the full match of the first search. Partial searches are slower than specifying all the text to be matched. Finally, the third search avoids both the “no-equal” and a partial match by using the IN function to list all the possible matches that are valid.
As you might have guessed, the third search is the winner, completing slightly faster than the first search but more than twice as fast as the second one. If you are searching a large set of data over a long period of time, the third search is definitely the best one to use.
How data is returned to the Log Search UI
Finally, although it is not related to log search speed, you might be curious about how data gets returned into the Log Search UI. As the log search query runs, as long as there are no errors, it will continue to pull back data to be returned for the search. For searches that do not contain groupby() or calculate(), results will be returned to the UI as the search runs. However, if groupby() or calculate() are part of the query, these functions are evaluated against the entire search period. Therefore, partial results are not possible.
If the search results cannot be returned because of an error, such as a search that cannot be computed or a rate-limiting error with a groupby() or calculate() function, then instead of the data being returned, you will see an error in the Log Search UI.
Hopefully, this blog has given you a better sense of how the Log Search search engine works and provided you with some practical tips, so you can start running faster searches.
Additional reading:
- Demystifying XDR: How Curated Detections Filter Out the Noise
- 2021 Cybersecurity Superlatives: An InsightIDR Year in Review
- What's New in InsightIDR: Q4 2021 in Review
- The End of the Cybersecurity Skills Crisis (Maybe?)