Tuesday, November 22, 2011

ELSA Beta Available


After two months of solid coding, I'm proud to announce that Enterprise Log Search and Archive (ELSA) is available in beta quality and ships with an auto-installer for many Linux systems. The backend and query code has been rewritten from the ground up with many enhancements. Most importantly, query speed is an order of magnitude faster in many cases, and you can now report on string fields like any other field.  The query interface has been streamlined, along with other interface changes.


The rewrite focused on making the platform more robust by removing many moving parts and streamlining a lot of background processes. The result is agentless, fully independent log nodes which do not need to communicate with any other nodes. This allows for easy distribution of nodes in customer sites. In fact, the nodes can be accessed from as many different web frontends as you like, as they are nothing more than a MySQL query interface to external clients, so customers can have their own interface if they prefer. The web server now handles asynchronously parallelizing queries and aggregating the results, which has improved performance and reliability.

I've also added parsers and classes for the major Bro IDS logs, so ELSA is now a viable frontend for Bro. Below is a screenshot of a source IP address with a report by DNS hostname lookup. This search covered several billion logs and returned in less than a second.

There are many more features, and I will refer interested readers to the new documentation available. I'm very interested in any issues encountered during install and operation, so please let me know if you run into any issues. You may also file a bug report on the project page if you run into any problems.

Monday, September 26, 2011

Bro Quickstart: Cluster Edition

In my last post, I described the very basic commands you need to get a very minimal Bro IDS instance up and running for proof-of-concept purposes.  In this post, I'll show you how to take Bro a step further with Bro cluster.

Bro Cluster
There are several reasons for running Bro in a cluster.  A single Bro instance will be overwhelmed when it encounters more than about 80 Mb/sec of traffic.  You can run multiple Bro instances on the same machine (to take advantage of multiple cores) or multiple instances spread across distributed machines to cut the load each instance sees down to the 80 Mb/sec mark.

Another reason for using Bro cluster is the ease of managing Bro processes and logging.  There is a comprehensive log rotation framework integrated into Bro cluster, as well as seamless configuration changes on-the-fly.

Installation
This install quickstart will be written for Ubuntu Server 10.04 LTS, but aside from the prerequisite installation, the steps should be largely the same on any Linux distribution.  To run multiple Bro instances on the same local machine, we will use PF_RING's amazing ability to perform efficient, per-flow software pcap load-balancing.

# install prereqs
sudo apt-get install swig python-dev libmagic-dev libpcre3-dev libssl-dev cmake git-core subversion ruby-dev libgeoip-dev flex bison
# uninstall conflicting tcpdump
sudo apt-get remove tcpdump libpcap-0.8
cd ~/
# install PF_RING
svn export https://svn.ntop.org/svn/ntop/trunk/PF_RING/ pfring-svn
cd pfring-svn/kernel
make && sudo make install
cd ../userland/lib
./configure --prefix=/usr/local/pfring && make && sudo make install
cd ../libpcap-1.1.1-ring
./configure --prefix=/usr/local/pfring && make && sudo make install
echo "/usr/local/pfring/lib" >> /etc/ld.so.conf
cd ../tcpdump-4.1.1
./configure --prefix=/usr/local/pfring && make && sudo make install
# Add PF_RING to the ldconfig include list
echo "PATH=$PATH:/usr/local/pfring/bin:/usr/local/pfring/sbin" >> /etc/bash.bashrc
cd ~/
# Get and make Bro
mkdir brobuild && cd brobuild
git clone --recursive git://git.bro-ids.org/bro
BRODIR=/usr/local/bro-$(date +"%Y%m%d")
./configure --prefix=$BRODIR --with-pcap=/usr/local/pfring && cd build && make -j8 && sudo make install
sudo ln -s $BRODIR /usr/local/bro
cd /usr/local/bro


Now we need to edit the Bro config for your environment.  For the purposes of this quickstart, I will assume that you are monitoring a single interface, eth2.  Edit /usr/local/bro/etc/node.cfg to look like this (assuming your local host is 192.168.1.1 and you want to run 4 instances):
[manager]
type=manager
host=192.168.1.1

[proxy-0]
type=proxy
host=192.168.1.1

[worker-0]
type=worker
host=192.168.1.1
interface=eth2
lb_method=pf_ring

lb_procs=4

Now we need to edit share/bro/site/local.bro to add some options that I recommend to prevent some of the more verbose logs from filling up the drive.  Append the following text:


  event bro_init()
       {
       Log::disable_stream(HTTP::LOG);
        Log::disable_stream(Syslog::LOG);
        Log::disable_stream(Conn::LOG);
        Log::disable_stream(DNS::LOG);
        Log::disable_stream(Weird::LOG);

       }
Now we're all ready to start Bro cluster:
/usr/local/bro/bin/broctl
> install
> start

Finally, in an enterprise, you will want to get these logs into your log management or SIEM solution.  You can do this very easily out of the box in Ubuntu, which uses rsyslog by default.  Create /etc/rsyslog.d/60-bro.conf:

$ModLoad imfile #
$InputFileName /usr/local/bro/logs/current/ssl.log
$InputFileTag bro_ssl:
$InputFileStateFile stat-bro_ssl
$InputFileSeverity info
$InputFileFacility local7
$InputRunFileMonitor
$InputFileName /usr/local/bro/logs/current/smtp.log
$InputFileTag bro_smtp:
$InputFileStateFile stat-bro_smtp
$InputFileSeverity info
$InputFileFacility local7
$InputRunFileMonitor
$InputFileName /usr/local/bro/logs/current/smtp_entities.log
$InputFileTag bro_smtp_entities:
$InputFileStateFile stat-bro_smtp_entities
$InputFileSeverity info
$InputFileFacility local7
$InputRunFileMonitor
$InputFileName /usr/local/bro/logs/current/notice.log
$InputFileTag bro_notice:
$InputFileStateFile stat-bro_notice
$InputFileSeverity info
$InputFileFacility local7
$InputRunFileMonitor
$InputFileName /usr/local/bro/logs/current/ssh.log
$InputFileTag bro_ssh:
$InputFileStateFile stat-bro_ssh
$InputFileSeverity info
$InputFileFacility local7
$InputRunFileMonitor
$InputFileName /usr/local/bro/logs/current/ftp.log
$InputFileTag bro_ftp:
$InputFileStateFile stat-bro_ftp
$InputFileSeverity info
$InputFileFacility local7
$InputRunFileMonitor
# check for new lines every second
$InputFilePollingInterval 1
local7.* @central_syslog_server

Apply changes:

restart rsyslog

Thursday, August 18, 2011

Monitoring SSL Connections with Bro: Quickstart

Updated (8/20/2011) based on more info from Seth.
 
Introduction
Bro (www.bro-ids.org) is an amazing suite of software which can do things that no other IDS on the planet can come close to.  In this post, I want to cover one such feature: SSL monitoring.  Bro has a true understanding of the SSL being used on your network and will efficiently process certificates on the wire for a variety of purposes.  Out of the box, Bro can very efficiently and accurately identify invalid and self-signed certificates, going so far as to actually walk the certificate chain using the certs that ship with Mozilla browsers for a true test.  In addition, Bro will extract all of the relevant details from certificates for logging purposes, which can provide a handy historical record of the sites and companies involved in SSL, which is the next best thing to performing proxy/MITM SSL inspection.

Installing Bro
This quickstart guide will show how to get up and running with Bro on Ubuntu.  I hope that most of the commands and tips will apply to other operating systems and Linux distros, but there will surely be some differences.

Begin by making sure we've got our prerequisites in order:
apt-get install git libssl-dev swig libmagic-dev libgeoip-dev
Grab the latest Bro from the git repository.  Beware, this is cutting edge code, and you may need to download the latest stable tarball from www.bro-ids.org if the git build fails:
git clone --recursive git://git.bro-ids.org/bro
Now you will have bro and auxiliary files in a directory named "bro."
cd bro
I have discovered that on some Linux distros (SuSE, for one), the version of CMake is less than  2.6.3 and so it needs to be downloaded from www.cmake.org and custom installed as Bro requires 2.6.3 or better.
(edited: "--enable-brov6" apparently has memory leaks right now.)
./configure --prefix=/usr/local/bro-git
There are a fair amount of options here, but the configure script does a pretty good job of finding out if you've got things installed already and adjusting accordingly.  Since we're looking to do SSL inspection, at a minimum, you'll need to make sure you've got the OpenSSL development libraries installed, which we've done above with apt-get.  If all goes, well, we do the make:
make && cd build && sudo make install
Now we will add a custom bro script which Seth Hall wrote which will print to STDOUT any SSL certificates which were created less than 30 days ago.
cd /usr/local/bro-git/share/bro/site/
vi young-ssl.bro
Paste in the following (edited: removed "@load protocols/ssl"):
event SSL::log_ssl(rec: SSL::Info)
       {
       # We have to check if there is a not_valid_before field because not
       # all SSL transactions actually exchange certificates (i.e. resumed session).
       if ( rec?$not_valid_before && rec$not_valid_before >= network_time() - 30 days &&
            rec$not_valid_before <= network_time() )
               {
               print fmt("%s is using a certificate that just became valid in the last 30 days (%T) (%s)",
                       rec$id$resp_h, rec$not_valid_before, rec$subject);
               }
       }
Now we activate it in the config:
echo "@load young-ssl" >> local.bro
Create some basic log directories for a test run:
mkdir /tmp/bro-logs
cd /tmp/bro-logs
Start bro (assuming we want to monitor eth1):
sudo /usr/local/bro-git/bin/bro -i eth1 local
Let it run for awhile, then have a look at the various logs created.  ssl.log will contain a list of all SSL certificates observed.  Here's an example:

# ts    uid    id.orig_h    id.orig_p    id.resp_h    id.resp_p    version    cipher    server_name    subject    not_valid_before    not_valid_after    validation_status
1313897881.475569    QUVGS5xx9ea    192.168.1.121    36804    199.59.148.87    443    TLSv10    TLS_DHE_RSA_WITH_AES_256_CBC_SHA    api.twitter.com    CN=api.twitter.com,OU=Twitter Platform,O=Twitter\, Inc.,L=San Francisco,ST=California,C=US    1274158800.000000    1337317199.000000    ok

So there you have it!  A fully functional Bro installation in just a few easy steps.  In a future post, I will show you have to get Bro output into various output collection mechanism like syslog and databases.

Monday, July 25, 2011

Running a load-balanced Snort in a PF_RING cluster

Even though Snort itself is single threaded, PF_RING has software load-balancing capabilities which will allow you to run it as if it were multi-threaded.  Here's the glossed-over version of the howto:

Note: By default, PF_RING ships with CLUSTER_LEN=8, which means only 8 processes can participate in a cluster.  If you have more than 8 cores and want to increase this amount, you will need to edit the source code for the PF_RING kernel module (<PF_RING_SRC>/kernel/linux/pf_ring.h and change #define CLUSTER_LEN 8 to 16 (or however many cores you have).  Then re-install the module (make && make install) and rmmod pf_ring && modprobe pf_ring to activate the new one.


1. Get PF_RING with the snort daq included
  svn co https://svn.ntop.org/svn/ntop/trunk/PF_RING/
2. Compile the daq (assuming PF_RING installed to /opt/PF_RING)
  ./configure --with-pic --with-libpcap-includes=/opt/PF_RING/include CFLAGS=-lpthread -lpfring -lpcap -D_GNU_SOURCE && make && make install
3. Add the following to your snort.conf:
config daq: pfring
config daq_dir: /usr/local/lib/daq
config daq_var: clusterid=44 (this can be any number < 255)
4. Start snort with a shell script wrapper like this (assuming you have 8 CPU's and you are sniffing eth2):
#!/bin/sh
for COUNTER in 0 1 2 3 4 5 6 7; do
mkdir /tmp/snort$COUNTER
kill $(cat /tmp/snort$COUNTER/snort_eth2.pid)
sleep 5;
/usr/local/snort/bin/snort -c /etc/snort/snort.conf --pid-path=/tmp/snort$COUNTER -l /tmp/snort$COUNTER --daq-var bindcpu=$COUNTER -D &
done
5. Profit

Thursday, July 7, 2011

ELSA VMware Appliance Available

Peter over at Balabit has graciously offered a place to host a VM for ELSA.  You can download it at http://spike2.fa.gau.hu/~mcholste/elsa_vm.tar.gz .  It is a fully-functional ELSA installation running on Ubuntu 10.04 LTS.  It will start all necessary services to begin recording and viewing logs and provides a good way to see what ELSA is all about without a major time investment.  Please note that performance-wise, a VM will not be ideal, but it should be enough for interested readers to get a look at what ELSA can do.  The user name for the VM is "elsa" and pass is "biglog" .  I've included an SVN update script in the tarball now, so if you want to make sure the ELSA installation is current, you can run /usr/local/elsa/contrib/update_from_svn.sh and then execute "service elsa restart."  Please let me know if you run into any issues or have any comments!

Monday, April 11, 2011

Network Intrusion Detection Systems


Incident response starts with a seed event which triggers an investigation. In my organization, this usually starts with a network intrusion detection system (NIDS) generating an alert based on predefined signatures. There are two major open-source NIDS: Snort and Suricata. There are lots of FAQ's and descriptions on the respective project web sites, so I will not cover the basics here. Instead, I will discuss some of the differences between them as well as performance guidelines.

Architecture Overview
The general flow of information is the same for both Snort and Suricata, though their internals differ substantially. At a high level, the network inspection looks like this:
Network packets → Packet header parsing → Payload normalization → Payload inspection → Alert generation
Generally, the processing effort for each phase is distributed as 10% for parsing, 10-20% for normalization, and 70-80% for payload inspection. Alert generation is usually negligible. If the sum total of effort it takes to complete these phases for a given packet exceeds the resources available, the packet will be “dropped,” meaning it goes unnoticed. Therefore, performance tuning is critical to reliably detecting intrusions.

Capacity Planning
Clearly, the vast majority of performance is dictated by the effort it takes to complete the payload inspection. This means that the reliability of a sensor is ultimately decided by whether or not it has enough resources to inspect the payload of the traffic it is assigned. (Some IDS events are generated solely based on IP addresses, but they are an exception.) When sizing hardware to create an IDS, here is my rule of thumb:
1 CPU = (1000 signatures ) * (500 megabits network traffic)
That is, you need one CPU for every thousand signatures inspecting 500 Megabits of network traffic. So if your rule set has 4000 signatures and your Internet gateway has 300 Megabits of network traffic, you will need at least ((4000/1000) = 4) * ((300/500) = .6) = 2.4 CPU's, meaning you'll need to spread the traffic across three CPU's. I should take a moment to point out that this formula applies to standard traffic for most organizations in which web makes up 80-90% of the traffic, followed by email. In a server hosting environment, you will need to find your own benchmarks.

Sizing Preprocessor Overhead
After you've acquired a box to use but before you start with finding out how many signatures you can run with your resources, you need to get a baseline of the performance when only running the payload normalization, usually referred to as preprocessors. Doing so is very simple: run the sensor with no rules loaded at peak traffic periods (around noon in an office environment). If you run the NIDS without daemonizing using the time command, you can get a pretty accurate reading on how much CPU time it takes. Here's an example:
time snort -c /etc/snort/snort-norules.conf
Let it run for around five minutes and kill it with Ctrl+C. The time command will then print out stats on the run, that look something like this:
real 4m33.143s
user 0m36.218s
sys 0m14.937s
“User” and “sys” refer to how much time was spent performing the inspection and how much time was spent moving packets from the network card into RAM and then into the NIDS, respectively. Add up “user” and “sys” and divide by “real” to get the percentage of CPU required. In this run, it would be (36.2 + 14.9)/273.1 = .18, or 18%. That is the percentage of CPU it takes to normalize the payloads. Keep in mind that during off-peak hours, most packets will not have large payloads, and so the percentage can be higher than during peak usage periods.

Detecting Packet Drops
Any libpcap-based IDS like Suricata and Snort will give you packet drop statistics. However, these numbers are unreliable, especially in high-drop situations. One way you can quickly find out if you are dropping packets is to run the above test but with your rules loaded. If your total CPU percentage is over 75%, there is a very good chance that you are occasionally dropping packets. If it's 95% or more, you are frequently dropping packets.

However, despite the above math, detecting packet loss when it really counts is still something of an art, as it's very difficult to account for a given packet. The best way to really determine whether or not your sensor is generally catching what it is supposed to catch is to setup a “heartbeat” signature. This consists of two parts: a script that will make a web request to a test site at a regular interval, and a signature designed to alert on that request. Here's an example Perl command:
perl -MLWP::UserAgent -e 'LWP::UserAgent->new()->get("http://example.com/testheartbeat123");'
That will make a web request to example.com. You should replace example.com with a site you have permission to make requests against.

The second part is to write a signature that will detect the heartbeat. Here's a corresponding Snort/Suricata signature that will detect the heartbeat request:
alert tcp any any → any 80 (msg:”Heartbeat” content:”/testheartbeat123”; http_uri; classification:not-suspicious; sid:1;)
Put an entry in your sensor's cron tab (this is assuming you're using Linux) to make the request every minute:
* * * * * perl -MLWP::UserAgent -e 'LWP::UserAgent->new()->get("http://example.com/testheartbeat123");' > /dev/null 2>&1

Now you should get an alert every minute, on the minute for your heartbeat signature. If there is ever a missing entry in your alert log, then you know the sensor had a lapse in coverage at that moment. If you log your alerts to a database, you can then create graphs using a spreadsheet to plot the times when your sensor was overloaded. You may also consider setting up a monitoring script that feeds a program like Nagios to detect if an entry is missing. The really nice thing about this setup is that it is a true check for the entire chain: traffic sourcing, detection, and alert output. If any part of the chain breaks, you'll be able to tell.

The caveat, of course, is that it is very possible to get a heartbeat alert in the same minute that the sensor is overloaded, so this is better for establishing trends, e.g. “the sensor got all 3600 heartbeats last hour” versus “there's no way we could've missed a packet that minute—we got a heartbeat.” The absolute, definitive test is to record all traffic to pcap for a given amount of time and replay it as a readfile through the NIDS. If the alerts from the readfile match the alerts you got live, then you're all set.

Multi-CPU Setups
Snort, unlike Suricata, is single-threaded. This means that anytime a single CPU cannot handle the load, packets will be lost. Suricata, by contrast, will attempt to use all of the CPU's on the sensor and will load-balance the traffic across all of the CPU's, so there is little tuning needed in this regard. In order to inspect more traffic than a single CPU can handle with Snort, you will need to run multiple Snort instances. In addition to the extra management overhead, this means that you will have to find a way to split the traffic evenly across those instances. The easiest way to do this is to use Luca Deri's PF_RING module to create a pfring DAQ for Snort. Details can be seen on Luca's blog.

It should be noted that when running Snort with multiple instances, each instance will have to normalize the traffic. Suricata improves on this by normalizing the traffic once (depending on configuration), then pushing the normalized traffic to worker threads which perform the payload inspection. Therefore, you will incur a 10-20% CPU utilization penalty per Snort instance.

Advanced Performance Enhancement
I stated above that about 10% of the CPU is devoted to the initial packet header parsing. This can be reduced and eliminated through several means. Using PF_RING, as mentioned above, the overhead can be drastically reduced to more like 1-2%. This can be very important if the link has gigabit or higher data but, you only want to inspect some of it. Filtering high-speed networks can be CPU intensive unless you use something like PF_RING to offload the packet filtering.

Another alternative is to purchase a pcap acceleration card, like the DAG cards manufactured by Endace Corporation. These cards range in price from a few to tens of thousands of dollars, depending on 1/10 gigabit configuration. They offload the entire burden of packet header parsing and filtering from the CPU onto the built-in packet processor.

Lastly, you can purchase a pcap load balancer, such as those made by Datasys and Gigamon. The feature sets range from basic replication to appliance-based pcap filtering. This option is the most effective but also the most costly.

Conclusion
If at all possible, I recommend diversifying your security portfolio by running both Snort and Suricata. Rapid development continues on Suricata (as well as improvements to Snort), so I hesitate to make any claims right now regarding which performs better overall. It is clear that Suricata has some major advantages when it comes to IP-only rules as well as detecting protocols on non-standard ports, but Snort has a solid track record and a mature codebase.

Saturday, April 2, 2011

Lizamoon: Knowing is Half the Battle

In my last post, I showed the different log sources that are readily available for collection, including a custom one: httpry_logger. Having a detailed record of every URL visited that is instantly accessible via ELSA makes incident response for well-publicized events very simple. This week, many news sites reported on a new attack made famous by Websense. Across the globe, IT security analysts were asking themselves (and, in many cases, being asked by management), if anyone had been affected by the malicious links the attack had scattered over hundreds of thousands of web sites. Depending on the tools available to the analyst, this can be a time-consuming task. With ELSA being fed by httpry_logger, however, this question can be answered while your boss is still standing at your desk!

The question: Did anyone visit the malicious web page, lizamoon.com (now offline), and if so, what happened? In ELSA query language, which is intentionally very similar to Google query language, the query is:
site:lizamoon.com

There are lots of hits, and your boss looks worried! Looks like a lot of cross-site scripting (XSS) in there. However, Websense reported that the site had already been taken offline. As you can see, many of these requests did not receive a response. So, let's drill down a bit and ask: who visited the malicious site and got data back? ELSA:
site:lizamoon.com content_length>=1

That looks better, but there are still a few hits. We need to drill down to see what the malicious content was. For this, we'll need to use a different tool, StreamDB, which I will describe in detail in a later post. For now, let's have a look at the StreamDB output for our query:

Returning 2 of 2 at offset 0 from Mon Mar 28 17:57:17 2011 to Mon Mar 28

17:57:17 2011
2011-03-28 17:57:17 x.x.x2.4:57986 ->; 95.64.9.18:80 0s 594 bytes RST GET lizamoon.com/ur.php
oid=12447-2986004740-594-0
GET /ur.php

Connection: Keep-Alive
Accept: */*
Accept-Encoding: gzip, deflate
Accept-Language: en-us

Host: lizamoon.com
Referer: http://www.designbasics.com/search/cdl_template/cdl-photo.asp?sPlanNo=24059&Exposure=33a-400&Path=http://www.designbasics.com/designs/24000/&ViewType=UPPER&IsPhoto=False&lPath=http://www.designbasics.com/designs/24000/&HomeTour=False&PlanName=Chatham&SquareFeet=1593&MaxWidth=&MaxDepth=
User-Agent: Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; .NET CLR 1.1.4322; .NET CLR 1.0.3705; InfoPath.1; .NET CLR 2.0.50727; MS-RTC LM 8)
X-HTTP-Version: 1.1

2011-03-28 17:57:17 x.x.x2.4:57986 <- 95.64.9.18:80 0s 975 bytes RST 200 ASCII text, with very long lines, with no line terminators

oid=12447-2986005334-975-0
200 OK

Connection: Keep-Alive

Date: Mon, 28 Mar 2011 22:56:18 GMT

Server: Apache/2.2.17 (FreeBSD) mod_ssl/2.2.17 OpenSSL/0.9.8n DAV/2 PHP/5.3.3
Content-Length: 650
Content-Type: text/html
Keep-Alive: timeout=5, max=100
Set-Cookie: click888=1; expires=Wed, 27-Apr-2011 22:56:18 GMT
X-HTTP-Version: 1.1
X-Powered-By: PHP/5.3.3
document.location = 'http://defender-nrpr.in/scan1b/237?sessionId=0500(snip)

So, content did indeed come down, and obviously, the goal is to set the browser's location to the next malicious site, defender-nrpr.in. We can then use ELSA to see if the client did follow to that site:
site:defender-nrpr.in
No results found, so our client is ok! So what did our client do? We do an ELSA query for that client IP for the few seconds around the request and see that there were no other suspicious requests. Looks like everything is fine, and management can rest easy.

Let's dig a little deeper in ELSA to see what other information we can find out about lizamoon. What are the hacked sites that are funneling traffic to the malicious drop site? ELSA:
site:lizamoon.com groupby:referer

This shows us the top unique page URI's that are linking to the malicious site. What IP addresses are serving lizamoon.com? ELSA:
site:lizamoon.com groupby:dstip
Who went there?
site:lizamoon.com groupby:srcip

Because each one of these queries finishes in a second or two, we can answer all of these questions in under thirty seconds—short enough to do while your boss is right there! This process of broad queries followed by drill-downs is the staple of efficient investigation. A lot of tools, (Websense, for instance), will give you the ability to do the broad queries, and to some extent, drill-down, but the difference is that the follow-up queries take far more time in large organizations, which can make your analysts less likely to follow every lead as they have to constantly prioritize their time. Queries that take almost no time are much more likely to be made. So if knowing is half the battle, then asking the questions is the other half.

Thursday, March 31, 2011

Comprehensive Log Collection

In my last post, I described the importance of comprehensive logging in an enterprise and how you can use the open-source ELSA to get your logs collected and indexed. In this post, I'll describe the various things you can use to generate logs so that you have something to collect.

The Haystack

The classic dilemma with log collection is that the volume of ordinary logs drowns out important ones. ELSA solves this problem by allowing an analyst to cut through a massive amount of irrelevant logs and immediately find the sought-after logs. This allows the organization to enable extremely verbose logging from all of their devices without sacrificing the ability to find important logs. That in turn allows verbose logs to assist in investigations when they normally would have been sacrificed for efficiency. As a secondary benefit, it reduces the amount time spent managing logs because no one is tasked with the difficult choices of which logs should be filtered and which should be kept.

Historically, network devices and UNIX/Linux devices have been the main source of syslogs. Network logs are critical to detecting attacks and conducting incident response. In addition to providing network connection records for both allowed and denied connections, other important logs are sent by network devices. For instance, denial of service attacks can produce logs from firewalls indicating that they have reached their open connection limit. A Cisco FWSM will generate logs like “Connection limit exceeded” and should be alerted on using ELSA. Other logs may not be errors, but are anomalies. Specifically, logs regarding configuration changes are helpful for detecting unauthorized access or improper change management.

ELSA can help zero-in on these kinds of logs by providing the negative operator in a search query. If most logs from a device contain a certain string like “connection,” then the query can be altered with “-connection” to exclude all of those. These searches happen so quickly that you can work through adding a half-dozen negations in a few seconds to uncover a new anomaly. The interesting string representing the anomaly can then be added as an alert for the future. In the screenshot below, you can see a series of queries, each with a decreasing number of results (the number in parenthesis on the tab) with an increasing amount of negation.

Collecting Network Logs

Let's start with an example for configuring a Cisco router to log all network connection records to syslog. There's a great example of setting up logging from both Cisco Catalyst switches and routers, but in a nutshell, it involves a single line added for your log host (for example 192.168.10.10):

logging 192.168.10.10

Almost all network vendors provide a way to export logs as syslog. If possible, use TCP to prevent log loss. ELSA will handle either TCP or UDP logs.

Collecting Linux Server Logs

Setting up logging on UNIX and Linux is generally simple, but there are differences in the logging clients used. Standard Linux boxes up until a few years ago used the venerable syslogd agent to preform logging. To forward all logs to a syslog server of 192.168.10.10, you would add this to /etc/syslog.conf:

*.* @192.168.10.10

Then just restart syslogd:

/etc/init.d/syslogd restart

Different Linux distributions may change the restart command or the location of the syslogd.conf. A similar syntax is used for the newer rsyslog.

For Syslog-NG, adding a remote logging destination is a bit more involved, but is still not overly complicated. A typical syslog-ng.conf file (usually located in /etc/syslog-ng) will have a source entry like this:

source src {

internal();

unix-dgram(“/dev/log”);

}

This is the entry that allows it to listen on the /dev/log socket for incoming local logs. To forward all of these logs to a remote server, we want to add a log destination to our remote server 192.168.10.10.

Destination d_remote { udp(“192.168.10.10”); };

Then, we add a statement that forwards all logs:

log { source(src); destination(d_remote); };

Restart syslog-ng:

/etc/init.d/syslog restart

Now the server should be forwarding all of its local logs to the remote syslog server.

Collecting Windows Server Logs

Windows Server 2008 introduced a new feature in which servers can subscribe to logs from another server. Unfortunately, Microsoft implemented this in a proprietary way which means that it is not syslog compatible. Luckily, there is a good solution to this: Eventlog-to-syslog. Evtsys works on all Windows versions and is available for 32- and 64-bit. Installation could not be simpler: download the executable from the site and run it from a command-line:

evtsys.exe -i -h my.syslog.server.address

Done! There are also a number of enterprise options for configuring backup syslog servers, as well as fine-tuning which events are sent through the registry. See the evtsys documentation for more details.

The great thing about evtsys is that in addition to its very small footprint and ease of install, is that it will, by default, log all eventlog categories, including application-specific categories like SQL server. ELSA has a built-in parser for events forwarded by evtsys and will parse them so that producing reports on event ID and other characteristics are possible.

For ultra-verbose logging, you can enable Windows process accounting which will create a log for every process created. This creates a veritable torrent of logs, but with ELSA's horsepower, it will take them in stride, making them available in case of a breech. It's nearly impossible for an attacker to infiltrate a server and do damage without starting any new processes. Logging active directory account creations alone makes this a worthwhile endeavor.

Evtsys works on Windows desktops just as well as servers. Malware hunting is much easier when you have a log of all the processes created on the machine by the installation of a rootkit.

Collecting Miscellaneous Logs

Applications on servers often generate very helpful, verbose logs which provide a critical view into the business logic of the app. The only way to catch particularly sophisticated attacks are through the monitoring of the business logic because no observable exploits or attacks will be used. Unfortunately, most apps log to flat files instead of the system's built-in logging facility, and forwarding flat files is often more challenging than it should be. However, there are a few tricks for sending flat-file logs from a server and streaming them as syslog which I will detail below.

On Windows, you will need to install yet another third-party program to perform the logging. It's called Epilog and it's available from Intersect Alliance. This small program will run as a Windows service and stream all files that match a pattern in configured directories as syslog.

Linux makes this much easier if you have a recent version of Syslog-NG. Check out this excellent post from Syslog-NG's makers, Balabit, on how to setup Syslog-NG 3.2 for forwarding flat-files. Of particular interest is the ability to use wildcards to specify intermediate directories like this:

/var/log/apache2/*/*.log

which would allow a web server with a lot of virtual hosts on it to easily forward all of their logs.

Creating Your Own Log Generators

Sometimes you will find that there isn't a good existing source for the information that you want to get into ELSA. I wanted an easy, efficient way for recording URL's on a network to ELSA to correlate with IDS alerts. Unfortunately, we didn't use a web proxy, so there was no easy way of logging this. So, I created httpry_logger to address this issue. It will forward all requests with the response code and size as a log like this:

10.124.19.12|66.35.45.157|GET|isc.sans.org|/diary.html?storyid=10501&rss|-|Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.14) Gecko/20110221 Ubuntu/10.10 (maverick) Firefox/3.6.14|,org,sans.org,isc.sans.org|301|260|8583

ELSA parses this output and creates fields like this:

host=10.68.2.28 program=url class=URL srcip=10.124.19.12 dstip=66.35.45.157 status_code=301 content_length=260 country_code=US method=GET site=isc.sans.org uri=/diary.html?storyid=10501&rss referer=- user_agent=Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.14) Gecko/20110221 Ubuntu/10.10 (maverick) Firefox/3.6.14 domains=,org,sans.org,isc.sans.org

Notice how the domains field includes the comma-separated list of possible subdomains for easy searching. So, an ELSA search for “sans.org” will return all results for web requests to sites under the sans.org domain.

Log What You Can

Even if you're unable to get every log source you want to stream logs, don't let that stop you from getting the quick wins under your belt by enabling logging on what you can. Remember, the benefits are linear, so the more you're logging the more benefit you're getting. Ignore perfection and concentrate on progress!


Monday, March 28, 2011

Fighting APT with Open-source Software, Part 1: Logging

Just because Advanced Persistent Threats (FUD) is a marketing buzzword doesn't mean that it isn't a problem. The Cisco Security Blog had a fantastic post detailing what APT is, what it is not, and what it takes to defend against it. From the post: “The state of the art in response to APT does not involve new magic software hardware solution divining for APT, but relies more on asking the right questions and being able to effectively use the existing detection tools (logging, netflow, IDS and DPI).

The article then goes on to detail exactly what you need to combat APT. As they stated, it is in fact NOT a product. It is a collection of information and tools which provides a capability utilized by a team in a perpetual investigatory process. They are dead-on as they describe what you need. Here is my paraphrased reproduction:

  1. A comprehensive collection of logs and the ability to search and alert on them.

  2. Network intrusion detection.

  3. A comprehensive collection of network connection records.

  4. Information sharing and collaboration with other orgs.

  5. The ability to understand the malware you collect.

I'm going to add another requirement of my own:

  1. The ability to quickly view prior network traffic to gain context for a network event and collect network forensic data.

These items shouldn't be a huge shock to anyone, and are probably already on a to-do list somewhere in your organization. It's like asking a doctor what you should do to be healthy: she'll say to exercise and eat right. She will certainly not prescribe diet pills. But much like some people find a workout schedule that works for them, I'm going to detail the implementations and techniques that work for us and will probably work for you.

There is a lot of ground to cover here, so I am going to address solutions to these tasks in a series of posts which detail what is needed to fulfill the above requirements and how it can be done with 100% open-source software.

In this introductory post, I'll tackle the biggest, most important, and perhaps most familiar topic: logs.

Enterprise Log Management (Omniscience Through Logging)

Producing and collecting logs is a crucial part of fighting APT because it allows individual assets and applications to record events that by themselves may be insignificant, but may be an indicator of a malicious activity. There is no way to know ahead of time what logs are important, so all logs must be generated and collected. APT will not generate “error” logs unless you are very lucky—it's the “info” (and sometimes “debug”) logs that have the good stuff.

The first major hurdle for most organizations is collecting all of the relevant information and putting it in one place (at least from a query standpoint). There are a lot of reasons why this task is so difficult. The first of which is that historically, log collection is just not sexy. It's hard for people to get excited about it, and it takes a herculean effort to do it effectively. Unless you have a passion for it, it's not going to get done. Sure, it's easy enough to to get a few logs collected, but for effective defense, you're going to need comprehensive logging. This is generally accomplished by enabling logging on every asset and sending it all to an SIEM or log management solution. This is a daunting task, and this is one of the biggest reasons why fighting APT is so hard. Omniscience does not come easily or cheaply.

If you have the money, there are a lot of commercial SIEM and log management solutions that can do the job out there. Balabit makes a log collection product with a solid framework, ArcSight has an excellent reputation with its SIEM, and I can personally vouch for Splunk as being a terrific log search and reporting product. However, large orgs will have massive amounts of logs, and large-scale commercial deployments are going to be extremely expensive (easily six figures or more). There are a number of free, open-source solutions out there which will provide a means for log collection, searching, and alerting, but they are not designed to scale to collecting all events from a large organization, while still making that data full-text searchable with millisecond response times. That kind of functionality costs a lot of money.

Building Big

Almost two years ago, I set out to create a log collection solution that would allow Google-fast searching on an massively large set of logs. The problem was two-fold: find a syslog server that could receive, normalize, and index logs at a very high rate, and find a database that could query those logs at Google speeds, all with massive scalability. I have to say that when I first started, I believed that this task was impossible, but I was glad to prove myself wrong.

The first breakthrough was finding Syslog-NG version 3, which includes support for the pattern-db parser. It allows Syslog-NG to be given an XML file specifying log patterns to normalize into fields which can be inserted into a database. It does this with a high-speed pattern matching algorithm (Aho-Corasick) instead of a traditional regular expression. This allows it to parse logs at over 100k logs/second on commodity hardware. Combined with MySQL's ability to bulk load data at very high rates (over 100k rows/second), I had an extremely efficient mechanism for getting the logs from the network, parsed, and stored in a database.

The second task, finding an efficient indexing system, was much more challenging. After trying about a half-dozen different indexing techniques and technologies, including native MySQL full-text, MongoDB, TokuDB, HBase, Lucene, and CouchDB, I found that none of them were even close to being fast enough to keep up with a sustained log stream of more than a few thousand logs per second when indexing each word in the message. I was especially surprised when HBase proved too slow, as it's the open-source version of what Google uses.

Then I found Sphinxsearch.com, which specializes in open-source, full-text search for MySQL. Sphinx was able to index log tables at rates of 50k logs/second, and it provided a huge added feature: distributed group-by functionality. So, armed with Syslog-NG, MySQL, and Sphinx, I was able to put together a formal Perl framework to manage bulk loading log files written by Syslog-NG into MySQL and indexing the new rows.

That all proved to be the easy part. Writing a web frontend and middleware server around the whole distributed system proved to be the tougher challenge. Many thousands of lines of Perl and Javascript later, I had a web app that used the industry standard Yahoo User Interface (YUI) to manage searching, reporting, and alerting on the vast store of logs available.

Introducing Enterprise Log Search and Archive (ELSA)


ELSA collects and indexes syslog data as described above, archives and deletes old logs when they reach configured ages, and sends out alerts when preset searches match new logs. It is 100% web-based for both using and administering.

Main features:

  • Full-text search on any word in a message or parsed field.

  • Group by any field and produce reports based on results.

  • Schedule searches.

  • Alert on search hits on new logs.

  • Save searches, email saved search results.

  • Create incident tickets based on search results (with plugin).

  • Complete plugin system for results.

  • Export results as permalink or in Excel, PDF, CSV, and HTML.

  • Full LDAP integration for permissions.

  • Statistics for queries by user and log size and count.

  • Fully distributed architecture, can handle n nodes with all queries executing in parallel.

  • Compressed archive with better than 10:1 ratio.



One of the biggest requirement differences between a large-scale, enterprise logging solution versus your average log collector is assigning permissions to the logs so that users receive only the logs they are authorized for. ELSA accomplishes this by assigning logs a class when they are parsed and allowing administrators to assign permissions based on a combination of log class, sending host, and generating program. The permissions can be either local database users for small implementations, or LDAP group names if an LDAP directory is configured.

Permissions are a crucial and powerful component to any comprehensive logging solution. That gives security departments the power to let web developers have access to the logs specific to their web site to look for problems without allowing them access to sensitive logs. The site authors may be the most qualified to notice suspicious activity because they will have the most knowledge of what is normal. The same goes for administrators and developers in other areas of the enterprise.

However, the biggest win for the security department is that log queries finish quickly. Ad-hoc searches on billions of logs finish in about 500-2000 milliseconds. This is critical, because it allows security analysts to explore hunches and build context for the incident they are analyzing without having to decide if the query is worthwhile before running it. That is, they are free to guess, hypothesize, and explore without being penalized by having to wait around for results. This means that the data from a seed incident may quickly blossom into supporting data for several other, tangentially related incidents because of a common piece of data. It means that the full extent and context of an incident becomes apparent quickly, allowing the analyst to decide if the incident warrants further action or can be set aside as a false-positive.

Getting ELSA

ELSA is available under GPLv2 licensing at http://code.google.com/p/enterprise-log-search-and-archive/ . Please see the INSTALL doc for specifics, but the basic components, as mentioned above, are Linux (untested on *BSD), Syslog-NG 3.1, MySQL 5.1, Sphinx search, Apache, and Perl. It is a complex system and will require a fair amount of initial configuration, but once it is up and running, it will not need much maintenance or tuning. If you run into issues, let me know and I will try to help you get up and running.