Stephen Reese

This is a quick post about one of many ways you may want to parse Microsoft DNS server logs. I this case, I simply wanted to know the top talkers. We use shell and Python in this entry on a Linux host. We follow-up with an all inclusive Python script if you want to skip to the end.

Here is the example data or you can follow along with your own:

DNS Server log file creation at 6/15/2014 6:11:48 PM UTC
Log file wrap at 6/15/2014 5:00:23 PM

Message logging key (for packets - other items use a subset of these fields):
        Field #  Information         Values
        -------  -----------         ------
           1     Date^M
           2     Time^M
           3     Thread ID
           4     Context
           5     Internal packet identifier^M
           6     UDP/TCP indicator^M
           7     Send/Receive indicator^M
           8     Remote IP^M
           9     Xid (hex)^M
          10     Query/Response      R = Response^M
                                     blank = Query^M
          11     Opcode              Q = Standard Query^M
                                     N = Notify^M
                                     U = Update^M
                                     ? = Unknown^M
          12     [ Flags (hex)^M
          13     Flags (char codes)  A = Authoritative Answer^M
                                     T = Truncated Response^M
                                     D = Recursion Desired^M
                                     R = Recursion Available^M
          14     ResponseCode ]^M
          15     Question Type^M
          16     Question Name^M

20140816 16:08:57 588 PACKET  019B99F0 UDP Rcv 192.168.0.2 80fd   Q [0001   D   NOERROR] A     (3)www(1)l(6)google(3)com(0)

20140816 16:08:57 588 PACKET  019CEFF0 UDP Snd 192.168.0.2 622d   Q [0001   D   NOERROR] A     (3)www(1)l(6)google(3)com(0)

20140816 16:08:57 588 PACKET  01C61480 UDP Rcv 192.168.0.2 622d R Q [8081   DR  NOERROR] A     (3)www(1)l(6)google(3)com(0)

20140816 16:08:57 588 PACKET  01C61480 UDP Snd 192.168.0.2 80fd R Q [8081   DR  NOERROR] A     (3)www(1)l(6)google(3)com(0)

20140816 15:51:47 588 PACKET  02131B00 UDP Snd 192.168.0.2 1b77   Q [0001   D   NOERROR] A     (9)messaging(9)microsoft(3)com(0)

20140816 15:51:47 588 PACKET  0242BD70 UDP Rcv 192.168.0.2 1b77 R Q [8081   DR  NOERROR] A     (9)messaging(9)microsoft(3)com(0)

20140816 16:28:56 588 PACKET  02447E50 UDP Rcv 192.168.0.2 6a24   Q [0001   D   NOERROR] A     (10)akamaiedge(3)net(0)

20140816 16:28:56 588 PACKET  01E8B070 UDP Snd 192.168.0.2 f11d   Q [0001   D   NOERROR] A     (10)akamaiedge(3)net(0)

20140816 16:28:56 588 PACKET  01BDA5A0 UDP Rcv 192.168.0.2 f11d R Q [8081   DR  NOERROR] A     (10)akamaiedge(3)net(0)

20140816 16:28:56 588 PACKET  01BDA5A0 UDP Snd 192.168.0.2 6a24 R Q [8081   DR  NOERROR] A     (10)akamaiedge(3)net(0)

Since there is a header, cut the 28 header lines.

$ sed '1,29d' log

Convert log from Windows to Unix format to handle pesky line returns:

$ awk '{ sub("\r$", ""); print }' log > log.wintounix

Get rid of blank lines:

$ sed '/^$/d' log.wintounix > log.nolines

Python code we are going to use to parse the file we have cleaned up.

import re
from collections import Counter
with open('log.nolines') as f:
    c = Counter('.'.join(re.findall(r'(\w+\(\d+\))',line.split()[-1])[-2:]) for line in f)

for domain, count in c.most_common():
    print domain,count

Sort the values returned from the Python script above, modify the key as needed.

$ sort -t" " -k3 -n -r parsed > parsed.sorted

That was a lot of work to parse a file. Lets make it a little easier. Run the following with an input file: parseMSDNS.py log

#!/usr/bin/env python
import re
import sys
import fileinput
import operator
import time
ret = {}

filename = sys.argv[1]
myfile = open(filename,'r')

start_time = time.time()
with myfile as theFile:
    for line in theFile:
        # normalize newlines
        #line = line.replace('\r\n', '\n').line.replace('\r', '\n')
        # match pattern returns true of false
        match = re.search(r'Q \[.+\].+\(\d+\)([^\(]+)\(\d+\)([^\(]+)',line.strip())
        if match != None:
            # if a match, determine the value
            key = ' '.join(match.groups())
            # calculate the number of key
            if key not in ret.keys():
                ret[key] = 1
            else:
                ret[key] += 1

for k in sorted(ret.keys(), key=lambda k:ret[k], reverse=True):
    print "{:15} - {}".format(k, ret[k])

print time.time() - start_time, "seconds"

That should do it. Leave a comment if something is not working as expected.


Comments

comments powered by Disqus