Parsing Microsoft DNS Server Logs

December 14, 2014 · By Stephen Reese

This is a quick post about one of many ways you may want to parse Microsoft DNS server logs. In this case, I simply wanted to know the top talkers. We use shell and Python in this entry on a Linux host. We follow-up with an all inclusive Python script if you want to skip to the end.

Here is the example data or you can follow along with your own:

DNS Server log file creation at 6/15/2014 6:11:48 PM UTC
Log file wrap at 6/15/2014 5:00:23 PM Message logging key (for packets - other items use a subset of these fields): ```sql Field # Information Values ------- ----------- ------ 1 Date^M 2 Time^M 3 Thread ID 4 Context 5 Internal packet identifier^M 6 UDP/TCP indicator^M 7 Send/Receive indicator^M 8 Remote IP^M 9 Xid (hex)^M 10 Query/Response R = Response^M blank = Query^M 11 Opcode Q = Standard Query^M N = Notify^M U = Update^M ? = Unknown^M 12 [ Flags (hex)^M 13 Flags (char codes) A = Authoritative Answer^M T = Truncated Response^M D = Recursion Desired^M R = Recursion Available^M 14 ResponseCode ]^M 15 Question Type^M 16 Question Name^M
``` 20140816 16:08:57 588 PACKET 019B99F0 UDP Rcv 192.168.0.2 80fd Q [0001 D NOERROR] A (3)www(1)l(6)google(3)com(0) 20140816 16:08:57 588 PACKET 019CEFF0 UDP Snd 192.168.0.2 622d Q [0001 D NOERROR] A (3)www(1)l(6)google(3)com(0) 20140816 16:08:57 588 PACKET 01C61480 UDP Rcv 192.168.0.2 622d R Q [8081 DR NOERROR] A (3)www(1)l(6)google(3)com(0) 20140816 16:08:57 588 PACKET 01C61480 UDP Snd 192.168.0.2 80fd R Q [8081 DR NOERROR] A (3)www(1)l(6)google(3)com(0) 20140816 15:51:47 588 PACKET 02131B00 UDP Snd 192.168.0.2 1b77 Q [0001 D NOERROR] A (9)messaging(9)microsoft(3)com(0) 20140816 15:51:47 588 PACKET 0242BD70 UDP Rcv 192.168.0.2 1b77 R Q [8081 DR NOERROR] A (9)messaging(9)microsoft(3)com(0) 20140816 16:28:56 588 PACKET 02447E50 UDP Rcv 192.168.0.2 6a24 Q [0001 D NOERROR] A (10)akamaiedge(3)net(0) 20140816 16:28:56 588 PACKET 01E8B070 UDP Snd 192.168.0.2 f11d Q [0001 D NOERROR] A (10)akamaiedge(3)net(0) 20140816 16:28:56 588 PACKET 01BDA5A0 UDP Rcv 192.168.0.2 f11d R Q [8081 DR NOERROR] A (10)akamaiedge(3)net(0) 20140816 16:28:56 588 PACKET 01BDA5A0 UDP Snd 192.168.0.2 6a24 R Q [8081 DR NOERROR] A (10)akamaiedge(3)net(0)

Since there is a header, cut the 28 header lines.

$ sed '1,29d' log

Convert log from Windows to Unix format to handle pesky line returns:

$ awk '{ sub("\r$", ""); print }' log > log.wintounix

Get rid of blank lines:

$ sed '/^$/d' log.wintounix > log.nolines

Python code we are going to use to parse the file we have cleaned up.

import re
from collections import Counter
with open('log.nolines') as f: c = Counter('.'.join(re.findall(r'(\w+\(\d+\))',line.split()[-1])[-2:]) for line in f) for domain, count in c.most_common(): print domain,count

Sort the values returned from the Python script above, modify the key as needed.

$ sort -t" " -k3 -n -r parsed > parsed.sorted

That was a lot of work to parse a file. Let's make it a little easier. Run the following with an input file: parseMSDNS.py log

:::python
#!/usr/bin/env python
import re
import sys
import fileinput
import operator
import time
ret = {} filename = sys.argv[1]
myfile = open(filename,'r') start_time = time.time()
with myfile as theFile: for line in theFile: # normalize newlines #line = line.replace('\r\n', '\n').line.replace('\r', '\n') # match pattern returns true of false match = re.search(r'Q \[.+\].+\(\d+\)([^\(]+)\(\d+\)([^\(]+)',line.strip()) if match != None: # if a match, determine the value key = ' '.join(match.groups()) # calculate the number of key if key not in ret.keys(): ret[key] = 1 else: ret[key] += 1 for k in sorted(ret.keys(), key=lambda k:ret[k], reverse=True): print "{:15} - {}".format(k, ret[k]) print time.time() - start_time, "seconds"