Skip navigation

Tag Archives: python

So I’ve spent about 8 hours or so whipping this bad boy up.  It’s actually fairly functional, but it only parses through the PCAP header and PACKET header at this point (as in the PCAP packet header, not the IP packet header)

Link to files (comes with test script and pcap): pcap_parser_0.01b.zip

As I mentioned in my last programming post, I’m really fed up with how inflexible the pure-python pcap libraries are when it comes to deciding exactly what part of a packet gets parsed.  So…

 

#! /usr/local/bin/python3
import pcap
p = pcap.Pcap_file("test.pcap")
pkt = p.next_packet()
while pkt:
    print(pkt)
    pkt = p.next_packet()

 

This bad boy does alot. With the default settings, we get…

 

PACKET( ts_sec=1374367283, ts_usec=, incl_len=66, orig_len=, endian=<,)
PACKET( ts_sec=1374367283, ts_usec=, incl_len=125, orig_len=, endian=<,)
PACKET( ts_sec=1374367283, ts_usec=, incl_len=66, orig_len=, endian=<,)
PACKET( ts_sec=1374367283, ts_usec=, incl_len=491, orig_len=, endian=<,)
PACKET( ts_sec=1374367283, ts_usec=, incl_len=66, orig_len=, endian=<,)
PACKET( ts_sec=1374367283, ts_usec=, incl_len=342, orig_len=, endian=<,)

 

but if we add a few lines…

 

#! /usr/local/bin/python3
import pcap
import packet

pConfig = packet.PARSE_CONFIG(ts_sec=True, 
				ts_usec=True, 
				incl_len=True,
				orig_len=True)
upConfig = packet.UNPACK_CONFIG(ts_sec=True, 
				ts_usec=True, 
				incl_len=True,
				orig_len=True)

p = pcap.Pcap_file("test.pcap")
pkt = p.next_pack(pConfig=pConfig, upConfig=upConfig)
while pkt:
    print(pkt)
    pkt = p.next_pack(pConfig=pConfig, upConfig=upConfig)

 

we get…

PACKET( ts_sec=1374367283, ts_usec=850337, incl_len=125, orig_len=125, endian=<,)
PACKET( ts_sec=1374367283, ts_usec=850478, incl_len=66, orig_len=66, endian=<,)
PACKET( ts_sec=1374367283, ts_usec=850810, incl_len=491, orig_len=491, endian=<,)
PACKET( ts_sec=1374367283, ts_usec=850857, incl_len=66, orig_len=66, endian=<,)
PACKET( ts_sec=1374367283, ts_usec=960149, incl_len=342, orig_len=342, endian=<,)

 

That doesn’t look like a huge difference, but the magic is really going on behind the scenes.  The difference between this and something like Scappy and DPKT is that if you don’t set values to True in the config classes, it doesn’t even read those bytes. It just moves the fuck on.

If you set them to parse but not to unpack, then they’ll stay as binary. You don’t always need to unpack, so it’s a waste of resources.

I’ll probably switch from a config class to some kind of Bitwise operations to handle configurations. So you’ll pass configuration parameters like…

pktCfg = TS_SEC | INCL_LEN | TS_SEC_UPK | INCL_LEN_UPK # 1 | 4 | 32 | 128
pkt = p.next_packet(cfg=pktCfg)

Each parser would need a default parse value, which would likely be a minimalistic approach at reaching the next header (for example, IP header would only unpack header length and next protocol value by default). Furthermore this lets a developer have control of what is parsed with flexibility, as he can simply OR/AND/XOR his default config with a new value on the fly prior to calling the parsing function.

For now…

its-happening-ron-paul-gif

This is something I’ve been wondering about for a long while.

Why don’t any Pure-Python Pcap parsing APIs have the functionality I want?

I’ve been involved in building tools that parse PCAP looking for “bad things” for about 6 years, and something I’ve noticed regarding every available packet parsing library (Scappy, Dpkt, pcaputils, blah blah blah) is that they all have certain problems in common:

  • Inflexibility:

    • I don’t get to decide what parts of a header gets parsed, it just farts it all out, sometimes even unpacked.
    • Sometimes, when you go to print the data, it just decides to unpack that shit for you, but really it’s still binary when calling the actual primitives.  What kind of shit is that?!

 

  • Speed & Scalability:

    • This is actually symptom of inflexibility.
    • If you read every single byte in a header, and unpack every single thing you read, you’re gonna have a bad slow time.

 

  • Endianness Failures:

    • This is primarily an issue I have with DPKT and PCAP creation tools that don’t use standard network byte order.
    • In the case where PCAPs are stored in little endian, DPKT shits a brick when handling certain headers.  Furthermore, attempting to swap the default endianness can even worse brick shitting.
    • Endianness flexibility may sound pointless, but tell me that when you start integrating into mixed-endian environments.You will curse the existence of little endian PCAP.

 

  • Filter Support

    • Why aren’t there built in filtering functions that can be applied in such a way that I can auto-ignore certain IP addresses or IP ranges?  This supports speed and flexibility and scalability and… omg just why.  WHY.
    • To those making the argument “well, that’s parse of the thing you need to program into your product”:  These are extremely common features in any parsing program, seems like a good reason to  add it to your API.

 

  • Stream/Flow Carving

    • When it comes to security, more often than not, i’m going to care more about the content of HTTP or DNS than I am going to care about Layers 2-4.
    • I definitely care about what protocols are going over the wire, but as adversaries become more complex (and as for-sale toolkits become more sophisticated) there are simply going to be less indicators in the raw flow data.
    • So, why can’t I combine an individual session, give it a unique ID, and save that ID and flow data for later use?  There are tools that do recombine sessions, which you can then pass upper-layer data over to something like DPKTs HTTP parsers… but maybe i don’t want to do that right this second, maybe i want to move on to the next packet and not hold up everything so i can decide whether or not that HTTP session is interesting.
    • Maybe I want to put that off until later, which brings us to…

 

  • File Carving/Session Recombination

    • I have 2 PCAP files, they are 2GB each (a manageable filesize on x86).  They contain packet data from one consecutive capture event.  This means there are sessions that will start in File A that END in File B.
    • If i have pre-parsed these PCAPs and have the header data for each packet/flow, and I save the offsets for each packet, I should have an easy way to re-access those sessions later without having to keep the packet data in memory (or worse, re-parse the entire PCAP).
    • No one has built a readily available carving function that allows us zip through a PCAP file efficiently and get 1 session based on packet offsets.  This is easily the most important function of multi-tiered packet inspection on scaled systems.  Parse OSI Layers 2-4, save session identifiers, and move the fuck on.  Do deep-dive in a separate process that targets packets intelligently.
    • The worst part about this is that it takes like 10 damn lines of code to write the carving function, it’s just a matter of aggregating the offsets.

 

I actually know why these things haven’t been addressed, and it’s commendable to be honest.  The available libraries do their best to make the functionality they provide as easy to use as possible while also trying to minimize the amount of knowledge the developer needs to create something useable.

With the exception of scalability/speed, I know my issues are pretty specific.  They really only apply to someone who wants to write a packet inspection program that will scale well while still enabling deep inspection of sessions, and enabling end users (analysts) to pull back intelligently targeted sessions with a meta-data search application.

I mean… that’s pretty niche…. i think…

You know what, I’ll make my own parsing library!  With beer! And Hookers!

End. Rant.

I’ll update again when I can at least go from packet to packet and record offsets.  I have to write some basic iptools for packing/unpacking and the packet header parser.  After that I can start plagiarizing other libraries for good ethernet/ipv4/tcp/udp parsing.