This is something I’ve been wondering about for a long while.
Why don’t any Pure-Python Pcap parsing APIs have the functionality I want?
I’ve been involved in building tools that parse PCAP looking for “bad things” for about 6 years, and something I’ve noticed regarding every available packet parsing library (Scappy, Dpkt, pcaputils, blah blah blah) is that they all have certain problems in common:
-
Inflexibility:
- I don’t get to decide what parts of a header gets parsed, it just farts it all out, sometimes even unpacked.
- Sometimes, when you go to print the data, it just decides to unpack that shit for you, but really it’s still binary when calling the actual primitives. What kind of shit is that?!
-
Speed & Scalability:
- This is actually symptom of inflexibility.
- If you read every single byte in a header, and unpack every single thing you read, you’re gonna have a
badslow time.
-
Endianness Failures:
- This is primarily an issue I have with DPKT and PCAP creation tools that don’t use standard network byte order.
- In the case where PCAPs are stored in little endian, DPKT shits a brick when handling certain headers. Furthermore, attempting to swap the default endianness can even worse brick shitting.
- Endianness flexibility may sound pointless, but tell me that when you start integrating into mixed-endian environments.You will curse the existence of little endian PCAP.
-
Filter Support
- Why aren’t there built in filtering functions that can be applied in such a way that I can auto-ignore certain IP addresses or IP ranges? This supports speed and flexibility and scalability and… omg just why. WHY.
- To those making the argument “well, that’s parse of the thing you need to program into your product”: These are extremely common features in any parsing program, seems like a good reason to add it to your API.
-
Stream/Flow Carving
- When it comes to security, more often than not, i’m going to care more about the content of HTTP or DNS than I am going to care about Layers 2-4.
- I definitely care about what protocols are going over the wire, but as adversaries become more complex (and as for-sale toolkits become more sophisticated) there are simply going to be less indicators in the raw flow data.
- So, why can’t I combine an individual session, give it a unique ID, and save that ID and flow data for later use? There are tools that do recombine sessions, which you can then pass upper-layer data over to something like DPKTs HTTP parsers… but maybe i don’t want to do that right this second, maybe i want to move on to the next packet and not hold up everything so i can decide whether or not that HTTP session is interesting.
- Maybe I want to put that off until later, which brings us to…
-
File Carving/Session Recombination
- I have 2 PCAP files, they are 2GB each (a manageable filesize on x86). They contain packet data from one consecutive capture event. This means there are sessions that will start in File A that END in File B.
- If i have pre-parsed these PCAPs and have the header data for each packet/flow, and I save the offsets for each packet, I should have an easy way to re-access those sessions later without having to keep the packet data in memory (or worse, re-parse the entire PCAP).
- No one has built a readily available carving function that allows us zip through a PCAP file efficiently and get 1 session based on packet offsets. This is easily the most important function of multi-tiered packet inspection on scaled systems. Parse OSI Layers 2-4, save session identifiers, and move the fuck on. Do deep-dive in a separate process that targets packets intelligently.
- The worst part about this is that it takes like 10 damn lines of code to write the carving function, it’s just a matter of aggregating the offsets.
I actually know why these things haven’t been addressed, and it’s commendable to be honest. The available libraries do their best to make the functionality they provide as easy to use as possible while also trying to minimize the amount of knowledge the developer needs to create something useable.
With the exception of scalability/speed, I know my issues are pretty specific. They really only apply to someone who wants to write a packet inspection program that will scale well while still enabling deep inspection of sessions, and enabling end users (analysts) to pull back intelligently targeted sessions with a meta-data search application.
I mean… that’s pretty niche…. i think…
You know what, I’ll make my own parsing library! With beer! And Hookers!
End. Rant.
I’ll update again when I can at least go from packet to packet and record offsets. I have to write some basic iptools for packing/unpacking and the packet header parser. After that I can start plagiarizing other libraries for good ethernet/ipv4/tcp/udp parsing.
2 Comments
Hi may be this project helps you https://bitbucket.org/camp0/aiengine
Regards,
Luis
NEAT! THANKS!