Skip navigation

Category Archives: Programming

The DefCon 22 quals have come and gone, and I was a new participant this year.  Thankfully, I didn’t leave with a big fat goose egg on my forehead, but I didn’t do so great either.  Most of my teammates answered all the questions before I got anywhere near the 3rd or 4th step of the challenge, and most of the reversing proved to be beyond me.

I did solve one challenge all on my lonesome however.  I solved 3DTTT before any of my teammates had done it, which surprised me as I found it relatively simple (and just a matter of prioritization).

The challenge itself consisted of connecting to a server that wanted to play a game of 3D Tic-Tac-Toe.  There was about a 3-second time limit on moves, which made it obvious that the play had to be automated.  Furthermore, you had to win an unknown number of rounds, and if you ever had -10 wins it disconnected you.  So the idea was to 1) Play fast and 2) Play good.

Difficult on the surface, but much easier than you’d expect.  Classic Tic-Tac-Toe is what is called a “Solved Game”.   A “Solved Game” means that the players can force a pre-determined outcome by making certain moves, regardless of their opponent’s strategy.  In classic tic-tac-toe, specifically, it’s possible to force a draw in a EVERY GAME, regardless of which player starts – and if the second player doesn’t make the correct move every turn, it’s a guaranteed win for player 1.

3D tic-tac-toe is a little more difficult, but still within the realms of being solved.  Here is an example game board

Top Level
-------------
|   |   |   |
-------------
|   |   |   |
-------------
|   |   |   |
-------------

Middle Level
-------------
|   |   |   |
-------------
|   |   |   |
-------------
|   |   |   |
-------------

Bottom Level
-------------
|   |   |   |
-------------
|   |   |   |
-------------
|   |   |   |
-------------

Now, 3DTT is a little different.  It works on points, and each complete line is 1 point.  Whoever has the most lines when all boards are filled wins.

So who has the advantage, player 1 or player 2?  Well, what’s the most advantageous place on the board?

In normal Tic-Tac-Toe, the center square is the most valuable square, because it can be used to make the most lines (4, two X’s, and 2 +’s).  Every other square on the board can only be used to make TWO or THREE lines (Corners- Three, Middles- Two).

This holds for 3DTTT as well!

Corners on the top and bottom levels can be used to make 2 diagonals (1 on the same level, 1 across all 3 levels), 4 straights (2 on each axis, on the same board and across all boards), and 1 completely vertical across all boards.  7 total potential wins.  These are Very strong places to hit!

Centers on the Top and Bottom level can be used to complete: 1 vertical, 2 straight on the same level, and 2 straight across all levels.  5 Total potential wins!  These are moderate assets, but still quite strong. They actually enable you to set up a constant string of wins, regardless of where you move, so these are really important.

Corners on the center level can be used to make 1 vertical, 2 straights, and 1 diagonal.  4 total potential wins. WEAK place to hit!

Middles on the top and bottom can be used to make 1 vertical, 2 straights on the same level (think T), and 1 straight across all 3 boards. 4 Total potential wins. WEAK!

Middles in the center can only 1 vertical and 2 straight lines.  3 potential wins! WEAK WEAK!

Oh but that leaves one last place we haven’t discussed yet…

The center square on the center board… 1 vertical, 2 straight on-level, 2 diagonal on-level, 4 diagonal all-levels, 4 straight all-levels… This single square enables you to obtain a whopping THIRTEEN POTENTIAL WINS.

So player one has a SERIOUS advantage, because he can immediately REMOVE 13 wins from his opponent’s potential pot, while enabling himself to hit a win every other move (which we will cover shortly).

 

So cool… move 1: Center Board, Center Square.  Check.

move 2: Center square of either the top/bottom board, whichever has not been taken

move 3: (maybe) center square of top/bottom board, if it has not already been taken.

Now what?  Well… the 3d nature of the game enables player 1 to have constant check-mate situations.  In a single player game, this board is an instant win for the X’s:

 

Middle Level
-------------
| X |   | X |
-------------
|   | O |   |
-------------
| O |   | X |
-------------

No matter WHERE the O’s move next, it’s an instant lose, right?  Well, if you own the center square on the center board, and the center square of another board – on your next move, you’ll either you complete an on-level line, or a cross-level line.  This is very important early game.

Looking at our list of squares, we see our most powerful spaces are the corners, so we prioritize those for the next set of moves

move: corner a
move: corner opposite of a on same board OR corner on opposite layer (1->3 or 3->1) and opposite end.  INSTANT POINT!

We do this until the corners are gone, and move on to the middles!

You can also see that middles on the top and bottom are as powerful as corners in the center board, so we can simplify our algorithm a little by just playing every possible corner first.  Not completely optimal, but way easier to implement.

Basically, you repeat this exact same strategy for the middles, and when you’re finished, you’ve completed the entire game and start the next one.

 

At this point, we COULD add some logic to prioritize wins or blocking your opponent, but the fact is, you’ve already removed enough potential point’s from your opponent (the defcon computer) to secure victory about 70-75% of the time.  So at this point… just let the thing run and start working on the next challenge! Woo!

The following code managed to win 50 games in a matter of about 5 minutes.  Granted, i was running on a university server with a very strong business connection, so take that for what it is.

Also I got lazy with my “prioritized wins” algorithms, so the corners and middles don’t play EXACTLY as mentioned above, but close enough.  Still enough to get the flag very very quickly.

ticbrute

GO GO SIMPLE COMBINATIONS/PERMUTATIONS!

 

I explained here that the most optimal squares to pick were based on the number of potential wins they unlocked, and that’s all very much in the spirit of tic-tac-toe… but how does it apply to netsec/hacking/exploitation at all?  At first glance, this is just a silly game of competing algorithms.

Well, think about all the protocols we use on a day-to-day basis, and how many rules they run by. Race conditions are a good example. If you can get your computer to return spoofed DNS responses faster than 8.8.8.8, you’re playing better and faster than your opponent.

But do you always have to play perfect? No. Personally I would have liked to see the defcon computer detect perfect players and refuse to play against them, forcing them to slow down/run imperfect games in order to bypass the security system.

If you’re loud about your attack, you get caught. If you make yourself look normal, you blend in.

Just some food for thought.

I’ve been having some fun over the past weekend getting my chops back up to speed with some wargames at overthewire.org.  So far i’m up to Vortex 9, and it’s been a bunch of fun :D.

Instead of going back and documenting 0-7 (which you can look up elsewhere), i’m going to start documenting from 8+ (which is nowhere on the web).  Here is the work i’ve done :]

First things first we need hacking music.  I recommend this series I wrote a while ago:

 

Ok, this is a reverse engineering challenge, so lets download a copy of vortex8.bin and boot up IDA to take a look at what’s going on.

We find the following execution

1. Start a new thread
2. Drop permissions from Vortex9 to Vortex8.

messthread

ccode

The only thing this new thread does is a really silly while(1) loop.

while1

while2

 

So this thread is just gonna print “0” to the screen every second.   I think I see where this is going.  Lets keep the fact that we’re calling “sleep” in mind.

The next thing the main thread does is make another call, which ends up providing us an opportunity to overflow a buffer..

callunsafe

strcpy

strcopy2

 

So overrun this 1032 byte buffer and we’ll hit the stored stack pointer & return address.  Easy enough… but we’ve got an issue

We can’t just straight execute from the stack, because we’ll be executing at “Level 8” permissions.  Instead, we need to get the thread running at “Level 9” permissions to run code that we put in the buffer.

So the other function makes a couple function calls, including printf and sleep.   Why don’t we try to overwrite the sleep function offset in the Global Offset Table, and get that other thread to start executing from this one’s stack (neat!).  We can do this because multiple threads in a single process share linear memory space.  If we had opened a separate process, we would have needed to find a way to open a remote thread/memory and write to it.  Good thing we don’t have to worry about that.

So first thing we should do is use gdb to get the GOT entry for sleep

sleepoffset

 

We’re going to overwrite 0x0804a008 to point to our shellcode on the stack.  To save time and space, my shellcode resides at 0xFFFFD26E.  You can use gdb to print out the value of ESP when you’re inside the unsecurecode function.  From there, just use the size of your shellcode to calculate exactly where it’s going to start (and how many nops you’ll need).

Lets write the assembly that’s going to replace this GOT entry and assemble it.  And then this quick little python function to display it on the screen for me to copy/paste (I use this for any thing i have to assemble & print, it’s a handy little tool imo)

sleepreplace

displaybin

displaybin2

 

cool.  We’ve got everything we need to make our final script

shell

 

I glossed over a few things.  So lets touch on that:

1. NO \x00 ALLOWED!  Your entire payload CANNOT have a \x00 in it (until the end).  strcopy() terminates when it hits a null byte, so you must ensure anything you assemble does NOT have a null byte.

2. You need to find your stack pointer and base pointer.  If you don’t realign the stack basepointer, when you RET onto the stack you’re gonna throw segfaults.  Make sure you overrun the stored basepointer with some place in memory that’s writable.

3. We’re going to execute the code to overwrite the GOT entry for sleep with the pointer for our shellcode on the stack, and then we’re going to \xEB\xFE in order to make that thread spin.   Remember, if we let the main thread die, then the program will terminate before we get to do anything with our shell in the other thread!

4. Now that we’ve overwritten the entry and started spinning, the other thread in the while loop should take care of the rest.  The next time the while loop comes around to call the “sleep” function, we should get execution of our shellcode.

lets try it out

winnerwinner

 

NEAT!

 

 

So I’ve spent about 8 hours or so whipping this bad boy up.  It’s actually fairly functional, but it only parses through the PCAP header and PACKET header at this point (as in the PCAP packet header, not the IP packet header)

Link to files (comes with test script and pcap): pcap_parser_0.01b.zip

As I mentioned in my last programming post, I’m really fed up with how inflexible the pure-python pcap libraries are when it comes to deciding exactly what part of a packet gets parsed.  So…

 

#! /usr/local/bin/python3
import pcap
p = pcap.Pcap_file("test.pcap")
pkt = p.next_packet()
while pkt:
    print(pkt)
    pkt = p.next_packet()

 

This bad boy does alot. With the default settings, we get…

 

PACKET( ts_sec=1374367283, ts_usec=, incl_len=66, orig_len=, endian=<,)
PACKET( ts_sec=1374367283, ts_usec=, incl_len=125, orig_len=, endian=<,)
PACKET( ts_sec=1374367283, ts_usec=, incl_len=66, orig_len=, endian=<,)
PACKET( ts_sec=1374367283, ts_usec=, incl_len=491, orig_len=, endian=<,)
PACKET( ts_sec=1374367283, ts_usec=, incl_len=66, orig_len=, endian=<,)
PACKET( ts_sec=1374367283, ts_usec=, incl_len=342, orig_len=, endian=<,)

 

but if we add a few lines…

 

#! /usr/local/bin/python3
import pcap
import packet

pConfig = packet.PARSE_CONFIG(ts_sec=True, 
				ts_usec=True, 
				incl_len=True,
				orig_len=True)
upConfig = packet.UNPACK_CONFIG(ts_sec=True, 
				ts_usec=True, 
				incl_len=True,
				orig_len=True)

p = pcap.Pcap_file("test.pcap")
pkt = p.next_pack(pConfig=pConfig, upConfig=upConfig)
while pkt:
    print(pkt)
    pkt = p.next_pack(pConfig=pConfig, upConfig=upConfig)

 

we get…

PACKET( ts_sec=1374367283, ts_usec=850337, incl_len=125, orig_len=125, endian=<,)
PACKET( ts_sec=1374367283, ts_usec=850478, incl_len=66, orig_len=66, endian=<,)
PACKET( ts_sec=1374367283, ts_usec=850810, incl_len=491, orig_len=491, endian=<,)
PACKET( ts_sec=1374367283, ts_usec=850857, incl_len=66, orig_len=66, endian=<,)
PACKET( ts_sec=1374367283, ts_usec=960149, incl_len=342, orig_len=342, endian=<,)

 

That doesn’t look like a huge difference, but the magic is really going on behind the scenes.  The difference between this and something like Scappy and DPKT is that if you don’t set values to True in the config classes, it doesn’t even read those bytes. It just moves the fuck on.

If you set them to parse but not to unpack, then they’ll stay as binary. You don’t always need to unpack, so it’s a waste of resources.

I’ll probably switch from a config class to some kind of Bitwise operations to handle configurations. So you’ll pass configuration parameters like…

pktCfg = TS_SEC | INCL_LEN | TS_SEC_UPK | INCL_LEN_UPK # 1 | 4 | 32 | 128
pkt = p.next_packet(cfg=pktCfg)

Each parser would need a default parse value, which would likely be a minimalistic approach at reaching the next header (for example, IP header would only unpack header length and next protocol value by default). Furthermore this lets a developer have control of what is parsed with flexibility, as he can simply OR/AND/XOR his default config with a new value on the fly prior to calling the parsing function.

For now…

its-happening-ron-paul-gif

This is something I’ve been wondering about for a long while.

Why don’t any Pure-Python Pcap parsing APIs have the functionality I want?

I’ve been involved in building tools that parse PCAP looking for “bad things” for about 6 years, and something I’ve noticed regarding every available packet parsing library (Scappy, Dpkt, pcaputils, blah blah blah) is that they all have certain problems in common:

  • Inflexibility:

    • I don’t get to decide what parts of a header gets parsed, it just farts it all out, sometimes even unpacked.
    • Sometimes, when you go to print the data, it just decides to unpack that shit for you, but really it’s still binary when calling the actual primitives.  What kind of shit is that?!

 

  • Speed & Scalability:

    • This is actually symptom of inflexibility.
    • If you read every single byte in a header, and unpack every single thing you read, you’re gonna have a bad slow time.

 

  • Endianness Failures:

    • This is primarily an issue I have with DPKT and PCAP creation tools that don’t use standard network byte order.
    • In the case where PCAPs are stored in little endian, DPKT shits a brick when handling certain headers.  Furthermore, attempting to swap the default endianness can even worse brick shitting.
    • Endianness flexibility may sound pointless, but tell me that when you start integrating into mixed-endian environments.You will curse the existence of little endian PCAP.

 

  • Filter Support

    • Why aren’t there built in filtering functions that can be applied in such a way that I can auto-ignore certain IP addresses or IP ranges?  This supports speed and flexibility and scalability and… omg just why.  WHY.
    • To those making the argument “well, that’s parse of the thing you need to program into your product”:  These are extremely common features in any parsing program, seems like a good reason to  add it to your API.

 

  • Stream/Flow Carving

    • When it comes to security, more often than not, i’m going to care more about the content of HTTP or DNS than I am going to care about Layers 2-4.
    • I definitely care about what protocols are going over the wire, but as adversaries become more complex (and as for-sale toolkits become more sophisticated) there are simply going to be less indicators in the raw flow data.
    • So, why can’t I combine an individual session, give it a unique ID, and save that ID and flow data for later use?  There are tools that do recombine sessions, which you can then pass upper-layer data over to something like DPKTs HTTP parsers… but maybe i don’t want to do that right this second, maybe i want to move on to the next packet and not hold up everything so i can decide whether or not that HTTP session is interesting.
    • Maybe I want to put that off until later, which brings us to…

 

  • File Carving/Session Recombination

    • I have 2 PCAP files, they are 2GB each (a manageable filesize on x86).  They contain packet data from one consecutive capture event.  This means there are sessions that will start in File A that END in File B.
    • If i have pre-parsed these PCAPs and have the header data for each packet/flow, and I save the offsets for each packet, I should have an easy way to re-access those sessions later without having to keep the packet data in memory (or worse, re-parse the entire PCAP).
    • No one has built a readily available carving function that allows us zip through a PCAP file efficiently and get 1 session based on packet offsets.  This is easily the most important function of multi-tiered packet inspection on scaled systems.  Parse OSI Layers 2-4, save session identifiers, and move the fuck on.  Do deep-dive in a separate process that targets packets intelligently.
    • The worst part about this is that it takes like 10 damn lines of code to write the carving function, it’s just a matter of aggregating the offsets.

 

I actually know why these things haven’t been addressed, and it’s commendable to be honest.  The available libraries do their best to make the functionality they provide as easy to use as possible while also trying to minimize the amount of knowledge the developer needs to create something useable.

With the exception of scalability/speed, I know my issues are pretty specific.  They really only apply to someone who wants to write a packet inspection program that will scale well while still enabling deep inspection of sessions, and enabling end users (analysts) to pull back intelligently targeted sessions with a meta-data search application.

I mean… that’s pretty niche…. i think…

You know what, I’ll make my own parsing library!  With beer! And Hookers!

End. Rant.

I’ll update again when I can at least go from packet to packet and record offsets.  I have to write some basic iptools for packing/unpacking and the packet header parser.  After that I can start plagiarizing other libraries for good ethernet/ipv4/tcp/udp parsing.