OnionRunner, ElasticSearch & Maltego

Last week Justin Seitz over at automatingosint.com released OnionRunner which is basically a python wrapper (because Python is awesome) for the OnionScan tool (https://github.com/s-rah/onionscan).

At the bottom of Justin’s blog post he wrote this:

For bonus points you can also push those JSON files into Elasticsearch (or modify onionrunner.py to do so on the fly) and analyze the results using Kibana!

Always being up for a challenge I’ve done just that. The onionrunner.py script outputs each scan result as a json file, you have two options for loading this into ElasticSearch. You can either load your results after you’ve run a scan or you can load them into ElasticSearch as a scan runs. Now this might sound scary but it’s not, lets tackle each option separately.

On the fly:

To send the results to ElasticSearch (and to a file), you just need to make some small changes to the onionrunner.py script (10 lines of code). Firstly we need to import a couple of extra python libraries. At the top of onionrunner.py add the following lines of code.

from elasticsearch import Elasticsearch
import datetime

If you haven’t used the elasticsearch python library before you will need to install it which you can easily do using:

pip install elasticsearch
sudo pip install elasticsearch

The elasticsearch library is essentially what we are going to use to load the results into ElasticSearch, as the results as nicely formatted as json, ElasticSearch happily accepts the data without any formatting changes. The datetime library is used to create the correct timestamp format for ElasticSearch so we can track when the data was imported (important if you are running multiple scans).

The second (and equally easy part) is to create a python function to send the results where they need to go. Create this function towards the bottom of the script (I added it after the add_new_onions function):


def send_to_elastic(data):
es = Elasticsearch()
data['timestamp'] = datetime.datetime.utcnow().strftime('%Y-%m-%dT%H:%M:%S.%fZ')
es.index(index='osint', doc_type='onion', body=data)

You will need to make some slight tweaks to the code to match your ElasticSearch environment. The first part of the function is where you define your ElasticSearch instance to connect to, the default is localhost:9200. If your server is called ‘bob’ then you need to change the line of code to:

es = ElasticSearch('bob')

The next line of code adds the timestamp we mentioned earlier, you don’t need to change this but if you use Kibana to visualise the data you will need to add this as the field to index. The next step is to change the index and doc_type variables to meet your requirements. So for example if you wanted to store the results in your ‘research’ index, with a doc_type of ‘darkweb’ you would simply change the code to this:

es.index(index='research', doc_type='darkweb', body=data)

The final step is to add the line of code into the script to send the data once it’s been collected. To do this you need to find the def process_results function, and then after this segment of code:

# look for additional .onion domains to add to our scan list
scan_result = ur"%s" % json_response.decode("utf8")
scan_result = json.loads(scan_result)

You just need to add this line of code:

# look for additional .onion domains to add to our scan list
scan_result = ur"%s" % json_response.decode("utf8")
scan_result = json.loads(scan_result)

And that’s it, job done. Now when you run the script the results will get saved to a json file and also added into ElasticSearch (fingers crossed).

From results files:

Adding existing scan results into ElasticSearch is also just as easy, well in fact it’s easier as I’ve written the code for you. Head over to https://github.com/catalyst256/MyJunk/blob/master/loadresults.py and you will find a pre-made script all ready for you to use. Well actually you will need to change the ElasticSearch parameters at the top of the script but otherwise it’s good to go.

To run the script all you need to do is specify the results folder as a variable when you run the script. So for example if you take the default output location for the onionrunner.py script you would just run the following.

./loadresults.py onionscan_results/ (NOTE: you need to have the trailing ‘/’ in there)

This will then load each file and send it to ElasticSearch, now how simple was that.

Maltego all the things:

I’ve been messing around with ElasticSearch a lot lately for various things and one thing I really like about it is how easy it is to plug Maltego (via transforms) into it and get some lovely visualisations out. So with that in mind I spent a bit of time writing some transforms for the data from OnionRunner.

The awesome thing (well one of them) about ElasticSearch is that it is free text searching, so essentially type some words in and it will return any record that matches, or you can be more precise and specify certain key: value searches to run. If we look at the data collected by OnionScan/OnionRunner we can start to work out what we want to search for. For example if you want to find all servers that run Apache you can search one of two ways.

1. “Apache”
2. serverVersion: Apache

Both will give you the results you need, the second option will only return matches to that specific “key” whereas the first one will find any references to apache in any field.

Using this search method it’s easy to create some Maltego transforms. The flow of the ones I (quickly) created work like this.

1. Create a phrase with your search query
2. Transform returns hiddenServer addresses
3. Search for open ports
4. Search for SSH keys

These are just some quick examples, and here are some screenshots.

Screen Shot 2016-08-01 at 09.01.01

Screen Shot 2016-08-01 at 09.03.34

Screen Shot 2016-08-01 at 09.07.57

The Maltego transforms aren’t quite ready for release as I need to tweak them and make them production ready but once they are I will release them. They are all local transforms so you will need to install them into Maltego yourself (I will provide instructions) but it’s a painless process.

Massive thanks to Justin for creating OnionRunner and if you want to learn Python I’ve heard great things about the Python courses he provides.

PS Sorry about the rubbish code blocks, WordPress hosted doesn’t like Python…😦

Scapy: Sessions (or Streams)

I’m in process of rewriting sniffMyPackets version 2 (and yes I’m actually doing it this time), part of work involves doing a complete rewrite of the Scapy based code that I used in the original version. This isn’t to say anything is wrong with the original version but my coding “abilities” has changed so I wanted to make it more streamline and hopefully more modular.

In the original sniffMyPackets I used tshark to extract TCP and UDP streams, which wasn’t the ideal solution however it worked so.. This time around I’m hoping to cut out using tshark completely so I’ve looking at what else Scapy can do. There is a “hidden” (well I didn’t know it was there) function called Sessions, this takes a pcap file and breaks it down into separate sessions (or streams). The plus side of this is that it’s actually really quick and doesn’t limit itself to just TCP & UDP based protocols.

I’m still playing with the function at the moment but I’m hoping that I can use this for sniffMyPackets, so I thought I would share my findings with you.

First off lets load a pcap file into Scapy.

>>> a = rdpcap('blogexample.pcap')

This is just a sample pcap file with a ICMP ping, DNS lookup and SSL connection to google.co.uk.

If you want to check the pcap has loaded correctly you can now just run.

>>> a.summary()

Cool so lets load the packets into a new variable to hold each the sessions.

>>> s = a.sessions()

This loads the packets from the pcap file into the new variable as a dictionary object, which I confirmed by during.

>>> print type(s)
<type 'dict'>

Now lets get a look at what we have got to work with now.

>>> s
{'UDP >': ...SNIP...

That’s just the first session object from the pcap and it contains two parts (as required for a dictionary object). The first is a summary of the traffic, it contains the protocol, source IP address and port (split by a colon) and then the destination IP address and port (again split by a colon). The second part of the dictionary object is the actual packets that make up the session.

Lets have a look at how to make use of the first part of the dictionary object (useful if you want a summary of the sessions a pcap file).

>>> for k, v in s.iteritems():
>>> print k

This is a nice and easy iteration of the dictionary object to get all the key within it (the summary details). You can of course slice and dice that as you see fit to make it more presentable. The other part of the sessions dictionary is the actual packets, using a similar method we can iterate through the sessions dictionary and pull out all the raw packets.

>>> for k, v in s.iteritems():
>>> print v

This will output a whole dump of all the packets for all of the sessions which isn’t that great. Lets see if we can make it a bit more friendly on the eye.

>>> for k, v in s.iteritems():
>>> print k
>>> print v.summary()
Ether / IP / TCP > SA
Ether / IP / TCP > A
Ether / IP / TCP > A / Raw
Ether / IP / TCP > A / Raw
Ether / IP / TCP > PA / Raw
Ether / IP / TCP > PA / Raw
Ether / IP / TCP > PA / Raw
Ether / IP / TCP > A
Ether / IP / TCP > PA / Raw
Ether / IP / TCP > A
Ether / IP / TCP > PA / Raw
Ether / IP / TCP > PA / Raw
Ether / IP / TCP > A
Ether / IP / TCP > PA / Raw
Ether / IP / TCP > PA / Raw

Now isn’t that much better, as you can see from the snippet above the session it’s captured starts at the SYN/ACK flag (probably because I started my capture after the SYN was sent) so in reality (and with a bit more testing) it looks like the Scapy Sessions function does allow for extracting TCP streams (well all streams really).

Lets have a go at something else.

>>> for k, v in s.iteritems():
>>> print v.time
AttributeError: 'list' object has no attribute 'time'

Oh well I was hoping that I could get the time stamp for each packet but it seems that you need to do some extra work first.

>>> for k, v in s.iteritems():
>>> for p in v:
>>> print p.time, k
1417505523.55 UDP >
1417505522.45 UDP >

Oh that works much better, because the packets in the second half of the dictionary are a list object we need to iterate over them to get access to any of the values within them (such as the packet time).

So there you go, I’m off to see what else I can do with this..


Scapy: Heartbleed

So I might be a bit late to the game but I post this code on Twitter a while back but then forgot to blog about so here you go…

I’ve written a little snippet of Python code that uses Scapy to search through a pcap file looking for Heartbleed requests and responses. Due to Scapy not having a layer for TLS connections this have been done by slicing and dicing the RAW layer and pulling out the information we need. A lot of the time that I’m writing Scapy code to analyse pcap files I use Wireshark to look at the packets and match that up in Scapy.

This is the output from Wireshark;

Screen Shot 2014-07-25 at 07.24.03

And this is what Scapy sees;

Raw load='\x18\x03\x02\x00\x03\x01@\x00'

Currently the script just looks for traffic on port 443, but I will tweak it over the next week or two to handle any port.

I’ve removed the port restrictions so it should be protocol agnostic.. (hopefully)..

You can find the code HERE:

To run it, just type

./pcap-heartbleed.py [pcapfile]

So for example:

./pcap-heartbleed.py heartbleed.pcap

If it finds anything it thinks is a Heartbleed packet you get an output similar to the one below (I’ve changed the IP addresses):

Heartbleed Request: src: dst:
Heartbleed Response: src: dst:
Heartbleed Request: src: dst:
Heartbleed Response: src: dst:
Heartbleed Request: src: dst:
Heartbleed Response: src: dst:


Gobbler: Eating its way through your pcap files…

So a while back I blogged about the future of sniffMyPackets and how I was looking at building components for it that would make better use of existing systems you may be using (not over tooling things). Gobbler is the first step of that journey and is now available for your amusement..

Gobbler can be found HERE:

Essentially you feed Gobbler a pcap file and tell it how you want the data outputted. The current supported outputs are:

1. Splunk (via TCP)
2. Splunk (via UDP)
3. JSON (just prints to screen at the moment).

The tool is based on Scapy (as always) and once the initial file read is done it’s actually quite quick (well I think it is).

Moving forward I am going to add dumping into databases (mongodb to start with) and Elastic search. On top of this I’m going to start writing protocol parsers for Scapy that aren’t “available”. I’ve found an HTTP one written by @steevebarbeau that is included in Gobbler so you can now get better Scapy/HTTP dumps.

To use with Splunk is easy, first in your Splunk web console create a new TCP or UDP listener (see screenshots below).



NOTE: UDP Listeners are better for any pcap with over 10,000 packets. I’ve had issues with TCP listeners past that point.

Once your Splunk Listener is ready, just “tweak” the gobbler.conf file to meet your needs:

server = ‘localhost’
port = 10000
protocol = ‘udp’

If you created a TCP listener, change the protocol to ‘tcp’ and gobbler will work out the rest for you.

To run gobbler is easy (as it should be), from the gobbler directory run this command:

./gobbler.py -p [pcapfile] -u [upload type]

I’ve included a 1 packet pcap in the repo so you can straight away test using:

./gobbler.py -p pcaps/test.pcap -u splunk

or if you want to see the results straight away try the json option:

./gobbler.py - pcaps/test.pcap -u json

If you load the packets into Splunk, then they will look something like this:


This means if you were to search Splunk for any packet from a Source IP (of your choice) you can just use (test_index is where I am storing them):

index="test_index" "ip_src="

This means you can then use the built-in GeoIP lookup to find out the City etc using this Splunk Search:

index="test_index" | iplocation ip_src

Which would give you back the additional fields so you can do stuff like this:


The benefit of using Gobbler is that I’ve done all the hard work on the formatting, I was going to write a Splunk app but the people at Splunk have been ignoring me so…

I’ve successfully managed to import about 40,000 packets via a UDP listening in just a few minutes, it’s not lightning quick but it’s a start.


PS. A “rebranding” of my core projects will be underway soon so stay tuned for more updates..🙂

Scapy – Iterating over DNS Responses

So while doing my Scapy Workshop at BSides London the other week, I stated that iterating over DNS response records with Scapy is a bit of a ball ache. Well I will be honest, I was kind of wrong. It’s not that difficult it’s just not that pretty.

This is an example of a DNS response (Answer) packet when running a dig against http://www.google.com

Screen Shot 2014-05-12 at 08.14.07

You will see that there are 5 DNSRR layers in the packet, now when you ask Scapy to return the rdata for those layers you will only get the first one (in the Scapy code below pkts[1] refers to the second packet in the pcap which is the response packet).


In order to get the rest, you need to iterate over the additional layers.


In the example above the [5] and [6] are the layer “numbers” so to make that a bit easier to understand.

pkts[1][0] = Ether
pkts[1][1] = IP
pkts[1][2] = UDP
pkts[1][3] = DNS
pkts[1][4] = DNSQR
pkts[1][5-9] = DNSRR

So the code below is a quick way to iterate over the DNS responses by using the ancount field to determine the number of responses and then working backwards through the layers to show all the values.

#!/usr/bin/env python

from scapy.all import *

pcap = 'dns.pcap'
pkts = rdpcap(pcap)

for p in pkts:
if p.haslayer(DNSRR):
a_count = p[DNS].ancount
i = a_count + 4
while i > 4:
print p[0][i].rdata, p[0][i].rrname
i -= 1

So there you go, quick and dirty Scapy/Python code.


BSides London 2014 – Scapy Workshop

So this week (Tuesday) was the 4th annual BSides London event held at the Kensington and Chelsea Town Hall (same venue as last year). For the last 3 years I’ve attended the event as not only a participant but also as a crew member, helping make the event awesome (which it is every year) and making it a tradition not to see ANY of the talks.. For me BSides is more about taking part and just meeting loads of cool people rather than going to all the talks and snagging all the free stuff, well ok apart from the MWR t-shirts but they are awesome.

This year however was slightly different, a new twist on an already awesome day. Leading up to the event I was busy helping (I use the word loosely) keep the website up-to-date (oh how I hate HTML) and generally just counting down the days.

This year I had planned on attending one workshop on a subject very close to my heart.. Scapy, which was due to be run by Matt Erasmus (@undeadsecurity). However the Thursday before BSides Matt had to pull out which left a 2 hour slot in the BSides schedule free (Can you guess where this is going??).

Friday morning I get an email from Iggy (@GeekChickUK) asking if I fancied running the workshop instead. No my initial panic fuelled response was going to be “God no” but then I thought “Why not, what’s the worse that can happen).

The next 3 days were rammed with me writing a new workshop for a group (well I was hoping at least 1) of people that would most likely either professional infosec ninja’s (they are all ninja’s right??) or at least be able to point out my mistakes when I made them.

On the day I think about 18 people attended my workshop, most of them laughed at my jokes and most of them (hopefully) learnt something new about how awesome Scapy is. I’ve just found out the online feedback form for the whole BSides London event contains questions about my workshop so I will leave judging it’s success till I see those..

If nothing else it’s made me want to run more workshops, not just on Scapy but on other areas as I learn them, so I just want to say a BIG THANK YOU to Iggy for giving me that nudge and of course all the people that attended my workshop on the day.

The GitHub repo is HERE
The Slide Pack is HERE
The Scapy Cheat Card (pdf version) is HERE


Scapy: pcap 2 streams

Morning readers, I thought I would start Monday morning with another piece of Scapy/Python coding goodness. This time though for an added treat I’ve thrown in a bit of tshark not because Scapy isn’t awesome but for this piece of code tshark works much better.

The code today, takes a pcap file and extracts all the TCP and UDP streams into a folder of your choice. If you’ve ever used Wireshark then you know this is a useful feature and it’s something that I’ve put into sniffmypackets as it’s a good way to break down a pcap file for easier analysis.

The code isn’t perfect (when is it ever), at the moment you will get the standard “Running as user “root” and group “root”. This could be dangerous.” error message if like me you run this on Kali. If you are using “normal” Linux you should be fine. I am looking at how to suppress the messages but the Python subprocess module isn’t playing nice at the moment.

The code has 2 functions, one for handling TCP streams, the other for UDP streams. For TCP streams we use the tshark “Fields” option to run through the pcap file and list of the stream indexes, we save that to a list (if they don’t already exist) and then we re-run tshark through the pcap file pulling out each stream at a time using the index and then write it to another pcap file.

For the UDP streams it’s a bit more “special”, UDP streams don’t have an index the same as TCP streams do, so we have to be a bit more creative. For UDP streams we use Scapy to list all the conversations based on source ip, source port, destination ip, destination port we then add that to a list, we then create the reverse of that (I called it duplicate) and check the list, if the duplicate exists we delete it from the list.

Why do we do that?? Well because we can’t filter on stream, parsing the pcap file will list both sides of the UDP conversation which mean when we output it we end up with twice as many UDP streams as we should have. By creating the duplicate we can prevent that from happening (took me a while to figure that out originally).

Once we have created our list of UDP streams we then use tshark to re-run the pcap file but this time with a filter (the -R switch). The filter looks a bit like this:

tshark -r ' + pcap + ' -R "(ip.addr eq ' + s_ip + ' and ip.addr eq ' + d_ip + ') and (udp.port eq ' + str(s_port) + ' and udp.port eq ' + str(d_port) + ')" -w ' + dumpfile

The parts in bold are from the list we created when we pulled out all the UDP conversations. Each UDP stream is then saved to a separate file in the same directory as the TCP streams.

To run the code (which can be found HERE) you need to provide two command line variables. The first is the pcap file, the second is your folder to store the output. The code will create the folder if it doesn’t already exist.

./pcap-streams.py /tmp/test.pcap /tmp/output

When you run it, it will look something like this:


If you then browse to your output folder you should see something like this (depending on the number of streams etc.


So there you go, enjoy.