cyber.egd.io

Write ups on what I do in the cybers

Those that follow me may know that I was raised Mormon but no longer practice in the religion. As such, I participate in quite a few online forums with other people in various places of the Mormon spectrum. In these forums, private text messages or social media posts from believing and orthodox family or friends are often shared as a way to vent frustration. Most of these forums that I participate in are private, but a few are public. Nevertheless, I'm always shocked at how little the person sharing these things does to redact the text in order to protect the privacy of the other party. The most common situation is that a name is redacted but a profile picture is not. Occasionally, nothing will be redacted at all. It is rare that I see a post that meets what I personally consider as acceptable. I thought I'd take the time to share that personal criteria in hopes that it can catch on and make the internet a safe place for everyone. After all, I have some experience in doxxing.

Why does it matter?

You may be thinking why should you care. The answer is, for a lot of reasons. While your intentions in sharing may not be malicious, but rather therapeutic or cautionary, there is always the risk that the comments shared will be triggering to someone in your audience. If there is information that can be used to potentially identify this person, it could be maliciously exploited to find further information. A good example is in the aftermath of Charlottesville, many of the white supremacists were identified via the photography at the event. Subsequently others around the Internet were able to track down many of their employers, contacted them, and advocated for the person's termination. Now, I'm not arguing that this situation is or isn't justifiable, but simply use it to illustrate the possibilities. If your friend makes one ignorant statement that you can get passed but someone else can't, and you don't redact their information sufficiently, that person could get fired or otherwise have their life ruined. This is a very important topic. For more information, I suggest reading the Wikipedia entry on doxxing. Please also note my disclaimer in the footnotes here.

The Criteria

Screenshots of social media posts (ie, Facebook, Instagram, Twitter, etc)

The most common form of sharing these posts I have seen is to post screenshots. It is typically of something that the person sharing completely disagrees with or finds offensive. In order for it to be acceptable to share, I believe the screenshot must have the following things redacted completely:

  • All names (except your own if you don't care)
  • All identifying pictures
  • Any other identifying information such as phone number, address, neighborhood and possibly even city and state.

To illustrate, I will use screenshots of a couple recent public Facebook posts I have made. For the sake of demonstration, imagine these are being posted to a private forum by a friend of mine.

Total fail

The example below is a total fail because absolutely nothing is redacted. This should almost never happen.

Nothing is redacted. Absolutely not acceptable.

Better than nothing

The next example is only slightly better because of the redacted name. However, you'll notice my profile picture is still exposed. This is still very identifying, especially because it is so close up.

Better than nothing. Profile picture is still unredacted.

Not much better

The only difference between this example and the previous is that my eyes are redacted. This isn't much different because there are still other identifying features in my profile pic such as my hat, beard, and smile. Somebody that was friends with me on Facebook could easily identify me. I know this for a fact because this exact scenario happened to me personally.

Not much better. Still identifying.

Perfect

The below example is the only acceptable way to redact a public social media post and reasonably protect the anonymity and privacy of the person involved. Notice that the exposed names of those that have liked or otherwise reacted to the post have also been redacted. This is important as well, especially if the post is particularly controversial and could potentially spread to outside the group for which it was originally intended.

Perfect. Do this every time!

Also note, that if the post had contained my location, address, phone number, or anything else potentially identifying, that should also be redacted.

Post with comments

Lastly, if the comments of a post are shared, apply the same principles outlined above to the comments section as well.

Apply same principles to comment section.

*Note: This post contains a link to my personal blog. For the sake of demonstration, assume that it's to a normal media outlet. Otherwise, it is very identifying.

Text and other private messages

Another fairly popular thing to share is Facebook and Twitter direct messages or text messages. Often times these are conversations that did not necessarily go the way the person sharing them had expected and the other party probably said some things they regret. In my example, I simply engaged in a generic conversation with one of my Facebook friends. This is the extent to which these conversations should be redacted.

Name, profile picture, address, phone number, and time of meetup all redacted.

Note the following things that were redacted:

  • Name of the conversation at the very top of screen
  • Time of the meeting
  • Address
  • Phone number
  • Name in line with conversation
  • Profile picture (in multiple places)

It is crucial to redact all these things in order to protect the person's identity. The exact same rules should be applied to text messages as well.

Scope: public vs. private posts

Often, people will argue that if someone makes a tweet, instagram, of Facebook post public for the whole world to see, that redactions are not needed. I tend to fall into this school of thought, but a valid counter point is the person could later regret the decision and either delete or make the post private. The forum or group in which you are participating may have specific rules around this topic and it is always good to check. One of the forums that I most frequent, the r/exmormon subreddit, does have rules specifically requiring the redaction of all shared social media posts whether public or not. If there are not rules specified, it is a personal decision, but may be better to err on the side of caution and follow the suggestions I outline above.

When making this decision, it is important to consider the scope of what you are sharing. When I say scope I mean it in two specific things: the scope of the privacy settings and the scope of the audience with whom it will be shared. For instance, if you can't believe what your cousin said about Trump's newest executive order in a post only shared with Facebook friends, it is probably ok to screenshot the post and share it with your sibling who is also this cousin's Facebook friend. Or if it's a public tweet of someone you don't know but you're only going to share it with a friend because it supports a point you made in a recent debate, that's also likely fine. But if ever you are explicitly sharing something set to a strict privacy setting with an audience that it obviously was not intended, it is crucial that you follow the suggestions above.

Please note that major media outlets will often share public tweets concerning a certain topic they are writing about, such as Buzzfeed in this hilarious article. In it, they are sharing tweets mostly of students from Brigham Young University the day the school finally allowed caffeinated beverages to be sold on campus. The article embeds the tweets in line with the rest of the text with the article rather than sharing an image of the tweet. This is important, because if the tweets were later deleted or made private, they would not longer appear in the article, thus respecting the user who initially made the comment. There are some instances when this is not necessary, such as when the person is a public figure. That said, think twice before saying something potentially controversial online. The Internet is typically not a forgiving place and not everyone has the same ethics as you or other respectable media organizations.[^1]

Conclusion

Please feel free to share this standard with your online forums and groups. The more these are implemented, the better the Internet will be. Also, don't worry about the redactions looking as clean as my examples here. As long as the sensitive information outlined above is obfuscated, whether by emojis, scribbles, or simply cropping it out, that is what matters. Lastly, please feel free to contact me via email or in the comment section below if you disagree or would like to see something added.

Thank you

egd

[^1]: The ethics and guidelines for doxxing are not nearly as black and white and clean cut as I make them out to be here. This article should not be used as an ethical guide, but rather be seen as suggestions for protecting the innocent when sharing private conversations outside of their intended audience. The example cited of Charlottesville brings attention to a fascinating ethical and moral debate about whether this sort of vigilante activity is justifiable in certain situations.

#doxxing #privacy

BSidesLV 2016 and Def Con 24 comprised my first experience of “Hacker Summer Camp”. I’ve now been working in information security for four years, have attended a handful of conferences, and have spoken at a number of both conferences and meetups. I personally feel that I have a good handle and understanding of the culture of the industry and mostly went into the week knowing what to expect. There were some great things and some not so great things. Here are just a few.

Old Friends and New Friends

I enjoyed seeing people who I knew of from the Internet but had never met in person. For example, @Andrew___Morris gave a fantastic talk about how he got the Bitcoin blockchain into a queryable database of transactions. It was him who taught me how to reverse engineer simple malware using gdb, yet I had never met the guy in real life. That was cool. Or one morning as we were walking from our room in the Tuscany to BSides, we were stopped by a woman asking if she could follow us because she didn’t know where she was going. Who was it? None other than, Katie Moussouris. Ironically I didn’t see Jack Daniel at BSidesLV, but I did at BSidesSLC. Go figure.

I have been told that “hallway con” is the best part of any conference and I totally agree. Essentially, hallway con is networking, talking to people waiting in the same line as you, learning about what they work on, what their aspirations are, etc. I would not be at my current job if it weren’t for some good old fashioned hallway con. Reconnecting with old friends and making new ones is probably the best thing you can do at any conference. It is sometimes easier to do that at certain events than it is others, largely due to the organization of a conference. A good conference will take advantage of their venue and orient their speaking and workshops rooms to facilitate mingling.

Venue

Both venues had their pros and cons. BSidesLV was organized and laid out in a very sensical manner that was easy to get around. The large vendor and keynote room provided a nice gateway to the majority of the tracks. I loved that as you walked out of talks you immediately felt like you were in the middle of everything else happening at the conference again. However, the Underground track was not in a room near the vendor area. The hallway in which it was found was fairly narrow. Every talk in that track had a serpentine line up and down the hallway making it very hard and uncomfortable for those not waiting in line to pass. As BSides grows I don’t see the Tuscany being able to accommodate many more participants than it already is, but don’t quote me on that ;)

Bally’s and Paris were just short of a nightmare: crowded bathrooms, restaurants, elevators, hallways, workshops, lines, and talks all with a slightly different brand of post-deodorant body odor. Granted, holding a conference of over 30,000 people is going to be tough in any hotel on the Strip. Which raises the question: why is Def Con still held in a hotel? Why don’t they move it to an actual convention center that is made to handle that type of attendance. I am willing to bet that the Def Con committee is sick of hearing complaints like this, but I am also willing to bet that Bally’s and Paris give them a killer deal every year. Next year they’re moving it to Caeser’s Palace which may not be prone to the same stand-still foot traffic in every hallway, but I am not holding my breath. Plus, having it at a bigger venue is sure to relieve stress on conference security, or more affectionately referred to as “Goons”.

Goons

Ok, I’m sorry but some of the Goons at Def Con need to step off their pedestal. The first day of Def Con we wanted to get into some workshops, but just like everyone else, we didn’t get our badges in time to make workshop registration. We were told by a friendly Goon that we could try and get in on standby but after we crossed Bally’s casino and got to the bottom of the escalator to go up to the workshops on the third floor, we were met by a very unfriendly Goon who rudely told us there was no way we were going to get into a workshop. I verbally pushed back in a similar tone insisting that we go up and try. After a bit of not-so-playful banter, he gave in. On our way up he reiterated in a very condescending tone that we weren’t going to get in. Once we got to the third floor they asked us what workshop we wanted to go to and let us right in.

The next day we were leaving the Social Engineering Village and heading towards Bally’s Casino when we heard yelling. I immediately thought it was a Goon. I looked up to see a very petite, short woman yelling something to the effect of “If you cut the line I will grab you and personally escort you to the back of it!” I started to chuckle at the thought of her little figure dragging someone to the back of the line when someone behind me said, “You need to check your attitude lady”, with which I completely agreed. I mean she was yelling at the top of her lungs in a very aggressive tone for what appeared to be no reason at all. I had not seen anyone push or try to cross the orange line they had to partition lanes of foot traffic. The same guy then followed up by asking her “I have Xanax in my backpack, would you like one?”

Don’t get me wrong, there were some very nice Goons that were authoritative, direct, and effective in getting people to where they need to go. But there were a good amount that definitely needed to take a little bit of something to calm their nerves. Just because you are a Goon helping direct affairs at the largest infosec conference doesn’t mean you’re any better, smarter, cooler, or 1337er than anyone else there. It definitely doesn’t mean you can disrespect the attendees.

Villages

I am not sure I have anything negative to say about any of the villages I attended. Admittedly, I spent the most time in the Packet Village (PV) and the Social Engineering Village (SEV) and both were run beautifully. The PV had four or five challenges running simultaneously as well as a speakers. Their speaking track got so popular that they were breaking fire code and had to quickly move the stage, A/V equipment, and chairs just outside of their booked room in order to accommodate. I was impressed.

The SEV is an obvious favorite by many at Def Con, especially when they are running their CTF. Who doesn’t love watching security professionals trying to take advantage of poorly trained employees and get as much information out of them as possible? I only had the opportunity to listen in on three contestant’s calls and only one was particularly entertaining, but that was just luck of the draw. I still enjoyed it!

I didn’t realize just how many villages and CTFs there actually are at Def Con. I had no idea there was an Intel CTF, that is my fault for not reading the program in its entirety. I believe this was its first year and you better believe I will be competing next year! At other conferences I have attended, I have typically found more value in the CTFs and villages than in the talks. I do not think Def Con and BSidesLV are any different. You only get to experience the CTFs and village challenges in the moment, while most talks will be on Youtube within a few weeks after the con. In addition, hallway con is much easier in a village or CTF environment where collaboration is a must, rather than a talk where talking is rude. However, there are always a few talks that pique my interest so much that I do my best to attend them live.

Underground and Skytalks Tracks

These were two tracks that I was very excited for. Researchers and professionals talking about their discoveries and opinions off the record. The idea that they feel that what they are going to say is so sensitive that it shouldn’t be recorded immediately makes the topic intriguing. However, I must say that I was fairly disappointed with the content of most of the presentations I attended.

I went to three Underground talks at BSides and two Skytalks at Def Con. Of the five, only one did I feel was sufficiently sensitive enough to warrant an off the record setting. For another, I believe the situation was a bit touchy and I could see why the presenter wouldn’t want proof that he revealed what he did, but the content of his talk wasn’t anything that I couldn’t go learn myself on the Internet. The other three, while containing interesting content, didn’t contain any sort of sensitive information at all. In fact in the last one I attended, the presenter wanted to record the talk himself, but Skytalks rightfully told him not to. Like any organization, I think it’s important that they don’t compromise what they originally organized to accomplish.

My experience in the two tracks got me thinking that perhaps the CFP process should be a little more strict in both cases. Granted, I have never been on a CFP committee nor do I know the circumstances of either track’s committee this year, but just because you are an off the record track at a three day conference doesn’t mean that you have to provide content for all three days. I feel that the committees would be doing a greater service to their attendees and the community by more strictly screening the content of each applicant’s talk in order to verify that it really should be delivered in such a private setting. That’s not to say that the rejected talks are not be comprised of great content, but rather they don’t belong in a setting provided by the Underground or Skytalks tracks, which implies a certain level of sensitivity.

If any member of either the Skytalks or BSidesLV CFP committees is reading this, I would love to get involved. Your mission is definitely something I relate to and support. If you’re interested, feel free to email me or reach out to me on Twitter.

Talk Titles

Everyone hates scrolling through their Facebook or Twitter feeds to see a title such as “What she says next will shock you” and, for the most part, we are all conditioned to not pay attention to such titles anymore. I feel that infosec conferences have a similar problem with the titles of their talks. The problem is not nearly as blatant as that, but there were several talks I attended whose content was very different than what the title — and in some cases even the abstract — led me to believe.

This is also partially due to the fact that not everyone in the industry is the best public speaker, but that’s ok. Sometimes presenters assume that they are conveying what they want to, when in reality their audience is very lost. A good speaker takes time to explain things and make sure the audience is understanding by asking questions. Honestly, audience engagement is key to any successful speaking opportunity. I feel the quality of all conferences would improve tremendously if they provided workshops for their speakers in both title/abstract writing and public speaking. I’m sure many conference committee members are rolling their eyes at me right now. I know, you’re on a budget and your volunteers are already stretched thin. Maybe take the same approach BSidesLV does with their Proving Ground track and assign volunteer mentors to beginning speakers or others who request it. If you do have the time and resources to offer a couple of public speaking workshops before your event, do it!

Again, if you are on a conference committee and looking for help, I’d love to get involved!

Women Professionals

You may be wondering why in much of this post I have been referring to a “we” rather than “I”. No, I don’t have an alternate personality. Rather, my wife and I attended both conferences together. She is a software engineer with interests in security and privacy. Even though it was the first time for both of us, she was treated extremely differently than how I was treated. I’m not talking about your normal getting hit on at all the Blackhat and Def Con parties, which did happen (she was sitting right next to me too). But rather, everywhere we went she was treated as a tag along. I won’t go into too much detail because she wrote her own post explaining it herself. But I will say that her experience inspired me to help women feel more inclusive in the industry.

I do think that the infosec industry is beginning to make significant progress on this, but we still have quite a ways to go. WISP, Women in Security and Privacy, is a newer organization dedicated to this cause. They had a booth in the vendor area at Def Con for the first time this year. In talking to them, it was obvious that this was an organization that I want to throw my support to. If you feel the same way about women in security, check them out and get involved!

Next Year

Overall, summer camp was a very positive experience with a few bad side effects. The parties were fun, I met a lot of smart people, and I left inspired to be a better security professional and community member. That said, I will be doing many things differently next year. I hope to present a demo in Def Con’s new Demo Labs area. I’m always working on a project at home and that would be a great venue to show something off. As stated above, I also hope to participate in a couple of CTFs next year. And, who knows, perhaps I’ll even submit to present at BSidesLV and/or Def Con next year. We’ll just have to see.

I hope my goals to get involved with WISP and CFP committees can help improve the security community as a whole. I’m still young and have a long time to spend in the industry, and while I love it in it’s current state, there is much room for improvement.

Thanks for a great week BSidesLV and Def Con!

#defcon #bsides

Sean Cassidy, in a recent blog post explained the workings of LostPass, a phishing framework specifically targeting the popular password manager LastPass. In it, he perfectly articulated an idea that has been bouncing around my mind for a couple of months:

The standard refrain is that we need better user training. That is simply not good enough.

I couldn’t agree more with this statement. We can train them about best practices and cyber threats until we cannot talk, but they will still mess up and the bad guys will still find a way! Sean goes on to say:

The real solution is designing software to be phishing resistant. Just like we have anti-exploitation techniques, we need anti-phishing techniques built into more software.

It’s easy to patch software, but it’s extremely hard to patch human error. However, I believe that we may be able to patch the human by helping them remember best practices, as Sean suggests, via software.

In an average internal phishing engagement there is always going to be the users that click on the link that, in a real life phishing attack, would compromise the entire company exposing trade secrets, financial data, or the memes posted in the company IRC. No matter how many times they sign their annual security awareness training certification, they love their phishy links! Or perhaps you have a fantasy user base and no one clicks on any links or downloads any attachments in their emails.

There will always be exploits like LostPass and your entire user population will go down in flames, including you. How can we protect against it? My answer: a whitelist.

I know, you immediately rolled your eyes. Ain’t nobody got time to manage a whitelist! But here is my proposal on how we can patch the human with a minimal stress whitelist:

In effort to avoid as much disruption as possible, take all the domains that anyone in your organization has ever hit in the last three months and take out all the known and perhaps not-so-well-known malicious ones. The remaining good domains will be your whitelist. If you are worried that your user base will notice, thus causing too much disruption with their workflow, expand and take the past six months worth of domains.

Enforce the whitelist via DNS or a browser extension, it doesn’t matter what layer you do it at, as long as it is effective. When a user decides to visit a site that isn’t on the whitelist, present them with a warning page similar to the one they see for a site with a untrusted TLS certificate and give them to options to continue or leave. If they have a legitimate purpose for visiting a non-whitelisted site, they can continue. That way if your CEO is in an important meeting and wants to

show everyone that new hot company’s website, they don’t look like like an idiot. However, for those click happy employees, when they see the warning they’ll immediately think to themselves Did I mean to go here? How did I get here? They will then remember that it was by a link in an email that they weren’t expecting. They will be more likely to remember their security awareness training in that moment and get out of there.

We cannot expect our users to remember everything they hear in user training, but we can help them remember. Will this stop every phishing attack? Nope, in fact there’s a good chance it wouldn’t stop LostPass. In that situation, some users might think that LastPass was taken off the whitelist by mistake and choose to continue. Others would become suspicious and hopefully check with security. But if we mention in training to do the latter, it is more likely what our users will do with a big warning page.

Is this what Sean had in mind when suggesting to build anti-phishing techniques into software? I don’t know, but I do think its a great start. Approaching user problems with this mindset will enable us to better protect our data and organizations. User training is not the whole answer.

If you have another idea on how to “patch the human”, I’d love to here it! Feel free to reach out on Twitter.

TL;DR

User training is not going to be enough in protecting against advanced adversaries. Don’t assume your users are going to do exactly what they are taught to do in security training. Plan for the the users that do not listen.

One way to supplement training would be to implement a whitelist of domains and URLs your users are allowed to visit. When a user hits a blocked site, give them the option to continue on or to leave. This will remind them of what they learned and help them realize that link was a little more shady than they thought.

Approaching user problems with this mindset could enable us to better protect our data and organizations. User training is not the whole answer.

#phishing #ideas #training

This article was originally posted on nullsecure.org and has been republished with permission.

I’ve been pretty busy lately with updating Tango to version 2.0 and working on threatnote, but, another project I started on recently was something @__eth0 and I are calling Gavel. Gavel is a set of Maltego transforms that query traffic records in each state. This project started out really ambitiously and we wanted to cover all 50 states, however, we ran into several problems. Our goal was to provide a way to look up certain data that are available in the traffic records, to include:

  • Address
  • Height
  • Weight
  • Age
  • License Plate Number
  • Car Make/Model

This is some great Open-Source Intelligence (OSINT) information available, and we wanted to make it easy to be obtained by researchers by using Maltego. As mentioned above, we ran into several problems that are preventing us from releasing it as a full blown set of transforms.

Roadblocks

The first problem we hit, was some states require you to pay for each query you make against the database. If we hosted this transform on a server, we wouldn’t be able to cover the cost of each of these queries, and even if we provided the code to the users, I’m not sure we could code out a good solution to facilitate the payment information for each query.

The next problem was some states are broken out by County. This would create so much extra work for us, and by the time we finished one county, another one might have changed their code, so it’s a ton of maintenance work to get them all working. Also, some states/counties used Captcha codes for each query, and I’ve had no experience getting around them.

So, with those problems at hand, we decided to open-source this tool to the community with the hopes that any people that would benefit from this OSINT tool can code out their own county and/or state. We’re aware we may never have all the states and counties covered, however, we’d like to get as many done as we can.

Currently, only Maryland is complete, so if you live there, you’re in luck! The code isn’t that difficult, it just took a little bit working with the requests and getting the exact responses we needed. The worst part is trying to parse the HTML, which I have no problem saying….I suck at.

How It Works

To use Gavel, you’ll simply download the code we provide and import the transforms into Maltego. Once all the code is set up and in the right place, you would then just add a “Person” entity in your Maltego graph like so…

png

Next, you would right-click on the entity and run “Gavel — Get Names”. This transform searches through the states traffic records and gives you the names of individuals that match your search that it has records on. For example, if your name was John Smith, there would probably be a ton of case records for that name, that’s why we give you the names, since it’s easier to narrow to the specific person you are looking for. This step also adds properties to each entity of the case ID’s that it will need to query in the next step.

Next, you would right-click on the person of interest and select “Gavel — Get Addresses”. It will then iterate through those case ID’s in the entities properties and return location and vehicle entities based on the information it finds.

Here’s a screenshot of what the end result would be.

png

In the image above, you can see at the top is the original entity we added, “Brian Warehime”. Below that are the case records it found that match the name that will hold all the case ID’s in the properties. Below the name are all the addresses and vehicle information we could discover (This is made up data, since my last traffic stop was so many years ago, it aged out).

You’ll notice on the right-hand side in the “Property View” section, we added additional properties to the person entity. We added the height, weight and DOB for each target, which will help validate if this is your target.

With regards to the vehicle entities, we display the license plate number, however, you can select the entity and on the right-hand side in the properties area, you will find the year, make and possible body style for the vehicle. See below for a screenshot:

png

Looking at the screenshot above, we can see it’s a 2000 GMC with a possible body style/make of “05”. I’m not sure where we could look that number up to find what model it corresponds to, so if you know, please let me know!

Installation

On the github page, you’ll find a few files to download, a few Python scripts, the Maltego library we use and an .mtz file.

First up, place the Python scripts in a location on your computer, like /Users/<yourname>/Maltego/Transforms or wherever. Next, place the Maltego library in the same directory as the two Python scripts you just moved.

Next, open up Maltego. Click on “Manage” in the titlebar, followed by clicking on the “Import Config” button. Locate the .mtz file you downloaded and click next. Make sure the “Local Transforms” and “Transform Sets” buttons are checked and click next. Once installed, click on “Finish”.

To make sure these transforms run correctly, we’ll need to set up your environment. Click on “Manage Transforms” in the menubar and it’ll open the “Transform Manager”. Next, scroll down until you find the Gavel transforms. Click on the first one and look at the bottom right of the window. You’ll see a few options like so:

png

First up, make sure the “Command Line” points to your correct Python interpreter, for instance, I put /usr/bin/python for mine. Next, change the “Working Directory” to the location you saved the transforms earlier.

Repeat the steps above for both Gavel transforms and you should be all set. One last thing before you go though, I believe you need to download an Entity expansion pack to use one of the entities I added (the car), which can be found here. It’ll still work without this, however, it’ll show up as a chess piece if the entity type is not found.

That should cover it, however, if those instructions don’t work, please feel free to email me or reach out to me on Twitter or something.

Future Development

With Maryland being the only state, we definitely want to expand this as far as we can. We’ll try to do other states as time allows, but, that’s why we need your help!

@__eth0 has done a lot of work for Delaware, and just needs to do some minor tweaking, however, once that’s done, we’ll require users to add a property value of “State” when they create the person identity to know which state to query.

A minor thing that I’ll most likely complete this week is adding the date to each entity when it was for. So, each address and vehicle will have a month/year attribute so you can know how useful the data is. One thing we thought would be useful as well is to correlate this information against state property records for validation. Anyone can go into the state’s property records and look up an address to see the current owners, so this would be an excellent way to validate the data from the case records.

You can find all the code on my Github page for what we have currently, and if you have any comments or questions, please feel free to reach out to us on Twitter at @brian_warehime or @__eth0.

#osint #maltego #privacy #projects

This was originally published on nullsecure.org. It has be republished with permission

Hey everyone, today we're doing something different. This is going to be a joint blog post from Ethan Dodge and I in which we talk about phishing defense coverage by the Alexa Top 100 domains, which will also expose the best attack vectors for phishing against these domains.

We're going to be using a combination of the new DNS reconnaissance tool DNStwist as well as some custom Python scripts to gather and analyze all the information we find, which we'll include in this post if you want to follow along or do your own research.

Overview

Here's a rundown of what we'll be doing to get all the information we need. We'll start by pulling down the Alexa Top 100 Domains, then we'll create a script to run them through a modified version of DNStwist to give us the permutated domain as well as the type of permutation (bitsquatting, Insertion, Omission, Replacement, etc.). We'll take this list and then do a host lookup of the domain to get the IP addresses hosting this domain, lastly we'll do a Whois lookup and Reverse DNS lookup on the IP we get and compare the Registrar/Pointer Record information against the domain to see if they match up.

After we have the comparison data, we'll be able to calculate what types of permutations are the most covered against attacks (meaning the original domain registered the permutated domain to possibly prevent phishing attempts), as well as the types of permutations that are least covered.

Grabbing the Data

First thing we need to do is get the Alexa Top One Million sites and just narrow that down to our scope of 100.

wget http://s3.amazonaws.com/alexa-static/top-1m.csv.zip

Then we'll just cut that down to 100 domains.

cat top-1m.csv | awk -F ',' {'print $2'} | head -n 100 > alexatop100.txt

This will give us something that looks like this:

https://i.imgur.com/4BMakl1.png

Getting Permutations of Domains

Starting out, we'll need to use DNStwist to get a list of permutations. We went ahead and modified the original script to not print out extra information we don't need, we only wanted the type of permutation and the resulting domain of that permutation.

Here's a screenshot of the resulting script being run using google.com as an example.

https://i.imgur.com/ewhsmic.png

If you want the modified version of dnstwist we used, you can grab it here.

Now that we have our list of domains, we'll use this bash one-liner to loop through each of the domains, and then run that through our modified dnstwist and output the results into it's own file in a new directory:

while read domain; do python dnstwist.py $domain > ~/Desktop/alexatop100/$domain; done < ~/Desktop/alexatop100.txt

Running the above takes about 5 seconds and gives us a directory looking like this:

https://i.imgur.com/wLbeUqU.png

Host Lookup

Next thing on our plate is to do a host lookup on the resulting permutated domains. We want to end up with a text file containing the permutated domain, permutation type, and IP Address if valid and a string of “NXDOMAIN” if it's not a valid IP Address.

Here's the bash one-liner we used to look through all the permutations for each domain and run the host lookup on each, and then add it to a new file for future analysis.

for file in *; do python hostlookup.py $file; done

After letting the above command run for about 30 minutes or so, we're left with a directory that looks like this:

https://i.imgur.com/A5nLMyr.png

You'll notice the directory now has another 100 files appended with \_hostlookup. Inside each file, we see each permutated domain with the IP Address it resolved to.

https://i.imgur.com/dOoIYtA.png

Reverse DNS Lookup

Initially we were going to run a Reverse DNS lookup against the IP's to see what Pointer records they had, however, we thought doing a Whois lookup would be higher integrity. In any event, here's the steps we did to run a Reverse DNS lookup on all the domains and permutations.

Next we wrote a Python script to grab the pointer record that was returned from each permutation. It included the domain, IP, permutation type, the pointer record and a True or False statement depending on if the original hostname was seen in the hostname we grabbed.

Running the above script in another bash one-liner like this, for file in *; do python rdnslookup.py $file; done, we get another 100 files in our directory that are prepended with _rdns.

https://i.imgur.com/zGDQJMf.png

Inside each file we can see the resulting Pointer record and the True or False string.

https://i.imgur.com/hpTms00.png

Moving on to the Whois lookup...

Whois Lookup

The last part we need to code before we can analyze the data is the Whois lookup on all the IP Addresses we grabbed from the host lookup step.

With this part, we just want to grab the description field of the Whois info, which should tell us the company that owns that IP Address. Between the Whois and Reverse DNS lookups, we should be able to determine if the owner of the IP Address matches the permutated domain.

Now that we have our final data, which was two different sources (Whois and Reverse DNS), we can now run some statistics on this data to answer some of the questions we asked earlier in the post. First things first though, we'll need to get Splunk set up to ingest the data.

Splunk Setup

In order for Splunk to recognize the fields, we'll configure the props.conf file in /opt/splunk/etc/system/local/ with the following settings:

[phishing] REPORT-phishing = REPORT-phishing

[whois] REPORT-whois = REPORT-whois

Next, we edit the transforms.conf file in /opt/splunk/etc/system/local/ like so:

[REPORT-phishing] DELIMS = “ “ FIELDS = “domain”,“ip”,“permtype”,“hostname”,“ismatch”

[REPORT-whois] DELIMS = “ “ FIELDS = “domain”,“ip”,“permtype”,“owner”,“ismatch”

That's all that needs to be done in order to parse the events. Which will look something like this now:

https://i.imgur.com/dqKiCa4.png

Before We Begin

Before we get into the specific results between Whois and Reverse DNS, it'll help if we identify the different types of permutations, provided by Lenny Zeltser on his blog:

  • Bitsquatting, which anticipates a small portion of systems encountering hardware errors, resulting in the mutation of the resolved domain name by 1 bit. (e.g., xeltser.com).
  • Homoglyph, which replaces a letter in the domain name with letters that look similar (e.g., ze1tser.com).
  • Repetition, which repeats one of the letters in the domain name (e.g., zeltsser.com).
  • Transposition, which swaps two letters within the domain name (e.g., zelster.com).
  • Replacement, which replaces one of the letters in the domain name, perhaps with a letter in proximity of the original letter on the keyboard (e.g, zektser.com).
  • Ommission, which removes one of the letters from the domain name (e.g., zelser.com).
  • Insertion, which inserts a letter into the domain name (e.g., zerltser.com).

Whois Analysis

Ok, let's jump into the data! We'll start with the analysis of the whois data.

Below is a list of the top permutation types registered

sourcetype=whois | top perm_type

Permutation Type Count
Replacement 2002
Insertion 1849
Bitsquatting 1347
Omission 454
Repetition 400
Transposition 347
Homoglyph 335
Concourse 313
Subdomain 193
Hyphenation 146

Alright, out of all the registered domains, how many permutated domains are potentially registered by the original domain owner?

sourcetype=whois is_match=true | stats count

Out of all the domains registered, according to our unvetted data, there are only 460 domains registered by the original domain owner.

Now, let's see the permutation type protected against the most.

Permutation Type Count
Insertion 146
Replacement 130
Bitsquatting 53
Repetition 41
Omission 35
Transposition 23
Homoglyph 19
Hyphenation 11
Subdomain 2

Out of all the Insertion permutated domains, let's identify the domains that are protected the most:

sourcetype=whois is_match=true perm_type="Insertion" | rex field=source "\/tmp\/(?<original_domain>[^_]+)"| top original_domain

Domain Count
amazon.com 29
microsoft.com 28
booking.com 26
amazon.co.uk 25
yahoo.com 15
amazon.in 9
netflix.com 7
wikipedia.org 2
yandex.ru 1
msn.com 1

Now, let's just do the most protected domains regardless of permutation:

sourcetype=whois is_match=true | rex field=source "\/tmp\/(?<original_domain>[^_]+)"| top original_domain

Domain Count
amazon.com 82
amazon.co.uk 63
microsoft.com 62
booking.com 55
amazon.in 54
yahoo.com 44
netflix.com 29
bing.com 17
apple.com 10
wikipedia.org 6

Looks like Amazon is the most concerned about someone ripping off their domain :)

Reverse DNS Analysis

Alright, let's move onto Reverse DNS Analysis.

Let's get the most common permutation types registered.

sourcetype=phishing | top perm_type

Permutation Type Count
Replacement 1163
Insertion 1025
Bitsquatting 796
Omission 236
Repetition 227
Homoglyph 203
Transposition 194
Subdomain 98
Hyphenation 95

Alright, out of all the registered domains, how many permutated domains are registered by the original domain owner?

sourcetype=phishing is_match=true | stats count

Out of all the domains registered, according to our unvetted data, there are only 381 domains registered by the original domain owner.

Now, let's see the permutation type protected against the most.

Permutation Type Count
Insertion 114
Replacement 108
Bitsquatting 47
Repetition 32
Omission 27
Transposition 25
Homoglyph 17
Hyphenation 10
Subdomain 1

Out of all the Insertion permutated domains, let's identify the domains that are protected the most:

sourcetype=phishing is_match=true perm_type="Insertion" | rex field=source "\/tmp\/(?<original_domain>[^_]+)" | top original_domain

Domain Count
amazon.com 29
booking.com 26
amazon.co.uk 25
yahoo.com 17
amazon.in 9
netflix.com 4
yandex.ru 1
wikipedia.org 1
msn.com 1
blogspot.com 1

Now, let's just do the most protected domains regardless of permutation:

sourcetype=phishing is_match=true | rex field=source "\/tmp\/(?<original_domain>[^_]+)" | top original_domain

Domain Count
amazon.com 82
amazon.co.uk 63
yahoo.com 56
booking.com 55
amazon.in 54
netflix.com 19
bing.com 12
google.de 8
google.com 7
wikipedia.org 5

Just like the Whois data, Amazon takes the cake for most permutated domains registered. It makes sense though, since that would be a great site to phish for credentials.

DDoS Protection Sites

We knew that we would have some incorrect data, since not every domain registered would point exactly to the owner, i.e. wikipedia.com is owned by Wikimedia, thus we wouldn't count that as true.

One of the big things we noticed is the amount of domains resolving to prolexic.com, which is a DDoS protection site, which is what companies would use in order to prevent DDoS attempts on their sites (...obviously). We doubted that phishing domains and/or malicious actors would enlist the help of a DDoS protection service, since it probably costs a lot of money based on traffic. Based on this fact, we are going to count prolexic.com hits as true and see what kind of results we get then.

Let's rerun some of the original searches now...

First, how many permutated domains are protected?

sourcetype=phishing | eval ddos=if(searchmatch("hostname=*prolexic*"),"True","False") | search ddos="True" OR is_match="True" | stats count

We see that there are now 808 domains protected, instead of the original 381, big change!

Now we'll see what permutation types are protected against the most:

Permutation Type Count
Replacement 243
Insertion 211
Bitsquatting 139
Repetition 48
Transposition 45
Omission 44
Homoglyph 40
Hyphenation 19
Subdomain 19

Lastly, what domains are protected the most:

Domain Count
amazon.com 91
amazon.co.uk 66
booking.com 59
yahoo.com 58
amazon.in 56
pinterest.com 41
netflix.com 26
google.es 21
paypal.com 19
imdb.com 15

Final Thoughts

So, some interesting things came out of this research. First, the most common types of permutated domains that companies seem to register are replacement or insertion permutation techniques (netflox.com or netfliix.com). We also discovered that a majority of companies are using DDoS protection sites to register permutated domains (this isn't really a surprise, just interesting to note.). Lastly, we see that amazon.com, booking.com and yahoo.com are the most protected against potential phishing attempts.

The last note isn't that surprising again, but, it's interesting to see what companies took the time and steps to register these other domains. Amazon and Yahoo! are definitely sites I would expect to see, more so Amazon than Yahoo, however, since Yahoo has been around for a while, it makes sense.

If you're interested in checking out the data we used, you can find it here. If you have any questions or comments about this info, please feel free to reach out to us on Twitter, @brian_warehime or @egd_io.

#osint

Enter your email to subscribe to updates.