Insights About Emails and Passwords
Hey guys, good morning. I've been parsing/indexing a lot of dataleaks (email & password), over than 70 million of lines and I'd like to share some insights I had while doing it. Just curiosities, but maybe someone can find this points usefull. In short, I'm gonna talk about a defensive point of view. Things that gave me trouble even when the passwords leaked in plain text. Imagine that this are some actions you may take to gain another level of safety, at least against most of the indexers in the wild.

How Parsing and Indexing Works

The first thing you need to have in mind: once your email and password leaked, how they will be processed? If you though about ReGex, you are right. As you know, data leaks in millions at time, so is virtually impossible to handle by hand. That said, the criteria used by those regexes can be bypassed. Let's talk about email addresses, take a look:

This is what an parser expects to find. No jokes, just human-readable lines. However, you can use almost any symbol you can imagine. So, when it comes to safety, maybe you want to difficult the things. See those:

Strange? Actually not, I've found a lot like this. In order, the first example have two "@". Believe me, a lot of parsers will break in this situation. Their regexes will match the first ex@mple, then reach the second "@". Maybe they will think this is, in fact, a password. In this case, they will continue until they find a valid email. The second and third example uses pairs of "()" and "<>". The point here is that some databases include this pairs too, so the leaked data will looks more or less like this:


That said, it's probably that the parser try to remove this symbols. Also, it will remove the original ones, resulting in invalid addresses. The last example includes a lot of things. Most part of those symbols will not be accepted on the creation of the email because they can create problems there too. But, if a quotation mark, for example, become a valid symbol, you certainly broke every single parser you can imagine. They gonna break and skip your address while parsing a file.

The same applies to passwords. Including quotations marks, parentheses and the like can let your password pass throug a parser almost invisible. Also, you can use blank spaces. And, if you feel very angry, use non-UTF characters. Those gonna break the entire file at the reading time and will need special intervention. Those are the worst.

The Separators

Other thing you may try is to use common separators on your addresses and passwords. Every single file I've parsed used one of those:

, : ;

My final file use the : separator, like this: And more than once I've found something like pass:word. Looks simple, but separators are not removed from the original sources because the parsers will need they. And most of they are not prepared to find more separators than columns in each line. This will break the line or index an incomplete password or address. Specially if used at the beginning or at the end of the value: ;pass,word;

Fun Fact About the Placeholders

To end our talk, let's talk about placeholders. When a indexer finds a invalid line, he can take two actions: drop the line or index the valid part and concatenate a placeholder to replace the broken part. Some indexers avoid to include placeholders because the database will have more value if all its lines are valid. So, when they find a line with an placeholder, they will drop it. That said, if your password looks like something below, there are a chance that it will be ignored by a indexer:

" "

Please, do not use weak passwords like those. This is just a fun fact I found.
That's really interesting, thanks for the post Corvo.
Specially the placeholders fact.

And for users points of view, the minimum you can do is to never use the same password twice, and to stay alert about leaks.

Possibly Related Threads…
Thread Author Replies Views Last Post
   OSEP Evasion Techniques and Breaching Defenses deepflowbr 1 4,345 03-22-2021, 10:05 PM
Last Post: Vector
  unmasking Cloudflare and Tor hidden services Insider 1 5,246 02-12-2021, 12:19 AM
Last Post: ueax
  sans,elearnsecuirty,offsec,eccouncil courses and more AltaScientia 7 10,237 01-12-2021, 08:29 PM
Last Post: Insider
  eLearnsecurity, offensive security and SANS dangcracker 15 29,806 12-20-2020, 09:47 PM
Last Post: skeebo