Tom Hyndman

Tracelight

There were three of us — myself, another experienced programmer who was basically ready to quit and go look for a different job, and a junior developer at their first job. We had just gone through some tumultuous, crazy time at the company. All of the prior dev team had been fired, and a new product manager had already joined and left, and now we were just there wondering if we were going to get fired or if we should just start building something and see what happened.

The three of us met every day for a couple of hours and talked through the problems and what we could build and what we could solve.

We had one thing to work with. In the middle of all of this, one of the founders had worked out some kind of deal to sell these Quick Reports — a one-time process where we’d gather a bunch of data from OSINT sources and put together a report and say, okay, here’s what we found on you. A one-time thing for several thousand dollars. We said let’s take that, we built that. It was pretty simple but they’re selling it.

It had these limitations though — it could time out if we wanted to do any kind of enrichment of the data. We could imagine right away a more complete assessment where we use the results from one OSINT tool to feed them into another OSINT tool, and then we might feed those back into one of the earlier tools, and this process could go on forever. Which doesn’t really work when you’re running on AWS Step Functions, which runs these lambdas kind of sequentially and has a fixed endpoint. We had no way of knowing when to end our step function because we didn’t know when the reports were done. It either just times out and we lose a dump of wherever we happen to be, or we have to terminate the reports logically when we know there’s still more we can learn.

We’re thinking about this problem of how do we know when the reports are done. Then I have a pretty simple idea — and to be honest we struggled to get there, and I struggled to convince them — but after a couple of days of talking it through we all agreed on a pretty simple idea, which was just to conceptually give up on the idea of a report ever being done. Just say that we take a snapshot, and now the report becomes this ongoing monitoring process. Okay, we’re going to monitor you for a week, then give you a snapshot that’s your initial assessment. Then we’ll check up again two weeks from now, a month from now — we’re watching you actively for indicators that pop up.

We had to re-architect everything. We spent a couple of weeks conceptualizing the architecture. The main solution was to build our OSINT analysis engine — really first and foremost an ingestion engine — with a plugin architecture. For any new OSINT source that came up, which happened all the time because our analysts would say here’s a list of eight more sources I want to use, we could build plugins for all of them. We’d have a known format for writing a plugin, import our libraries, hit those API endpoints, and all of the data we got back would fit into the known output format. Very fast, known process that you can easily train someone very junior to write plugins. Which is what we did — our junior person now just had to go through all the OSINT sources and write basically the same code 100 times for 100 different sources, just this very simple plugin format.

The ingestion engine had to take all of that and deal with scheduling and dependencies — especially making sure we didn’t get into crazy recursive loops and dependency loops — and have a clean process for bringing in all of the OSINT data and making a profile out of it for that client. All of these different domain names, fuzz domain names, looking for bad actors and finding all of their assets, potential vulnerabilities, threat actors who’ve squatted on a similar domain — a million things. Looking at IP addresses, scanning servers, whatever. And that data gets fed back into the system — enriched data. Once we’ve got your domain, now we’re also going to check all these fuzz domains to make sure nobody’s squatting on them, and so on. This thing explodes out of control. So the engine mainly has to maintain things in a clean bookkeeping way so that explosion doesn’t actually explode — it just becomes an enrichment process that unfolds over time with some good logical limitations on growth.

I was the one who got the project manager on board. And then I was the one who went into the meeting and pitched it to the CEO and COO — got C-suite approval to go ahead and build the project we had described and make that our product.

We had conceptualized it so well it came together in a couple of days — we had the first version running. We just continued building it out. Once it was running and built out I shifted over to working on the user interface.

When the company was eventually acquired, what I heard from the other guy — the back-end developer who had been hired right after me back in those earlier days — he told me that Tracelight was one of the primary motivators for the acquisition. They wanted software that did just what it did.