Episode #74 Train Hard, Fight Easy with Vaughan Shanks

September 1, 2021 Twentyseven Lanterns

Vaughan Shanks is a Co-Founder and the CEO of Cydarm Technologies, since the company was founded in 2017. Prior to Cydarm, Vaughan worked as a software engineer in a range of Federal Government positions, working with organisations in Defence, intelligence, and law enforcement, in both a Public Service and private sector capacity, in Australia and the USA. Vaughan first developed an appreciation for information security while working as a UNIX sysadmin in the late 1990s.

Listen as Vaughan shares his story, we talk security operating models, and the learnings from y2k that can be applied to the cyber challenges we face today.

Links:

Transcript

CP: Hello and welcome to The Security Collective podcast. I'm your host Claire Pales and today's guest is Vaughan Shanks. Vaughan Shanks is a co-founder and the CEO of Cydarm Technologies since the company was founded in 2017. Prior to Cydarm, Vaughan worked as a software engineer in a range of federal government positions, working with organisations in defence, intelligence and law enforcement in both the public service and a private sector capacity in Australia and the USA. Vaughan first developed an appreciation for information security while working as a UNIX sysadmin in the late 90s. Vaughan has a PhD in computer science from RMIT. Vaughan, it's great to have you on the podcast today.

VS: Thanks, Claire. Great to be here.

CP: So I'm keen to dive straight in. And part of the reason I wanted to invite you on the podcast today was that we share a common, some might say odd passion for collaboration in cyber security. So tell me a little bit about how Cydarm came about and some of the features of it that promote collaboration at speed.

VS: Sure, so the story of Cydarm began for me, maybe six or seven years ago, and I was actually watching a demonstration of a prototype cyber analysis platform. It had been built by some very smart people, and they were pulling in data from a variety of different sources, doing correlations, technical analysis, fusion, dicing, slicing. And, you know, they showed at the end of the demo, tada, we've like uncovered a massive cyber incident with all the linkages to different assets and people and, you know, any questions? And I said, well, yeah I've got a question, what happens next? And the presenter looked at me and said, what do you mean? You just get on the phone and tell someone, there's been an incident. I said, well, that's, you know, I used to work in the government, and I know how large organisations operate, and that's, that's sort of not the complete picture, more has to happen than that. And sometime later, I was I was exploring, you know, problems I could solve. And I started calling some former colleagues and saying, you know, do you have a problem with coordination around security operations and incident response. And what I found is that no one is doing this very well. There's a lot of ad hoc systems put together, flow of information is often long email threads, or awkward conversations over bridge calls, that sort of thing, like a recording of information of decisions is very ad hoc. And so I set about building a platform to address that.

CP: The platform itself obviously solves a particular problem that you saw. How do you put the people around that, that are appropriate to use that platform and really get the most out of it?

VS: Yeah, the Security Operations Centre is your last line of defence in an organisation. And that's really where you're going to have to move quickly to resolve an incident and often involve people from outside the SOC increasingly. So as you escalate an incident, sometimes you need to bring in people from risk, compliance, legal, sometimes external people on retainer. And I guess the question I ask is, how do you make all these different parties collaborate? I mean, prior to the Coronavirus, we all knew how a SOC works. You sit in a room together with a whole lot of screens facing forward and couple of large monitors on the wall with dashboards. And you know, if you have an incident, you need to discuss, you just swivel chair and talk about it. And so the Security Operation Centre is really the last line of defence that an organisation has, after the protective controls have failed to detect an incoming threat. And once an incident is escalated, once you've determined that the threat needs to be brought to a larger audience, then you have a collaboration issue.

CP: Yeah. And I guess from your perspective, you're very focused around the security operations of an organisation. And I'm really interested to know, lots of my clients ask me, you know, what's the one thing that we should be doing to make sure that we're our controls are effective? And I mean, that's a big question. But you know, having the right people in place is one thing and the right platform is another thing. But in your mind, if there was one piece of advice you could give the general community about cybersecurity controls, what's that one thing that they could do to make things better or more efficient and effective?

VS: I think it's very important to have observability around your controls, and not just the obvious observability that the controls will report to you themselves that we've blocked this many threats today. But what are they missing, and the controller is not going to tell you which attacks it missed, because that's why they missed it. So this is where the Security Operation Centre which I might add is quite an expensive enterprise to run. You might as well get good value for money out of that, and actually collect the information from the SOC, use that to inform your understanding of the threat environment and the things that your controls are missing, and then use that data to perform evidence based security. So to really start looking at what is the efficacy of our controls? What's the miss rate? Have we misconfigured something? Is there an entire control type that we're missing, that we just we've never rolled it out, and maybe it's time to, to to invest in some new technology.

CP: And do you think organisations can go through that process themselves? Or do you recommend third parties help them to assess the effectiveness of their controls and the visibility of the controls that they've got in place,

VS: As a vendor, I would say, don't trust the vendor that sold you the control! So you do need to independently measure how well the control is performing. For some larger organisations, you can do this using in house analysts. If your security team is a bit thinner, then you might need to bring in some external help to help you with those decisions.

CP: And when we met, we were talking about incident response, and you mentioned that every incident should be treated as something worth recording or noting. Did you learn that the hard way? And if you did, I guess how has that approach served you well, since then?

VS: So we had a saying when I was working in government, that good record keeping is good governance. And I think that that holds true in the private sector as well, especially if you are in a regulated industry. You need to be tracking any sort of security threat that happens. I think it's not enough to just see an alert on a SIEM platform, for example, and dismiss it immediately and have no record that you did that. I think it is very important to know how many things you're dismissing without investigating. I think it's also important to have a record of, you know, any sort of incident or question that comes up no matter how mild, even an inquiry from user - I saw something funny on my laptop today, I just thought I'd flag it with security. And you say I don't worry about that nothing to be concerned about. Having a record of that means you can at least prove later that you do look into everything, everything gets considered. And you can start to also measure the activities of your security team. So obviously an organisation that's that gets lucky and doesn't have any attacks against it, is going to have very little to record in the way of escalated incidents. But no doubt the SOC team is staying very busy investigating non incidents. And that in and of itself is an interesting metric. It's evidence that maybe the alert threshold is set too low, or the controls are not well tuned.

CP: You mentioned before that a SOC is an expensive part of a cybersecurity team to have in house. I guess is there a threshold or positioning or an operating model in which an organisation should choose to have their SOC in house? Or do you think it's really dependent on the organisation?

VS: It depends on a number of factors, I think, yeah, it depends on what sort of operational risk budget you have to put towards that. It depends on how much risk you're carrying. Whether it's regulatory risk, operational risk, and you know, whether you're willing to trust that to an outside organisation. And in some cases, it makes perfect sense to have, you know, say a CISO or a, you know, some kind of security lead, who then contracts the SOC out to an external MSSP. And that's often a more flexible option than retaining an in house stuff. And I suppose you need to remember though, that you still own the risk. And whatever that team does for you, whatever they report to you, you need to be on top of that, and able to respond. And at the end of the day, you will have to explain to your customers, to your investors, to your regulators, what's happening if there is an incident.

CP: So tell me about how Cydarm fits with that model, or where does Cydarm fit into a security operations model?

VS: Cydarm fits into security operations in a couple of ways. We enable the process to be guided. So we have an inbuilt workflow and playbooks and the workflow that we recommend follows best practice as recommended by NIST. So we don't make this up, we just take the best information we can find, the things that are recommended to us by our flagship customers, and we bake that into the product. The second piece of that is the access control. The access control allows you to label data to a very fine grained level. So we use a model that's, it's not role based access control, it goes a step further, its attribute based access control. Which is something that's widely known in the government, in certainly in the national security community, although not everyone knows it by that name, they just call it classification. So the data is labelled accordingly, and depending on the attributes that a person has when they log in on the system, they get to see different views of the same data. So they might see a different list of cases, they might see different data on the same case, because the attributes differ between one user and another.

CP: Do you think that there are ever times where that limits someone's ability to do their job?

VS: I think if misconfigured, it's entirely possible. We do have a feature that allows you to discover information that you can't read. So you can run a search, and the search will tell you, we have no results for you, but there were results that match your search, you should speak to your manager. And so in those instances, maybe it's something that's it's kind of a flexible need to know where, you know, knowing that we already have some information on this indicator is not a close held secret, but we don't really want to share the context with you, at least unless you stumble across it, and you now need to be read into this other more sensitive investigation that we have.

CP: And so what would you say is time and time again, in terms of feedback that's coming back from your clients, like what's the most valuable or their favourite part of the product? Like what do you find they're getting the most value out of?

VS: That's a really good question. What we find is it differs between customers, to be honest. But I think certainly with the managed security service providers we work with, and regulated entities, by far and away, the most valuable feature is the automated report generation. I saved hours and hours of painstaking work, cutting and pasting data into, you know, into like an Excel sheet or a Word doc.

CP: And so it sounds like the platform has elements of support for pre-incident, during an incident, and I mean, having that type of automated report, when you're doing a post incident review would be incredibly valuable.

VS: Yeah, it definitely drives post incident review, if you want to do a retrospective on an incident, learn from it, perhaps even craft a new playbook based on an incident that was conducted well. It's also useful for regulation. I had a customer just the other day saying the regulator is asking us questions, I had to generate half a dozen incident reports on particular incidents to respond to questions they had. And it's just click, click, click, and here you go, here's what we did. And they say, great, it looks like you guys are doing all the right things, and they move on.

CP: I want to finish up by talking about y2k, which might seem like an odd topic. But you mentioned it to me when we caught up recently, and then more people have mentioned it to me lately. It must be just spinning around in people's minds at the moment, I have no idea why. But in your case, you were telling me about how y2k was dining room conversation for you 20 years ago, and your mum did some work around planning for y2k. And it's got me thinking recently, and I'm wondering what your thoughts are, how do you think the learnings from y2k can be applied to the cyber challenges that we have today? And do you think that there's any type of relevance or alignment between the two, you know, our industry now and that point in time that everybody was sort of holding their breath for?

VS: Yeah, the y2k thing was very interesting. Yeah, this was dinnertime conversation, because my mum at the time was an analyst programmer back in the day and was at a very large enterprise, managing their APAC operations. And she was on point globally for y2k, because part of her remit included New Zealand and then Australia. And obviously, as the world turned in as January 1 dawned, we all knew that Australia and New Zealand were going to be the first cab off the rank and they were going to get hit first if this thing blew up. And I think it's never been completely acknowledged the amount of work that went into preparing for y2k, and the fact that it was it was kind of a fizzer like, nothing really happened, or very few things happened globally. A lot of people assumed that maybe there was never a problem. But it's actually due to the very hard work and careful forethought of a lot of IT leaders and you know, countless staff who worked around the clock to make sure these both were remediated ahead of the event. So I think first of all, kudos to those people of the earlier generation that they got that done. The parallels in cybersecurity are similar, that you have this event that is devastating if it happens. So say a huge data breach or ransomware of an entire organisation with a critical function in society that could literally bring society to its knees, and you know, we all have no energy or no food for weeks, possibly, if something like this happens. The people who understand this problem are often, you know, in a very niche industry, cybersecurity, very technically minded people or leaders in that industry, they get this. And they're doing everything in their power to mitigate it. And so when it doesn't happen, I suppose people could ask the question, why do we spend so much in cyber security, we've never actually had like a cyber apocalypse. So you know the parallel is in that sense, there's a lot of unsung heroes who, you know, we've had some near misses, frankly, where things could have gone a lot worse than they did. And I think the people that enabled those catastrophes to be averted often don't get enough credit. And it's really on us in cybersecurity, to be able to communicate what happened and to be able to communicate the risk. And for people to understand priors, and how taking mitigation actions, taking preventative actions can stop an event from happening. We need the broader audience to understand the importance of these preventative actions and why they need to be resourced.

CP: Yeah. I mean, the prevention that went on back then was so thorough in most organisations, you know, there were whole teams of people dedicated to what if we wake up tomorrow morning, and there's no ability to access our computer networks? You know, how would we function with pen and paper? How long could we function? And these are all conversations that companies are having now, based on potential for ransomware You know, what if we had to shut our whole network down? There are companies out there in the last 12 months that have been in that position. Logistics companies, healthcare companies, reduced to pen and paper unable to run their businesses. And yes, y2k didn't eventuate, but the planning that was done, on the belief that their organisation was going to lose control. I just wonder, could we get organisations to do that level of pre planning and, and practice around cyber security incidents? How do we get to that level of understanding that it can be that catastrophic, and that's actually happening. Y2k didn't happen, cyber security incidents are actually happening.

VS: Yeah, I think there are enough illustrative examples out there. And we had, you know, a national level logistics provider ransomware twice about three months apart last year, that should sound alarm bells. I was having a conversation with a paramedic a few months ago. And I said, how about that hospital thing the other week, I guess that would have affected you guys. And he said, that's still ongoing. And I was shocked two weeks later. And I said, what do you mean? He said, when we when we turn up with a patient at the hospital, the doctors are using WhatsApp. Everything is being done on paper. That, to me is shocking, that we live in a society, an allegedly advanced society we're you know, one of the OECD leaders in most things, we have this great healthcare system, but it's so brittle. And these things, you know, if they don't get the appropriate resourcing put against them, you know, we could find ourselves with, you know, massive outages like this that can have sweeping effects and know eventually someone's going to get hurt.

CP: Yeah, I mean, two things on that. One is that healthcare time and time again, is on the top of the list for organisations where there's concern around cybersecurity. The nature of the data that they have access to, and that they hold and that they're custodians of, and the types of systems that are running to run hospitals and run pathologies and those types of things. But on the other side of that, it just proves the requirement to practice your Incident Response Plan. And I've talked about this on the podcast loads of times, but practising would tell these organisations that the decisions that need to be made and the potential for systems to not be able to be back up and running because there might be dependencies in other places. And a lot of the incident response plans and crisis management plans have these sort of chronologies of, you know, this is the time we tell the board and this is the time all the systems are back up and running again. And you know, that might be 48 hours. Well, sometimes it's not, sometimes you can't plug everything back in. Sometimes your third parties don't want to plug back into you, after an incident because there's concern about forensics and you know, data cleansing and all sorts of, of activities that need to occur for you to be confident that you can fire things back up again. It doesn't always go according to the plan. So practising those plans and understanding, even in a perfect world, how long it would take to reboot some of your systems or rebuild. There's lots of organisations out there that haven't done that thinking.

VS: Yeah, I think it's definitely important to have a plan in place to do tabletop exercises. I'd say, you know, even and I think this will be mandated by the SOCI bill, which I think most of that we can agree is common sense, but doing drills on a regular cadence where we simulate an event and do a stress test on the organisation say, how would we respond? As you know, I did some time in the in the army back in the day, and we used to say, train hard, fight easy. And so if you, if you put in the training, if you've got the well rehearsed drill, then when things really get bad, you'll know exactly what to do. Everyone knows their place. They know what the lines of communication are, you can work more efficiently, and hopefully bring the systems back online sooner.

CP: Yeah, I've got to say that every company or board that I speak to about incidents in the wake or in the days that pass after things are back up and running, though, they will consistently say we just wish we had been better prepared. And obviously an incident or a crisis, you can never completely predict what's going to happen. But as you say, like even in war, in the armies and those types of things where you practice over and over again, at the time, it will never go exactly according to the plan. But at least if you have some rehearsal and some understanding of the process that you're about to go through, and the right leaders in place, then it puts you in a much stronger position for resilience and, and response and recovery, probably more to the point.

VS: Definitely. And I think too, we should applaud those organisations that release a public report after a breach incident and give a blow by blow of how it happened, how they were exploited, and you know the ramifications. I think it's, you know, it is showing vulnerability at an organisational level. And it's brave to put that out there. And I think, you know, the logical conclusion of where this leads is something a bit like the ATSB. If we can mandate, getting access to the results of an incident like that, and actually use that as a case study, have a public conversation about, you know, what steps we need to take to avoid incidents like this in future. Do regulations or processes, best practices need to change to mitigate future events like this one? And, you know, I think as with rail accidents, as with air crashes, I think this will, I mean, they're already talking about this in the USA right now. So I think it's only a matter of time before it comes to Australia.

CP: There's a lot of synergies with safety, and cybersecurity. And people have been talking about that for a long time. But I agree with you that now would be a good time to start to put some of those governance processes, reflections, examples, you know, out into the public arena so that organisations can learn from it, and the rising tide lifts all boats, as they say.

VS: Absolutely. And I'll say this as a little biased as an ex-government person. But, you know, by world standards we have a very effective bureaucracy in Australia. And you know, I think, actually, we could if this was resourced properly, we could do a really good job on this. And it would be a massive public service to or a public benefit, I should say, to do something like this.

CP: Vaughan, thank you so much for your time today, I've really enjoyed the chat. As always, I've talked like way longer than I originally planned. But I really want to thank you, people can check out Cydarm, we will put some details into the show notes. And thanks so much. And I'd love to have you back again to chat.

VS: Thanks Claire, great chat, really enjoyed it.