Cisco in the CloudCisco has had a hard time adjusting to cloud-centric enterprises. It doesn't help that these cloud providers aren't too keen on third parties providing "core" services in their environments. Both AWS and $MSFT Azure have taken the Microsoft model where popular third-party services are emulated and built into the core product, providing services that lack customization but are often cheaper, better supported, and scale better than these third party services.
Something that AWS and Azure don't do well yet is something I'm calling mid-network filtering. What I mean by that is that both cloud providers have taken to an endpoint-focused security model, and build their security tools around locking down the endpoints. There's nothing inherently wrong with this model - in fact, if you need to choose a model, endpoint-adjacent network filtering is excellent. But in my experience something that's lost with cloud is flexibility - the ability to build the network and access controls in a way that makes sense for your business. Many network and virtualization vendors, including Cisco, have focused on providing the most flexible and extensible methods of building... whatever you want, in whichever want you want it! Because of that, their products can be confusing and sometimes esoteric - try writing a universally valid CoS policy for Cisco switches across their different small biz to core enterprise switches - it's impossible. But I digress.
Cisco's foray into covering one of the current gaps in AWS cloud computing is called the "Transit Network VPC." It's a pre-packaged VPC environment with all the accoutrements, including subnets, routing tables, NACLs, security groups, etc. all built to serve one purpose - hosting 2x redundant Cisco CSR routers which operate as a hub for the VPNs to every spoke cloud VPC. Because of their positioning in the network, and because they're a full fledged enterprise network platform, they're able to perform ingress and egress filtering (including ZBFW) on each and every tunnel (VPN) interface to every VPC. That is world's away from the endpoint-security-or-nothing approach that AWS has implemented in the rest of the network.
Here's a link to the marketplace entry: Cisco Cloud Services Router (CSR) 1000v - Transit Network VPC - BYOL
For now, AWS supports this product and even hosts documentation for how to deploy it. As an aside, it can't last - the CSR routers, even fully built, can only push 4.5Gbps, and the VPN tunnels they're terminating can only push 400Mbps of throughput due to VGW capacity limitations. Native tooling, including non-ec2 based appliances, as well as native backhaul constructions like VPC peering, DX gateways, transit gateways, etc., are going to push nearly every customer away from devices like this.
But let's talk about something that these devices do well. In order to build the VPNs from every VPC you control and want to communicate together, each VPC gets a CloudFormation (CF) stack that polls the hub and pushes the local VGW configuration to an S3 "config" bucket in the hub VPC where the CSRs live. There are a series of lambdas there that are triggered by this upload, and read the information uploaded. This connection info is transformed into Cisco IOS configuration and pushed automatically via Lambda automation to the CSRs.
The real power here is that the Lambda doing this pushing is written in python, and is extractable from the vanilla Lambda provided by Cisco. The file is called "cisco-configurator.py", and looks a little like this:
These commands are executed in order on both CSRs, and it's easy enough to add a custom ACL to each tunnel automatically when it's built:
And remember that every VGW forward all the networks it knows about to every BGP VPN peer, which can very VERY quickly lead to routing conflicts and shut your network down. To prevent, this, how about we add an inbound prefix-list to the BGP peer to only allow the networks we'd like to receive, and no others.
And keep in mind that VGWs will only accept 100 routes from a BGP neighbor before shutting down the neighbor for a 5-minute hold-down timer. So let's prevent more than say, 20 routes from coming into our BGP transit network before shutting the neighbor down. I'd rather shut down a single spoke than forward more than 100 routes to all the peers and shutting down the entire global network. You can do this with "max-prefix" command, like this:
Save all your work and your updated lambda is now good to go. Zip up the entire folder structure into a .zip file - be careful on macs, their default right click "compress" option will ignore any files and folders with a leading period, which will skip a required library and cause your lambda to fail to run when it's uploaded.
|Remember that .libs_cffi_backend folder and file! You need it!|
Now log into your AWS Lambda console and open up the "cisco configurator" lambda. Scroll down to "Function code" and click the upload button. Find your updated and zipped up Cisco Configurator lambda and click upload.
Bam, you're done. Make sure to check out the logs to verify the lambda is properly running. The most frequent problem I ran into was skipping the libs_cffi_backend folder and library file, which caused it to error out saying it couldn't find that exact file.
If you have any issues after uploading the file to AWS, turn up the logging to "DEBUG" and hit save. The monitoring logs will generate WAY more information, and are very verbose about what's wrong.
And that's all - your lambda will run automatically each time a new spoke VPN with the "poller" lambda updates its VGW tags for the VPN transit network - either turning it off or on will generate a new log file upload, which will trip this lambda to run. For extra fun, turn on SSH debugging and watch your lambda get to work.
Good luck out there,
Next up: How to bring all the Cisco lambda calls in-house, to avoid vendor source control of these files