Wednesday, April 17, 2019

Deploy Cisco ASAv in AWS for Infrastructure VPN with TONS of pictures

The AWS Virtual Private Gateway (VGW) is flexible, and integral to many of the AWS-native network backbone features exposed to users. However, it has some significant throughput drawbacks - in testing, we only see about about 640Mbps, or about 80MBps. And beyond that, flows seem to be limited to 400Mbps each, so utilizing the full 640Mbps requires multi-threading on the part of the application. The documentation claims the VGWs are capable of pushing 1.25Gbps, but we have yet to see close to that in our day to day testing.

To be fair to AWS, they do offer a pretty significant asterisk on the 1.25Gbps throughput claim.

If you're paying for a 10Gbps Direct Connect (DX), you're probably not too happy about being limited to 6.4% of the speed you thought you were getting. Or even 12.5%, if you get closer to the theoretical 1.25Gbps speed Amazon claims the VGW can push.

The real answer here is to use a Private Virtual InterFace (PVIF). But PVIFs have security drawbacks - they have none. The AWS security model is based around endpoint security ONLY - Network ACLs (NACLs) and Security Groups (SGs) are all that you get. Other features are coming online over time, but there are many reasons network interconnect filtering might be required. Here are a few examples, by no means an exhaustive list

  • You need high bandwidth, requiring a PVIF - no interconnect filtering for you
  • You're connecting to another cloud via a VPN - no interconnect filtering for you
  • You're connecting to a partner's AWS VPC, via VPN or VPC peering - no interconnect filtering for you
You're noticing the trend here. The AWS-native toolkit isn't great here, so partners have stepped up, Cisco among them. The Cisco ASAv is a virtual machine that emulates the feature-set of the eponymous Cisco ASA series. Right now the highest throughput supported in AWS (or any other cloud) is 2Gbps with an ASAv30. It's still only 20% of the 10Gbps DX your company is springing for, but it's closer, and the security gains from deploying it are oceans away from the nothing-nada-none interconnect security on the AWS-native options. 

The Cisco ASAv50, which would help us utilize 100% of the 10Gbps DX at our hypothetical company, isn't yet available on the public cloud. True to form, the AWS Marketplace entry for the ASAv shows 2Gbps throughput as the highest yet available. Fingers crossed it becomes available over time.

None, enough prologue. Let's build this thing. First, go the AWS Markplace page for the ASAv and subscribe. There are two options:

  • Pay-as-you-go: You're charged in an ec2-like way for per-minute. The reviews on this version aren't great, mostly from folks misunderstanding billing, or maybe from Cisco over-billing. Regardless, I don't recommend this one. 
  • BYOL: Bring Your Own License. The ASAv doesn't actually REQUIRE a license. However, you'll be locked in at 100Kbps throughput until you purchase a license from a VAR, register it with Cisco, and get it install - Cisco Smart licenses only. 
Pick your poison, then hit "Continue to Subscribe". It'll take a few minutes, then you'll get an email that shows you are now subscribed. You DON'T need to continue to Configuration - it's actually easier if you don't. 

Now, let's get this really moving. Log into the AWS console --> ec2, and then click "Launch Instance" in the top right.

Search for ASAv, and find the BYOL device and hit "Select" next to it to start configuring the device.

Take note of the next screen, particularly the right side that shows rates for the different ec2 instance types. These costs vary by region (and possibly by contracted rate with AWS?), so yours might be slightly different.

Now compare the rates and sizings with the AWS Marketplace entry for the ASAv - it shows the suggested sizings for the different ASAv types and throughputs.

For me in this region, it's clear that the c4.xlarge is my best bet to get 2Gbps (ASAv30) with the cheapest rate. So let's do it. Hit "continue" on the ec2 deployment wizard, check the box and continue on to "Next: Configure Instance Details".

Here you get to decide where to put your ASAv device. Which is an interesting question, because this is a transitive device, not very common in cloud-land. The first interface built on an ASAv is called the "mgmt0/0" interface when you log into the ASA. Cisco recommends you use this as an actual management interface, but I like to use it as an internal interface, so let's put the device into a "private" subnet. Private doesn't mean much here - just that it is reachable from the inside of your network so you can route traffic through it.

Remember to also hit the "Enable termination protection" option to prevent someone from accidentally terminating this instance and deleting all your hard work.

Scroll down, and let's add our second interface to the ASAv. This one will be called "Gigabit0/0" locally on the ASAv. I like to put this in the "public" zone, and use it for inbound traffic. That requires some very specific network items, such as:

  • This interface has to be in a subnet with a default route through an Internet GateWay (IGW)
  • The NACL on the subnet has to permit inbound traffic on VPN ports: 
    • udp/500: ISAKMP
    • udp/4500: ESP, used for nat-traversal, highly relevant in the AWS environment where all hosts have private IPs and are behind 1:1 NAT to their Elastic IPs (EIPs)
Let's add an interface to our ec2 instance and put it in the "public" subnet. 

Now, it seems like you're done with this page, but this is where you'll find the first gotcha of this deployment. When the Cisco ASAv comes up, its configuration will be blank, which prevents anyone to connect to it over the network. There is no console to connect in cloud-land, so we need a way to give the device its initial configuration to permit us to connect to it. Cisco has an example of what they call a "0-day Configuration" posted on their docs page. It'll get the job done, but let's modify it for our use-case. I'll highlight the information that I modified:

hostname NewASAvThanksKyler
int gi 0/0
 nameif outside
 security-level 0
 ip address dhcp setroute
 no shut
interface management0/0
 management-only  <-- remove this line if you want to use it as your internal interface 
 nameif inside
 security-level 100
 ip address dhcp
 no shut
same-security-traffic permit inter-interface
same-security-traffic permit intra-interface
crypto key generate rsa modulus 2048
ssh inside
ssh outside    <-- Set this one to "ssh 0 0 outside" if you want to connect to it from the internet - say, if you aren't privately connected to this VPC. 
ssh timeout 30
ssh version 2
username admin password SuperSecretPassword privilege 15   <-- Set your own password
username admin attributes
service-type admin

Customize it however you like, then expand the "Advanced Details" panel at the bottom of the screen and paste in the configuration you just created. Leave the radio button on "As text". Then click "Next: Add Storage".

Click through the storage panel with no changes, which brings us to tags. Tags can be used for all sorts of organizational and billing purposes. Here, we'll just give our hosts a name. Either click "Add a Name tag" or just create a new tag with Key of "Name" and call it whatever you want. It makes sense to have the AWS name match the Cisco configuration name, something we didn't do here.

The next panel asks us for a SG to assign to our host. Every host in AWS has a SG that controls the ingress and egress of traffic to it. The default permits inbound tcp/22 (SSH) from the internet, which is a terrible, bad, awful idea. The host will be protected by an SSH key, but why permit any schmoe on the internet to connect to it? Update the rules shown below. You CAN put "" on the tcp/500 and tcp/4500 rules, which would permit anyone on the internet to try and establish a VPN to your host, but I don't recommend it. No reason to expose your devices more than you need to, even when the device "Security" in the name.

Now you can review. Make sure everything looks like you expect. Then hit Launch! And we're off to the races! Oh wait, we're not. We need to select an SSH key to control access to the box. Either select an existing SSH key or build a new one. In either case, make sure you have the private key, then check the box and hit "Launch Instances".

The ASAv will build. It takes about 10 minutes to build and come up to where you can reach it, but that time will fly - there's a few more steps we have to take before this device is reachable. Let's add an EIP - a public Amazon-owned IP so this device can reach (and be reachable from) the internet. Before we go and grab the EIP, we need to learn where we're going to put it. Usually you'd associated an EIP with an ec2 instance, which is simple with an instance that has a single interface. However, we have a few, and could potentially have many more. So we need to learn exactly which Elastic Network Interface (ENI) we want to attach it to. In the ec2 panel, click on the interface, then go to Actions --> Networking --> Manage IP Addresses. We don't care about the actual IPs yet - what we're interested in the ENI of eth1, the SECOND interface on this device. Remember, the second interface is gi0/0, and the one we set in the "outside" zone in our Cisco config. Copy the ENI-XXX to your clipboard or write down the last few.

Now let's go get the real EIP. Go to the ec2 panel, then on the left side, under "Network & Security", click on Elastic IPs.

In the top left, click "Allocate new address", then "Amazon pool," then click Allocate to see your new IP. I'd recommend writing the IP down, especially if you have a bunch of other EIPs, so you don't get it mixed up. Click close, then click on the EIP in your list, Actions at the top, then Associate. Bam, your device is now on the internet.

Cisco AWS ec2 instances have a built-in security measure to prevent hosts from becoming transit devices and siphoning data from your network. That security measure is called the source/destination check, and makes sure that only traffic destined TO or sourced FROM an IP the host owns passes to or from the host. Which is exactly the rule we need to break for this ASAv to act as a transit device, so let's turn it off. On the ec2 panel, under Network & Security, click on Network Interfaces.

As an aside, if you need to find an IP in your org, regardless of what service is consuming it, that service has to create an ENI to slurp up traffic. So if you want to find out if an IP belongs to RDS, Batch, Directory Services, or a plane-jane ec2 instance, look for the ENI as a shortcut.

An easy way to find both ENIs assigned to our host is to search for the SG name that we created. Bam, found. We only need to disable this check on the "inside" interface of the ASAv, which will be the first interface we assign. You can find it by looking for "Primary Network Interface" in the description.

Click on the Primary network interface and click Actions --> Change Source/Dest. Check. Then turn it off and hit save.

Once the host boots, ssh to it using the SSH key you selected when building the host and you are in! If you get an error message from a mac or linux client when attempting to use the SSH key that the key is inaccessible even though you know it is, update the permissions to be more specific:
computer:~ kyler$ chmod 400 Downloads/KylerASAv.pem 

Now that the ASA is built, reachable, and all interfaces are in the right subnets, verify your internet access. If you can reach out, the ASAv is done. Now you just need to build some VPNs!

The one major next step is to update any routing tables in your VPC to point at the internal interface (ENI) of the ASAv so it can carry the traffic across. Find the ENI of the Primary network Interface just as above, then go the VPC panel in the AWS web console. Click on the route table you want to edit, then on "Routes," then on "Edit Routes". Add a new route to whatever the partner's network is that you want to route towards, then click in the "Target" field. You'll need to select "Network Interface," then you can paste int the entire ENI-XXX string where you want to send this traffic. Hit "Save routes" and your changes are live - you're sending your internal traffic to the internal ENI of the ASAv device.

Good luck out there! 

Stitching Clouds - Azure to AWS Cisco CSRs behind IGW (static NAT)

The Microsoft Azure cloud has made some tremendous strides forward in the past few years. Despite entering the market years after the market leader (AWS) had established a dominance, they have quickly built their market share. The playbook was classic Microsoft - court enterprises, and leverage the overwhelming dominance of the Windows operating system with packaged and discounted OS licensing costs.

Because of this and more, Azure now holds a strong second place in the cloud environment market. Most infrastructure abstractions copy the AWS model, but use different names. Cisco has provided a solid mapping of names between the services.

Most enterprises begin with a single cloud, and quickly realize that each cloud has its own benefits and potentially unique features, and almost all enterprises are now what's called in industry as "multi-cloud". That means linking the network of these cloud together so all services can communicate.

A good place to start is Azure, since it will help us build our Cisco CSR IOS configuration once built. There are a half-dozen steps to building the VNG (Virtual Network Gateway, the parallel for the AWS VGW - Virtual Network Gateway), and which Azure's documentation covers very well, so I won't rehash it. This doc will walk you through creating:

  • Gateway Network (where Azure-build public-facing services, like the VNG, will live)
  • Virtual Network Gateway (VNG), the device which terminates infrastructure VPNs
  • Local Network Gateways, the reference object which contains public IPs and other context information for your public VPN endpoints, similar to "Customer Gateways" in AWS-land
  • Connections, finally, the real VPN between your Azure tenant and the non-Azure VPN endpoint
Now, the docs from $MSFT are excellent, but there are a few low-hanging fruit and gotchas, and I'll talk through them here. For the VNG, here's a few gotchas: 
  • HA - High Availability. If you're an enterprise, you want HA. However, the VNG doesn't launch with HA enabled by default, like it would in AWS. So let's go into the VNG and add HA. Flip "Active-active mode" to "Enabled". 
    • NOTE: When you save after updating the active-active mode, this will cause the VNG to be torn down and non-functional for about 20-30 minutes while it is being rebuilt in HA-mode. All your Local Network Gateways and Connections will endure, but the configuration of each will be affected, so it's a good idea to do this before moving on to the other items. 
  • Set the BGP ASN - it has to be a public ASN (non-registered), but can be an other. Make sure it doesn't overlap with your CSRs or other ASNs, or your BGP atribute routing will get complex. 

Go through and build all of these items for your CSR or CSRs. For the Local Network Gateway, make sure to add the BGP configuration and point at the public (Elastic IP) of your CSR. For the BGP peer IP address, this IP will be inside your VPN, so it will be a local loopback on your device. I created a local loopback on my CSR with IP This interface and IP will be what you tell BPG to source your connection from when you build BGP across this tunnel. 

Once all that configuration information is saved, switch to the "Connection" configuration item. At this point, it knows everything it should be doing, but there is a gotcha, where you need to enable BGP across this connection. It's a simple on/off, so flip it to "Enabled" and hit save. It'll take a minute to update. 

Now let's let Azure do the work for us. Navigate into your Connection to your CSR, click on "Overview" in the left column, and then click "Download Configuration." There's lots of configuration types, pick the one most relevant to you. For our Cisco CSRs, the one that is easiest for me to read and works is the IOS (ISR, ASR) template. 

My advice is to copy all the configuration out to a notepad and make sure you save a copy. Save as you go, we'll be making a few very important changes in order for this to work. 

Right at the top, there's a few items to check. First, make sure there are two public IPs listed. If there aren't, your HA mode isn't finished rebuilding or isn't enabled. I don't recommend any enterprise of any size move forward without HA. It's always worth the investment. 

!   > Public IP addresses:   
!     + Public IP 1:
!     + Public IP 2:

Second, make sure this connection is built with BGP enabled. If you don't see "True" here, or if the line is missing, go back and double-check your Connection, Local Network Gateway, and Virtual Network Gateway config - one of them will be missing the BGP=Enabled section or was rebuilding after a save when you downloaded config. 

!   > Azure virtual network
!     + Enable BGP:            True

In the IKE section, make sure to update the local IP address to your PRIVATE IP address. Hosts in AWS aren't directly on the internet, they almost always use an EIP to do 1:1 NAT. 

crypto ikev2 policy AzureCSR1
  proposal AzureCSR1-proposal
  match address local

The IKEv2 policy has the same issue, matching on the public (EIP) address, rather than local, and needs to be updated: 

crypto ikev2 profile AzureCSR1-profile
  match address  local

The tunnels that are built require a slight modification - when HA is used, there are two destinations, requiring two tunnels. In Cisco-land, when two tunnels share a crypto profile, they require the "shared" keyword. I have no idea why this is, but Azure's auto-config builder misses this, and that means the CSR can only bring up one tunnel at a time... unless you make this change: 

int tun 90
 tunnel protection ipsec profile AzureCSR1-IPsecProfile shared

Remember that all tunnels in Cisco's transitCSR product for AWS live in VRFs, and it's a good idea for these tunnels to also exist in VRFs. First, let's build the VRF. 

ip vrf AzureCSR1_VRF
 rd 64518:200
 route-target export 64518:0
 route-target import 64518:0

Then put the new loopback interface in the VRF

int loopback 90
 ip vrf forwarding AzureCSR1_VRF
 ip address
 no shut

Make sure to update the BGP config to source traffic from your new interface, and make sure to configure the BGP neighbor in the Azure connectivity VRF:

router bgp 64512
 address-family ipv4 vrf AzureCSR1_VRF
  neighbor remote-as 65555
  neighbor activate
  neighbor update-source loopback90

And there's one final oddity with Azure's provided config - they recommend you use an APIPA reserved address (169.254.X.X) for your tunnel, and a /32 to boot. Which means the router can't inherently understand which traffic to send over the tunnel, even though all the items are in place. the trick to kick-start it all is to add a static route over the tunnel towards the BGP neighbor so they can establish a neighborship and start routing. If you are using HA mode (again, highly recommended, make sure to send each BGP neighbor over the appropriate tunnel).

ip route vrf AzureCSR1_VRF tun 90
ip route vrf AzureCSR1_VRF tun 91

And bam, you now have routing between your Cisco transit CSRs and an Azure tenant. There's no fancy lambda to configure connectivity that I'm aware of - but if you stumble across any Azure-focused Cisco automation here, please link in the comments and we can share with the community.

Thanks all. Good luck out there!

Tuesday, April 16, 2019

AWS: Check if ec2 keypair private / public key matches

Hey all,

To validate if the ec2 SSH public key value you see in your AWS console matches a local private key, you can run the following command:

MacComputer:~$ openssl pkcs8 -in ~/.ssh/kyler_key.pem -nocrypt -topk8 -outform DER | openssl sha1 -c

These two match! SSH auth will let me in.

Good luck out there!

Monday, April 15, 2019

Cisco TransitVPC in AWS - How to Use Lambda To Update CSR Config Automatically

Cisco in the Cloud

Cisco has had a hard time adjusting to cloud-centric enterprises. It doesn't help that these cloud providers aren't too keen on third parties providing "core" services in their environments. Both AWS and $MSFT Azure have taken the Microsoft model where popular third-party services are emulated and built into the core product, providing services that lack customization but are often cheaper, better supported, and scale better than these third party services.

Something that AWS and Azure don't do well yet is something I'm calling mid-network filtering. What I mean by that is that both cloud providers have taken to an endpoint-focused security model, and build their security tools around locking down the endpoints. There's nothing inherently wrong with this model - in fact, if you need to choose a model, endpoint-adjacent network filtering is excellent. But in my experience something that's lost with cloud is flexibility - the ability to build the network and access controls in a way that makes sense for your business. Many network and virtualization vendors, including Cisco, have focused on providing the most flexible and extensible methods of building... whatever you want, in whichever want you want it! Because of that, their products can be confusing and sometimes esoteric - try writing a universally valid CoS policy for Cisco switches across their different small biz to core enterprise switches - it's impossible. But I digress.

Cisco's foray into covering one of the current gaps in AWS cloud computing is called the "Transit Network VPC." It's a pre-packaged VPC environment with all the accoutrements, including subnets, routing tables, NACLs, security groups, etc. all built to serve one purpose - hosting 2x redundant Cisco CSR routers which operate as a hub for the VPNs to every spoke cloud VPC. Because of their positioning in the network, and because they're a full fledged enterprise network platform, they're able to perform ingress and egress filtering (including ZBFW) on each and every tunnel (VPN) interface to every VPC. That is world's away from the endpoint-security-or-nothing approach that AWS has implemented in the rest of the network.

Here's a link to the marketplace entry: Cisco Cloud Services Router (CSR) 1000v - Transit Network VPC - BYOL

For now, AWS supports this product and even hosts documentation for how to deploy it. As an aside, it can't last - the CSR routers, even fully built, can only push 4.5Gbps, and the VPN tunnels they're terminating can only push 400Mbps of throughput due to VGW capacity limitations. Native tooling, including non-ec2 based appliances, as well as native backhaul constructions like VPC peering, DX gateways, transit gateways, etc., are going to push nearly every customer away from devices like this.

But let's talk about something that these devices do well. In order to build the VPNs from every VPC you control and want to communicate together, each VPC gets a CloudFormation (CF) stack that polls the hub and pushes the local VGW configuration to an S3 "config" bucket in the hub VPC where the CSRs live. There are a series of lambdas there that are triggered by this upload, and read the information uploaded. This connection info is transformed into Cisco IOS configuration and pushed automatically via Lambda automation to the CSRs.

The real power here is that the Lambda doing this pushing is written in python, and is extractable from the vanilla Lambda provided by Cisco. The file is called "", and looks a little like this:

Standard python code - cryptic, and not very useful. However, scroll down and you'll see some human-readable code showing what commands are being executed:

These commands are executed in order on both CSRs, and it's easy enough to add a custom ACL to each tunnel automatically when it's built:

And remember that every VGW forward all the networks it knows about to every BGP VPN peer, which can very VERY quickly lead to routing conflicts and shut your network down. To prevent, this, how about we add an inbound prefix-list to the BGP peer to only allow the networks we'd like to receive, and no others.

And keep in mind that VGWs will only accept 100 routes from a BGP neighbor before shutting down the neighbor for a 5-minute hold-down timer. So let's prevent more than say, 20 routes from coming into our BGP transit network before shutting the neighbor down. I'd rather shut down a single spoke than forward more than 100 routes to all the peers and shutting down the entire global network. You can do this with "max-prefix" command, like this:

Save all your work and your updated lambda is now good to go. Zip up the entire folder structure into a .zip file - be careful on macs, their default right click "compress" option will ignore any files and folders with a leading period, which will skip a required library and cause your lambda to fail to run when it's uploaded.

Remember that .libs_cffi_backend folder and file! You need it!

Now log into your AWS Lambda console and open up the "cisco configurator" lambda. Scroll down to "Function code" and click the upload button. Find your updated and zipped up Cisco Configurator lambda and click upload.

Bam, you're done. Make sure to check out the logs to verify the lambda is properly running. The most frequent problem I ran into was skipping the libs_cffi_backend folder and library file, which caused it to error out saying it couldn't find that exact file.

If you have any issues after uploading the file to AWS, turn up the logging to "DEBUG" and hit save. The monitoring logs will generate WAY more information, and are very verbose about what's wrong.

And that's all - your lambda will run automatically each time a new spoke VPN with the "poller" lambda updates its VGW tags for the VPN transit network - either turning it off or on will generate a new log file upload, which will trip this lambda to run. For extra fun, turn on SSH debugging and watch your lambda get to work.

Good luck out there,

Next up: How to bring all the Cisco lambda calls in-house, to avoid vendor source control of these files