Technically Speaking, We Know How to Keep a Secret

Technically Speaking, We Know How to Keep a Secret

Our technical blog series showcases some of the more compelling problems and deep tech our world class software engineering team wrestles with as they build a fast, powerful and scalable actuarial platform that is used by life insurers globally. In this post, Infrastructure Lead Jonathan Wierenga outlines some of the methods our development team uses to ensure customer data is secure.

Keeping customer data secure is one of the most important considerations when developing software for the cloud. At Montoux, we take security very seriously for a range of reasons, but customers trusting us with their sensitive financial information is the most important one. Therefore as a security conscious company we strive to follow industry standard engineering practices in all development and operational work we do.

One of the key pieces of our infrastructure that facilitates secure engineering practices is Hashicorp Vault. Vault is a powerful tool that provides secure and controlled programmatic access to secrets such as access keys, credentials, certificates, and other sensitive data.  We adopted Vault at Montoux because we required a secret management system, as well as Public Key Infrastructure capabilities, which we use for Mutual TLS authentication between users and our servers.  We also wanted a service with strong integration with Terraform (which we use to manage our infrastructure), and AWS. We are also keen adopters of open source software.

Securing secrets in practice

A recent project provided a simple illustration of how Vault can be used to securely store and retrieve a secret. But first, a little background.  We use Splunk to create monitoring visualisations, usage reporting and security alerts for the applications on our platform. The data ingested by Splunk is captured from logs on our application servers, and fed by universal forwarders on those servers.  However some functionality within our platform has been refactored into AWS Lambda functions, and the log data from these functions was not being ingested into Splunk.  As we scale and shift more of our platform to a serverless architecture, ensuring we can still capture these logs is important for ongoing monitoring and analysis.

This project involved creating a mechanism to capture Lambda logs written to AWS Cloudwatch log streams, and index them in Splunk. The remainder of this blog post will show how this is done through a brief example as well as demonstrate how Vault enables secure handling of the secrets involved in the solution.

Example: Splunk ingestion using the HTTP Event Collector

Splunk offers a number of methods for ingesting application data; one of these is the HTTP Event Collector. The HEC input can be used to transport application events directly to Splunk, without the use of a Splunk forwarder - perfect for serverless environments!  We settled on a simple architecture that looks like the following:

In this architecture, any log stream that we wish to capture and filter on is configured to send events to a log-forwarder lambda. This lambda will transform those events into a HEC-compatible format and stream them directly into Splunk.

In order to setup HEC on your Splunk server, you can follow these steps. At the end, you will have generated something like this:

We can now create a lambda that performs the task of receiving Cloudwatch log events, and forwarding these to Splunk. The lambda will act as a subscriber to a Cloudwatch logstream.  As a minimal example, with no failure handling or retry mechanism, the lambda code can be written in Python as follows:

def handler(event, _context):
    hec_token = os.environ["HEC_TOKEN"]
    splunk_url = os.environ["SPLUNK_ADDR"]
    queue = []
    # Cloudwatch logs come through as Base 64 encoded, GZIP'd data.
    compressed_data = base64.b64decode(event["awslogs"]["data"])
    data = json.loads(gzip.decompress(compressed_data))
    for log in data["logEvents"]:
        queue.append(json.dumps({
            "event": {
                "message": log["message"],
                "time": log["timestamp"],
                "sourcetype": "httpevent"
            }
        }))
    request_headers = {
        "Authorization": f"Splunk {hec_token}",
        "Content-Type": "application/x-www-form-urlencoded"
    }
    r = requests.post(f"{splunk_url}/services/collector/event/1.0",
    data="".join(queue),
    headers=request_headers)
    return r.json()

However in this minimal example, the token required to communicate with Splunk is stored as an environment variable.  We can do better by storing it and accessing it through Vault!

Using Vault to manage generic secrets involves 3 parts:
1. Defining a policy that allow authorisation to access a particular secret path,
2. Defining a role to which policies can be attached,
3. Managing the secrets themselves.

In the following example, we utilise Vault’s key/value store to write, and then read back the token we created earlier to a path `secret/splunk-hec`. This can be run as an administrator:

> vault kv put secret/splunk-hec http-token=f3b320d5-e6fa-4fa5-beb4-4dfd0f43289e
Success! Data written to: secret/splunk-hec
> vault read secret/splunk-hec
Key             Value
---             -----
refresh_interval 768h
http-token       f3b320d5-e6fa-4fa5-beb4-4dfd0f43289e

In order to allow our lambda to read from this path, we can utilise Vault’s AWS Auth method. The AWS Auth method allows you to create a role that is tied to an AWS IAM role or user identity.  

In this example, our lambda will run with IAM role `lambda-cloudwatch-forwarder-role`. So once the AWS Auth method is enabled in Vault, we can create a policy and an AWS Auth role for that IAM role as follows:

> cat splunk-hec.hcl
path "secret/splunk-hec" {
  capabilities = ["read"]
}
> vault policy write splunk-hec-read splunk-hec.hcl
Success! Uploaded policy: splunk-hec-read
> vault write auth/aws/role/lambda-cloudwatch-forwarder-role auth_type=iam
bound_iam_principal_arn=arn:aws:iam::xxxxxxxxxxxxx:role/lambda-cloudwatch-forwarder-role policies=splunk-hec-read

The AWS Auth role provides convenient plumbing - when a Vault client authenticates using the AWS Auth method, Vault will match up the appropriate role with the caller’s IAM role or user identity. So in practical terms, when our lambda authenticates with Vault using its IAM role `lambda-cloudwatch-forwarder-role`, then the policy `splunk-hec-read` is applied to that session.
The lambda can then be updated to use Vault’s hvac Python client library. This will allow the function to authenticate and read the stored secret from Vault by using the AWS credentials from the session as follows:

import hvac
def handler(event, _context):
    session = boto3.Session()
    credentials = session.get_credentials()
    vault = hvac.Client(url=os.environ["VAULT_ADDR"])
    vault.auth_aws_iam(credentials.access_key,
                       credentials.secret_key,
                       credentials.token)
    hec_token = vault.read("/secret/splunk-hec")["data"]["http-token"]
    # etc

Voila! The token can be read dynamically without needing it to be specified in the code or injected via the infrastructure.  Obviously you don’t want to print this token value in the logs - that would void the whole process.

And there are other benefits gained by using Vault in this manner:

- The policy can be revoked at any time
- The secret can also be rotated at any time, without any change to the lambda configuration.
- Other identities (such as developers) can be given access to the same secret path through their own policies.

Hopefully this simple example has shown the benefits of using Vault for secret management.

At Montoux we are continuing to investigate other usages of Vault, including the use of One Time Passwords (OTP) for SSH access to avoid the key rotation and revocation dilemma, as well as dynamically-created AWS access credentials.

Back to blog home