Centralized Inventory of Managed VMs in AWS

Sunil Kumar Mohanty
Fortum Technology Blog
5 min readApr 9, 2024

--

In any organization, there is a constant need to manage and operate VMs (virtual machines) centrally. Motivation includes standardization, patch management, better cost management by reservations or savings plans, ensuring compliance, listing instances which have certain applications installed and so on and so forth.

I would personally prefer to transition away from VMs towards managed services or serverless architecture. That way we abstract the infrastructure management and lower our operational overhead. However, despite our best efforts and good intentions, it is a fact that neither us nor the majority of organizations can completely shift away from having, at least some, virtual machines to manage.

The predicament arises due to legacy applications, managed services not supporting some use cases (e.g. Timescale DB), ecosystem compatibility (integration with some enterprise tooling), regulatory constraints, licensing considerations and more. So, the natural first step is to ensure we have an accurate, centralized repository of all the VMs configuration.

The AWS built-in service for configuration management is AWS Config.

AWS Config’s multi-account and multi-region data aggregation does help with the discoverability of VMs¹. However, its scope is limited to only centralized configuration management. It looks at EC2 instances from the outside and sees its configuration and compliance posture. For example, it helps answer the below questions

  • List of all EC2 instances
  • List of publicly available EC2 instances
  • Instances with specific tags
  • Non-compliant instances

However, Config does not provide any information about an instance from the inside. For that, the Systems Manager Inventory comes to the rescue.

Systems Manager Inventory — single account view

Systems Manager Inventory provides a wealth of metadata and configuration information about managed instances². The information collected by Inventory is as below³

  • instance information like InstanceId, OS details, CPU details, etc
  • installed applications: such as applications installed through yum, apt
  • network configuration: information about IP address, subnet
  • aws components: details about aws components installed such as ssm agent
  • windows registry keys
  • files: information about the filesystem
  • custom inventory: Inventory can be used to collect custom metadata like Business name, contact details, etc

Inventory works with any managed node. A managed node has an SSM agent installed and can communicate with the AWS Systems Manager. So, a managed node can be a VM from other cloud providers, IoT devices, and on-premise servers⁴.

Inventory for a managed node is collected by running the AWS-GatherSoftwareInventory SSM document (provided by AWS in all accounts) against those instances. The State Manager feature of Systems Manager can then be used to associate the above document against all the instances and run them at a frequency.

Terraform implementation for this is shown below. It collects inventory for all EC2 instances in an account once a day.

resource "aws_ssm_association" "this" {
name = "AWS-GatherSoftwareInventory"
targets {
key = "InstanceIds"
values = ["*"]
}
schedule_expression = "rate(1 day)"
}

The solution above gathers inventory information but is limited to an individual account. To meet my ultimate goal of aggregating the data at the organization level i.e. multiple accounts and regions, enter Resource Data Sync.

Resource Data Sync for Inventory — AWS Organization view

Resource Data Sync is a powerful Systems Manager feature that enables the collection of inventory information from multiple AWS Accounts and regions into an S3 bucket. It stores the raw data in JSON format which can then be queried using Athena and integrated with QuickSight for visualization purposes.

As shown in the diagram above, resource data sync needs to be set up in all the accounts and in all the region for it to work. The easiest way to implement this is by using Cloudformation Stackset. Below is an example of the template for the stack

AWSTemplateFormatVersion: 2010-09-09
Description: Cloudformation Template to create resources for Systems Manager Inventory
Parameters:
ResourceBucketName:
Type: String
Description: Name of S3 bucket where the inventory data will be stored

Resources:
ResourceDataSync:
Type: AWS::SSM::ResourceDataSync
Properties:
SyncName: centralized-resource-data-sync
SyncType: SyncToDestination
S3Destination:
BucketName: !Ref ResourceBucketName
BucketRegion: !Ref 'AWS::Region'
SyncFormat: JsonSerDe

InventoryCollection:
Type: AWS::SSM::Association
Properties:
AssociationName: software-inventory
Name: AWS-GatherSoftwareInventory
ScheduleExpression: "rate(1 day)"
Targets:
- Key: InstanceIds
Values:
- "*"

Next, is the integration with Amazon Athena. The data exported by Resource Data Sync is in JSON format and is organized in a structure that enables Glue to automatically create partitions. This is how the root folder structure looks like

And below is an example how the detailed structure look like.

s3://example-inventory-bucket/AWS:Application/accountid=012345678912/region=us-west-1/resourcetype=ManagedInstanceInventory/

Below code is an example implementation of Glue crawler using Terraform

resource "aws_glue_catalog_database" "this" {
name = "ssminventory"
}

resource "aws_glue_crawler" "this" {
database_name = aws_glue_catalog_database.this.name
name = "inventory"
role = aws_iam_role.crawler_role.arn

s3_target {
exclusions = [
"**/test.json",
]
path = "s3://${var.resource_data_sync_bucket_id}"
}

configuration = jsonencode(
{
CrawlerOutput = {
Partitions = {
AddOrUpdateBehavior = "InheritFromTable"
}
}
CreatePartitionIndex = true
Version = 1
}
)

recrawl_policy {
recrawl_behavior = "CRAWL_EVERYTHING"
}

schema_change_policy {
delete_behavior = "LOG"
update_behavior = "LOG"
}

schedule = "rate(1 day)"
}

resource "aws_iam_role" "crawler_role" {

name = "inventory-crawler-role"
path = "/inventory/"
description = "Policy used by glue crawler to access resource data sync bucket"
assume_role_policy = data.aws_iam_policy_document.crawler_role_assume_role_policy.json
}

data "aws_iam_policy_document" "crawler_role_assume_role_policy" {
statement {
actions = ["sts:AssumeRole"]
principals {
type = "Service"
identifiers = ["glue.amazonaws.com"]
}
}
}


data "aws_iam_policy_document" "glue_access_resource_data_sync" {
statement {
sid = "s3"
actions = [
"s3:GetObject",
"s3:PutObject"
]
resources = ["${var.resource_data_sync_bucket_arn}/*"]
effect = "Allow"
}
}

resource "aws_iam_role_policy" "glue_access_resource_data_sync" {
name = "access-resource-data-sync-bucket"
role = aws_iam_role.crawler_role.id
policy = data.aws_iam_policy_document.glue_access_resource_data_sync.json
}

resource "aws_iam_role_policy_attachment" "glue_service_role" {
role = aws_iam_role.crawler_role.name
policy_arn = "arn:aws:iam::aws:policy/service-role/AWSGlueServiceRole"
}

Please note that resource data syncs automatically adds a test.json file in the above folders which should be excluded from the scope of the crawler.

Unfortunately, the crawler above posed a problem for me while crawling the AWS:Application folder. It creates a partition column called resourcetype because of the resourcetype=ManagedInstanceInventory folder. Additionally, the JSON file also contains the resourcetype field. As a result, catalog table was created with a duplicate resourcetype column leading to an error when querying from Athena. To address the issue, I created the aws_application table manually using terraform and disabled Glue from updating the table schema.

Finally, I was able to query my complete managed VM inventory 🚀

--

--