Skip to content

History: HMA 1.0 (AWS HMA) -> HMA 2.0 (Open Media Match):

In March of 2024, we archived the original version of HMA, and replaced it with a full rewrite, which is what you'll find here. Below you can find our reasoning for doing so.

Predecessor project: Hasher-Matcher-Actioner 1.0

HMA 1.0 was an earlier effort to build an offering based on open source components that could be picked up and deployed in a turn-key fashion for the purposes of hashing and matching content against databases of known harmful material at scale. However, we got feedback from industry that the lack of flexibility and hard dependencies on Amazon Web Services for critical components prevented several potential users from being able to deploy it. Additionally, the high number of dependencies needed to provide the robust functionality have become a maintenance burden over time.

With Open Media Match (HMA 2.0), we propose to build a new, entirely open source offering of composable components. We will drop the UI element and focus entirely on the backend API, we will eschew as many dependencies as possible, including those to specific cloud environments like AWS in favor of a containerized solution, and aim to be as infrastructure-agnostic as possible.

Compare and contrast to HMA

HMA 1.0 (AWS Deployed)HMA 2.0 (Docker image only)
Heavy AWS dependencies: VPC, DynamoDB, Lambda, API Gateway, ...OSS dependencies only
Opinionated: only supported by included Terraform scriptsRuns in any Docker based infra
Service capabilities provided by AWS LambdaService capabilities provided by composable and repackageable components
AWS only, needs its own S3 buckets and DynamoDB tablesIntegrates anywhere Docker can be used, or under self-managed Python/Flask
Includes a GUI and APIAPI only
Scales automaticallyDoesn't auto-scale but can be auto-scaled

Motivations

Simpler setup and Time-to-Evaluation

While HMA 1.0 in theory was only a single terraform command to stand up an instance, in practice, getting the AWS infrastructure in place, and potentially jumping through many hoops to make it play nicely with your existing AWS infrastructure prevented evaluating the functionality to see if it was worth spending the time to complete an integration. We have found that many users want a way to evaluate the capabilities as fast as possible, which the Terraform and AWS dependencies are at odds with.

With HMA 2.0, we will specifically focus on this evaluation step, potentially allowing users to evaluate on a single machine - even a developer laptop.

Reduce barriers to adoption

The ThreatExchange team originally chose the AWS dependency as part of scaling targets (4,000 QPS hash lookups), for an initial industry partner that could use AWS.

Since release, while some industry and NGO partners have been able to deploy and use AWS HMA, other industry partners have given feedback that the hard AWS dependency is not reconcilable with their business and technical constraints on what third-party software they can run, and where they can run it.

With HMA 2.0, we are limiting the number of dependencies and removing reliance on cloud-specific infrastructure.

Facilitate external integrations

Hasher-Matcher-Actioner includes a GUI tool for interacting with the matching service and managing the hash database, but it's a standalone offering tightly coupled to HMA, and doesn't facilitate integration with either a client's internally-built tools or with an open source harmful-content management tool.

Improve the Python-ThreatExchange Interfaces

Python-ThreatExchange was envisioned as a compatibility layer for trust and safety teams, academics, and open source contributors to make sharing improvements easier. However, to deploy and use these systems are either too tightly bound with HMA (AWS) or the CLI, and so make selective and compositional use of the library difficult. The design of HMA 2.0 will aim to standardize the core HMA steps (Fetching, Storing, Indexing, Hashing, Matching, Recording review results) in a way that makes them easier to partially adopt.

See #1247 for more detail.

Released under the BSD License.