Greenmask Blog

Automating Test Data Management with Greenmask and OpenEverest

Mon, 09 Mar 2026 00:00:00 GMT

Introduction

This article shows how Greenmask and OpenEverest can be combined to build a cloud-native Test Data Management (TDM) workflow. Greenmask anonymizes production database dumps, while OpenEverest automates provisioning and lifecycle management of database clusters on Kubernetes. Together they enable teams to quickly spin up staging environments populated with safe, production-like data. This approach helps developers validate integrations faster while maintaining data privacy and compliance.

Greenmask's Core Mission

The core idea behind Greenmask from the very beginning has been to provide users with a convenient way to create test data for development and testing environments.

While the Greenmask CLI utility performs this task extremely well and provides a wide range of functionality — enabling teams to implement different approaches to Test Data Management (TDM) — much of the surrounding automation has traditionally remained the responsibility of the user. Tasks such as scheduling jobs, regularly taking dumps, delivering and configuring staging datasets, performing semantic analysis of data, and maintaining transformation configurations were typically handled by the teams adopting the tool.

Expanding Greenmask into a Platform

Over the past year, the Greenmask team has focused on improving the platform's extensibility. This effort resulted in a new internal framework and MySQL support, introduced as part of the services layer. The goal of v1 is to provide a versatile foundation that simplifies adding support for new DBMSs and extending Greenmask with new features.

This step allowed us to move further toward building a broader platform around Greenmask.

Our goal is to address Dynamic Staging Environment capabilities — making it easier to provision realistic testing environments with production-like data. That is why we started building a cloud-native, API-first Greenmask platform.

At the same time, the Greenmask CLI will remain fully available, allowing users to continue using it as a standalone tool or as part of the larger platform.

Why OpenEverest Is a Natural Fit

Provisioning databases and managing them throughout their lifecycle is a complex challenge. This is exactly the problem the OpenEverest team has been solving. OpenEverest is the first open-source platform for automated database provisioning and lifecycle management. It supports multiple database technologies and can be deployed on any Kubernetes infrastructure — whether in the cloud or on-premises.

OpenEverest is evolving toward a modular architecture, where databases, storage systems, and other technologies are implemented as plugins. In the near future, we expect to see support for technologies such as ClickHouse, Vitess, DocumentDB, Valkey, along with integrations with Prometheus and other ecosystem tools.

Toward a Seamless TDM Integration

Because of this strong alignment, we are starting work on a Test Data Management solution that integrates seamlessly with the OpenEverest ecosystem. Our goal is to make Greenmask a first-class provisioning method inside OpenEverest, allowing teams to spin up staging databases populated with anonymized, production-like data as easily as selecting an option during cluster creation.

At the same time, we want to deliver value to users today. That's why we prepared a collaborative blog post with Sergey Pronin (founder of Solanica.io):

👉 Anonymizing Data with Greenmask and OpenEverest

How the Greenmask and OpenEverest Flow Works

OpenEverest manages production and staging databases, while Greenmask anonymizes production data to safely populate staging environments.

The article demonstrates how Greenmask can already be used to implement Test Data Management workflows within the OpenEverest ecosystem.

Why TDM Matters for AI-Driven Development

In our view, Test Data Management capabilities are becoming increasingly important in the context of the rapidly growing adoption of AI in software development. The faster a developer — or an AI agent — can spin up a complete test environment composed of multiple services and databases, and roll it back when needed, the faster hypotheses and integrations can be validated.

Accelerating validation directly accelerates development. Automated, safe access to realistic datasets will become a critical component of this workflow.

A Step Toward Dynamic Staging Environments

This collaboration demonstrates how combining database lifecycle automation with data anonymization and transformation enables teams to safely work with realistic production data in development environments.

We believe that integrating Greenmask with OpenEverest is a natural step toward building a fully automated and secure Dynamic Staging Environment (DSE) workflow for modern cloud-native infrastructure.

Greenmask: The Ultimate Solution for Synthetic Data and Privacy

Tue, 28 Jan 2025 00:00:00 GMT

As discussed in database anonymization: the basics, the process is inherently complex and can typically be approached in two ways—or even by combining them: anonymization and synthetic data generation. To achieve these, we need a tool equipped with essential features such as data transformation, database schema dumping, and database subsetting. In this article, we will explore some key features of Greenmask and highlight the use cases where they can be effectively applied.

Greenmask is an open-source core utility designed as an extensible tool built on top of vendor-specific dump utilities, such as pg_dump for PostgreSQL. One of the primary goals set by the Greenmask engineering team is to maintain reliability comparable to that of vendor utilities. Instead of independently generating database schema dumps (e.g., CREATE TABLE statements), Greenmask delegates this task to the vendor utilities. This approach avoids the challenges of maintaining compatibility with all major database versions, whether it's MySQL, PostgreSQL, or others.

Consider a scenario where a major database release introduces changes in table definition syntax. Maintaining support for such changes would require continuous updates. However, by leveraging vendor utilities—which are inherently reliable for schema dumping—Greenmask can focus exclusively on data dumping and anonymization, ensuring it delivers the best possible results in this area while delegating schema dumping to the vendor utility.

Greenmask is extensible and offers a variety of features, but there are a few key ones we want to highlight.

Database subset

Greenmask allows you to define subset conditions for filtering data during the dump process. This feature is particularly useful when you need to extract only a specific part of the database, such as a single table or a group of tables. It automatically ensures data consistency by including all related data from other tables necessary to maintain the integrity of the subset. Greenmask is also capable of handling circular references in database schemas, even in complex cases where multiple cycles exist within a strongly connected component.

Deterministic transformers

These use hash functions to ensure consistent output for the same input, providing reliability and repeatability. Most transformers support both random and hash-based engines, offering flexibility to suit a wide range of use cases.

Dynamic parameters

Most transformers support dynamic parameters, enabling them to adapt based on table column values. This feature is particularly useful for managing dependencies between columns and ensuring constraints are handled effectively.

Transformation validation and easy maintenance

Greenmask provides validation warnings, data transformation diffs, and schema diffs during configuration, enabling effective monitoring and maintenance of transformations. The schema diff feature is particularly useful for preventing data leakage when the schema changes. We understand that software and data do not exist in a vacuum—they continuously evolve throughout the software lifecycle. To address this, Greenmask is not just a tool but a comprehensive process that allows you to validate and review changes before applying them in untrusted or testing environments.

Transformation inheritance

Greenmask supports transformation inheritance for partitioned tables and tables with foreign keys. You can define a transformation once and apply it to all related tables that reference it. If your tables do not have foreign keys, you can define virtual ones to achieve the same functionality.

Database type safe

Greenmask ensures data integrity by validating data and utilizing the database driver for encoding and decoding operations, preserving accurate data formats. If you've ever used services or utilities that make changes without validation—only to encounter errors during restoration, such as a timestamp being mistakenly inserted into an integer field—Greenmask eliminates such issues. It operates with transformers that use the database driver to encode and decode data, ensuring reliable, on-the-fly transformations.

Conclusion

There are many additional features that can be applied to various use cases. You can explore them in detail in our comprehensive documentation. Greenmask is an excellent choice if you're looking for a unified tool that not only covers nearly every technical aspect but also provides a clear process for maintaining database anonymization and generating synthetic data. Don't hesitate to test your innovative ideas using our playground, which can be easily deployed locally with Docker Compose.

Database Anonymization: The Basics.

Fri, 17 Jan 2025 00:00:00 GMT

The strategy of modern software development focuses on delivering high-quality products as quickly and efficiently as possible. Achieving this goal requires a well-structured development process supported by high-quality data. Often, data must be shared with third-party vendors, such as outsourcing companies. Many organizations maintain separate staging environments for testing, development, and pre-production. However, the closer these environments mirror production, the greater the risk of data breaches due to the increased sensitivity of the data involved.

Introduction

To generate high-quality data for testing and development while minimizing the risk of data breaches, organizations often use anonymized data or synthetic data generation. Anonymized data is a transformed version of the original data that retains its usability for testing, development, and analysis. Synthetic data, on the other hand, is generated independently of the original records and is often used for AI training and other purposes.

A real-world examples

Let's explore some real-world examples where anonymized and synthetic data prove to be both useful and beneficial:

Outsourcing service

One of the critical challenges in the software development industry is having an ability to deliver high-quality products on time. This is especially true for companies that outsource their software development projects. When outsourcing, companies often face the challenge of sharing sensitive data with third-party vendors. To mitigate the risk of data breaches the companies often deploy numerous barriers to control the actions of outsourcers. Jump hosts are one of the most common barriers used to control access to sensitive data. However, this approach can be cumbersome and time-consuming, leading to delays in project delivery.

To address this challenge, companies trying to optimize their development approach often by organizing a staging environment that fits all regulatory requirements. Having a staging environment that closely resembles the production environment allows the development team to work with anonymized data, reducing the risk of data breaches while maintaining operational efficiency. This approach enables companies to streamline their development process, improve project delivery times, and enhance the overall quality of their products.

The benefits of using prepared staging environment are:

Optimized Task Allocation: Minimize the dependency on client-authorized personnel for specific tasks, enabling a more flexible and efficient team structure.
Lower Resource Reservation: Reduce the need to reserve authorized personnel by allowing non-authorized team members to handle appropriate tasks.
Reduced Rework: Decrease the likelihood of rework by enabling testing on data that closely resembles real-world scenarios.
Efficient Scaling: Unlock resources through tools like Greenmask, supporting project scaling without requiring additional financial investments.

This approach can be beneficial for both outsourcing companies and the organizations that use their services. For outsourcing providers, it enables smoother collaboration with clients and reduces delays caused by restricted access. For companies leveraging outsourcing services, it minimizes risks, ensures secure data handling, and enhances the efficiency of outsourced projects.

How can we organize a staging environment that closely resembles production while minimizing the risk of data breaches? The answer lies in anonymizing sensitive data or synthetic data generation.

Fintech company

FinTech companies often face a unique challenge when addressing fraud detection using production or production-like data. For instance, analysts may need to identify patterns within the data but are restricted from accessing Personally Identifiable Information (PII) while still fulfilling their responsibilities.

When working with real, non-anonymized, and uncontrolled data, the time required for approvals and access increases significantly. Direct access to such data almost always requires frequent approvals and carries the risk of data breaches.

Another applicable case is when insufficient or incomplete test data during the development and debugging stages fails to reveal bugs, which can potentially lead to misuse. In such situations, the organization of a staging environment becomes essential.

In such scenarios, anonymized data becomes a viable solution. By applying transformations to the data, it can be securely shared with employees to complete their tasks without compromising sensitive information. A tool that automates and facilitates this transformation process can significantly enhance efficiency and security, and Greenmask effectively addresses these challenges.

Conclusion

There are numerous examples where anonymized and synthetic data prove invaluable for organizations. Establishing a well-organized staging environment, coupled with the right tools to support the entire software development lifecycle, is critical for safeguarding sensitive data while maintaining development efficiency. Virtually every aspect of the software development industry can benefit from properly structured staging environments and improved data accessibility.

Greenmask Blog

Automating Test Data Management with Greenmask and OpenEverest

Introduction​

Greenmask's Core Mission​

Expanding Greenmask into a Platform​

Why OpenEverest Is a Natural Fit​

Toward a Seamless TDM Integration​

How the Greenmask and OpenEverest Flow Works​

Why TDM Matters for AI-Driven Development​

A Step Toward Dynamic Staging Environments​

Greenmask: The Ultimate Solution for Synthetic Data and Privacy

Database subset​

Deterministic transformers​

Dynamic parameters​

Transformation validation and easy maintenance​

Transformation inheritance​

Database type safe​

Conclusion​

Database Anonymization: The Basics.

Introduction​

A real-world examples​

Outsourcing service​

The benefits of using prepared staging environment are:​

Fintech company​

Conclusion​

Introduction

Greenmask's Core Mission

Expanding Greenmask into a Platform

Why OpenEverest Is a Natural Fit

Toward a Seamless TDM Integration

How the Greenmask and OpenEverest Flow Works

Why TDM Matters for AI-Driven Development

A Step Toward Dynamic Staging Environments

Database subset

Deterministic transformers

Dynamic parameters

Transformation validation and easy maintenance

Transformation inheritance

Database type safe

Conclusion

Introduction

A real-world examples

Outsourcing service

The benefits of using prepared staging environment are:

Fintech company

Conclusion