Data Generator
As we develop data intensive applications we often need realistic datasets for testing. Datasets that resemble the data as it appears in production. But finding enough real data or creating sufficient volume and variety manually is hard.
The DataGenerator
library uses annotated Apache Avro Schemas to help you generate random and yet realistic datasets,
supporting JSON
, Avro
and YAML
output formats.
The Avro Schemas can be annotated with Data Faker and Spring Expression Language (SpEL) expressions to adapt the generated content to any particular use-case or data model.
Allows, configuring dependencies between the fields of a single or different Schemas.
Quick Start
Add the data-generator
dependency to your project:
<dependency>
<groupId>com.logaritex.data</groupId>
<artifactId>data-generator</artifactId>
<version>0.0.3</version>
</dependency>
Create an Avro Schema with data Faker and/or SpEL expressions to hint the desired field content:
namespace: io.simple.clicksteram
type: record
name: User
fields:
- name: id
type: string
doc: "#{id_number.valid}" # (1)
- name: sendAt
type:
type: long
logicalType: timestamp-millis
doc: "[[T(System).currentTimeMillis()]]" # (2)
- name: fullName
type: string
doc: "#{name.fullName}"
- name: email
type: string
doc: "#{internet.emailAddress}"
- name: age
type: int
doc: "#{number.number_between '8','80'}"
- Generate realistic random IDs using the Faker's IdNumber provider.
- Generate a timestamp (now), using Spring Expression Language (SpEL) to call the Java static method:
java.lang.System.currentTimeMillis()
.
Run the DataGenerator with the user.yaml schema the generate few data instances:
Iterator<GenericData.Record> iterator =
new DataGenerator(
DataUtil.uriToSchema("file:/user.yaml"), // (1)
3) //(2)
.iterator();
while (iterator.hasNext()) {
System.out.println(iterator.next());
}
- Initialize the generator with the
user.yaml
schema. - Number of instances to generate.
the result should look like this:
{
"id": "263-73-3809",
"sendAt": 1645529931141,
"fullName": "Mohammed Goldner V",
"email": "joeann.glover@hotmail.com",
"age": 78
},
{
"id": "360-46-4449",
"sendAt": 1645529931181,
"fullName": "Ms. Winston Gutmann",
"email": "louanne.kunze@yahoo.com",
"age": 13
}
Follow the usage section for a step-by-step guidelines.
Features
- Datasets are generated from and validated against well-formed Apache Avro Schemas.
- Annotate schema fields with Data Faker and Spring Expression Language SpEL expressions.
- Inter-field dependency - field values in a record can derive from or depend on each other.
- Instance uniqueness - enforce instance uniqueness based on a selected dataset record field.
- Shared dataset values - allows sharing field values between different datasets.
For full documentation visit logaritex/data-generator.