Skip to content

Data Generator

As we develop data intensive applications we often need realistic datasets for testing. Datasets that resemble the data as it appears in production. But finding enough real data or creating sufficient volume and variety manually is hard.

The DataGenerator library uses annotated Apache Avro Schemas to help you generate random and yet realistic datasets, supporting JSON, Avro and YAML output formats.

anomaly detection logo

The Avro Schemas can be annotated with Data Faker and Spring Expression Language (SpEL) expressions to adapt the generated content to any particular use-case or data model.

Allows, configuring dependencies between the fields of a single or different Schemas.

Quick Start

Add the data-generator dependency to your project:

<dependency>
  <groupId>com.logaritex.data</groupId>
  <artifactId>data-generator</artifactId>
  <version>0.0.3</version>
</dependency>

Create an Avro Schema with data Faker and/or SpEL expressions to hint the desired field content:

user.yaml
namespace: io.simple.clicksteram
type: record
name: User
fields:
  - name: id
    type: string
    doc: "#{id_number.valid}"   # (1)
  - name: sendAt
    type:
      type: long
      logicalType: timestamp-millis
    doc: "[[T(System).currentTimeMillis()]]" # (2)
  - name: fullName
    type: string
    doc: "#{name.fullName}"
  - name: email
    type: string
    doc: "#{internet.emailAddress}"
  - name: age
    type: int
    doc: "#{number.number_between '8','80'}"
  1. Generate realistic random IDs using the Faker's IdNumber provider.
  2. Generate a timestamp (now), using Spring Expression Language (SpEL) to call the Java static method: java.lang.System.currentTimeMillis().

Run the DataGenerator with the user.yaml schema the generate few data instances:

Iterator<GenericData.Record> iterator = 
    new DataGenerator(
        DataUtil.uriToSchema("file:/user.yaml"), // (1)
        3) //(2)
    .iterator();

while (iterator.hasNext()) {
    System.out.println(iterator.next());
}
  1. Initialize the generator with the user.yaml schema.
  2. Number of instances to generate.

the result should look like this:

{ 
  "id": "263-73-3809", 
  "sendAt": 1645529931141, 
  "fullName": "Mohammed Goldner V", 
  "email": "joeann.glover@hotmail.com", 
  "age": 78
},
{ 
  "id": "360-46-4449", 
  "sendAt": 1645529931181, 
  "fullName": "Ms. Winston Gutmann", 
  "email": "louanne.kunze@yahoo.com", 
  "age": 13
}

Follow the usage section for a step-by-step guidelines.

Features

For full documentation visit logaritex/data-generator.

Back to top