Introduction¶
Declarative Stream Mapping(DSM) is a stream deserializer library that makes parsing of XML and JSON easy. DSM allows you to make custom parsing, filtering, transforming, aggregating, grouping on any JSON or XML document at stream time(read only once). DSM uses yaml or json for configuration definitions
If you parsing a complex, huge file and want to have high performance and low memory usage then DSM is for you.
Simple Example¶
Lets Parse below simple JSON and XML file with DSM
File contents are taken from Swagger Petstore example. Slightly changed.
Source file
JSON
{
"id": 1,
"name": "Van Kedisi",
"status": "sold",
"createDate": "01/24/2019",
"category": {"id": 1,"name": "Cats"},
"tags": [
{"id": 1,"name": "Cute"},
{"id": 2,"name": "Popular"}
],
"photoUrls": ["url1","url2" ]
}
XML
<?xml version="1.0" encoding="UTF-8" ?>
<Pet id="1">
<name>Van Kedisi</name>
<status>sold</status>
<createDate>01/24/2019</createDate>
<category>
<id>1</id>
<name>Cats</name>
</category>
<tags>
<tag>
<id>1</id>
<name>Cute</name>
</tag>
<tag>
<id>2</id>
<name>Popular</name>
</tag>
</tags>
<photoUrls>
<photoUrl>url1</photoUrl>
<photoUrl>url2</photoUrl>
</photoUrls>
</Pet>
Those are rules that we want to apply during parsing.
- exclude “photoUrls” tag.
- read only “name” field of “tags” tag.
- read only “name” field of “category” tag.
- add new the “isPopular” field that it’s value is true, if “tag.name” has “Popular” value
DSM config file
[YAML]
params:
dateFormat: MM/dd/yyyy
result:
type:object
path: /
xml:
path: /Pet
filter: $self.data.status=='sold'
fields:
id:
dataType: int
xml:
attribute: true
name: string
status: status
createDate: date
category:
path: category/name
isPopular:
default: $self.data.tags.contains("Popular")
tags:
type:array
path: tags/name |tags/tag/name # this is a regex expression. works for both JSON and XML
Class to deserialize
[JAVA]
public class Pet {
private int id;
private String name;
private boolean isPopular;
private String status;
private String category;
private Date createDate;
private List<String> tags;
// getter/setter
}
Read Data
DSMBuilder builder = new DSMBuilder("dsm-config-file.yaml");
DSM dsm = builder.setType(DSMBuilder.XML).create();
Pet pet = dsm.toObject(new File("path/to/xmlFile.xml"),Pet.class); // read data from xml file
dsm = builder.setType(DSMBuilder.JSON).create();
pet = dsm.toObject(new File("path/to/jsonFile.json"),Pet.class); // read data from json file
Features¶
- Work for both XML and JSON
- Custom stream parsing
- Filtering by value on any field with very low cognitive complexity
- Flexible value transformation.
- Default value assignment
- Custom function calling during parsing
- **Powerful Scripting**(Apache JEXL, Groovy, Javascript and other jsr223 implementations are supported)
- Multiple inheritance between DSM config file (DSM file can extends to another config file)
- Reusable fragments support
- Very short learning curve
- Memory and CPU efficient
- Partial data extraction from JSON or XML
- String manipulation with expression
Installation¶
Maven
Jackson
<dependency>
<groupId>com.github.mfatihercik</groupId>
<artifactId>dsm</artifactId>
<version>1.0.4</version>
</dependency>
Gradle
Jackson
compile ('com.github.mfatihercik:dsm:1.0.4')
Sample Config File¶
Detailed documentation and all option is here.
This config file contains some possible option and their short description.
[header.yaml]
params:
dateFormat: MM/dd/yyyy # define date format for "date" data type
transformations:
SOLD_STATUS: # value transformation for "isAvailable" property
map:
sold: false
pending: false
available: true
DEFAULT: false
SOLD_STATUS_SKIP:
$ref: $transformations.SOLD_STATUS # extends to "SOLD_STATUS" transformation.
map:
DEFAULT: exclude # exclude default value
onlyIfExist: # make transformation only source value exist in transformation map other wise return as it is
functions:
insertPet: com.example.InsertPet # declare a function to declare at Parsing Element.
fragments: # create reusable fragment
category:
type:object
fields:
id: int
name: string
type: string
[main.yaml]
$extends: header.yaml # extends to header.yaml config.
result:
type:array # result is an array
path: / | /Pets/Pet # start reading form beginning for json. path is a regex. we can define both for xml and json same time. or we can declare for xml in XML field.
xml:
path: /Pets/Pet # start reading from /Pets/Pet for xml
filter: $self.data.isAvailable # filter by "isAvailable" property. "self" key word refers to current Node. self.parent refers to parent Node. self.data refers to current node data
function: insertPet # call "insertPet" function for every element of "result" array
fields:
name: string # read name as string.
id:
dataType: int # read id as int
xml:
attribute: true # id is an attribute on /Pets/Pet tag.
createDate: date # use dateFormat in params then convert string to date
isAvailable:
path: status # read isAvailable as string from "status" tag
dataType: boolean
transformationCode: SOLD_STATUS # user "SOLD_STATUS" transformation to map from "status" to "isAvailable"
category:
$ref: $fragments.category # extends to "fragment.category"
fields:
type: exclude # exclude "type" field from "category" fragment
name:
default: 'Animal' #set default value to 'Animal' if "category/name" tag not exist in source document
isPopular:
default: $self.data.tags.contains("Popular") # set default value of "isPopular" property
tags:
type:array
path: tags/name
filter: $value.length>15 # filter by length of value.
xml:
path: tags/tag/name