DataZen Documentation
DataZen User Guide

APPLY TX

Overview

The APPLY TX operation transforms a specific column containing a, XML, JSON, or CSV document and replaces the current pipeline data with the transformed data set. It is equivalent to the APPLY TX option available through the SELECT HTTP and SELECT DRIVE command, but can be applied anywhere in an ETL pipeline regardless of the original source of the data.

XML and JSON

When processing an XML or JSON document, the value specified in the TX command is interpreted as a modified JSON or XML Path, with support for inner-document JOIN operation. See the Document Path documentation for further details.

CSV

When processing a CSV document, the value specified in the TX command is parsed as a space-delimited set of key-value pairs separated by a column: key:value. Keys are not case-sensitive. The following keys and values are supported:

KeyValuesComments
headers0 or 1Indicates that a header line is present in the payload
skipan integer >= 0Skips N number of rows from the top of the content before parsing data
delimiterOne or more charactersMultiple characters may be provided when parsing content with more than one delimiter; use " " for space
quotedA single character, or 1 or 0A character to use as surrounding quotes; if 1 is provided, a double-quote is assumed; 0 means not quoted
trim0 or 1If 1, the content will be trimmed
fieldsa list of integersTo parse delimited files, provide a comma-separated list of integers without spaces (ex: 5,15,25 or "5, 15, 25")
flatten0 or 1When 1, flattens the content if line feeds are detected
commentscomma-separated list of stringsUsed to ignore a line when the characters are found at the start of a line (ex: --,#)

These additional formatting options are only available during write operations: bytes and date.

For example, this is a valid transformation that converts a flat file into rows and columns:

APPLY TX 'headers:1 delimiter:, quoted:1 trim:1' ON 'payload';

Syntax

Transforms an XML, JSON, CSV document from an existing field into rows and columns based on the path provided for each row found in the data pipeline. The resulting transformation becomes the new pipeline data set.

APPLY TX 
	'...'	-- the JSON/XML path or CSV options
ON  '...'	
{ WITH 
	RAWCOL '...'
}
;

PATH

The Json or XML Path that represents the node(s) to transform; this path supports the Document Path notation specified in the DataZen documentation; for CSV data, the file processing options to use (ex: headers:1 trim:1)

ON

The column name or column index to inspect containing the JSON or XML document

ROWCOL

When specified, represents the column name to add with the raw content for each row

Example 1


SELECT * FROM HTTP [Rss Feed] (GET /econi.xml);

-- Apply a JSON transformation on the payload column and add the original 
-- document used to create each row
APPLY TX (//item) ON 'payload' WITH RAWCOL '_raw';


Example 2


-- Call an ADP Endpoint returning demographics data
SELECT * FROM HTTP [ADP] (GET /worker-demographics?limit=100&select=workers/person/legalName);

-- Apply a document transformation on the payload column and add the original 
-- document used to create each row; 
-- The transformation joins multiple nodes within the document 
APPLY TX (workers || person || legalAddress) ON 'payload' WITH RAWCOL '_raw';