APPLY TX

Overview

The APPLY TX operation transforms a specific column containing a, XML, JSON, or CSV document and replaces the current pipeline data with the transformed data set. It is equivalent to the APPLY TX option available through the SELECT HTTP and SELECT DRIVE command, but can be applied anywhere in an ETL pipeline regardless of the original source of the data.

XML and JSON

When processing an XML or JSON document, the value specified in the TX command is interpreted as a modified JSON or XML Path, with support for inner-document JOIN operation. See the Document Path documentation for further details.

CSV

When processing a CSV document, the value specified in the TX command is parsed as a space-delimited set of key-value pairs separated by a column: key:value. Keys are not case-sensitive. The following keys and values are supported:

Key	Values	Comments
headers	0 or 1	Indicates that a header line is present in the payload
skip	an integer >= 0	Skips N number of rows from the top of the content before parsing data
delimiter	One or more characters	Multiple characters may be provided when parsing content with more than one delimiter; use " " for space
quoted	A single character, or 1 or 0	A character to use as surrounding quotes; if 1 is provided, a double-quote is assumed; 0 means not quoted
trim	0 or 1	If 1, the content will be trimmed
fields	a list of integers	To parse delimited files, provide a comma-separated list of integers without spaces (ex: 5,15,25 or "5, 15, 25")
flatten	0 or 1	When 1, flattens the content if line feeds are detected
comments	comma-separated list of strings	Used to ignore a line when the characters are found at the start of a line (ex: --,#)

These additional formatting options are only available during write operations: bytes and date.

For example, this is a valid transformation that converts a flat file into rows and columns:

APPLY TX 'headers:1 delimiter:, quoted:1 trim:1' ON 'payload';

Syntax

Transforms an XML, JSON, CSV document from an existing field into rows and columns based on the path provided for each row found in the data pipeline. The resulting transformation becomes the new pipeline data set.

APPLY TX 
	'...'	-- the JSON/XML path or CSV options
ON  '...'	
{ WITH 
	RAWCOL '...'
}
;

PATH	The Json or XML Path that represents the node(s) to transform; this path supports the Document Path notation specified in the DataZen documentation; for CSV data, the file processing options to use (ex: headers:1 trim:1)
ON	The column name or column index to inspect containing the JSON or XML document
ROWCOL	When specified, represents the column name to add with the raw content for each row

Example 1


SELECT * FROM HTTP [Rss Feed] (GET /econi.xml);

-- Apply a JSON transformation on the payload column and add the original 
-- document used to create each row
APPLY TX (//item) ON 'payload' WITH RAWCOL '_raw';

Example 2


-- Call an ADP Endpoint returning demographics data
SELECT * FROM HTTP [ADP] (GET /worker-demographics?limit=100&select=workers/person/legalName);

-- Apply a document transformation on the payload column and add the original 
-- document used to create each row; 
-- The transformation joins multiple nodes within the document 
APPLY TX (workers || person || legalAddress) ON 'payload' WITH RAWCOL '_raw';