Tips on how to use HTTP Paging Settings
Overview
Configuring HTTP Paging can be tricky depending on the source system as there are not standards established. Some APIs will use different paging options depending on the operation used or the version of the API, while others may provide incomplete information. The following documentation provides a summary of all available paging options typically available and how to use them with SQL CDC.
Implementing HTTP Paging in DataZen allows data teams to implement a key performance technique using an inner APPLY PIPELINE operation; when performed as part of a SELECT operation, the inner APPLY PIPELINE command operates for each page of data, allowing teams to optimize memory resources for complex results.
There are two broad categories of paging implementations:
- Using URL Query Parameters: These implementations are a bit easier because you can quickly see the impact of changing paging values in the debug output
- Using a Payload with a POST operation: This approach involves modifying the payload progressively to fetch the next batch of data
Further more, the following four paging strategies are available with SQL CDC:
- Link: This approach involves finding a URL link in the response payload to use as the next request, as-is
- Offset: This approach involves keeping a count of records returned so far and use a DataZen paging marker to send the next starting count
- Page: This approach involves keeping the current page index along with a count of records returned so far and use a DataZen paging marker to send the next starting count or page index as needed
- Token: Also called a bookmark strategy, this approach involves finding a value, token, or reference number in the response payload to be used by the next request
The best way to know for sure which paging strategy to use is by inspecting a raw HTTP response first, and leverage the
MAX_DEBUG_PAGES setting to validate the paging options are working.
Paging Markers Matrix
Once you know which paging strategy to use, implementing the strategy varies depending on whether the paging implementation uses a Query Parameter or an HTTP Payload. You can always use the various paging markers below, but in a few cases these markers can be omitted as DataZen will have enough information to implement the paging automatically. Use the following matrix to determine which paging markers to use depending on the scenario:
PAGING |
Query Parameter | HTTP Payload |
|---|---|---|
| Link | PAGING_PARAM: ignoredPAGING_PATH: required
The Json or XML Path to the next link to use; if the path is not found, its value is
empty, or the value is the same as the last one, paging will exitExample: Zendesk
SELECT * FROM HTTP [Enzo-ZenDesk]
(GET /incremental/tickets.json?start_time=#toepoch(#dateadd(#now(), -45d)))
WITH
PAGING 'link' PAGING_PATH 'next_page'
DEBUG_MAX_PAGES 2
APPLY TX 'tickets'
APPLY PIPELINE (
APPLY FILTER ([status] = 'solved' OR [status] = 'closed' );
)
;
|
N/A |
| Offset | PAGING_PARAM: optional The Query Parameter to add/update in the URL query;
you can leave this option empty and instead specify use @recordindex (1-based) or @recordcount (0-based) as a query parameter
to use the desired record count explicitly.
default: @recordcount is used by default if not specified explicitly
PAGING_PATH: requiredThe Json or XML Path used to count the number of records returned; for JSON, this
is usually the path to an array of objects, so the actual number of items returned can be counted; this setting is required.do not provide a path to a "count" property; this path should be the actual node that contains the data Example: M5 Automatically adds the $skip query parameter using the @recordcount value as calculated by the number of records retrieved in items
SELECT * FROM HTTP [M5] (GET /assets?$filter=active eq true&$top=10)
WITH
PAGING 'offset' PAGING_PARAM '$skip' PAGING_PATH 'items'
DEBUG_MAX_PAGES 2
APPLY TX (items);
Example: Quickbooks Online Updates the @recordindex value Query Parameter using the number of records retrieved in path QueryResponse.Invoice
SELECT * FROM HTTP [QuickbooksOnline]
(GET /query?query=select * from invoice startposition @recordindex MAXRESULTS 10&minorversion=65)
WITH
PAGING 'offset' PAGING_PATH 'QueryResponse.Invoice'
DEBUG_MAX_PAGES 2
APPLY TX (QueryResponse.Invoice);
Example: Greenmile Updates the @recordcount value Query Parameter using the number of records retrieved in path $
SELECT * FROM HTTP [Greenmile]
(POST /Route/restrictions?criteria={ "filters": [], "firstResult": @recordcount })
WITH
PAGING 'offset' PAGING_PATH '$'
DEBUG_MAX_PAGES 2
CONTENT_TYPE 'application/json'
PAYLOAD ({
"attr": "date",
"gte": "#dateadd(#now(), -7d, o)"
})
APPLY TX ($);
|
PAGING_PARAM: do not use Use @recordindex (1-based) or @recordcount (0-based) in the HTTP payload
PAGING_PATH: requiredThe Json or XML Path used to count the number of records returned; for JSON, this
is usually the path to an array of objects, so the actual number of items returned can be counted; this setting is required
do not provide a path to a "count" property; this path should be the actual node that contains the data Example: Google Analytics Updates the @recordcount value in the HTTP Body using the number of records retrieved in path $.rows
SELECT * FROM HTTP [google-analytics]
(POST https://analyticsdata.googleapis.com/v1beta/properties/356081512:runReport)
WITH
PAGING 'offset' PAGING_PATH '$.rows'
DEBUG_MAX_PAGES 3
PAYLOAD (
{
"dateRanges": [{ "startDate": "#dateadd(#utcnow(), -7d, yyyy-MM-dd)", "endDate": "#utcnow(yyyy-MM-dd)" }],
"dimensions": [{ "name": "country" }, { "name": "region" },{ "name": "city" }],
"metrics": [{ "name": "activeUsers" }],
"offset": "@recordcount"
}
)
APPLY TX '..dimensionValues' RAWCOL '_raw'
; |
| Page | PAGING_PARAM: optional
The Query Parameter to add/update in the URL query representing the paging count;
you can leave this option empty and instead specify use @pagingindex (1-based), @pagingcount (0-based) in the query
parameter to use the desired paging count explicitly
default: @pageindex is used by default if not specified explicitly
PAGING_PATH: ignoredExample: Service Titan Automatically adds the page query parameter using the @pagingindex value
SELECT * FROM HTTP [ST]
(GET https://api-integration.servicetitan.io/inventory/v2/tenant/1019050407/trucks?pageSize=10)
WITH
PAGING 'page' PAGING_PARAM 'page'
DEBUG_MAX_PAGES 2
APPLY TX 'data'
;
Example: Expensya Uses @pagingcount as specified in the HTTP Query
SELECT * FROM HTTP [Expensya]
(GET /api/v2/expenses?dateFilterType=4&startDate=2026-01-01T00:00:00.000&page=@pagingcount)
WITH
PAGING 'page'
;
|
PAGING_PARAM: do not use Use @pagingindex (1-based) or @pagingcount (0-based) in the payloadPAGING_PATH: ignoredExample Uses @pagingindex as specified in the HTTP Body
SELECT * FROM HTTP [http]
(POST /api/v2/)
WITH
PAYLOAD ({
"next": "@pagingindex"
}
)
PAGING 'page'
;
|
| Token | PAGING_PARAM: optional
The Query Parameter to add/update in the URL query representing the token to add;
if you leave this option empty you must specify @pagingmarker in the query
parameter to use the token explicitlyPAGING_PATH: required
The Json or XML Path to the token value to use; this path should point to a JValue for a Json payloadExample: Samsara Finds the next token under $.pagination.endCursor and adds the query parameter after with its value
SELECT * FROM HTTP [Samsara]
(GET /fleet/reports/drivers/fuel-energy?startDate=2026-01-01T00:00:00Z&endDate=2026-01-07T00:00:00Z)
WITH
PAGING 'token' PAGING_PARAM 'after' PAGING_PATH '$.pagination.endCursor'
DEBUG_MAX_PAGES 2
APPLY TX 'data.driverReports || d_=driver'
;
|
PAGING_PARAM: do not use
Use @pagingmarker in the payloadPAGING_PATH: ignoredExample: Intacct Finds the next token under XPath //data/resultId and use it conditionally using an @iif() operation;
Intacct returns a paging marker upon making the first call, then expects a different payload with the paging marker
on subsequent calls; the @iif() operation return true if the first parameter is not empty
SELECT * FROM HTTP [Intacct] (POST https://api.intacct.com/ia/xml/xmlgw.phtml)
WITH
PAGING 'token' PAGING_PATH '//data/resultId'
CONTENT_TYPE 'application/xml'
DEBUG_MAX_PAGES 2
PAYLOAD (<?xml version="1.0" encoding="UTF-8"?>
<request>
<operation>
<content>
<function controlid="123456">
@iif( @pagingmarker , "<readMore>
<resultId>@pagingmarker</resultId>
/<readMore>", "<readByQuery>
<object>PODOCUMENTENTRY</object>
<fields>LINE_NO,RECORDNO,DOCHDRNO,DOCHDRID,DOCPARID,PODOCUMENT.PONUMBER,VENDORID</fields>
<query>WHENMODIFIED > '08/20/2025 00:00:00'</query>
<pagesize>2000</pagesize>
<docparid>Receiver</docparid>
<readByQuery>")
</function>
</content>
</operation>
</request>
)
APPLY TX 'podocumententry'
;
|
