
Using ArcGIS Data Pipelines
How to get started and some common use-cases in ArcGIS Data Pipelines
What is ArcGIS Data Pipelines?
ArcGIS Data Pipelines enables you to connect to and read vector or tabular data from a variety of data stores, prepare and then write it out as a feature layer available in ArcGIS. It can be used to reproduce data preparation workflows, automate and schedule them to run on regular intervals.
It is a no code, graphic user interface made up of three elements, inputs, tools and outputs. The inputs can be from Files in ArcGIS, cloud storage, databases, or ArcGIS Online feature layers. To process the data there are 5 tool categories found in ArcGIS Data Pipelines:
- Clean Remove unnecessary fields, modify or fill values.
- Construct Create fields derived from existing fields using the calculate field.
- Format change formats of fields, change projections.
- Integrate joins and relates.
- Output Dataset
For more information on the tool categories and what they do Click Here
When using the ArcGIS Data Pipelines be aware that opening the editor and being connected uses credits as well as running the Pipeline. For more details on Credit usage click here.
Getting started
First step to using ArcGIS Data Pipelines is to bring in the datasets, as mentioned these can be from Public URL's, Cloud storage, Databases or ArcGIS.
File
Files uploaded to ArcGIS online are able to be used within Data Pipelines by using the File input.
From here select where you want the file to come from, in this example Browse Existing will open up your ArcGIS online content.
Once you've found the file you want to use, select Add.
Public URL
To bring in a dataset from a public URL for example from Data.Govt.nz, the first step is to source the URL.
0:18 This is found by clicking the arrow next to the downloads button, and right-clicking to copy the csv or json URL.
Note: The URL shown at the top of the Schools Directory page has a date attached to it and it will change when the dataset is updated. This URL is not recommended to use if you want the dataset to update.
1:00 Once the data has successfully been loaded into Data Pipelines, click the preview button to look at the data table.
Note that to use Public URLs they must be fully encoded or fully decoded as Data Pipelines does not currently support partially encoded URLs.
Filter Data option: API Query
Data.Govt.nz have a Data API pop-up that makes it really easy to filter the data before bringing it into Data Pipelines.
In this example the API query is used to filter the results to only those that contain the word Christchurch. This reduces the data being pulled into Data Pipelines to just Christchurch schools, this link is shown below.
https://catalogue.data.govt.nz/api/3/action/datastore_search?resource_id=4b292323-9fcc-41f8-814b-3c7=christchurch
From here you can copy the link that most closely suits the query you want to produce, and then edit it to fit your use case.
LINZ
Another source of public data is the LINZ data service . To either use as a data source or to enrich existing data.
To use this source, edit the below link, for the right layer will pull in a dataset from LINZ data service to Data Pipelines.
https://data.linz.govt.nz/services;key=YourAPIKey/wfs?service=wfs&request=getfeature&outputformat=csv&typename=layer-TheLayerNumber&srsName=EPSG:2193
If you're wanting to use a URL from another Koordinates platform such as Stats NZ , LRIS from Landcare research and MfE Data service . Edit the URL further by making sure its calling the right data service.
For example, the link may look like this for a dataset from the Stats NZ data finder:
https://datafinder.stats.govt.nz/services;key=YourAPIKey/wfs?service=wfs&request=getfeature&outputformat=csv&typename=layer-TheLayerNumber&srsName=EPSG:2193
Amazon S3 Buckets
Another input into Data Pipelines is Cloud storage like Amazon S3.
0:03 To start you need to add a connection to a data store. You need an access key, secret key and a bucket name to access your data. (This demo displays a fake access key, ensure you are using a working key)
0:40 This creates an item in your ArcGIS online, so ensure the name and information is correct.
Important to note that if you are copying a data link, highlight and then copy it so that you are getting the file name not the URL.
0:45 Next make sure that the dataset path is correct to the file containing the dataset you want to access, and check the file format is correct.
Make sure to preview your dataset to ensure it's all there, then you can proceed with your pipeline.
Processing Data
To start with, the following tutorial will show you how to pull data from an outside source and upload it to ArcGIS online, this workflow can be changed to fit your specific needs, and with only a small change can be used to Update an existing layer rather than creating one.
0:07 As shown in the Getting Started section you can filter the data using action API, however this can also be done in Data Pipelines. By using the filter tool and creating a query expression.
0:40 After checking that the dataset has been queried correctly, you can now create geometry. For this example, we will be creating point geometries.
Here you must specify the Geometry type to be points, Geometry Format to be XYZ, and ensure to select the right x and y coordinates. Finally, you can name the geometry field, here it is called "Shapes", this is automatically named "Geometry" if you choose not to define it yourself.
1:15 Next you need to use the Project geometry tool to choose the spatial reference. In this example NZGD 2000 New Zealand Transverse Mercator.
After each step you can click preview to ensure that the features are showing as they should.
1:40 Finally, to create a feature layer output, select the Outputs category and then the Feature layer tool. Here you can select whether you would like to Create, Replace or add and update a feature layer.
Ensure its being created to the appropriate folder and then select Run.
The new layer can be found in your content ready for use.
Schedule Your Tasks
A really useful capability of ArcGIS Data Pipelines is the schedule task. Here you can create a task such as updating a layer when the source has been uploaded, whether that's once an hour or once a month.
To do this, click the schedule button on the side panel, and create a task. Select the start date and how often it is to be repeated.
If you want to replace a dataset each time the source is updated, ensure that the output settings are selected correctly.
Possible Use Cases
Enrich your layer using a feature layer in ArcGIS online.
0:00 Bring in the feature layer you want to enrich by selecting feature layer in the input pane. This could be from an outside data source as well.
0:10 Repeat this to bring in the feature layer.
0:22 Under the Integrate category, add a join into the workspace. Click and drag to connect the layers to the tool.
0:30 Next you need to configure the tool.
In this case change the Join operation to Join one to one.
The summary field is Usual_Resident_population.
The relationship is an attribute relationship, and the matching fields are Statistical_Area_2_Description and SA2_name.
1:13 Finally, in the Outputs pane, add the Feature layer tool and connect it to the Join tool.
Output method should be Create and you will then choose the name and location it will be created in, on ArcGIS online.
1:30 After Previewing the output, click run to create your layer. This can then be seen in your ArcGIS online.
Resources
Data pipelines documentation
Credit Usage Documentation
Free Encoder/Decoder
Data sources
Data.Govt.nz - https://www.data.govt.nz/
Koordinates Platform
LINZ Data Service - https://data.linz.govt.nz/
MfE Data Service - https://data.mfe.govt.nz/
Stats NZ - https://datafinder.stats.govt.nz/
LRIS - https://lris.scinfo.org.nz/