Developer Programs

Learn

Docs

Finicity

Pipelined Data Sets > Finicity

Introduction

The goal of this pipeline is to make the Finicity dataset available in Google Big Query where it can be queried using the SQL interface for business purposes and to drive downstream business processes.

Scenarios

The Finicity dataset is a single datafile in AVRO format sourced from an Amazon S3 bucket on a predetermined cadence. The current requirement dictates that this file be copied to Google Cloud Storage bucket and subsequently imported into Big Query, and exposed as a native table in Big Query to provide a SQL interface over the data in order to drive downstream business processes.

This pipeline enables three scenarios:

  • Initial load: The datafiles are loaded for the first time
  • Incremental load: Subsequent datafile loads after the initial load
  • Historical load / backfill: Historical load to clean up existing data and reload from the beginning of time

Pipelines

The pipeline consists of two Google Data Transfer jobs. The first job transfers the datafile from the source Amazon S3 bucket into the destination Google Cloud Storage bucket and the second job picks up the datafile from the Google Cloud Storage bucket and import it into Google Big Query and expose it as native table. After the datafile is imported, the copies of the datafile in Amazon S3 bucket and the Google Cloud Storage bucket are both retained for later archival and audit purposes. After the initial datafile import into Big Query, the subsequent imports only append to the existing table; there is no additional work to de-duplicate data or purge old(er) entries. Per the current requirements, the two jobs are currently time-driven and not auto-triggered based on events such as the availability of the source file and the completion of the first job.

Schedule

Initially, the data import process is expected to run once a day and time-based, i.e., running on a predetermined schedule, but the pipeline is extensible to change it to any schedule as needed in the future.

Finicity
Last updated Tue Nov 5 2024