DIGITAL - Conversations (Banno)

Pipelined Data Sets > DIGITAL - Conversations (Banno)

Introduction

The goal of this pipeline is to make the Conversations dataset available in Google Big Query where it can be queried using the SQL interface for business purposes and to drive downstream business processes.

Pipelines

The Banno Conversations pipeline utilizes Dataflow jobs to push Conversations data directly from Kafka into the BigQuery dataset / tables. The pipeline source consists of:

The original data source is Kafka; specifically, records in our conversations.events.v4c topic.
The Dataflow job is implemented using the Apache Beam Java API, which we use from Scala. The job fetches new records from the Kafka topic, transforms them into BigQuery rows, then writes those rows to BigQuery using the Google’s Java API.

Topics in this section

Schema

Have a Question?

Have a how-to question? Seeing a weird error? Get help on StackOverflow.

Register for the Digital Toolkit Meetup where we answer technical Q&A from the audience.

Last updated Tue Feb 4 2025