Laying Data Pipelines In The Cloud

In the decades I’ve been working across multiple business domains and sectors, I, like you, have had to handle data of many kinds, shapes, sizes, frequencies and quality.

The simple truth for me when dealing with data operations is that, in essence, the key steps are always broadly this:

  1. you begin with something, some input
  2. you need an output, of some kind
  3. you transform your input to your output

And in a data pipeline, repeat these steps until you get to your final consumers.

Clearly other processes are important (if not essential) around the periphery of all data pipelines, like monitoring and logging, but in the course of the next several weeks I’m going to be discussing the how-to of engineering and laying down data pipelines using cloud services with a particular focus on, but not limited to:

  • my basic triad of data processing
  • getting to grips with data of many kinds, shapes, sizes, frequencies and quality
  • the Azure Data Platform

To give you a flavour of what’s coming soon, I’ll be talking about, amongst other things:

  • Azure Functions
  • Azure Data Factory
  • Azure DataBricks
  • Azure Synapse Analytics, in particular Synapse Analytics Studio

I’ll run though sample scenarios and use cases for each of these, and provide supporting materials on my GitHub.
Watch out for the first of these starting next month.

Until then, Stay Safe everyone!

About the author