Excel Data Pipelines

A Functional Piece

Last time on Laying Data Pipelines in the Cloud, I talked about some options for ingesting Excel data in Azure.

I wanted to round off that discussion briefly with a bit more of a spotlight on Azure Functions, and why in particular this Azure service is such a great introduction for those wanting to understand how to engage with the world of data engineering with microservices.

What are Microservices in Azure ?

To paraphrase Microsoft’s definition, microservices architecture enables teams across an organisation to build functionality using common design patterns, to create highly scalable applications, using loosely coupled services.

What does this mean for data pipelines in the cloud ?

In the context of the Excel data pipelines scenario last time, I outlined these solutions:

  • Power Automate/Logic Apps
    • An Excel spreadsheet can be converted into CSV using the Create CSV Table action (Data Operations category) BUT this only works if you are opening a spreadsheet which has its data stored as an Excel Table
  • Azure Functions
    • Powershell or Python code can easily convert an Excel spreadsheet to CSV format. You can deploy such code to Azure Functions to create serverless apps that can be automated via Azure Data Factory (ADF).

Azure Functions

In addition to Powershell and Python scripts, Azure Functions can execute Node and C# code in serverless applications (i.e. without the headache of complex virtual machine configurations and optimisation) with very straightforward point and click deployments in the Azure Portal, or easy to configure Azure Resource Management templates using your favourite IDE (Visual Studio Code etc.).

Azure Functions are a first class service in the microservices space, operating as Platform-as-a Service (PaaS) with huge integration capability across the whole Azure ecosystem, not just the Data Platform.

In data engineering terms, it offers all the flexibility of Script Tasks in SSIS, but with the huge advantage of being highly scalable and performant, enough to handle the variable velocities of big/streaming data, and the flexibility of being highly decoupled, enough to connect to many Azure services using well-defined API patterns.

Power Automate/Logic Apps

Power Automate and Logic Apps in the microservice architecture is one of orchestration as well as service development (and indeed is regarded as an enterprise platform in its own right). In particular, it enables a predominantly no-code approach to integrating microservices like Azure Functions to other services in Azure.

Wrap up of Data Pipelines

One of the many things that I believe has helped the massive growth of Azure in recent years, especially in the Data Platform, is its inclusivity of Open Source. The one time Microsoft proprietary approach to all software development is now “keep doing what you’re doing, AND we will help you do it even better in Azure!” (the bonus here for organisations is the ability to include great SLAs – for additional cost of course).

In my next post, I’ll run though using Azure Functions to create a serverless Python App to create data ready for further consumption in a data pipeline in the cloud!

Further information

Would you like to know more about how cloud services can help improve your organisational data strategy? Then get in touch with risual at https://www.risual.com/contact/

Watch out for the next post, and as ever, stay COVID safe !

About the author