Recently Microsoft have announced updates for their debugging tools for the HDInsight feature. Many businesses currently use Microsoft Cosmos which runs millions of jobs constantly which made being able to scale or manage jobs being a huge challenge. To combat this Microsoft have announced the early access of the Apache Spark Debugging Toolset for HDInsight for spark 2.3 cluster or higher. The current default Spark history server user experience has been enhanced with HDInsight to be full of valuable information on spark jobs of Job Graphs and Data Flows. These new features assist developers in job data management, data sampling, job monitoring and job diagnosis. Some of the enhancements include:
The graph tab shows a insightful virtualisation of the current jobs. The interface helps with innovative debugging experiences for example playback and heatmap ranked by progress of job stages, read, written for Spark application and singular jobs. The spark job graph will display the job execution details containing input and output across the stages. If jobs are already completed the Spark graph will allow the Spark developer to playback the job progress with written details. It is also possible to look into Spark job diagnostics around performance, data and execution time using the experience.
This allows users to perform tasks such as output a data view, search, download, preview, copy and many more. Its also possible to partially download data as a sample to then run through the debugging process. There is also a feature included in the Table Operations section to view the Hive metadata and investigate table operations at each stage to gain even more insights.
Future Planned Features
- Critical path analysis for Spark application and job
- Spark job diagnosis
- Data Skew and Time Skew Analysis
- Executor Usage Analysis
- Debugging on failed job