VisionMotivationFor 1JS to meet the OKR for 99% pipeline reliability, we firstly need to be able to monitor pipelines (particularly PR pipelines) for failures and be able to categorize those failures into two buckets; expected and unexpected. Expected failures are errors that occur for legitimate reasons and can be anticipated before the pipeline is run. These failures are determinate, reproducible and ideally have a fix from the code contributors. Test failures, compilation errors such as missing imports or declarations, and merge conflicts are all examples of expected failures (in the correct context). Conversely, unexpected failures are errors that occur during pipeline runs that cannot be anticipated. These errors are often not reproducible and can involve infrastructure changes which are not easily made by the code contributors. Network errors, build tool bugs and other uncategorized random failures are examples of unexpected failures. To be able to make the distinction between expected and unexpected failures is a difficult but not impossible feat. It can be as simple as boiling failures down to one or a few messages, then grouping messages into buckets, then categorizing those buckets as expected or unexpected. This can only be done if each failure has sufficient data about itself. Currently, this is the limitation we are facing due to how the pipelines are constructed. A vast majority of failures seen in Midgard and OOUI are due to the indistinct “Bash exited with code ‘1’”. Such an error message does nothing to explain how or why the pipeline run failed, and the only way currently to check is to manually go into the logs, then investigate. Such white glove treatment is highly effective in treating a few cases, but as 1JS grows, it is clear that an automated or semi-automated process is produced. This process would need to begin with better messaging and failure data to do anything. ApproachSome of the worst offenders of the bad messaging phenomenon occur within bash scripts in the pipeline steps. Bash is an extremely powerful multitool so swapping it out for something else is basically out of the question. What we expect is something that operates bash, ideally having an identical (or extended) API in the pipeline definition YML but can capture failures more effectively and present that information to a downstream system, which can further process the information. The approach that most aligns with the above requirements is an ADO pipeline task extension, capable of running bash scripts and processing the output and detecting errors better. VisionThe ADO pipeline task extension that comes to save the day? Vision. It is a task that can be swapped in for bash in any pipeline YML definition, so switching over can be made very easy. It provides the capability to run bash, process output and crucially give more information about failures. How Does it Work? It is an ADO extension written to be executed in a NodeJS runtime. It makes use of node’s built in child_process capabilities and is able to execute bash and process output. Setup
Execute
Process Output
UsageSee Vision Extension on ADO Marketplace for more details. Example
Inputs
|