OverOps' Error analysis screen provides a powerful mechanism to get to the root cause of errors and exceptions in production and staging environments. The screen is divided into a number of sections, each providing information about the error to create a complete picture of the cause and impact of the error.
The main components in this screen include:
The Error Analysis screen
This pane provides you with important details relating to the impact of this error on your application(s). It is divided into two areas: the analytics pane and chart.
The analytics pane shows you the exact type of error, when it began and how many times its and out of how many calls. The analytics chart shows you the volume of the error over the course of the selected timeframe (e.g. last hour,day, week,..). You can also filter the chart to shows the volume in specific applications or servers.
You can click the JVM or Server labels to directly to see the volume of the error on specifically on that machine or application. You can also hover over the occurrences label to see the number of times this error has occurred and out of how many calls into the method containing it.
You can use the button in the top right of the screen to open and select this error in the context of OverOps' main dashboard. You can click "Go to snapshot" on any point in the graph to jump and see the code and variable state at that moment in time.
The analytics panel on the left and the chart on the right showing the volume of the error.
The call stack pane shows you the chain of methods within the JVM leading to the error. The topmost method denotes the last method on non 3rd party code within your application leading to the error.If the method contains a it means variable state has been captured for by the JVM micro-agent.
In case of an exception that is caught and then re-thrown once or even multiple times within the context of a thread you can see the error analysis for these exceptions using the Related Errors dropdown (this drop down will only shown if such related exceptions exists).
At the bottom of the stack you can see the machine name and the JVM thread name for the thread in which this error occurred. 3rd party code is hidden by default and can expanded by toggling the "Show 3rd party methods" on the bottom of the stack. You can also use the "COPY STACK" button to copy the full stack to clipboard.
The call stack pane
The source code view shows you by default a decompiled Java version of the bytecode which was executing within the JVM at the moment of error. You can hover any highlighted variable to see its value and jump to see its full contents within the variable grid.
The line in which the error occurred will also be highlighted as depicted below. Above the code pane you can see the full error message and the time in which it occurred.
You can also easily configure OverOps to use your own source code instead of decompiling it from the JVM.
You can search for any variable name or value in the source code or variable grids using the box. Click here to learn more about variable search.
The source code and variable state pane
The variable state grid shows the value of the variables and objects accessible from the method. Objects can be explored up to five levels deep into the heap. Click the "..." ellipsis button next to every object to see its entire contents as a JSON object and copy it to the clipboard.
The variable grid contains all local variables and parameters (including "this" in non-static methods). The first method also contains thread-local variables defined for this thread as well as Logback, SLF4J and Log4J Mapped Diagnostics Context (MDC) values. These MDC objects are often too large, for the full set of data to be available in the log, the micro-agent, however, is able to capture and record the entire object.
In some use cases, such as asynchronous message passing, these MDC objects contain a key-value map of recorded requests, initial servlet information, and much more. These can be seen in any OverOps snapshot, and provide better visibility to the source of the bad request. This is very helpful extended visibility feature since back tracing the source of a bad request in an asynchronous environment is a known challenge.
Which variables are collected and at what depth - how many variables to collect, the number of items to collect from a collection, the length of string to capture, etc. - is determined by the micro-agent. This ensures gathering the most relevant variables within an allocated timeframe using an adaptive machine learning algorithm. Click here to learn more about object and variable state.
(1) The variable grid displaying the variables state within the current method as well as thread local variables
(2). The JSON representation of an object available through the "..." ellipsis button.
The Log View shows the last 250 log statements leading up to the error. As these statements are collected directly from JVM memory you can see any DEBUG, TRACE or INFO statements regardless of whether or not they were logged to file.
You can reach this view by clicking the button to switch between code and log view.
Click here to learn more about the Log View.
The Log View pane showing the last 250 log statements leading to this error.
The actions toolbar provides you with a set of capabilities to share, mark and search through the contents of this error analysis.
- Send to JIRA - enables you to create a new JIRA issues for this error linking directly to the source, stack state and statistics behind this error. Click here to learn more about JIRA integration.
- Hide - marks this error with "Hide", which means it will no longer appear in dashboard event list and chart. Furthermore, the micro-agents will no longer capture error analysis snapshots for it. It will appear under the "Archive" label in the dashboard where it can be-unhidden. Click here to learn more about hiding errors.
- Resolve - marks this error as "Resolved", which means that it has been fixed by you in which case it will be removed from the dashboard's event list and chart. However should this error occur after a new code deployment it will be tagged as "resurfaced" and you will receive an email notification and it will return to the event list and chart. Click here to learn more about resolving errors.
- Label - add a label to this error. Labels are great tool for classifying and tagging errors with tags such "Critical", "Low" to assign priority or "John" or "AQ" to assign responsibility or "V1 RC2" to denote a version. Learn more about creating and assigning labels here.
- Edit Note - This enables you to attach a note to this error and share it with your teammates by tagging them. They will be alerted via email. Click here to learn more about sharing with teammates.