Editor’s Note: This post was originally published on Jul. 7, 2021, and updated on Nov. 30, 2022 to ensure all content and links are up to date and accurate.
The first step most mobile teams take to improve their user experience is investing in a crash reporting or performance monitoring tool.
While these tools may help mobile teams identify about 80% of issues, roughly 20% of these issues still remain unsolvable either because they aren’t aware that they exist or because they don’t have the data to solve them.
As a result, your users may be complaining that there are errors with your app, even though your monitoring and crash reporting tools say that everything is working properly.
In addition, the mobile engineers don’t know where to start looking for errors in order to fix them as they don’t have the data to illuminate what’s going wrong.
For example, one of the largest e-commerce platforms in the United States had a crash affecting 1% of users on every release. However, they didn’t have the technical data to solve it. While their previous monitoring system could show them there was a failure, it couldn’t provide visibility into what was causing the failure to happen.
So how can mobile teams solve this issue?
A mobile data platform that provides observability is the answer to this dilemma. It provides the data that mobile engineers need to view all errors and performance issues and actually solve them.
It does this by collecting all data from every session and links them together into a timeline that provides useful context around the error or performance issue. This leads to faster fix times, more accurate performance metrics, and better error prioritization.
For example, in the case of the e-commerce platform above, they switched to Embrace (a mobile-first, mobile-only data platform that offers observability) to discover that the culprit of the crash was a failing third-party network call in a previous session.
Traditional error tracking tools can provide logs and breadcrumbs alongside stack traces. However, they cannot collect 100% of the data from 100% of user sessions. They cannot connect foreground and background sessions into unified user experiences so mobile engineers can immediately recreate the technical truth behind complete user journeys.
Instead of just tracking known issue types as a crash reporting and performance monitoring tool would, a mobile data platform with superior observability tooling can collect comprehensive data to give you complete visibility and the ability to solve issues in record time.
With each passing day, data platforms expand the ability of businesses to make better decisions through high-fidelity data. Datadog has enabled observability for cloud infrastructure, so businesses can efficiently monitor their distributed systems. Snowflake has built a powerful data infrastructure that allows companies to store, transform, and query big data of all types. The next entrant is Embrace, whose mobile data platform provides world-class observability and actionable insights for the entire mobile organization.
Why Is Observability so Important for Mobile?
It’s essential for companies with a business model that relies heavily on their mobile offerings to deliver a flawless user experience every time. Therefore, a crash reporting or performance monitoring tool that only covers about 80% of issues with limited actionable insights is not an acceptable solution.
For example, COVID-19 forced many hotels to invest in mobile experiences and one of those innovations is keyless entry. So instead of unlocking your room or entering the building with a key, you can open it with a code on the mobile app.
However, if guests rely on the app to enter their room, it must be 100% flawless. Otherwise, you risk losing potentially loyal customers.
Healthcare apps, such as baby monitors or glucose monitors, also depend on flawless performance. If a healthcare app fails, it could result in a life-threatening situation.
For example, one of our customers offers baby monitoring mobile solutions. If the baby’s vitals drop, it sends a notification to the parents. However, if it malfunctions (even just once), it could compromise the ability of parents to take action in an emergency.
If a bug is found in the app, the company must fix it as soon as possible to avoid the above scenarios. The problem is that most traditional performance monitoring and crash reporting tools don’t show actionable information on how to solve the issue – they just say that the issue exists.
This means the mobile engineers are wasting valuable time as they scramble to uncover what actually caused the issue.
When customers ask why observability is important to mobile development, this is one of the prime examples we point to. With a mobile data platform that offers observability, the mobile engineers can quickly piece together what happened before the error or crash and solve it in record time.
In addition, mobile has a much wider variety of variables than web. For example, you can't predict a user's connectivity, OS, device, etc. Therefore, relying on crash reporting and performance monitoring features built by brands that focus on web likely won’t provide the in-depth nuance that mobile engineers require.
With the right observability tooling, the sheer data collected enables you to identify any issue regardless of its complexity.
Why Is Traditional Error Monitoring an Incomplete Solution for Mobile?
Traditional error monitoring lacks the ability to provide complete visibility into mobile’s toughest issues. This is an issue because it leads to a lack of:
- Impact Clarity
- Actionable Data
If you can’t see the full magnitude of an error, it’s difficult to triage accurately. This is because most performance and crash reporting tools fail to offer impact clarity as they only show crash reporting based on a sample of data rather than the actual number of users impacted.
Therefore, while you might know that you have a crash, a data sample may say that it affects 0.05% of users when it really affects 2% of total users and vice versa.
On the other hand, a mobile data platform with observability accounts for every user and can give you an exact number of affected users.
Traditional crash reporting and performance monitoring software allows you to see that an error or crash exists, but won't help you solve it. For example, most tools tell you what your out of memory (OOM) percentage is, but the engineer still doesn’t know where the OOM occurred.
However, a mobile data platform with observability tells you your exact OOM percentage score and the exact screens where it occurred. Therefore, the engineer knows exactly where to go to solve it.
Most mobile performance tools/error tracking tools show you that an error occurred, though only if you've already created a log for that error or it is a crash.
However, most errors that occur don't result in a crash, and mobile issues are particularly difficult to predict due to the complexity of mobile, so logging is a poor approach.
Therefore, you still don't have complete visibility into performance.
This is where mobile data platforms that offer observability differ. Rather than depending on logs or crashes to alert you of issues, you have meaningful data to show you every session with an error or bug. Specifically, we provide:
Session Replay: Accessing high-fidelity data from 100% of user sessions. You no longer have to manually reproduce issues, and engineers can spend more time building features.
Extensive Metric Tracking: You can compare performance and stability metrics by release, feature, etc. Set up dashboards to track the KPIs that matter to your business.
Proactive Alerting: Get notified of failures that impact the user experience. Your backend system's health will not 100% reflect the device-side reality, so we alert you on every failure type, including bad code that your team wrote, bad code another team wrote (e.g. third-party SDK), and server-side outages.
With more robust and meaningful mobile data available in a mobile-focused solution, you'll be able to significantly reduce your resolution time.
When Facebook had an outage last year that crashed many iOS apps, Embrace was able to alert affected mobile teams immediately. They could remotely disable the Facebook SDK and restore service to users, mitigating the backlash of negative app store reviews and social media complaints that otherwise would be leveled at the mobile app.
For a different example, one of our customers is a Fortune 500 e-commerce company with a mobile app that customers rely on for ordering and curbside pickup. On Black Friday, they had a server outage that was causing purchases to fail. Embrace alerted them immediately, and they were able to quickly resolve the issue within a few hours, saving thousands of purchases.
With most performance monitoring tools that lack real-time data, it would have taken at least 24 hours to discover this issue (for example, Firebase data isn’t available until roughly 24 hours later). Though with data provided by Embrace, they were able to examine affected sessions and quickly respond to the server outage.
What Actionable Data Does Observability Provide?
Most crash reporting and performance monitoring tools will show you known crashes that occur and errors you have already set up logs for. More advanced ones may offer a basic percentage for things like OOMs.
However, observability takes it to a whole new level. Specifically, here are a few examples of the visibility it can provide for your mobile team.
If the app crashes on the user, they will probably leave and never come back. For some companies, such as healthcare apps, poor performance can destroy your company.
While most crash reporting tools offer the number of crashes that occured, mobile data platforms with observability show the exact data of how many people were affected by the crash (not just a sampled percentage). From there, you can even drill down to individual sessions, which provides the necessary context to fix it.
Out of Memory
An Out of Memory (OOM) error looks and feels like a crash to the user. Essentially, the app stops functioning, so it's an equally poor experience to the user as a crash. Unfortunately, because crashes are more commonly reported than OOM errors, many apps have excellent crash scores, though less than stellar OOM scores.
So if your app is 99.9% crash-free, yet only 97% OOM-free, your users still experience your app as if it is only 96.9% crash-free.
Media-heavy mobile apps like e-commerce or social media tend to struggle with OOMs the most as they load many images and videos that exceed memory limits.
Some traditional crash reporting tools provide an OOM percentage (though this is also often based on a sample of data), but only an observability platform will tell you where it happened (such as the exact screen) and how it happened (the individual session details).
For example, one of our customers is an e-commerce platform that originally had roughly 97% OOM-free sessions. However, by working down the list of specific pages with OOM issues presented in the Embrace dashboard, they were able to lift it to 99.6%.
ANR (Application Not Responding) is an error that manifests as a frozen screen to the user.
There are two main results of an ANR, and we categorize them as either ANR Exit or ANR Completed:
- An ANR Exit means the user force quit the app after seeing a dialog prompt indicating the app was not responding.
- An ANR Completed is much more encompassing as it records any ANR that occurred during the session (regardless of whether the user left).
Ideally, companies should aim for above 99.2% on ANRs.
Most traditional performance monitoring tools show you ANR Exits, but not ANR Completed. Therefore, the mobile team has no insight into ANRs that eventually resolve. Until enough users complain about slowness and stutters to warrant an investigation, mobile teams will have no reporting on this issue. This lack of visibility could easily result in churn and lost revenue for months (if not years) before the mobile team can pinpoint the issue.
Fortunately, mobile data platforms with observability (like Embrace) can show you more than just your overall average ANR score and the final stack trace.
We capture stack traces throughout an entire ANR interval to see what code your app was executing and how that evolved during the duration of the ANR. This allows you to better pinpoint the root cause of the ANR.
A user termination is when the user double-taps the homescreen of their mobile device and then swipes up to escape the app. There are many reasons why someone might terminate (e.g. they got cold feet purchasing something, they were clearing their phone), but one of them is because they were having a really bad technical experience (e.g. frozen screen). Therefore, it's important to keep an eye on this metric and look into it if any spikes occur.
Most crash reporting and performance monitoring tools don't show user terminations at all. However, we show you the percentage of sessions that have a user termination. So if they spike after a new release, you can dive in and fix them.
We also provide insights to point you towards problematic fragments/views/screens within the mobile app. You’ll see which of these areas have high numbers of user terminations relative to the frequency of total visits. That way, you can investigate sessions where the user force quit the app to uncover the preceding performance issues.
Network monitoring is important because if network calls fail, it means your app isn't functioning properly for the user. For example, if purchase calls are failing, people can't make purchases in your app.
Some performance monitoring tools offer network monitoring, but they frequently do not provide actionable insights to address shortcomings.
However, with Embrace, you can:
- Track how long calls take to complete
- See error rates for these calls (e.g. how many 500s you have)
- Set custom alerts for specific metrics
- Measure connection errors (e.g. device-side errors such as the device losing network connectivity)
- Inspect every network call’s timing and outcome within the corresponding user session
This data can help you uncover all manner or network-related issues, such as calls firing out of order, third-party calls blocking first-party calls, connection errors leading to app abandons, server outages leading to crashes, and more.
If you predict that an error will occur somewhere in your app and you want to track it, you can create a log for it. Then, if an error does occur, you'll be notified.
For example, most people have logs set up for registration and checkout pages.
Almost all performance monitoring tools offer error logging, though it's not a great idea to rely entirely on logs as many issues aren't predictable in mobile due to its complexity (e.g. different connectivities, app versions, OS versions, devices, etc.).
So instead of relying on adding logs to produce all visibility, Embrace offers tons of visibility out of the box (such as ANRs, network calls, etc.) before you ever add any logs.
In addition, observability adds another layer, which is the ability to filter logs to see their impact. For example, if you have a log set up on the checkout page, you can see which users, devices, OSs, and more are affected.
This is valuable because instead of just seeing an error message that it failed, mobile engineers can see the context in which it failed so that they can quickly identify and fix the problem.
App performance metrics are used to measure the timing and outcome of key areas within your app (e.g. startup time, add to cart, purchase, register). This is essential as slow performance will frustrate your users and cause them to leave.
Most performance monitoring tools only show you the duration and failure percentage. They won't allow you to dive into individual sessions to see what's causing the issue or see when users abandon a key moment.
With Embrace, you can set your own KPIs in addition to our out of the box KPIs. We also allow you to take any regression (e.g. users abandoning a key flow, traces that exceed a certain duration) and filter down to the specific segments you want to investigate.
For example, if you notice an increase in median startup time, you can dive into specific sessions with slow startup times and uncover the cause. Or, you can search for sessions in the new version that have a startup duration above a certain threshold.
Therefore, you can start your analysis broadly with aggregate metrics and then drill down to data from the affected sessions to find clues to solve the issue.
If you're not sure what your target should be, we also offer benchmarks for those KPIs. For example, the average acceptable startup time is approximately 1-3 seconds.
This way, you don't have to guess and can compare your percentages to real data.
How Can Observability Drive Larger Business Decisions?
Most performance monitoring tools prioritize crashes based on frequency. However, because observability gives you deeper insight into each individual user session, you can understand which issues affect your revenue the most and prioritize solving them first, further spurring your business growth and retaining key users.
Essentially, observability is the key to unlocking the full value of your mobile data.
Above, you saw how observability can surface revenue-impacting issues (crashes, ANRs, OOMs, etc.) and provide the data needed to solve them.
However, that isn't the only way it drives revenue. With high-fidelity data, you can run better models to see where LTV is most affected by issues and prioritize accordingly.
Instead of just relying on the frequency of an issue to inform your team of mobile errors, you can drill down to business impact. For example, you can understand questions like:
- How are engagement and conversions affected by new features in different regions?
- Which underperforming features lead to more abandons?
- How do slowdowns in key areas affect session duration and retention?
- Which sources of churn lead to the most revenue loss?
While crash reporting and performance monitoring tools were a great start to mobile monitoring, they are inefficient at providing the depth of information mobile engineers require to solve critical errors, mitigate risk, and align with business goals.