Measuring the health and performance of mobile applications has never been more important. With ever-increasing competition, mobile companies cannot rely on feature differentiation alone to retain users. As users invest more of their time into mobile offerings, they’re demanding flawless experiences every time.
Businesses that want to excel in the modern mobile landscape are thus investing in solutions that provide higher levels of visibility into performance. There are many options to choose from with various strengths and weaknesses, falling into the categories of:
- Mobile Application Performance Monitoring
- Mobile Real User Monitoring
- Mobile Observability
In this post, we’ll cover what these solutions offer, their strengths and weaknesses, and how they provide much needed visibility to mobile teams.
Mobile Application Performance Monitoring
Mobile application performance monitoring (APM) is the most basic monitoring necessary for a team to know the health of their application. This monitoring system provides key insights and information about an application’s performance and usage patterns.
Performance monitoring is different from error tracking, which only reports errors and stability issues in your mobile app. An easy way to visualize the difference is that error tracking solutions are like when your body experiences acute pain — that’s when you know something has gone terribly wrong. Performance monitoring, on the other hand, is like trying to determine how healthy a person is by timing how long they can run on a treadmill. You could then compare their time to some benchmark. You might see that they’re doing worse than average, but you won’t know what exactly is causing them to do poorly.
The primary strength of mobile APM is to help the mobile team understand the performance of the app before the errors start piling up. Knowing how the app is currently performing can help the team focus on the metrics that matter and give them actionable KPIs to improve upon.
The following are some common metrics that mobile APMs collect:
- Startup time
- Network call duration and payload size
- Network errors
- Timing of custom traces (e.g. add to cart, purchase)
Mobile APMs can point out regressions in aggregate metrics over time. Teams can set up and test the timing of key interactions within the app so that when it goes into production, they know the expected performance. Any significant deviations probably warrant an investigation.
The expression “where there’s smoke, there’s fire” comes to mind. Mobile APMs can indicate when an issue may have surfaced so mobile teams know when to dive in.
Unfortunately, identifying that failures are happening within a mobile application is not the same as providing the context needed to solve them. Mobile APM’s biggest weakness is an overreliance on metrics and logs.
For example, if the mobile team discovered a key endpoint suddenly had a large increase in response time, the ultimate impact on the user could be trivial or huge. Without individual session context to know whether users were force quitting the app when hitting this part of the app, the mobile team is forced to guess based solely on a metric whether a regression deserves immediate attention.
Let’s go through a few scenarios when a network response time suddenly balloons.
It happens when users are attempting a purchase.
- Users might be upset, but ultimately complete the purchase.
- Or users could cancel the purchase and force quit the app, choosing to use a competitor’s offering instead.
It happens when users are uploading a heavy video.
- The call results in a connection error when it takes so long that it times out. Users, unfazed, try the upload again and again until it’s successful.
- Or the slow call results in a crash. Users are upset at having to go through the upload process again, and decide against it, ultimately nosediving app engagement.
It happens when users launch the app.
- Users hate the long load time but patiently wait for the app to become interactive.
- Or users just background the app and uninstall it.
Because the metric isn’t tied to individual user journeys, mobile teams cannot effectively prioritize issues without guesswork. Tracking indicators instead of actual user experiences will always have this visibility gap.
So what’s the next step up from mobile APM? It’s a solution that gives the mobile team insight into how users are interacting with the application itself. For that, we have…
Mobile Real User Monitoring
Mobile real user monitoring (RUM), also known as end-user experience monitoring (EUM), is a monitoring system which provides key insights and information about an end user’s usage patterns and experience with the application. It’s an extension of performance monitoring that captures and analyzes transactions at the individual user level. Thus, it’s designed to gauge the underlying user experience, including key metrics like load time and transaction paths, and it’s a logical expansion of mobile APM’s capabilities.
Let’s expand on our analogy about how to monitor a person’s health:
- In mobile APM, we determine how healthy a person is by timing how long they can run on a treadmill. We can get a general sense of their health this way, but we won’t be able to tell exactly why they are underperforming.
- In mobile RUM, we instead get a list of every action they take and how long it takes to complete them. For example, we know if they went into an ice cream store and spent 30 minutes eating ice cream. Thus, we have more insight into how the person’s actions ultimately affect their health.
Mobile teams will want to implement mobile RUM when the user’s experience with the app is starting to take priority over simple performance tracking. Remember: just because your app is “healthy” and doesn’t crash or freeze doesn’t mean your user is enjoying your app!
Mobile RUM is frequently added as a mobile counterpart to existing backend monitoring tools like Appdynamics, Dynatrace, and Datadog. Teams can thus extend their existing coverage to include device-side visibility. The goal is to track transactions from the device through the entire backend to discover shortcomings.
Here’s what a good mobile RUM solution tracks:
- Session-level context: breadcrumbs of user actions (e.g. taps, scrolls) and screens visited
- Device-level information: device, OS, version, region, etc.
- Network-level information: network calls, network quality, etc.
Mobile teams can be notified of a spike in a given event (e.g. network call, trace) and investigate a sampled amount of sessions to gain context.
Mobile RUM is an additional feature built out from a backend monitoring solution. While it functions on a basic level for mobile teams to gain insight into user actions within a given session, mobile RUM tools don’t provide the true session replay necessary for debugging individual issues.
Mobile has so many variables that every single device is unique. When you factor in differences between the OS, app version, region, connectivity, and device state, each user experience has incredibly high cardinality. Thus, the key to uncovering the root cause of issues is diving into high fidelity data from the individual affected session. Instead of relying on broad coverage like breadcrumbs and screens, mobile teams need the ability to immediately reproduce the entire technical and behavioral details of any session.
Let’s go through a few issue types that a mobile RUM solution would not provide visibility into.
Issues spanning multiple sessions
In mobile, users jump in and out of apps all the time. A unified user experience could span several foreground sessions. As such, for monitoring solutions that are not built to collect and stitch together nearby sessions, the mobile teams will not be able to pinpoint the root cause when the failure is a previous session.
One of the largest mobile e-commerce apps in the world had a crash that affected 1% of users every release. It existed for years since the app’s first submission to the app store. The mobile team knew the impact on the business but could not discover what the cause was. The problem stemmed from a failing third-party network call in a previous session. It took a mobile-first platform like Embrace to provide the visibility needed to solve it.
Exceeding resource limits
Mobile apps crash when they exceed system resource limits. Having a complete replay of the technical details of a session includes knowing exactly when there is CPU pegging, a low memory warning, low battery, etc. Knowing what leads up to these bad device states can both pinpoint optimization opportunities and prevent app failures.
For example, e-commerce and social media apps frequently load large amounts of photos and videos, leading to out of memory (OOM) exceptions that kill the app. OOMs tend to occur an order of magnitude more frequently than traditional crashes. A 99% crash-free app with a 97% OOM-free rate really has a crash-free rate of 96%.
A mobile RUM solution offers no context into these issues. If the network calls that download photos all complete quickly and successfully, there’s no indication of a problem. The mobile team needs insight into the device state at all times during a session to spot these types of failures.
Unclassified crash types
There are several crashes that are difficult for traditional monitoring solutions to classify, including Watchdog terminations, Auto Layout exceptions, and CollectionView crashes. When users complain about such a crash, the mobile team either doesn’t get a stack trace or doesn’t get a useful one. Mobile RUM solutions do not provide this mobile-focused depth in their coverage.
Application Not Responding (ANR) is a type of error on Android that predominantly occurs when the main thread of the application is blocked for a minimum of 5 seconds, upon which the user is prompted to terminate the app. However, ANRs do not have to last 5 seconds to impact the user experience. Freezes and stutters happen anytime the main thread is blocked, and these lead to users backgrounding and force quitting the app.
Mobile RUM can track slow network calls, but frozen screens can happen in many ways beyond a single slow network call. If multiple SDKs are initializing at startup, the network calls they fire could resolve quickly yet lead to congestion when processing the data on the device. Likewise, storing or transforming data can be CPU-intensive and result in a frozen screen.
Users force quit mobile apps for many reasons. It could be that they are just clearing out their running apps, or it could be a sign of frustration with the app (e.g. slowness, too many ads, frozen screen). A mobile RUM solution cannot provide insights about a spike in user terminations because it’s not built to monitor entire user experiences. In troubleshooting these types of issues, where there’s an absence of information, having complete data from every session is crucial. That way, the mobile team can go from an indicator metric to immediately filtering affected sessions across attributes to spot patterns that lead to causes.
Similar to the weakness of mobile APM, mobile RUM can alert teams to issues but doesn’t provide the context necessary to solve them. Mobile teams want a data platform that uncovers both known and unknown issues. Instead of monitoring for specific elements, they want a solution that will show every user-impacting issue, its impact, and provide the context needed to solve it. To that end, the next step up from mobile APM and mobile RUM is…
Observability is a design concept defined as the ability of a system to enable identifying its internal states by analyzing its external outputs. In layman’s terms, you can look at an observable system from the outside and know exactly what’s happening inside. An observable application leverages design and instrumentation to provide insights that enhance monitoring and logging data.
Let’s revisit one more time our analogy about how to monitor a person’s health:
- In mobile APM, we determine how healthy a person is by timing how long they can run on a treadmill.
- In mobile RUM, we instead get a list of every action they take and how long it takes to complete them.
- In mobile observability, we have a camera following the person around, cataloguing where they are and what they’re doing at any given time. In addition, we have a machine hooked up to them that continually sends us their body’s vitals so we know how they react physiologically to a given situation.
In other words, observability is a measure of a system’s ability to enable teams to diagnose what’s happening inside without the need for guesswork and process of elimination. When observability is integrated into an application, the mobile team can easily gauge its internals and navigate to the root cause of issues faster.
As businesses increasingly turn to mobile as a primary revenue mechanism, the ability to make decisions with high-fidelity data and insights becomes a key competitive advantage. To move fast, it’s crucial to have full visibility into your mobile applications:
Performance and stability
- Are users abandoning the app because the startup is really slow?
- Is the app freezing when users try to make a purchase?
- Are users dropping out of key funnels?
- Whether it’s a crash, slow startup, or failing endpoint, how does the issue affect revenue and churn?
- When a regression happens, how does it affect engagement (e.g. session length, feature use)?
- How does in-app advertising impact the user experience?
- Are users spending more time and money in Feature A as opposed to Feature B?
- Do features have more adoption with specific user segments (e.g. device, OS, region)?
- Does a new feature not perform as well as other parts of the app (e.g. uses too many system resources, runs slow)?
- Are mobile engineers the first to know when something’s wrong?
- When a third-party SDK crashes your app, do you know before the company makes an announcement?
- Can you control precisely when your team is notified for a given issue?
If your current mobile tooling cannot help you answer these questions, then you need to switch to a more powerful data platform.
Embrace Is Built to Power Your Mobile Business
Embrace is the only mobile data platform that provides observability, debugging, and proactive alerting for mobile teams. We are a comprehensive solution that fully reproduces every user experience from every single session. Your team gets the data it needs to proactively identify, prioritize, and solve any issue that’s costing you users or revenue.
Embrace collects 100% of the data from 100% of user sessions and makes it available to the entire mobile team.
- Engineering can replay any session to pinpoint the root cause of any issue — even one that doesn’t result in a crash or error log.
- Product can check feature adoption and run experiments to make roadmap decisions based on which parts of the app are getting the biggest return.
- Data Science can run LTV models with complete data and uncover where churn is impacting revenue the most.
- QA can easily test new app versions and send the associated session data to engineers without the need to create manual bug reports.
- CS can triage user complaints by looking up individual user sessions to see if the issue was with the code, the user, or the network.
A platform that offers observability comes with all the benefits of mobile APM and mobile RUM plus so much more, including:
- 100% of the technical and behavioral data from 100% of sessions
- Timing and outcome of every network call
- Full user journey (e.g. views/screens/activities, breadcrumbs, and webviews)
- Full user actions (e.g. taps, swipes, scrolls, button presses)
- Automatic crash classification and deduplication
- Advanced ANR detection and solving by capturing stack traces throughout every frozen interval
- Device state (e.g. CPU, memory, battery)
- Error logging with filtering by key value pairs
- Timing and abandonment tracking for custom traces
For teams that lack the visibility needed to drive business decisions, observability is the answer. Embrace will surface any issue before it has an outsized impact on your users. Your team gets alerted to any problem so they are the first to know. They have the data to allocate resources efficiently where they will produce the most value.
In recent years, the mobile tooling landscape has improved by leaps and bounds. Traditional error and performance monitoring solutions are no longer robust enough to serve the growing needs of mobile-first and mobile-focused companies.
If your team is using mobile APM and mobile RUM solutions and still struggles to uncover and solve the issues that are costing you users and revenue, consider the benefits that a mobile data and observability platform like Embrace can provide.