On May 19, 2019 at approximately 8:20pm PST, Braze, a leader in mobile attribution, ‘went down’ across all of its customers. When a major vendor ‘goes down’, their network calls originating from the mobile app result in an error. Typically, the vendor will notify customers immediately, along with an estimated time to resolve the issue. But, even worse in this instance, when we looked at Braze’s site, no downtime was self-reported.
Although total outages are infrequent, given that almost every app depends on an attribution vendor, we compared five of the largest attribution vendors for error rates to help all app developers better understand on which attribution vendor to rely. The four vendors are Braze (formerly AppBoy), AppsFlyer, Branch (aka Tune), and Kochava.
Effects of a Mobile Attribution Vendor ”Going Down”
The effect of the network calls misfiring from the mobile app are two-fold: one affects the mobile app’s core analytics for understanding their users and paying their install sources. and the other has the potential to freeze the app.
First - Amiss Attribution
When a user installs an app, the attribution vendor calls itself and attempts to understand the originating source of the install. When this call misfires, the user install may not record. The source of attribution (for example Facebook, Twitter, Google, and AppLovin) will not be attributed to the install and it may not get paid. In fact, the install may be incorrectly attributed to a different vendor or an organic source, like the Apple or Google app stores. Analytics for understanding effective CPIs (eCPIs,) LTVs, and retention will be incorrect.
Second - App Freezes
In some cases, the developer may make the app reliant on the call. For example, if the user clicked and installed from a Nike Lebron’s shoe display ad, then an in-app promotion that should have been displayed to the user will not be shown. Even worse, the app might freeze because the code assumes that the attribution vendor never goes down.
Attribution Vendor Comparison
We compared the most implemented attribution vendors by their network call error rates. Error rates are a good proxy for the stability of their platform, especially in terms of how their integration affects the stability of the app and the quality of the app’s core tracking metrics.
The primary network errors tracked are 4xx and 5xx. These errors are returned by a network call that does not complete correctly. Each code tells you the reason why. At the highest level, 4xx’s generally represent user-side access issues, the most common being a 404 which indicates that the endpoint most likely does not exist. 5xx’s represent more serious client-side server issues and are generally pretty rare. If a high rate of 5xx’s shows up, it generally means there was a server outage.
While each vendor had relatively low error rates in absolute terms, small percentages can have a large affect on the app, payments sent to vendors, and the LTVs of cohorts by channel. Consider a crash, the impact is high even when the crash-free session percentage is 99.9%.

Taking a deeper dive into the relative error rates when compared to the total call percentage are as follows:

The worst performing domain by error type is Braze, which over-indexes error percentage to total call percentage by roughly 22 percentage points.
We separated Appsflyer from the sample given that 99.9% of their total errors returned an intentional 4xx rather than inadvertent 5xx. Always sending a 4xx is considered poor practice. Even so, some third-party SDKs use 4xx intentionally to keep their own server costs lower.
The next step would be to understand if these errors are actually due to server-side issues or if they were a result of an unintentional device error, which can mostly be explained by the ratio of 5xx to 4xx.

If we adjust our findings to the percentage of 5xx to total call percentage, the adjusted results are as follows:

By the adjusted results, it seems that Braze had no change given that 99.9% of their errors return a 5xx, while Branch actually improved given that a material amount of their errors resulted in a 4xx instead of a 5xx.
Conclusion
Perhaps no vendor is more important to understand in terms of stability than attribution vendors. When making a key decision that impacts understanding of your users, key marketing purchase decisions, and downstream analytics, percentages matter. Without proper attribution, teams will operate on incomplete or erroneous data.
Think about an LTV that is off by pennies…
These pennies are shaved off of a bid price…
This means less users acquired.
Having an attribution vendor with minimal error rates will give teams the most accurate insights into their users and best converting channels.
Editor’s note about the methodology for this research: Embrace collects network data for close to 100 mobile apps. The company aggregated the network calls across all apps and did counts of error rates as well as counts of successful network calls. Along with this, Embrace collects different statistics around these calls, including duration and size.
Article was also shown on Developer Tech News on July 10, 2019