# Quantifying the Accuracy of Mobility Insights from Cellular Network Data

In the Mobility Insights team at Swisscom, we believe that combining mobile network signalling data with powerful machine learning algorithms will improve urban life and make it more sustainable. Our mission is to help our customers — mainly mobility and transportation specialists— by providing a minute by minute image of the collective mobility in Switzerland. As a result, our platform empowers our customers with a data-driven approach to decision making.

For this reason, it is very important to quantify the accuracy of the indicators we offer. In this article, we describe our continuous efforts to collect ground-truth data and quantify the accuracy of the mobility indicators we offer.

# From mobile-network data to mobility indicators

The mobility insights big-data platform processes **anonymised** network events — more than 2 Million per second — and produces more than 10 Million anonymous trips per day. These trips are trajectories that we describe in term of time, space and mode of transport. As we focus on collective mobility, we aggregate these trips into dynamic mobility indicators that describe minute by minute the mobility pulse of Switzerland. For example, we are able to quantify, for a given minute of the day, the number of highway and train trips that go through a given area. We are also able to build the associated distribution of origins and destinations. All results we share are k-anonymised [4] to minimise the risk of re-identification.

In this article, we will focus on quantifying the accuracy of two central mobility indicators

**Origin-Destination (OD) matrix [3]**describes the travel demand between geographical areas. The element (i, j) of this matrix typically represents the volume of trips between the area i and the area j. We show in the figure below an example of an OD matrix.**Main mode of transport (MMoT)**we associate each trip with the main mode of transport.

As we offer these mobility indicators to the users of our platform, we need to provide an estimation of the accuracy we are able to achieve.

# Collecting ground-truth data

Benchmarking machine-learning algorithms requires data that associate samples with ground truth (actual label). This is challenging given that the machine learning task at hand is very specific with no public datasets available. We therefore decided to collect the data ourselves: we developed an application — **only open to Swisscom employees with an explicit opt-in —** that provides personalised mobility reports which describe the daily trips performed by the user as well as the associated CO2 footprint. At the same time, the user is able to provide feedback: She can rate the reconstructed trip, correct the origin and destination of each trip as well as the detected mode of transport.

**Evaluation**

To evaluate the accuracy of our indicators, we compute the

**Median positioning error**we compute the median distance, over all end-points of all trips, between our estimate and the actual end-point as specified by the user.**MMoT classification accuracy**the proportion of samples for which we provided the correct main mode of transport.

We estimate the accuracy of our indicators over more than 6000 trips, which amounts to more than 12000 end-points on which we measure the median positioning error.

## The median position error is 132 meters. The main mode of transport classification accuracy is 90%.

For the median positioning error, we are more accurate than the state of the art positioning methods. For example, Ilias Leontiadis et Al. [1] propose a method for positioning SIM (Subscriber Identity Module) cards by using mobile network signalling data. Our probabilistic positioning method (explained below) is **42% more accurate**.

We are therefore able to build accurate OD matrices whose geographical scale is in **hundreds of meters**. We currently offer to our customers OD matrices at the level of postal codes [2], which is a geographical scale that is compatible with the positioning accuracy we achieve. If we add to this the fact that we detect for 90% of the trips the correct mode of transport, we are now able to accurately decompose these matrices into matrices that are associated with MMoT: we can for example produce for a given hour of the day an OD matrix associated with train trips.

# Probabilistic positioning from mobile network signalling data

We give a high-level explanation of the probabilistic-positioning method we have created. In order to position a SIM card, we associate with each observation *(time, cell) *a probability distribution that represents our uncertainty about the actual location. We choose to work on polar coordinates where the location is defined by two random variables: The radius *R* that is the distance from the origin (cell site) and the angle Θ measured with respect to the cell azimuth. In our model, we assume that the random variables *R* (radius) and Θ (angle) are independent given the cell *c* and the observed signal delay δ

*p(R = r, Θ = θ|c, δ) = p(r|c, δ) × p(θ|c, δ) = p(r|δ) × p(θ|c)*.

The signal delay is an estimation of time delay between the cell and the device, which is available via 3G and 4G protocols. However, it is sparse because measurements are not continuously performed. This implies that in most situations we need to marginalise over the possible values of signal delay for a given cell in order to obtain the distribution of radius *p(R|c)*. For that, we build for each cell an empirical distribution of signal delay values which is learnt from all observations.

The angle distribution is a Multinomial distribution that depends only on the cell azimuth and bandwidth. The radius is distributed as a Gaussian *N(μ(δ),σ(δ)) *whose mean and variance are learnt from empirical observations using a maximum likelihood estimator [5].

# Conclusions

In this article, we described our continuous efforts to quantify the accuracy of the mobility insights we offer and how we collect ground-truth data through an application. The results of our benchmarking showed that our machine learning algorithms produce accurate time-dependant OD matrices for different mode of transports. This empowers our customers — mainly mobility and transportation specialists— with a data-driven and therefore more informed approach to decision making.

# References

[1] From Cells to Streets: Estimating Mobile Paths with Cellular-Side Data, Ilias Leontiadis et Al., published in 10th ACM International on Conference on Emerging Networking Experiments and Technologies (CoNEXT 2014).

[2] Postal codes in CH https://en.wikipedia.org/wiki/Postal_codes_in_Switzerland_and_Liechtenstein

[3] https://en.wikipedia.org/wiki/Trip_distribution

[4] https://en.wikipedia.org/wiki/K-anonymity

[5] https://en.wikipedia.org/wiki/Maximum_likelihood_estimation