Quantifying the Accuracy of Mobility Insights from Cellular Network Data

From mobile-network data to mobility indicators

The mobility insights big-data platform processes anonymised network events — more than 2 Million per second — and produces more than 10 Million anonymous trips per day. These trips are trajectories that we describe in term of time, space and mode of transport. As we focus on collective mobility, we aggregate these trips into dynamic mobility indicators that describe minute by minute the mobility pulse of Switzerland. For example, we are able to quantify, for a given minute of the day, the number of highway and train trips that go through a given area. We are also able to build the associated distribution of origins and destinations. All results we share are k-anonymised [4] to minimise the risk of re-identification.

  • Origin-Destination (OD) matrix [3] describes the travel demand between geographical areas. The element (i, j) of this matrix typically represents the volume of trips between the area i and the area j. We show in the figure below an example of an OD matrix.
  • Main mode of transport (MMoT) we associate each trip with the main mode of transport.
We show an OD matrix that represents the daily trips between Swiss cantons. The colour of each square is proportional to the daily count of trips between the associated cantons (blue is low and yellow is large). For example, the square that corresponds to row ZH and column BE reflects the volume of trips from Zurich to Bern. We notice from the diagonal of the matrix that most of trips are intra-cantonal. Furthermore, we apply k-anonymity and report trips only if their count is larger than k=20.

Collecting ground-truth data

Benchmarking machine-learning algorithms requires data that associate samples with ground truth (actual label). This is challenging given that the machine learning task at hand is very specific with no public datasets available. We therefore decided to collect the data ourselves: we developed an application — only open to Swisscom employees with an explicit opt-in — that provides personalised mobility reports which describe the daily trips performed by the user as well as the associated CO2 footprint. At the same time, the user is able to provide feedback: She can rate the reconstructed trip, correct the origin and destination of each trip as well as the detected mode of transport.

We display the trip to the user and ask for feedback: The user can specify the mode(s) of transport used, associate trips with semantics (commute, leisure, nature, etc), and obtain an estimate of his/her CO2 footprint.

Evaluation

To evaluate the accuracy of our indicators, we compute the

  • Median positioning error we compute the median distance, over all end-points of all trips, between our estimate and the actual end-point as specified by the user.
  • MMoT classification accuracy the proportion of samples for which we provided the correct main mode of transport.
The application allows the user to provide the actual origin and destination of a trip. This is the feedback we use to measure the median positioning accuracy.

The median position error is 132 meters. The main mode of transport classification accuracy is 90%.

For the median positioning error, we are more accurate than the state of the art positioning methods. For example, Ilias Leontiadis et Al. [1] propose a method for positioning SIM (Subscriber Identity Module) cards by using mobile network signalling data. Our probabilistic positioning method (explained below) is 42% more accurate.

We plot the empirical distribution of the positioning error. The median error is 132 meters and the 80% of the samples have a positioning error that is lower than 350meters

Probabilistic positioning from mobile network signalling data

We give a high-level explanation of the probabilistic-positioning method we have created. In order to position a SIM card, we associate with each observation (time, cell) a probability distribution that represents our uncertainty about the actual location. We choose to work on polar coordinates where the location is defined by two random variables: The radius R that is the distance from the origin (cell site) and the angle Θ measured with respect to the cell azimuth. In our model, we assume that the random variables R (radius) and Θ (angle) are independent given the cell c and the observed signal delay δ

Conclusions

In this article, we described our continuous efforts to quantify the accuracy of the mobility insights we offer and how we collect ground-truth data through an application. The results of our benchmarking showed that our machine learning algorithms produce accurate time-dependant OD matrices for different mode of transports. This empowers our customers — mainly mobility and transportation specialists— with a data-driven and therefore more informed approach to decision making.

References

[1] From Cells to Streets: Estimating Mobile Paths with Cellular-Side Data, Ilias Leontiadis et Al., published in 10th ACM International on Conference on Emerging Networking Experiments and Technologies (CoNEXT 2014).

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store