Through the Bus Drivers’ Eyes

Linking Operational Data and Drivers’ Perspectives to Network Monitoring and Planning

📅 29/09/2025

👥 Gonçalo Matos, Filipe Moura, Rosa Félix

Topics

  • EDA on events
  • EDA on messages
  • How do they relate?
  • Discussion
  • Next steps

EDA on events

📌 32 633 525 events

🚏 5 745 268 stop services

🚍 179 091 trips monitored (65.1% of planned offer)

📅 Between 01/05 and 31/05/2025 (31 days)

Descriptive stats

🕐 Average Reliability Buffer1 Index of 0.294 (against 0.146 planned)2

Examples of routes with high RBI²
Route Mean duration (planned) P95 duration (planned) RBI (planned) Difference
712: Santa Apolónia > Marquês Pombal 24.5
(21.0)
35.0
(21.0)
0.426
(0.0)
+ 0.426
750: Estação Benfica > Algés 25.9
(25.7)
37.0
(27.0)
0.425
(0.048)
+ 0.376
10B: Campo Cebolas (Circular) 21.7
(25.0)
30.0
(25.0)
0.384
(0.0)
+ 0.384
773: Alcântara > Rato 25.6
(26.5)
36.0
(29.0)
0.408
(0.094)
+ 0.314

Descriptive stats: RBI

🕐 Greatest impact on business days…

Descriptive stats: RBI

🕐 Greatest impact on business days…

… and during peak hours

Descriptive stats

🚌🚌 Bus bunching1 occurs on 7.5% of stop services

Descriptive stats

🚌🚌 Bus bunching1 occurs on 7.5% of stop services

Stops with highest bus bunching index
Stop Route % Services bunched
Belém (Museu Coches) 751 25.8
R. Alecrim 758 25.6
Altinho (MAAT) 751 25.6
Portas Benfica 758 25.5

Descriptive stats: bus bunching

🕐 Greatest impact on business days…

Descriptive stats: bus bunching

🕐 Greatest impact on business days…

… and during peak hours

Descriptive stats: bus bunching

🛣️ Do bus lanes have an influence?

Descriptive stats: bus bunching

🛣️ Do bus lanes have an influence?

On stops served by bus lanes (14.6%), 8.38% of the services are bunched

Routes that pass through bus lanes do not seem to register less bunching, actually data suggests the opposite occurs (R² = 0.13, r = 0.36, p < 0.001)

Descriptive stats

✂️️ There are, on average, 443 services shortened per day (4.9% of planned offer)



Routes with highest shortening rates
Route Shortened services (%)
723: Desterro - Algés 19.5
727: Roma-Areeiro - Restelo 16.9
751: Campolide - Linda-a-velha 13.6
759: Restauradores - Oriente 11.8
738: Estrada Luz - Alto Sto. Amaro 11.4

Descriptive stats: shortenings

🕐 Greatest impact in the middle of the week and Saturdays

Descriptive stats: shortenings

🕐 Greatest impact in the middle of the week and Saturdays

… and during peak hours

Descriptive stats: shortenings

📏 Does route length influence the number of shortenings?

There is a statistically significant positive exponential relationship between the variables

R² = 0.29, r = 0.54, p < 0.001

EDA on messages

💬 270 395 messages

🚍 101 bus routes

📅 Between 20/01 and 31/05/2025 (132 days)

Descriptive stats

👥 Exchanged between drivers (61.4%) and controllers (38.6%)

🗺️ 24.83 per route/day (SD = 27.69), on average

Descriptive stats

👥 Exchanged between drivers (61.4%) and controllers (38.6%)

👨‍✈️ Drivers

2.78 per day (SD = 2.35), on average

Mostly related to procedures (67.51%)

Descriptive stats

👥 Exchanged between drivers (61.4%) and controllers (38.6%)

👨‍✈️ Drivers

👮‍♀️ Controllers

Mainly free text messages (71.8%)

Descriptive stats

👥 Exchanged between drivers (61.4%) and controllers (38.6%)

👨‍✈️ Drivers

👮‍♀️ Controllers

Problem: Information loss on controllers’ free text messages

Free text analysis

Problem: Information loss on controllers’ free text messages

Solution: Text mining for free text analysis (Valença, Moura, and Morais de Sá 2023)

Procedure:

  1. Data clean

    1.1. Lemmatization [Rinker (2017); extended by author observation]

     Standardize words to base or dictionary form (Eg. Siga > seguir, OBG > obrigado)

    1.2. Stop words [stopwords-iso (2016), Boothe (2023); extended with stops names]

     Filter out words with little semantic value (Eg. à, pelo, seu, por)

Free text analysis

Problem: Information loss on controllers’ free text messages

Solution: Text mining for free text analysis (Valença, Moura, and Morais de Sá 2023)

Procedure:

  1. Data clean

    1.1. Lemmatization

    1.2. Stop words

  2. Word frequency analysis

  1. Topic modelling, using Latent Dirichlet Allocation

Word frequency analysis

Useful to identify context specific and correct lemmas and stop words

1st iteration

obg is a missing lemma (obrigado)

atar is an invalid lemma (até > atar)

pedir is an invalid lemma (pedido > pedir)

Word frequency analysis

Useful to identify context specific and correct lemmas and stop words

1st iteration

11th iteration

Topic modelling

Using Latent Dirichlet Allocation

Free text analysis outcomes

  • 7 clusters were already typified

  • 7 new clusters (extending 53 current categories)

ID Category
429 ASSIM QUE POSSÍVEL,AVANCE.OBG.
430 NO TERMINAL,PEÇA FONIA. OBG.
431 URGENTE.
432 SE POSSÍVEL,CONTACTE CCT. OBG.
433 IMOBILIZAR NO LOCAL. OBRIGADO.
434 MENSAGEM RECEBIDA. OBRIGADO.
435 INTRODUZA OS DADOS NA CONSOLA.
436 DÊ FIM OU INÍCIO DE VIAGEM.OBG
437 DÊ FIM DE SERVIÇO. OBRIGADO.
438 CONTACTAR RAPIDAMENTE A CCT.
439 RÁDIO AVARIADO,CONTACTAR CCT.
440 VÁ RESERVADO À RENDIÇÃO. OBG.
441 NO TERMINAL,AGUARDE PELA FT.
442 NO TERMINAL,AGUARDE PELO PS.
443 AUSÊNCIA DA CHAPA DA FRENTE.
444 CONTROLAR A MARCHA. OBRIGADO.
445 ATENÇÃO AO HORÁRIO. OBRIGADO.
446 PISO ESCORREGADIO - CUIDADO!
447 SE ENCONTROU ACHADO,CONTACTE.
448 NO TERMINAL,VISTORIE EXTERIOR
449 A/C,SÓ QUANDO ESSENCIAL. OBG.
450 USE A VENTILAÇÃO. OBRIGADO.
451 CAMINHO LIVRE
452 Saída da Estação
453 Início de Viagem
454 Fim de Viagem
455 Início da Viagem Recolha
456 Recolha à Estação
457 Rendição
458 Pedido de Fonia
459 Partida Avançada
460 PASSAGEIRO SEM TÍTULO
461 CARTEIRISTAS
462 AUSÊNCIA MOMENTÂNEA
463 FIM DE AUSÊNCIA MOMENTANEA
464 FALSO ALARME
465 MENSAGEM RECEBIDA
466 COMPLETO
467 ACIDENTE ENTRE TERCEIROS
468 INTERRUPCAO
469 CAMINHO LIVRE
470 FALTA DE RENDICAO
471 AVARIA SEM IMOBILIZACAO
472 AVARIA COM IMOBILIZACAO
473 ACIDENTE
474 BICICLETA
475 VANDALISMO NO INTERIOR
476 PEDIDO DE POLICIA
477 CONGESTIONAMENTO
478 COMBOIO DE CARREIRA
479 CONSOLA OU RÁDIO AVARIADO
480 CADEIRA DE RODAS
481 TEXTO LIVRE
901 PEDIDO DE ESTADO
902 BANDEIRAS
903 SAUDAÇÃO CONTROLADOR
904 ENCURTAMENTO PERCURSO
905 TRANSBORDO PASSAGEIROS
906 ULTRAPASSAR CHAPA FRENTE
907 RADAR

Free text analysis outcomes

Before

75 008 uncategorized

71.8% of controllers’

27.8% of total

Free text analysis outcomes

Before

75 008 uncategorized

71.8% of controllers’

27.8% of total

After

10 486 uncategorized

10% of controllers’

3.9% of total

Messages spatial analysis

Kernel density maps (Wickham 2016)

🚌💨 Heatmap of overcome next bus message (by controllers)

Relating messages and events

Relating messages and events

Considering messages and events for May 2025

Message N Departure delay Travel time diff Arrival delay
- - Mean (SD)
Median (IQR)
Mean (SD)
Median (IQR)
Mean (SD)
Median (IQR)
🙆
Full
1939 0.31(2.72)
0.00 (0.00, 1.00)
3.65(5.08)
4.00 (0.00, 7.00)
3.96(5.14)
4.00 (0.00, 8.00)
🎌
Flags
967 0.37(4.49)
0.00 (-3.00, 3.00)
6.22(4.78)
7.00 (3.00, 9.00)
6.58(5.24)
8.00 (3.00, 11.00)

Proceed
850 -0.06(2.90)
0.00 (-1.00, 1.00)
2.17(5.40)
2.00 (-1.00, 6.00)
2.11(5.31)
2.00 (-2.00, 6.00)
✂️
Shortening
670 -0.53(3.63)
0.00 (-3.00, 2.00)
6.23(4.63)
6.50 (3.00, 9.00)
5.70(5.00)
7.00 (2.00, 10.00)
🚗
Traffic
645 0.33(2.38)
0.00 (0.00, 1.00)
4.76(4.92)
5.00 (1.00, 8.00)
5.09(4.85)
6.00 (2.00, 9.00)
🚌💨
Overcome
58 0.84(2.50)
0.00 (0.00, 2.00)
-0.40(5.68)
-2.00 (-4.00, 2.75)
0.45(5.58)
0.00 (-3.75, 4.75)
💥
Accident
48 0.50(2.57)
0.00 (0.00, 1.00)
4.08(5.29)
5.00 (-0.25, 8.00)
4.58(5.31)
5.00 (1.00, 9.00)
⏱️
Control driving
10 -2.10(3.03)
0.00 (-4.25, 0.00)
-3.30(6.98)
-4.00 (-8.25, -1.25)
-5.40(5.80)
-6.00 (-9.75, -3.50)

Schedule warning
0 NaN(NA)
NA (NA, NA)
NaN(NA)
NA (NA, NA)
NaN(NA)
NA (NA, NA)
🔒
Reserved
2 -2.00(2.83)
-2.00 (-3.00, -1.00)
4.00(2.83)
4.00 (3.00, 5.00)
2.00(0.00)
2.00 (2.00, 2.00)

Relating messages and events

Considering messages and events for 01/01 > 31/05 period (withOUT outliers)

Message N Departure delay Travel time diff Arrival delay
- - Mean (SD)
Median (IQR)
Mean (SD)
Median (IQR)
Mean (SD)
Median (IQR)
🙆
Full
6621 0.35(2.56)
0.00 (0.00, 1.00)
3.38(5.15)
4.00 (0.00, 7.00)
3.72(5.21)
4.00 (0.00, 8.00)
🎌
Flags
2937 0.38(4.45)
0.00 (-3.00, 3.00)
5.76(4.84)
6.00 (3.00, 9.00)
6.14(5.47)
8.00 (3.00, 11.00)

Proceed
2606 -0.06(3.00)
0.00 (-1.00, 1.00)
2.10(5.36)
2.00 (-1.00, 6.00)
2.05(5.43)
2.00 (-2.00, 6.00)
✂️
Shortening
2043 -0.31(3.69)
0.00 (-2.00, 2.00)
5.85(4.81)
6.00 (3.00, 9.00)
5.54(5.18)
6.00 (2.00, 10.00)
🚗
Traffic
1782 0.36(2.39)
0.00 (0.00, 1.00)
4.55(4.90)
5.00 (2.00, 8.00)
4.91(4.96)
5.50 (2.00, 9.00)
🚌💨
Overcome
173 0.41(2.42)
0.00 (0.00, 1.00)
-0.31(6.05)
-1.00 (-4.00, 4.00)
0.10(5.98)
0.00 (-4.00, 4.00)
💥
Accident
167 0.50(2.31)
0.00 (0.00, 1.00)
3.54(5.21)
4.00 (-0.50, 8.00)
4.04(5.36)
5.00 (0.00, 8.00)

Schedule warning
61 -0.43(5.02)
0.00 (-4.00, 3.00)
3.43(4.79)
3.00 (0.00, 6.00)
3.00(5.33)
3.00 (0.00, 7.00)
⏱️
Control driving
52 -0.71(1.85)
0.00 (-1.00, 0.00)
-4.25(5.85)
-5.00 (-9.00, -2.00)
-4.96(5.44)
-5.00 (-9.00, -3.00)
🔒
Reserved
4 0.50(4.12)
0.00 (-1.00, 1.50)
4.00(2.31)
4.00 (2.00, 6.00)
4.50(5.00)
2.00 (2.00, 4.50)

Relating messages and events

Considering messages and events for 01/01 > 31/05 period (with outliers)

Message N Departure delay Travel time diff Arrival delay
- - Mean (SD)
Median (IQR)
Mean (SD)
Median (IQR)
Mean (SD)
Median (IQR)
🙆
Full
10066 1.96(45.44)
0.00 (0.00, 2.00)
12.82(44.09)
7.00 (2.00, 13.00)
14.78(62.13)
7.00 (2.00, 14.00)
🎌
Flags
6788 1.87(37.75)
1.00 (0.00, 5.00)
18.82(64.55)
10.00 (6.00, 15.00)
20.68(73.58)
13.00 (8.00, 18.00)
✂️
Shortening
5885 -0.30(34.49)
0.00 (-2.00, 3.00)
17.63(61.65)
10.00 (5.00, 15.00)
17.33(68.51)
11.00 (4.00, 17.00)

Proceed
3850 0.10(55.41)
0.00 (-1.00, 2.00)
21.21(91.42)
4.00 (0.00, 11.00)
21.31(100.99)
4.00 (0.00, 12.00)
🚗
Traffic
3147 1.67(50.76)
0.00 (0.00, 2.00)
16.40(44.58)
9.00 (4.00, 16.00)
18.06(67.94)
10.00 (4.00, 18.00)
💥
Accident
271 13.61(94.48)
0.00 (0.00, 2.00)
12.22(30.90)
8.00 (2.00, 14.00)
25.83(103.92)
8.00 (2.00, 16.50)
🚌💨
Overcome
217 0.56(2.38)
0.00 (0.00, 1.00)
4.07(45.11)
0.00 (-4.00, 6.00)
4.64(45.16)
1.00 (-3.00, 6.00)

Schedule warning
106 0.21(5.87)
0.00 (-3.00, 4.00)
27.04(48.70)
6.50 (2.00, 16.75)
27.25(48.63)
8.50 (2.00, 21.75)
⏱️
Control driving
72 -0.69(2.74)
0.00 (-1.00, 0.00)
-1.43(10.54)
-4.00 (-9.00, 2.50)
-2.12(10.62)
-5.00 (-9.00, 1.75)
🔒
Reserved
29 9.41(48.72)
0.00 (-1.00, 4.00)
26.76(52.72)
13.00 (6.00, 18.00)
36.17(72.78)
15.00 (4.00, 20.00)

Discussion (EDA)

  • Messages allow to improve resources allocation
  • Messages complement events

Messages allow to improve resources allocation

🙆 Heatmap of bus full (by drivers)

Useful for network and schedule planning

Messages allow to improve resources allocation

🙆 Heatmap of bus full (by drivers)

🚧 Heatmap of road blocked messages (by drivers)

Useful for traffic controllers allocation

Messages allow to improve resources allocation

🙆 Heatmap of bus full (by drivers)

🚧 Heatmap of road blocked messages (by drivers)

🚨 Heatmap of ticket fraud (by drivers)

Useful for fraud controllers allocation

Messages complement events

Drivers’ inputs can complement the data-driven analysis

🚌🚌 Bus bunching example

% of bus bunching per stop (data-driven)

Heatmap of bus bunching messages (by drivers)

Messages complement events

Relating messages with trips can help identifying patterns and relations

  • Despite departing on time (medians close to 0), there is a tendency to have a delayed arrival on the occurrence of 🚗 traffic (Mdn = 6.0), 💥 accidents (Mdn = 5.00) or 🙆 crowding (Mdn = 4.00);
Message N Departure delay Travel time diff Arrival delay
🙆
Full
1939 0.31(2.72)
0.00 (0.00, 1.00)
3.65(5.08)
4.00 (0.00, 7.00)
3.96(5.14)
4.00 (0.00, 8.00)
🚗
Traffic
645 0.33(2.38)
0.00 (0.00, 1.00)
4.76(4.92)
5.00 (1.00, 8.00)
5.09(4.85)
6.00 (2.00, 9.00)
💥
Accident
48 0.50(2.57)
0.00 (0.00, 1.00)
4.08(5.29)
5.00 (-0.25, 8.00)
4.58(5.31)
5.00 (1.00, 9.00)

Messages complement events

Relating messages with trips can help identifying patterns and relations

  • Delayed arrivals are also associated to ✂️ shortenings (Mdn = 7.00), an operational response to mitigate their impact on the following services;
Message N Departure delay Travel time diff Arrival delay
✂️
Shortening
670 -0.53(3.63)
0.00 (-3.00, 2.00)
6.23(4.63)
6.50 (3.00, 9.00)
5.70(5.00)
7.00 (2.00, 10.00)

Messages complement events

Relating messages with trips can help identifying patterns and relations

  • Instructions to 🚌💨 overcome seem to be a good operational approach to recover schedule regularity, due to its negative travel time difference (Mdn = -2.00) as well as near-zero arrival delay (Mdn = 0.00);
Message N Departure delay Travel time diff Arrival delay
🚌💨
Overcome
58 0.84(2.50)
0.00 (0.00, 2.00)
-0.40(5.68)
-2.00 (-4.00, 2.75)
0.45(5.58)
0.00 (-3.75, 4.75)

Messages complement events

Relating messages with trips can help identifying patterns and relations

  • Instructions to ⏩ proceed seem to have little connection to disruptions, due to the relatively low travel time difference (Mdn = 2.00) and arrival delay (Mdn = 2.00);
Message N Departure delay Travel time diff Arrival delay

Proceed
850 -0.06(2.90)
0.00 (-1.00, 1.00)
2.17(5.40)
2.00 (-1.00, 6.00)
2.11(5.31)
2.00 (-2.00, 6.00)

Messages complement events

Relating messages with trips can help identifying patterns and relations

  • Messages to ⏱️ control driving seem to have little impact, as these trips still present lower durations and earlier arrival times.
Message N Departure delay Travel time diff Arrival delay
⏱️
Control driving
10 -2.10(3.03)
0.00 (-4.25, 0.00)
-3.30(6.98)
-4.00 (-8.25, -1.25)
-5.40(5.80)
-6.00 (-9.75, -3.50)

Next steps

Next steps

  • Execute EDA on messages and events
  • Write methodology and results of EDA
  • Write discussion relating quantitative and qualitative analysis
  • Improve case study description
  • Improve literature review
  • Write conclusion
  • Write introduction
  • Write abstract
  • Submit to GET 2026?

References

References

Boothe, Andy. 2023. “Sigpwned/Popular-Names-by-Country-Dataset.” https://github.com/sigpwned/popular-names-by-country-dataset.
Hu, Wen Xun, and Amer Shalaby. 2017. “Use of Automated Vehicle Location Data for Route- and Segment-Level Analyses of Bus Route Reliability and Speed.” Transportation Research Record 2649 (1): 9–19. https://doi.org/10.3141/2649-02.
Nguyen, Kien, Jingyun Yang, Yijun Lin, Jianfa Lin, Yao-Yi Chiang, and Cyrus Shahabi. 2018. “Los Angeles Metro Bus Data Analysis Using GPS Trajectory and Schedule Data (Demo Paper).” In Proceedings of the 26th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, 560–63. SIGSPATIAL ’18. New York, NY, USA: Association for Computing Machinery. https://doi.org/10.1145/3274895.3274911.
Rinker, Tyler W. 2017. lemmar: Dictionary Based Lemmatization.” Buffalo, New York. http://github.com/trinker/lemmar.
stopwords-iso. 2016. “Stopwords-Iso/Stopwords-Pt.” Stopwords ISO. https://github.com/stopwords-iso/stopwords-pt.
Valença, Gabriel, Filipe Moura, and Ana Morais de Sá. 2023. “How Can We Develop Road Space Allocation Solutions for Smart Cities Using Emerging Information Technologies? A Review Using Text Mining.” International Journal of Information Management Data Insights 3 (1): 100150. https://doi.org/10.1016/j.jjimei.2022.100150.
Wickham, Hadley. 2016. Ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York. https://ggplot2.tidyverse.org.