Big Data in Transportation and Urban Sectors
(See publications in this area here)
My interest in Big Data analytics in the mobility and urban/city space is primarily in the areas of knowledge discovery, prediction and forecasting, and in the development of information quality, uncertainty and reliability metrics for improved management and decision-support. I am also interested in the questions of understanding and predicting future technology impacts, barriers to and variations in technology adoption, and social and organizational learning and preparedness in adopting ICT-based mobility strategies. Equally challenging is understanding the long-term consequences of living in a ubiquitous information environment. Socially just and inclusive travel strategies of the future may require the use of innovative technologies to address the needs of marginalized communities.
My research in this area has been funded by several sources such as:
- National Science Foundation: Big Data and Urban Informatics: e-Infrastructure for Social Science Research on Sustainable Urban Systems (2012-2014) – Principal Investigator (PI) until Jan 2013 when PI-ship was transferred due to move overseas to the UK;
- Institute for Policy and Civic Engagement: Urban Informatics, Big Data and Community Engagement: A Study of Enabling Organizations (2012-2013) – PI;
- Illinois Department of Transportation: Next Generation Intermodal Passenger Transportation System (2011-2013) – PI;
- US Department of Transportation’s Federal Highway Administration:
Locational Privacy: Legal and Policy Options (2008-2010) – PI;
- NAVTEQ LLC (currently Nokia Location and Commerce): Design of Special Events, Weather and Construction Delay Component of a Dynamic Traveler Information System (2007-2008) – PI;
- National Science Foundation: IGERT: Integrative Graduate Program in Computational Transportation Science (2005-2010) – Co-PI;
- US Department of Transportation’s Federal Highway Administration:
System-Wide Information for Transportation Assessment and Research (2005-2008) – PI;
- US Department of Transportation’s Bureau of Transportation Statistics:
Development, Testing and Evaluation of Intelligent Databases for Motor Carrier Safety (2000-2001) – PI;
- City of Chicago Department of Transportation: Evaluation of RT-TRACS Adaptive Signal Control (2000-2001) – PI;
- National Science Foundation’s (NSF) through the National Institute of Statistical Sciences (NISS): Transportation Infrastructure and Network Modeling (1996-2000) – Reseacher;
- National Science Foundation (through the National Institute of Statistical Sciences) (NSF DMS # 9313013): Measurement, Modeling and Prediction for Infrastructural Systems (1993-1999) – Researcher;
- National Science Foundation: Fellows for Cross-Disciplinary Research in Statistics (1993-1999) – Researcher;
- US Department of Transportation’s Federal Highway Administration (through the Illinois Department of Transportation): Advanced Driver and Vehicle Advisory Navigation Concept (ADVANCE) (1992-1999) – Researcher.
Mining, Discovering and Forecasting Mobility Patterns
The transportation sector, with instrumented cars, highways, transit vehicles, control systems and pedestrians or bicyclists using smartphones and other mobile devices, is a major contributor to Big Data in cities. However, sensors measure current (demand, use intensity and other) conditions. To give people information on when they will arrive somewhere or which way they should drive or walk in order to get their destination close to their desired time, or to inform how early they should leave so as to be on time, and so on, there is fundamentally a prediction problem. This is because congestion conditions will continue to change as time evolves and unexpected factors such as road incidents or bad weather may make travel conditions atypical.
One line of my work in this area deals with forecasting future travel times on highway segments for the purpose of computing shortest paths for drivers desiring to go from point A to point B through congested road networks. Other applications include identification of short-term (future) traffic hot-spots, bottlenecks in roads and spatio-temporal knowledge discovery of where “freeway breakdowns” occur (ie, when and where traffic demand exceeds network capacity so that traffic slows down and queues starts to form), duration of delay conditions and so on, for more efficient travel and dynamic resource management of transportation networks. This research stream has enabled us to look at forecasting and prediction models; data mining and knowledge discovery of interesting transportation patterns; approaches for information fusion, updating and change detection; quantification of information quality and uncertainty; and designs and simulation models for impact evaluation.
Research questions we have examined are:
Travel Time/Speed Prediction and Knowledge Discovery for Traffic Management: What methods can be used to predict sensor-based travel times/speeds a short time horizon (say, 5-minutes) into the future? We have looked at problems of freeway breakdowns, variability in speeds given different weather and incident conditions and other aspects to discovering patterns in transportation systems. My work has mostly dealt with data from road-based sensors and GPS data aggregated over time windows of observation (say 5 minutes). Our methods include time-series forecasting and statistical learning, and more recently, machine learning including Random Forests and Support Vector Regression.
Measurement Errors: What types of measurement errors are likely to arise from sensing and how detrimental can those be and are there model-based approaches to address these problems? Focusing on missing values and models of missing data, we were able to show that data imputation techniques, while very important to do so as to avoid case deletion, loss of efficiency and so on, can be challenging to do in real-world transportation databases and unless multivariate missingness rates are very high, simple approaches may not be too far off from what more complicated simulation-based approaches can give. I have also looked at the overall problem of data quality in User-Generated Content, particularly in opportunistic sensing systems, although this work is very preliminary.
Sample Sizes and Other Aspects of Information Uncertainty: What sample sizes of instrumented mobile device measurements are essential for meaningful forecasts, in terms of information reliability? Unless there are infrastructure-based sensors in place at a particular point, the number of “live” measurements of traffic patterns would be affected by overall technology adoption/deployment rates, spatial distribution of mobile sensing devices over transport network at a point in time, people not “opting into” sensing systems due to locational privacy concerns, unwillingness to participate in sensing for other reasons for lack of incentives and so on. However, it turns out that there is an empirical “limit” to how much information you can gain from increasing sample size. Therefore it may be possible to have high-quality results from fairly low adoption rates.
Profiles of Transport Networks: Following the above line of questions, we explored the idea that “real-time” traffic sensing levels necessary for accurate, network-wide travel time and speed information may not be available at desired levels of accuracy everywhere and at all times. Traffic patterns exhibit a surprisingly high degree of repeatability: seasonality, special events, and diurnal patterns (morning and evening rush hour and so on) and even short-lived periodicities such as those due to traffic signals. With the massive amounts of sensor readings (combination of infrastructure-based sensor data and data from mobile devices) collected at the same location and over multiple years, we can build databases with average traffic behavior and “unusual” traffic behavior relating to different events and conditions, for the same time-of-day, day-of-week, season, special event condition and so on. These can be used in place of live measurements when there is no one at a location with a sensor, fixed sensors are unavailable or malfunctioning and so on. But how well do these estimates reflect that which is experienced by a random driver going through the road segment? Pretty well, actually. Ultimately, we called these archived-data driven estimates “profiles” of network segments.
Change Detection and Information Updating: One line of work has been on multi-sensor fusion and change detection. We attempted to detect changes in the above “profiles” in an autonomous way, ie, when are current patterns of traffic sufficiently different from the aforementioned (very-large sample) profiles so as to warrant changing the profile? We used a likelihood ratio test as a way to constantly check to ensure whether oncoming data feeds of information changes the underlying distribution of the profiles. Moreover, we adopted simple Bayesian rules to update information distributions, where necessary.
Selecting Cases for Model Development and Design of Experiments: We have all heard that the volume, velocity and variety of Big Data will only increase. But for many model development purposes, we do not necessarily need all the data – a well-selected subset will be sufficient in some cases. There is also the case of selecting a training set of data and a test set. However, when data are correlated over time and space, as in the case of transportation network data, randomly selecting data points may not be useful in order to develop and test models that address spatio-temporal dynamics. We have done some early work as a part of our research on predicting speeds under inclement weather where we assess the impact of inaccuracy in weather forecasts using a sampling approach that preserves correlations among observations in multiple time points. This line of research clearly needs more work and better-developed design for improved experimentation.
Effect of Information Certainty in Multiple Sectors in a Smart and Connected City Context: There is a lot of excitement about connecting different sectors of cities (transportation, energy, utility, water management) under the banner of “Smart and Connected Cities”. This is clearly important. But the fact is that all these sectors have dynamic behaviors with their own internal forecast errors and so on, and by connecting them, we may have a situation where is propagation of errors, magnified uncertainty and so on. In our simulations of weather-responsive traffic management, we found that in bad weather conditions (with high levels of precipitation), in order to make accurate speed forecasts, it is better not to have weather data (precipitation forecasts) at all rather than forecasts which have inaccuracy levels that are higher than a certain threshold. I am interested in exploring information uncertainties as they propagate across sectors when we connect them together and their methodological and management implications.
Evaluation Simulations and Tools: There are many specialized models and tools to evaluate the impacts of ICT-based technology in transportation such as those which allow evaluation on the effects on traffic patterns, traveler behavior and so on. In early work, I used a Stochastic User Equilibrium model to evaluate the effect of information uncertainty on individual’s route choices and to test the effectiveness of car navigation systems. Later work focused on a design and analysis of computer experiments framework with the use of a traffic microsimulation model, CORSIM, to evaluate the effects of multiple real-time traffic control strategies.
Social, Organizational and Behavioral Questions in Mobility Intelligence
Habits, Attitudes and Behavioral Aspects of Sustainable Mobility Use: In order for Big Data to facilitate sustainable travel behavior, it is not enough to have information on travel options and mobility choices; there must also be ways to address people’s habits relating to the use of un-sustainable modes of transportation such as solo driving, negative attitudes that people have to using public or shared modes of transportation and so on. We have been looking at this issue primarily by survey-based approaches to understanding habits, attitudes and behavioral intentions within the framework of the Theory of Planned Behavior and the positive psychological effect that travel information can have on sustainable travel outcomes such as the choice of public transportation as the mode of travel. This line of work has implications for recommender systems and persuasive technologies.
Technology Adoption Trends: I have examined this question in various ways: in a recent study, using a massive individual-level longitudinal (annual) dataset spanning 3 cohorts of Americans starting from the mid-1960’s (consisting of over 32,000 individuals), I modeled, using econometric techniques, how after controlling for sociodemographics, personal life, locational, and labor market conditions, car-ownership among young Americans has changed over over a 40-year period, from the mid-1960s to the mid-2000s. More recently, I have examined how traveler’s use of public transportation changes differentially over time as a result of real-time transit arrival information. We make the point that this is because the use of ICT technologies have to be learned, experiences shared, and factors such as peer influences, acceptance, use levels and feedback builds up over time. Additionally, the technology itself may improve over time in terms of usability, availability and information quality. This may partly determine why ridership effects in response to real-time travel information may grow over time and not happen all at once or at the same rate over time.
Spatially-Targeted Market Segmentation for Location-Based Services: Adoption rates are also likely to vary over different parts of a city. Our recent work involved creating an index of “digital savviness” to predict where use of public transport ICT technologies are likely to be higher. Using this and other indices, we used clustering methods to divide the City of Chicago into several areas with markets for potentially different types of LBS; for example, sub-markets with digitally-savvy residents versus residents who are more likely to be experiencing digital divide.
Consumer Awareness and Policy Studies of Locational Privacy: The use of social media and location-based information has become so ubiquitous and necessary that consumers often do not take the time to fully understand what is happening to the information they generate; and even if they try to understand, issues relating to data privacy and information security are so complex and sometimes hidden, that they cannot/do not follow through. While Privacy Enhancing Technologies (PETS) and privacy-by-design are active areas of research, we have been focusing on the policy and consumer awareness side. We have evaluated privacy policies given by information service providers using content analysis and found that overall average reading grade of the privacy policies studied is roughly that of a sophomore to a junior in college, while the average reading level of U.S. adults is between the 8th and 9th grades! Further, there is lack of consistent guidance related to all aspects of data privacy in the mobile environment, including notification/awareness, choice/consent, access/participation, integrity/security, and enforcement/redress. Our most recent work focuses on locational privacy issue from an economic perspective; using econometric methods on primary (stated preference) survey data, we attempt to estimate privacy-utility-risk trade-offs of mobile service users with respect to locational privacy.
Digital Citizenship and Mobile Digital Divide: Although I have not done much in this area as yet, I speculated on these topics in my forthcoming book with Glenn Geers Transportation and Information: Trends in Technology and Policy and would like to follow up with empirical studies. Digital citizenship has been defined as expected behaviors with regard to technology use and in my opinion includes (i) etiquette relating to device use in public places, distracted driving and etiquette relating to behaviors in the online world and participatory sensing systems; (ii) digital literacy relating to information access and use; comprehension of implications of use; skills for protecting safety and security and valuation of benefits versus risks; and comprehension of cyberspace norms. Mobile digital divide is an interesting concept – whereas generally understood to be about inequality of access to information technology and the Internet, there are already great differences in the transportation services available in different locations, socio-demographic groups and lifecycle stages. Therefore, for the digital mobile environment to have geographical and social inequality consequences, lack of access and availability of digital technologies have to magnify these already-present inequality transportation effects. Hence, identifying the net inequalities attributable to the mobile digital divide promises to be challenging.
Role of Digital Civic Entrepreneurs of “Digital Informediaries” in pushing out Big Data/Open Data to Citizens: This study examines the role of “digital infomediaries” or enabling organizations in our increasingly ubiquitous information society. Enabling organizations can be two types: technology/ICT companies or they could be community data organizations where the leadership is composed of ICT-oriented social entrepreneurs (who we call “digital civic entrepreneurs”). ICT-oriented technology companies have played, through software, web services, Web 2.0 and social media and smart city solutions, a highly visible role in community engagement, ways to use and manipulate information, formation of social networks and ways to connect with others in the community. Another group of enabling organizations, community data organizations, have openness of information, engagement and participation as explicit goals. Such organizations typically attempt to directly involve citizens in experiencing, engaging and collaborating in the city. Such organizations may very well consist of one or two persons with technology skills who are passionate about specific aspects of cities. They are likely to be capitalizing on recent open data policies, Application Programming Interface (API’s) being used by governmental and other resources to tap into data, open source technology and social media/Web 2.0 technologies. More details on this ongoing project may be found here
End Note: People, travelers, drivers have a lot of very good local knowledge about travel and traffic conditions which can be difficult for analysts to beat. So while it is true that mobility intelligence derived from Big Data analytics can give us novel new understanding of mobility behaviors, understanding the results in terms of traffic and transportation dynamics, traveler behavior and why things happen the way they do would be critical for knowledge discovery and application development. Finding robust solutions to the social, behavioral, legal and organizational questions involved will be critical for Big Data analytics to have meaningful impact towards sustainable transportation outcomes. Cross-fertilization of knowledge between the computing disciplines and the “mobility disciplines” (transportation planning, engineering, economics, geography, management etc) would be key for these reasons (see the Transportation Research Board’s Joint Subcommittee on
Computational Transportation and Society).
Selected Publications of Interest
- Thakuriah, P. and G. Geers (forthcoming). Transportation and Information: Trends in Technology and Policy. Springer, New York.
- Thakuriah, P. and N. Tilahun (Forthcoming). Incorporating Weather Information into Real-Time Speed Estimates: Comparison of Alternative Models. Forthcoming in Journal of Transportation Engineering. doi: http://dx.doi.org/10.1061/(ASCE)TE.1943-5436.0000506
- Tang, L. and P. Thakuriah (2012). Ridership Effects of CTA Bus Tracker System. In Transportation Research – Part C: Emerging Technologies, Vol. 22, pp. 146-161.
- Thakuriah, P., L. Tang and W. Vassilakis (2012). An Assessment of Temporal and Spatial Effects of Bus Arrival Time Information and Implications for Spatially Targeted Locaton-Based Services. Proc. 2012 Transportation Research Board Annual Conference.
- Thakuriah, P., G. Geers and S. Liang (eds.) (2011). Proceedings of 4th ACM SIGSPATIAL International Workshop on Computational Transportation Science. Held in conjunction with ACM SIGSPATIAL GIS 2011. ACM Press, New York.
- Tang, L. and P. Thakuriah (2011). Will the Psychological Effects of Real-time Transit Information Systems Lead to Ridership Gain? In Transportation Research Record, Journal of the Transportation Research Board . No. 2216, pp. 67-74.
- Thakuriah, P., L. Tang and W. Vassilakis (2011). Spatio-Temporal Effects of Real-Time Bus Arrival Time Information. In Proc. of Association of Computing Machinery (ACM) SIGSPATIAL GIS 2011 International Workshop on Computational Transportation Science , pp. 6-11.
- Cottrill, C. and P. Thakuriah (2011). Protecting Location Privacy. Policy Evaluation. In Transportation Research Record, Journal of the Transportation Research Board, No. 2215, pp. 67-74.
- Thakuriah, P. and N. Tilahun (2010). Using Real-Time Weather Information in Traveler Information Systems and Location-Based Services: A Statistical Learning Application Under Alternative Experimental Conditions. In Proc. ACM SIGSPATIAL GIS 2010.
- Thakuriah, P. (2010). Evaluation of Alternative Data Imputation Strategies: A Case Study of Motor Carrier Safety Data. In Transportation Letters: The International Journal of Transportation Research , Vol. 2, Issue 3, pp. 199-216.
- Lee, J. and P. Thakuriah (2004). Probabilistic Linkage of Commercial Motor Vehicle and Carrier Data. In the Journal of Transportation Research Forum, Vol. 43, No. 2, Fall, pp. 37-52.
- Sacks, J., N. Rouphail, B. Park and P. Thakuriah. (2002). Statistically-Based Validation of Computer Models in Traffic Operations and Management. In Journal of Transportation and Statistics, Vol. 5, Issue 1, pp. 1-15.
- Thakuriah, P., A. Sen and A. Karr. (1999). Probe-Based Surveillance for Travel Time Information in ITS. In Behavioral and Network Impacts of Driver Information Systems. Edited by Richard Emmerink and Peter Nijkamp. Ashgate Publishing Ltd, England, pp. 393- 425.
- Sen, A., P. Thakuriah, X. Zhu and A. Karr. (1999). Variances of Link Travel Time Estimates: Implications for Optimal Routes. In International Transactions in Operational Research, the Journal of the International Federation of Operations Research Societies, Vol. 6, pp. 75-87.
- Sen, A., S. Soot, P. Thakuriah, H. Condie. (1998). Estimation of Static Travel Times in a Dynamic Route Guidance System – II. In Mathematical and Computer Modelling, Vol. 27, No. 9-11, pp. 67-85.
- Sen, A., P. Thakuriah, X. Zhu and A. Karr. (1997). Frequency of Probe Reports and Variance of Travel Time Estimates. In Journal of Transportation Engineering, Vol. 123, No. 4, July/August, pp. 290-297.
- Thakuriah, P. and A. Sen (1996). Quality of Information given by Advanced Traveler Information Systems. In Transportation Research Part C: Emerging Technologies, Vol. 4, No. 5, pp. 249-266
- Sen, A. and P. Thakuriah (1995). Estimation of Static Travel Times in a Dynamic Route Guidance System. In Mathematical and Computer Modelling, Vol. 22, No. 4-7, pp. 83-101.(See other publications in this area here)