Abstract: Since the advent of pervasive digital technologies in our daily lives, we are leaving an increasing amount of digital traces, including cellphone data, health records, public transportation trajectories, credit card transactions, and connected cars' communications. Each dataset, although anonymized, tells a story from different aspects of human life and can be used to leverage businesses in specific dimensions, such as optimizing mobility system, predicting economic growth, and forecasting customer purchases. However, these efforts are contained to the limit of each individual dataset. Since the identifiable information are anonymized, there is not any common field or variable to allow the information flow amongst datasets in order to fuse and merge datasets. On the other hand, fusing data at individual level from different sources can provide valuable new insights of human behaviors. This helps to offer more effective and useful products and services that mutually benefit businesses and customers. Moreover, data fusion enables detailed impact assessments and cross-sell analyses. In this talk, I introduce a novel paradigm to combine multiple anonymized datasets through pattern recognition and statistical learning techniques. This data fusion technique is based on a fundamental concept: although people’s identities are fully anonymized, the environment that they are interacting with is not anonymized. Thus, it makes it possible to generate new meta-informations from anonymized individual trajectories and allow information from multiple sources complement and enrich each other without compromising people’s privacy. These linked datasets establish a collective knowledge platform that helps to build solutions and make informed decisions. Finally, I touch upon the serious privacy concerns that can be raised in several different contexts. What if one of the datasets contains identifiable information? This may allow for de-identification of other anonymized datasets and cause a privacy breach. This way of de-anonymization not only challenges the current anonymization techniques and policies that relies on a single-dataset information but also warns on the unpredictable consequence of publishing de-identified data. This issue urges for development of new security and privacy policies as well as new privacy-guaranteed way of interacting with data.
Bio: Behrooz Hashemian, Ph.D., is a researcher and chief data officer at Massachusetts Institute of Technology (MIT), Senseable City Lab. He investigates the innovative implementation of big data analytics and artificial intelligence in smart cities, finance, and healthcare. He is a data scientist with expertise in developing predictive analytics strategies, machine learning solutions, and data-driven platforms for informed decision making. His work endeavors to bridge the gap between academic research and industrial deployment of big data analytics and artificial intelligence. Dr. Hashemian leads an unprecedented project on anonymized data fusion, which provides a multidimensional insight into urban activities and customer behaviors from multiple sources.