Smart city big data analytics: An advanced review

With the increasing role of ICT in enabling and supporting smart cities, the demand for big data analytics solutions is increasing. Various artificial intelligence, data mining, machine learning and statistical analysis‐based solutions have been successfully applied in thematic domains like climate science, energy management, transport, air quality management and weather pattern analysis. In this paper, we present a systematic review of the literature on smart city big data analytics. We have searched a number of different repositories using specific keywords and followed a structured data mining methodology for selecting material for the review. We have also performed a technological and thematic analysis of the shortlisted literature, identified various data mining/machine learning techniques and presented the results. Based on this analysis we also present a classification model that studies four aspects of research in this domain. These include data models, computing models, security and privacy aspects and major market drivers in the smart cities domain. Moreover, we present a gap analysis and identify future directions for research. For the thematic analysis we identified the themes smart city governance, economy, environment, transport and energy. We present the major challenges in these themes, the major research work done in the field of data analytics to address these challenges and future research directions.


| INTRODUCTION
Various statistical, machine learning and data mining techniques have proved to be revolutionary with respect to Smart City Data Analytics. Deep learning models are one of the most cutting-edge technologies that are introduced for various applications for smart city services (Mohammadi, Al-Fuqaha, & Oh, 2018;Mohammadi, Al-Fuqaha, Sorour, & Guizani, 2018;Zaouali, Rekik, & Bouallegue, 2018). Deep Neural Networks (DNNs), Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), Long Short Term Memory (LSTM), Autoencoders (AEs), Generative Adversarial Networks (GANs), and Deep Belief Network (DBNs) are widely known Deep Learning models that are currently being investigated for various Smart City Applications including Energy consumption prediction, behavior detection, security threat identification, traffic sign detection, face recognition and human activity recognition (Mohammadi, Al-Fuqaha, Sorour, & Guizani, 2018). Similarly learning from web data is crucial for many applications in smart cities and several deep learning approaches have been proposed recently based on deep probabilistic framework (Niu, Tang, Veeraraghavan, & Sabharwal, 2018) to learn from web data.
According to the Deloitte Consulting Group ( van Dijk & Teuben, 2015): "A city is smart when investments in (i) human and social capital, (ii) traditional infrastructure and (iii) disruptive technologies fuel sustainable economic growth and a high quality of life, with a wise management of natural resources, through participatory governance." Therefore, the key elements of a smart city, according to Deloitte, are: • its human and social capital, • its traditional infrastructure, and • disruptive technologies such as Artificial Intelligence, machine learning, data mining, augmented reality, distributed ledger technologies, IoTs, etc.
Investment in the nexus of these three elements leads to certain benefits: • sustainable economic growth, • high quality of life, and • judicious use and management of natural resources resulting in a reduced ecological footprint.
• prediction of future scenarios for sustainable cities The nature of these key elements and benefits and their complex inter-relationships is such that smart solutions are multisectoral by necessity. These span domains such as smart mobility, smart safety, smart energy, smart water and waste, smart health, smart education and smart government etc. One of the major transformations that a smart city brings is in the sharing of data and information between these various domains. Traditionally these areas have acted as information silos, with policymaking in each being restricted to the confines of the respective area. In smart cities these barriers are expected to be broken, and information flows freely between them with policy-making being an integrated and holistic process. The resultant ICT solutions are thus multi-faceted and complex by nature and heavily rely on AI, data mining, statistical and machine learning methods to generate new information and knowledge from multi-disciplinary data sets. A more detailed list of the goals, challenges, and smart domains is shown in Table 1. This multi-faceted and challenging nature of smart cities provide potent opportunities for the data mining and knowledge management communities to test existing models and algorithms as well as design new ones. Table 2 shows the various disruptive technologies used in the context of smart cities. These disruptive technologies are causing the very nature of business-as-usual in cities to transform by informing evidence-based decision-making (MGI, 2013). For example, by providing new methods and paradigms for communication, Internet of Everything is transforming domains from transport and energy management to waste management (enerquire, 2017;Treadwell, 2017). Blockchain, on the other T A B L E 1 Smart city goals, challenges and domains (van Dijk & Teuben, 2015) Goals Challenges Domains Economic growth hand, is disrupting how global economies function and do business securely and transparently. However, since in this paper we are focusing on big data analytics in smart cities, we will cover only that in subsequent sections. In this paper we present an overview of how big data analytics can help enable smart cities. Our specific focus is on smart city challenges that can be addressed by big data analytics and in this regard we also review past and future trends. This will provide a holistic multi-disciplinary view of challenges faced by the smart cities communities where the data mining and knowledge management communities can contribute. The rest of this paper is organized as follows. We begin by introducing our research methodology, including inclusion and exclusion criteria, search engines, data mining techniques, etc. We then present an introduction of big data technologies followed by how big data technologies are used in smart cities. This includes a description of the tools and techniques used as well as a description of various thematic areas in smart cities. We then conclude our review and draw some conclusions.

| RESEARCH METHODOLOGY
It is important for a structured literature review to follow a consistent, reproducible methodology. There is a wealth of literature available on subjects such as data analytics, smart cities, big data and big data analytics. As this review focuses on smart city data analytics, we also restricted our search to this subject. We also adopted a data mining-based approach for sifting through and analyzing the literature (Haneem, Kama, Ali, & Selamat, 2017). We queried a number of repositories such as ACM, arXiv, DBLP, Google Scholar, IEEExplore, Inspire, JSTOR, ScienceDirect and Scopus. Search terms included smart city big data analytics and smart city analytics. We searched these repositories using the terms provided, and initially selected 100 most recent papers based on their titles and abstracts. We further filtered these papers by going through their abstracts, introductions and conclusions and short listed about 66 papers. We then created a wordcloud based on the abstracts of these 66 selected papers, shown in Figure 1. Since words like smart, analytics, big, data and cities were part of our search terms, it is no surprise that they occur quite frequently in the abstracts of the filtered papers. Therefore, we can safely ignore them for the purposes of our analysis. Of the remaining frequently-occurring words, economic, governance, energy, etc. are significant. We arranged such significant words into a taxonomy by folding them into common themes, shown in Figure 2. The leaf nodes depict terms identified by the wordcloud while the parent nodes depict the common themes we have identified in our analysis. For example, the words congestion, vehicle, and traffic all fall under the common theme of Transport. We also chose to study different aspects of those themes by using the other frequent keywords like data and computing.
Our analysis confirms what were suggested by Vinod Kumar (2015) and Chatterjee and Kar (2015) as the six most important themes of smart cities; smart economy, smart governance, smart mobility, smart environment, smart living and smart people. This work is different from Chauhan, Agarwal, and Kar (2016) in the way that we mainly focus on the current challenges present on the above mentioned five themes and contributions of big data analytics to solve them. Existing work focuses more on the big data characteristics of various themes. The specific aspects were also chosen on the basis of their relevance to big data analytics, for example, both the data and computing models directly influence how the analytics are applied and how effective they are. Privacy is an important concern in any contemporary software system and the recent focus on privacy legislation has made it further important. Similarly market drivers can act as an indication of the widespread adoption of the various technologies and allow us to ascertain their popularity. In the second phase we searched the above repository again with keywords specific to the identified themes, for example, city governance data analytics and city governance and data analysis for the governance theme, Smart City Transportation and Smart City Transportation and Big Data Analytics for the transport theme and again short listed a number of papers based on their titles and abstracts. Table 3 shows the search terms and repositories queried for each repository for each theme. After this we critically reviewed the papers for our analysis. We then studied the literature from a number of aspects, for example, what are the key trends currently in vogue in smart cities data analytics? Who are the major drivers (commercial, public, government, etc.) behind these trends? What is the major research done in this area within the past 5 to 10 years? What are the future challenges and directions? To identify the latest trends we also reviewed different reports by various organizations on smart cities. The results are chronicled in the following sections.

| BIG DATA ANALYTICS
"Big data analytics is the process of examining large and varied data sets-, that is, big data-to uncover hidden patterns, unknown correlations, market trends, customer preferences and other useful information that can help organizations make more-informed business decisions" (Rouse, 2017). In recent years big data has emerged as an important disruptive technology that has revolutionized how businesses operate. Recently Visa used big data analytics to reduce the time for processing 73 billion transactions from 1 month to 13 minutes (John Walker, 2014). Similarly big data analytics has brought about significant impact in many other domains such as climate change analysis, weather pattern analysis, real estate markets, etc. (Barkham, Bokhari, & Saiz, 2018;Faghmous & Vipin, 2014;Ismail, Majid, Zain, & Bakar, 2016).
Big data is characterized by the four Vs; Volume, Variety, Velocity and Veracity (Mohanty, 2015). Given these characteristics processing and analyzing big data in a reasonable amount of time is generally beyond the capabilities of a single computer system. This is why big data analytics necessitates the use of distributed processing techniques. In recent years several paradigms have emerged that harness the power of distributed computing. These include cluster computing (Ganelin, Orhian,  Sasaki, & York, 2016) and cloud computing (Khan, Anjum, Soomro, & Tahir, 2015) as well as fog and edge computing (Simmhan, 2018). These paradigms are proving to be instrumental in the success of big data analytics. We will cover different examples of these in subsequent sections. From a more technical point-of-view, big data analytics consists of a number of techniques that are used to analyze the data. These include, but are not limited to association rule learning, classification tree analysis, genetic algorithms, machine learning, regression analysis, sentiment analysis, social network analysis and deep learning (Stephenson, 2013). As with many other domains, big data analytics has also been applied in smart cities.

| BIG DATA ANALYTICS IN SMART CITIES
In the past few years big data analytics has emerged as an important disruptive technology in enabling the "smartness" of smart cities (Bassoo, Ramnarain-Seetohul, Hurbungs, Fowdur, & Beeharry, 2018). Other disruptive technologies such as Internet of Things (IoT) generate big data that needs to be processed and analyzed in short amounts of time to enable smart city governments to react to emerging patterns and trends in an efficient manner. The importance of big data analytics using AI, data mining, machine learning or statistical methods in smart cities cannot be stressed enough. Its benefits cut across any number of domains such as transportation planning, urban planning, smart buildings, weather prediction and analysis, etc. We conducted an extensive literature review of the various ways in which big data analytics are being applied to smart cities. Thematic and technological meta-analyses of this literature revealed a number of cross-cutting concerns that are common across the various themes we identified. We subsequently developed a classification model from these concerns, shown in Figure 2.
In the field of smart cities big data analytics is the key technology to enable evidence-based decision making, developing and providing new public services as well as allowing citizens to co-create new public services Dustdar, Nastic, & Scekic, 2016, etc.). Big data can also be used to monitor government initiatives (Malik, Sam, Hussain, & Abuarqoub, 2018) as well as provision of various user-related public services (Zhang, Zhang, Zhang, Shi, & Zhong, 2016). Monitoring and predicting air quality is another example of how big data can be used in smart cities on which there is a wealth of literature (GSMA, 2018;Zheng et al., 2015, etc.). Traditional Machine Learning approaches such as Support Vector Machines, Decision Tree, Nearest Neighbor classifier are still widely used approaches for the analysis of big data. For large datasets, Projection-SVM is a recent distributed implementation of kernel support vector using subspace partitioning (Singh & Mohan, 2018). Nguyen, Le, Nguyen, Phung, and Webb (2018) have investigated Bayesian kernel machines using Stein Variational Gradient Descent for Big Data analysis resulting in efficient and fast learning method. Bayesian approach is quite popular for some applications of smart city services. For example, recently a Bayesian approach to residential property valuation based on built environment and house characteristics is proposed by Liu et al. (2018). The proposed model is able to predict the main feature of submarket in house attribute similarity, location proximity, and substitutability. In another study, SVM is also used by Preda, Oprea, Bâra, and Belciu (Velicanu) (2018) for the forecasting of photovoltaic (PV) output for efficient renewable energy systems in Big Data Analytics Context. Support Vector Machine (SVM) and linear regression are investigated as machine learning approaches in this study and significant improvement in performance while introducing machine learning process. Iyengar, Lee, Irwin, Shenoy, and Weil (2018) use Bayesian inference which is a data-driven approach to identify the least energy efficient buildings in a city with large population. Bayesian inference is being used to estimate various parameters distribution of a building. A case study from a city containing more than 10,000 buildings show that more than half of the buildings required energy improvement measures.  have argued that most of the data generated by big data is unlabeled. Thus, semisupervised learning is important for the proper utilization of unlabeled data. A three-level learning framework is proposed that matches the hierarchical description of big data generated by smart cities.
In this section we first present the various tools and techniques that are used for different aspects in smart city projects. These are followed by the results of the thematic analysis.

| Tools and techniques
A number of different approaches, tools and techniques have been used to perform big data analytics in smart cities. We look at these from four different perspectives: • tools and techniques used for data modeling, storage, transfer and management, • tools and techniques used for computing, that is, processing and analyzing the data, • tools and techniques used for managing security and privacy, for example, authentication and authorization, encryption, anonymization, etc., and • major market drivers such as commercial vendors, research community, public sector, etc. that are supporting the discussed approaches.
Each of these is discussed subsequently.

| Data models
Relational databases are still the most commonly used storage mechanism for smart city applications Eirinaki et al. (2018); Motta, You, Sacco, and Ma (2014); Peters-Anders, Khan, Loibl, Augustin, and Breinbauer (2017). A number of tools and techniques are used to store, transfer and manage big data in smart cities. This is especially challenging because of the nature of big data discussed previously. Since it is generally large in volume, the databases used to store it must necessarily be distributed to be able to leverage multiple data stores. For this reason data storage solutions such as HDFS Hu et al., 2018;Pei, Li, & Tong, 2018) and Amazon EC2 (Lu et al., 2011) are commonly used. Due to its frequent updates (velocity), big data is also, often by nature, time series data, and thus time series databases such as InfluxDB are also found in literature (Tahat, Aburub, Al-Zyoude, & Talhi, 2018). As big data is also often from many sources and thus, in a variety of formats, traditional relational databases such as MySQL, Oracle, etc. that impose strict structural restrictions on the data are unsuitable. This paves way for more unstructured databases such as MongoDB (Pei et al., 2018;Tahat et al., 2018).

| Computing models
As with data models, the nature of big data also necessitates different computing models. Moreover, since smart cities are designed to provide public information services to citizens, this also necessitates unique computing models. Cloud, fog and edge computing are important and emerging technologies in this regard (Bibri, 2018). On the backend distributed data processing technologies such as Hadoop Ghaemi, Alimohammadi, & Farnaghi, 2018;Hu et al., 2018;Lu et al., 2011;Pei et al., 2018;Wang & Eick, 2017) and Microsoft Azure are popular (Tahat et al., 2018). Our survey also found that machine learning is by far the most common method of performing big data analytics, for example, Anisetti et al. (2018); Ghaemi et al. (2018); ; Lu et al. (2011);Pei et al. (2018) just to name a few. Machine learning techniques such as Artificial Neural Networks (ANNs), clustering and SVMs have been shown to be very successful for performing the sorts of analytics that are required Pei et al., 2018;Tahat et al., 2018;Zheng, Liu, & Hsieh, 2013).

| Security and privacy
There is a lot of work in smart city security and mostly standard security and privacy approaches have been applied (Khan, Pervez, & Abbasi, 2017). This is an important area of smart cities is made even more important due to the recent spotlight on security and privacy in the IT domain overall. The introduction of General Data Protection Regulation (GDPR) is one result of such a focus. As a result, research in this area is ramping up. Anisetti et al. (2018) propose a system that uses model-based definitions of public policies to ensure compliance with the GDPR. environment, public agencies such as health, mobility, environment, security and socio-economic activities for analysis and decision making. In this respect, smart governance uses ICT tools for data collection, processing and analysis in order to better understand the state of city, improve communication, enhance public service provisioning and promote transparency by introducing evidence-based policies (Javed, Khan, & McClatchey, 2018;. Since the focus on this survey paper is data analysis, we will only cover data and technology related aspects. The PwC report highlights the importance of Data Driven Cities (PWC.,) for variety of smart city applications across the world and recommends technology development and introduction strategy for long-term sustainability of data based cities; horizontally (i.e., across departments) and vertically (i.e., thematic and level of governance) integrated information platform for consistent and compatible data processing; and, measurable indicators of technology in city management such as financial gains, efficiency in governance processes, etc. Similarly, Barns (2018) discusses role of urban data platforms or dashboards in supporting the city governance. She argues that data platforms contribute as a key element for the development of new governance models for smart cities; enabling various city stakeholders to test and discuss potentials and pitfalls of data-driven initiatives in smart governance.
This part of the survey attempts to review literature, specifically, focusing on urban governance and data analytics. For this section, about 160 relevant articles were shortlisted based on their title and keywords. Most of these shortlisted papers either focused on general data analytics in smart cities or talked about a specific theme such as energy or mobility or health or emergency response or urban planning, etc. After reviewing abstracts of these articles, about 20 papers were considered suitable for governance theme review.
According to literature review, ICT is an essential component in smart city governance to gain necessary intelligence that provides basis for evidence-based decision-making. At the core of ICT based solutions are three main components: (i) data collection through sensors, Internet-of-Things, smart phones, remote sensing (e.g., satellite or in-situ) and city databases; (ii) data processing and/or pre-processing (e.g., filtering, data quality and format translations); (iii) data analysis using machine learning, data mining and other statistical algorithms to generate new knowledge in various cross-thematic applications such as mobility (Docherty, Marsden, & Anable, 2018;Peters-Anders et al., 2017;Rathore et al.,2018) Table 4 presents coverage of topics in the selected literature.
The transformation from traditional governance to ICT enabled smart governance (Bolívar, 2015;Meijer & Bolívar, 2016) is gaining momentum where open data or other data platforms are providing useful contributions (Barns, 2018;Pérez-González & Díaz-Díaz, 2015). On the one hand increasing Big urban data provide greater opportunities for new forms of city governance; on the other hand there is no systematic and coherent data system to manage unstructured, semi-structured and structured heterogeneous urban data in order to perform data analysis (Kourtit & Nijkamp, 2018). Barns (2018) and Kourtit and Nijkamp (2018) discuss the role of urban data platforms or dashboard in supporting the city governance. These data platforms contribute as key elements for the development of new governance models for smart cities and forums in which various city stakeholders can test and discuss potentials and pitfalls of data-driven initiatives in smart governance (Barns, 2018). Similarly,  presents different dimensions (e.g., technology, citizen engagement, openness, evidence-based decision making, creativity & innovation, sustainability, integration, effectiveness and efficiency) of the word "smartness" in understanding and developing smart city governments. We agree with authors that technological advances should be equally met by advances in government management and policy for smart governance. This is also supported by Meijer and Bolívar (2016) who argue that smart city governance is not just a technological issue rather it is a complex process of institutional change with more appealing socio-technical governance focusing on both economic gains and other public values. Others refer this approach as "collective smart" which is necessary value-driven context of cyber human smart cities . Likewise Kourtit and Nijkamp (2018) argue that smart city management will fail without human or cognitive capital and lack of innovativeness and creativity. Hence, cities of future need both technical and socio-economic dimensions to be considered for urban policy making.
Among more technology savvy literature,  proposes a 3D GIS and cloud-based government affairs service platform that enables various city stakeholders to use city data (acquisition and management), analyze (Hadoop/map-reduce, data mining and statistical) and create new services developed through a multi-tier cloud-based architecture. Dustdar et al. (2016) emphasize on the need of a cloud/edge based public utility service ecosystem where consumers, producers and brokers can manage privacy protected distributed micro-transactions (e.g., blockchain). Babar and Arif (2017) propose a smart city architecture for big data analytics with three generic layers-data acquisition and aggregation, data computation and processing and application specific decision making. This layered architecture is common with others like Zhang et al. (2016); Khan et al. (2015); Rathore et al. (2016). Survey indicates that data pre-processing is a common and essential step to prepare raw data into more appropriate format for analysis and decision making. For data preparation and analysis, different techniques or approaches are used, for example, pre-processing or data normalization (min-max approach); data aggregation for analytics, for example, Hadoop model, Kalman Filter for speeding up the processing by separating data noise, Round Robin and Least Slack Time for scheduling and load balancing. Hadoop or map-reduce based model is useful for static data but with increasing real-time data and need to process and to make near real-time decisions emphasize using Spark and Storm. Such platforms can support governance of individual themes such as energy  or multi-disciplinary applications Rathore et al., 2016).
For good city governance, monitoring and real-time decision making can play an important role as highlighted by Malik et al. (2018) where raw data is transformed in JSON and RDF for semantic inferencing and querying. For instance, Eirinaki et al. (2018) use a multi-tier architecture (Angular JS, NodeJS, Apache Mahout, MySQL with interfaces with Socrata Open Data Platform) in Amazon EC2 cloud to handle online building permit applications. This not only facilitates citizens to easily communicate with complex planning permission processes but it also performs cross-departmental data collection and data analytics to enable city planners get the likelihood of building constructions in specific parts of a city. The monitoring is also important for policy making in cities where policy provenance through data capturing and processing can be used for policy analytics .
Citizen centered data analytics can provide necessary intelligence to city authority in provisioning of user related public services. Zhang et al. (2016) introduced a multi-layered framework for citizen centered urban governance and decision making. Their proposed multi-layered platform consists of three layers: data merging, that is, combining citizen-centered data with other urban sources; knowledge discovery, that is, citizen profile building through citizen participation, statistical and machine learning methods; and, decision making, that is, using ontology, data mining, Bayesian Net techniques. However, citizen centered data challenges such as managing heterogenous structured, unstructured or semi-structured citizen data requires various techniques such as: cleaning, splitting, translating, merging, sorting, and validating data. Aligning user profiles with urban data and analyzing it using data mining and machine learning techniques, the platform is able to deliver user-centered public services resulting in effective urban governance. Likewise Aguilera et al. (2017) also build citizen centric apps by exploiting open data in IES Cities platform. In IES platform, citizens are considered as prosumers that is consume and produce data resulting in better provisioning of public services to citizens and decision making. The platform uses heterogenous urban data in formats CSV, RDF, JSON, relational and provides toolset to enable developers to develop new urban apps. Motta et al. (2014) present City Feed-a crowdsourcing system that covers transactional and analytical modules to measure the maturity of crowd-based city governance. The platform uses an intelligence engine to perform collective analysis on the crowdsource data based on well-defined dimensions. The above literature shows that it is becoming increasingly important in open governance to engage with citizens and take their opinion in urban planning and decision making, for example, smarticipate platform where citizens can vote and comment on an existing proposal; create alternative proposals and can get processed rulebased feedback . Though existing literature emphasize on the use of urban Big data analytics, alignment between technology innovation, human dimension, that is, socio-techno perspective and city management processes; evaluation in real city environment highlight lack of integrated and/or cross-departmental data analytics for smart governance. For example, Chatfield and Reddick (2018) investigated 311 on demand services (i.e., non-emergency police and government services) of a large US city and impact of big data analytics. Their investigation reveals use of big data analytics by different departments but its usage on a wider city scale was not clearly evident. As a result authors proposed process-based approach as an institutional mechanism for embedding big data analytics use in critical governance processes to co-create public values and creating a culture of analytics driven governance in public authorities. This also can establish smart and connected communities in contributing to data driven governance. In this respect Sun et al. (2016) argue that smart and connected communities (SCC) need remembering past, that is, preservation, living in present, that is, livability and need to plan for future, that is, sustainability. Therefore, the ubiquitous IoTs, smart sensors, cloud computing and big data analytics can play an important role in promoting SCC. Author, presents a layered platform "TreSight" for data sensing, context building, event handling and processing, big data analytics for gaining insights, secure identity management (OAuth) and providing context-aware services, for example, Tourism and cultural heritage service mobile app. Pan et al. (2016) explore impact of big data in smart city intelligence for infrastructure support, urban governance, public services and economic and industrial development in China. Authors present an overview of urban big data with multidimensional hierarchy, integrity and correlation which require systems to adopt targeted methods for data lifecycle, that is, data acquisition (e.g., filtering), pre-processing (cleaning, filling, normalization, quality check, etc.), processing and analyzing (i.e., linear analysis, non-linear analysis, factor analysis, sequential analysis, variable curve analysis, bivariate statistics, linear regression, etc.), categorization and inter-relation analysis (i.e., support vector machine, Naive Bayes, random forest, logistic regression, etc.), uncovering patterns, rules and knowledge (i.e., AI neural network, genetic algorithms, cross-media algorithm), and value generation (interactive and visual information) with automated routines. This suggests that data driven smart city analytics and governance have common steps where numerous techniques or approaches can be applied as part of the system architecture.
Above literature review clearly indicates that data analytics is becoming common in smart city governance. Most of the frameworks or platforms have common functionalities such as data collection, preparation, processing, analysis and application specific decision-making. Citizen-centric data and integrated (i.e., cross-departmental) data analytics for urban governance is challenging and requires data preparation, coordination & alignment between technology and city processes, and political willingness is needed in order to gain operational feasibility for data driven smart city governance. Most of the literature covers different technologies for example, IoTs, Sensors, Big Data, Cloud, Edge, Fog, Semantics, Data management, etc.; however, not all of these literature articles covered security and privacy issues of data driven urban governance.

| Environment
An important pillar of smart cities is the smart environment (Chatterjee & Kar, 2015;Vinod Kumar, 2015), an umbrella term for a number of different areas. As mentioned previously, one of the challenges that smart cities face is to reduce its carbon footprint to mitigate climate change ( van Dijk & Teuben, 2015). This can be done in a number of different ways, for example, by switching to a renewable source of energy such as solar, by reducing air pollution and thus, improving air quality and even by motivating people to switch to alternative modes of transport, resulting in fewer numbers of cars on the road. Due to this reason, the theme of smart environment can cover a number of different topics. We focus specifically on tools and techniques that involve the use of big data analytics. After searching relevant literature and analyzing their abstracts and years of publication, we short listed 10 papers for detailed review.
The recent emergence of IoT has resulted in a deluge of data in many fields, including smart cities. Air quality monitoring in particular is an area that has benefited greatly from IoT (Molka-Danielsen, Engelseth, Olesnanikova, Sarafin, & Zalman, 2017;Tahat et al., 2018;Wang & Eick, 2017). The presence of a large number of sensors that acquire readings every few minutes generates a significant amount of data that needs to be stored, processed and analyzed efficiently and in near realtime. This is why big data analytics has proven to be a popular choice for processing and analyzing such data. For example, GSMA (2018) has recently produced a report citing the advantages of using big data analytics coupled with the IoT to monitor air quality data as well as the opportunities available. Hu et al. (2018) propose ClimateSpark, an in-memory data analytics framework for processing climate-related big data. ClimateSpark is based on Apache Spark (Ganelin et al., 2016) and includes ClimateRDD, a data model for representing climate data that allows for more efficient processing of the data that is based on the Resilient Distributed Dataset (RDD) format. ClimateRDD chunks up the spatiotemporal climate data both temporally and spatially into logical units. This allows for better parallel I/O as well as provides advanced data querying capabilities. As ClimateSpark uses Spark for its core data processing capabilities, all processing is performed in-memory, which also greatly speeds up the processing time.
Another framework that targets geo-spatial data mining in order to detect changes in the ozone concentration in various geographic locations was presented by Wang and Eick (2017). The framework includes pre-processing of geo-spatial data, a density-based data mining algorithm and a visualization algorithm. The framework also introduces a change pattern discovery technique to detect dynamic change patterns in geo-spatial data, particularly to detect changes in the ozone concentration. A post-processing analysis technique is also presented that automatically identifies interesting spatial clusters output by the data mining algorithm. Pei et al. (2018) propose the ATWDP framework that is used for analyzing solar cell welding data. It offers a number of services such as early warning, real-time monitoring and knowledge discovery. The framework has three layers; the enterprise layer, the sensor layer and the batch layer. Data is collected from solar cells using sensors embedded in them in the sensor layer. The enterprise layer is for interfacing with the users and is also used for data acquisition. The batch layer processes the collected data using MapReduce combined with an SVM-DS (Cortes & Vapnik, 1995). Tahat et al. (2018) present a cloud-based big data analytics framework to analyze environmental data. Data is collected through sensors and sent to a central cloud platform for storage. A MongoDB database stores the user authentication information while an InfluxDB is used to store the time series monitoring data. The Orange data mining framework is used to analyze the data. The framework also has an associated iOS mobile app to access and visualize the data. Molka-Danielsen et al. (2017) present two case studies to investigate how big data analytics can be used along with wireless sensor networks to monitor air quality in workplace environments. The data is collected from wireless sensors and stored in a MicroSD card for offline processing. Even though their aim was to study how big data analytics can be used to monitor air quality the nature of their data is such that it cannot be considered big data. Lu et al. (2011) propose a framework to perform distributed big data analytics on climate data using RapidMiner (Arimond, Kofler, & Shafait, 2010). The authors also implement a data conversion tool to convert the NetCDF (Rew & Davis, 1990) climate data into RapidMiner native format. Zheng et al. (2013) present U-Air, a co-learning framework that uses two separate classifiers to predict the air quality of a particular geographic location. The framework uses a spatial classifier that takes into account spatially-related features such as road networks, distance to an existing station, points of interests, etc. The other classifier is a temporal classifier that takes into account temporally-related features such as humidity, temperature and traffic flow. Le (2018) adopt a slightly different approach. They use a Convolutional Neural Network to process "image-like" air pollution concentration maps and classify the pollution according to hazard level in real time. On the other hand they use a combination of a Neural Network and Long Short-term Memory network to take into account historical time series air pollution data and current local factors such as temperature, humidity, wind direction, etc. to predict future air pollution. Zhu, Cai, Yang, and Zhou (2018) present yet another method for prediction air pollution concentrations. They use a machine learning approach that uses regularization to predict hourly air pollution levels based on historical meteorological data as well as current air pollution data.
A major concern in contemporary analytics-based systems is the privacy and security of user data. With the implementation of the GDPR, this has become of even greater importance. Anisetti et al. (2018) propose a semi-automated privacy-aware framework solution to ensure compliance with the GDPR. The authors propose a policy definition process that produces a machine-readable policy using deontic logic. Modeling policies this way allows them to generate machine-executable workflows that in turn produce evidence using big data analytics for decision-making. This deontic model also incorporate privacy requirements which allows the system to automatically check if the defined policy conforms to GDPR requirements.

| Energy
Energy plays a very important role in improving the living environment of Smart Cities by making environment cleaner, recycling of energy, lower energy consumption, lower emissions to reduce pollution and by having lower energy bills (TRANSFORM Project Consortium, 2015). With the recent advancements in IoT and Cloud technologies, energy systems are being made to work more intelligently based on the data collected from their operations. According to Report Linker, smart energy systems will grow at a CAGR of 14.91% in terms of revenue by 2020 which will contain artificial intelligent devices to work without human interference (Clare, 2016). This part of the paper focusses on the data analytics related to Energy to achieve the above mentioned objectives.
An architecture for energy data analytics is proposed by Malik et al. (2017) for making data easily reproducible and accessible for broader purposes like usability, transformation and with stream support. The proposed architecture has three main components for data collection, data storage and data distribution. For data collection a schema consisting of Type and Units of value measured, Geo-coordinates of the location of measurement with frequency of measurement and time stamp is proposed to store data from different sensor types in a single table. For data distribution, a publisher-subscriber based architecture is proposed to support both batch and stream data processing. For data storage, it recommends to use Polyglot persistence architecture based on multiple database technologies according to the requirements for application or components of application.
The imbalance between increasing energy demands and dwindling energy supply has intensified over period of time. A Big data and data mining technology based solution is presented by Horban (2016) to increase the energy efficiency. Power failures are common in developing countries but, in recent years these have become more frequent in developed countries as well. Ji, Wei, and Poor (2017) address the issue of failure in Energy systems and resilience using data analytics especially caused by natural disaster like weather. The data about failures, recoveries and weather variables are collected and analyzed to find the resilience level of infrastructure and services. The system also provides modeling technique to provide a pertinent role of guiding data analytics like what measurements to use and quantities to estimate, etc. The system tries to answer the questions related to Outage management and failure prediction from the location aware data collected automatically with high accuracy. For the support of study, data analytics were performed on a data set generated at a national level across the whole USA to quantify the resilience of the energy infrastructure and services under severe weather conditions.
The energy sharing through a substantial enhancement of its operational efficiency rather than relying on establishing new energy networks has become of utmost important to enable governments to meet their expenses in energy sector. The paper in F. Li et al. (2018) discusses sharing the energy systems in two major categories according to Customer Flexibility and Network Flexibility. It is encouraged to share the energy with flexibility on Peer-to-Peer (P2P) energy trading in local energy markets supported by Shared Network Access (SNA). SNA aims to entertain flexible demands in a cost effective manner. The paper presents this idea based on Big data analytics as a key enabler to inform and enable users and market participants to share energy based on established prices and a match on flexibility in demand.
From the literature review on energy, big data and analytics can be seen as big enablers to gain insights which can help in solving the issues of energy systems, for example, achieving lower consumption and lower emissions in order to make environment more cleaner and lowering the energy bills through sharing or behavioral change.

| Economy
Healthy economy is a critical factor for cities to function properly and provide services to citizens. Economic forecasting aims to provide future prediction/forecast of the economy on the basis of certain values. These values can be high and low prices of goods, services (inflation), employment, etc. Precise measurements about the economic requirements of populations importantly influence researchers and government. These measurements manipulate governments to allocate the limited resources and present the foundation about the need to global world for taking decision and improving decisions about the livelihoods of human. The economic quality and quantity of data is much improved in many developing countries but still insufficient in some countries. This needs effort to identify and involvement in the poor areas. Economic Forecast is mainly used by businesses and governments to assist them in deciding their budgets, multi-year plans and strategy for the upcoming year. Stock exchange market specialists use forecasts to help them estimate the valuation of a company and its stock. In the age of Big Data, it can be used to forecast economic aspects, by producing meaningful knowledge for future. Based on the current data and information, machine learning algorithms can be applied to achieve better forecast results. Athey (2018) discusses the impact of machine learning and its approaches on Economic policies and also comparison of traditional approaches with machine learning. Future trends and contribution of Machine Learning with economics are also discussed in this paper. The author has also discussed state of the art research in econometric and statistic, and then performs contrast with traditional approaches. Some of the problems related to decision making are explained using different computational approaches. Following are the examples discussed by Athey (2018): Hip Replacement Surgery: Prediction model is very helpful in making decisions regarding the health of a person. It is difficult for poor people to arrange money for their treatment. If one can guess what will happen in the future, then decisions can be made accordingly. Hip replacement example is used to predict whether to go for a surgery of an older person or not based on the health conditions.
Health Code Violation: In restaurants, health code violations are used by the health inspector in order to evaluate the performance of the restaurant and to rate them according to their food quality, health and safety. Prediction Models are used in this paper to predict health code violations in restaurants.
Nyman, Ormerod, Smith, and Tuckett (2014) describe a new approach to economic forecasting, which is based on the availability of "Big Data". The main aim is to develop an algorithm to analyze text data which is given in different forms for example news feed, email, office legal documents recording to extract time series of sentiment analysis that are similar in nature, that might forecast aspects for the economy. Deep Learning and Machine learning methods are widely used to predict poverty in many countries including countries from Africa (Blumenstock, 2016;Gebru et al., 2017;Jean et al., 2016). These approaches produce better results for predicting and identifying poverty and wealth. Satellite images contain a lot of information. But the images are unstructured. The biggest challenge is to extract meaningful information from it. Transfer learning is being used (Jean et al., 2016) to overcome these challenges. This method has some noise but it is able to obtain proxy and train a deep learning model for poverty. The transfer learning approach consists of three steps. Firstly, a convolutional neural network (CNN) ( LeCun, Bengio, & Hinton, 2015) has been pretrained on a large dataset of ImageNet. ImageNet consists of labeled images. This model is able to learn features such as edges and corners. After retrieving the images from CNN model, the CNN again fine tunes the model in a new task. CNN model is used as a feature extractor with survey data to predict the poverty estimations. Most important market driver in this case study is poverty prediction using socio-economic data in African regions especially Uganda, Nigeria, Tanzania, Malawi. As it is difficult to scale-up conventional data-collection approaches, alternative approach to data-collection is using novel sources of "passively collected data" such as social media, mobile phone networks and satellites. Economic characteristics of populations influences a lot of research and accurate measurement of such data can shape critical decisions for the governments. Livelihood can be improved by allocating resources in optimal fashion and by tracking progress.
The regression technique can also be used to predict the future income of a person from his/her current financial statistics and the expected growth rate in the financial value of that person. There are different methodologies to compute the income of a person from his/her current financial statistics. The most common way of future value (of person) prediction is modeling and analysis of Earnings Dynamics relevant to the Consumption Dynamics and the environment uncertainty and change for a person. There are a lot of heterogeneity in these models but these models are producing good results. These dynamics can be analyzed by different regression techniques.
Druedah and Munk-Nielsen (2018) provided a linked regression tree based algorithm to analyze income and liabilities dynamics. The analysis is done on a data of Danish males from the age of 30 to 60. The attributes for the analysis are income, assets and debts. The income is the yearly earning of that person and the assets are housing equity, financial assets including the bank account balance, stocks, bonds and funds, and the debts that he has to pay to any authority including bank loans. The Linked Regression Tree (LRT) algorithm is used to estimate the mean and the variance of the levels of log income over a life cycle of 30 years from the ages of 30 to 60. LRT algorithm converts the data into income groups and the problem of regression into the problem of classification. The income of a person at any age (in future) is dependent upon his current income and the growth rate in the income. The Markov model (a probability based model) is used to predict the income group of a person at any age with current income group as a current state and the growth in income as a transition. One of the extra finding other then the actual prediction that was found from the results is that, a person uses his 10% income on welfare activities. They used different measures like mean square error of group specific income-based on the age and five-fold cross validation.
Machine learning incorporates empirical resolution of tasks instead of constituting a hard-wired rule which would map input x to a predicted value y. This essential quality can be applied in the field of econometrics where parameter estimation is key in the estimation of relationship between y and x. However, machine learning is more synchronous with finding predicted values instead of parametric values. Mullainathan and Spiess (2017) discuss how machine learning can be tuned to become an integral part in the econometrics and what pitfalls to avoid in overcoming the inherent problem of overfitting. This research has used 10,000 records of owner-occupied unit's data from 2011 metropolitan sample of American Housing Survey. It has compared multiple classifiers such as regression trees and LASSO with simple procedural estimation functions such as least squares to find why machine learning is a more portent tool in econometrics. Although, random forests perform much better in prediction on out-of-sample data than ordinary least squares it also has a tendency to overfit. This resonates with most machine learning approaches wherein high dimensionality can both be a bane and a boon: to have a large set of features does offer good generalization but it really is overfitting on a sample data which then translates to poor performance on out-of-sample. This is where regularization becomes essential-to apply methods to tune the algorithm in such a way as to increase its prediction power on unseen data. This empirical tuning includes procedures such as cross-validation and using these subsamples to apply regularization parameters. Finally, the parameter with best average performance is selected (Angrist & Pischke, 2008). This is being used in LASSO which is most used by economists. This incorporates a quadratic loss function, class of linear functions and a regularizer as sum of coefficients' absolute value (Bekker, 1994). Conclusively, the efficiency of machine learning lies in not deducing the rules but inducing them instead. Thus, instead of rigidly applying functions uniformly on all types of data, the data is being allowed to make informative decision. Nevertheless, machine learning works in conjunction with theory driven approaches and helps in making more informed decisions in choosing which variables/functions to select in order to best make predictions on unseen data.
Smart cities and sharing economy are the current topics of prototype development and research, where different issues are discussed such as trend of current urbanization, economically facing situation which people are facing for many years. Gori, Parcu, and Stasi (2015) have discussed about sharing economy and smart cities, their common features, similarities and differences. Authors have suggested some policies to Government for welfare of masses. Smart cities as defined by European Commission are a system where people use power, energy, resources to improve their living standards and smart communication means and information use help to achieve this goal. A smart city emerges when government participates for improving living standards by investing in people, transport, ICT, communication infrastructure, sustainable economic growth while managing natural resources. Both smart city and sharing economy depends upon improved utilization of resources that can be either money, assets, time or services. Data protection and security are also important terms. Camerer (2017) has expressed the idea of connecting Machine Learning with behavioral economics. They perform the experiments on the basis of evaluating bargaining and risk choice. The main theme is how artificial intelligence (AI) can be used to assemble preferences of predictions regarding the unfamiliar product.
In essence, AI and machine learning can easily transform the economic structure and thus, manipulation can be minimized. Moreover ML and AI will focus policy problems and new research areas will born and process will be changed due to automation. ML will automate the routine task of data analysis and so on the economist will be able to enjoy the credibility of empirical work.

| Transport
As the world is changing, cities are growing and urban population is rising, the need for transport for goods and people is increasing. Hence we need more mobility but in smarter way. Being a significant component of modern economy, transportation constitute 6 to 12% of Gross Domestic Product (GDP) in many developed countries. It also has great impact on people's life as estimated around 10 to 15% of household expenditure is of transportation and around 8% time of daily life is spent on travel by people. Though transportation has greatly improved people's lives; few costly problems remain unsolved including congestion, traffic accidents, air pollution and climate change due to fuel consumption and particulate emission of the vehicles. Continuous efforts have been done in the past to solve these mentioned problems but little significant success is made. Recently, Transportation has become a hot topic in Internet of Things (IoT) area and is considered part of the solution to mentioned problems (Gottbehüt, 2016).
Based on IoT technology, companies across various travel and transportation industry sectors-including railways, aerospace industry, airlines, freight logistics and others-have started capturing the data from almost all kind of systems or events in the public or private clouds. With advancement of cloud technology to store data and sophisticated data analysis techniques to extract insights from data at high speed and accuracy, organizations are making better knowledgeable decisions which was not achievable before (Wedgwood & Howard, 2015). This way data itself has achieved strategic and competitive asset importance and data analytics tools and technologies are contributing to solve the problems of congestion, accidents and climate change.
This part of the survey focuses to show the advancement in data analytics technology, issues and available solutions. From the literature search, following papers were selected to be presented in this survey, mainly, based on their relevance to the topic.
As data is of key importance to perform analytic, Shukla and Champaneria (2017) present different ways to collect data in order to provide better transportation management. For counting the number of devices entering or exiting the road, different technologies are presented including pneumatic tubes, inductive loops and vehicle's detection via video. Pneumatic tubes are installed on the surface of the road to count the vehicles by the compressed burst of air by the air switch when a vehicle passes near to the tube. Inductive Loops count the passing vehicles by detecting the vehicle from the change in magnetic field. Besides these all, a modern video based technology is used now a days. The vehicles can be identified and counted by successfully identifying them in video by the intelligent software which can identify the different objects and separate vehicles from various background objects. The paper also talks about collecting GPS based data from mobile phones or tracker fitted in the car to collect information about vehicle speed or traffic density. Different types of sensors are also presented in the paper to measure CO2 using Nondispersive Infrared Sensors (NIDR), visibility using Fog sensors, presence of vehicle in some specific lane using Ultrasonic sensors and Magnetometer sensor for vehicle classification, detection and speed estimation.
To reduce the congestion and road accidents, a big data analytic based model is presented by Abberley, Gould, Crockett, and Cheng (2017) to capture the semantics of road accidents and congestion using ontology. The paper uses Big data quantitative information to provide qualitative information about traffic ontology using Machine learning techniques. The four steps model tries to formulate a conceptual model of congestion in step 1 to develop an ontology for congestion considering the impact of road accidents. In step 2, congestion causing dimensions due to accidents are identified such as journey time and traffic volume. In step 3, the Big data sources relevant to identified dimensions are identified to collect the data for analysis. At the end, in the step 4, the identified dimensions and data from identified data sources is analyzed to identify patterns which can lead to better decision makings to reduce congestion. Machine learning algorithm Clustering is used to analyze different traffic and congestion situations in different week days for different times of day to show the impact of schools and office travels.
Another important objective to meet from Smart Transportation is to have better Climate. Electric Vehicles are seen as an important component to achieve this objective. Li, Kisacikoglu, Liu, Singh, and Erol-Kantarci (2017) present a survey on data analytic techniques based on Big data for reducing emissions for greener smart city is presented with focus on smart grids and electric vehicles. Different Hadoop based and web based data analytic techniques are presented. Battery consumption and charging meters are managed by Hadoop R statistical package and stream processing using Hadoop Pig scripts respectively. For web based solutions, decision tree algorithms, J48 and M5 algorithms are used on Weka platform to analyze the data collected from NY Independent System Operators (NYISO).
Literature has also discussed the systems to manage the overall traffic systems. For example Sharif et al. (2017) present a low cost Smart Traffic System (STS) is also proposed to deploy traffic updates instantly while Rizwan, Suresh, and Babu (2016) propose a traffic management system that tries to solve the traffic problems specifically in India. Rizwan et al. (2016) propose a Smart Traffic Management System (STMS) to solve the problem of traffic management in India especially congestion. The low cost STMS aims to provide better service by traffic indicators which are updated instantly depending upon the traffic conditions. It suggests to deploy sensor in the middle of the road at every 500 m on highways to collect different types of data from the vehicles. The big data collected is analyzed for traffic density and predict the future condition for traffic considering the number of vehicles entering the road and capacity of the road. The traffic conditions information and possible alternative routes are sent to the drivers on a mobile phone application developed for the proposed system. The drivers can take alternative routes to avoid the congested areas.
The overall literature review shows that data analytic techniques based on machine learning, Big data and artificial intelligence in general are trying to solve the long present issues of traffic management successfully. With this progress, it looks like that congestion, road accidents and traffic management problems will be solved in future smart cities using IoT and data analytic technologies to a greater extent.  Babar & Arif, 2017;Rathore et al., 2016) Environment Data Models • Aggregating and storing data from sensors

| DISCUSSION AND CONCLUSIONS
• Data my be missing or inaccurate • Data is often of a spatial and temporal nature, increasing its complexity • Develop models to manage the complex spatiotemporal nature of the data  • Store data in distributed and time series databases • Develop models that optimize parallel I/O for faster data access

Computing Models
• Data is voluminous and needs to be handled processed in near real-time • The spatiotemporal nature of data needs to be taken into account during processing • New, multi-faceted computing models needs to be developed to process spatiotemporal data (Ghaemi et al., 2018;Le, 2018, etc.) • Distributed data processing mechanisms need to be used.

Security and privacy
• Data processing and handling needs to be compliant with privacy-related regulations such as GDPR.
• Introduce methodology, tools and techniques to ensure compliance .

| Governance
For smart governance, city administrators, planners, policy and decision makers can benefit from evidence-based collective information, for example, performance indicators, public value, financial gains, citizen centric public services, etc., based on which they can plan effective policies and perform impact assessment. However, most of the applications are built for vertical thematic silos due to lack of inter-departmental collaboration, data standardization and data sharing. This suggests that there is a need for Big data strategy for smart city where privacy-preserved open data, crowd-sourced or citizen-centered data, data from public different agencies or/and departments can be securely gathered, harmonized or integrated and processed for • Data Reproducibility (M. Malik et al., 2017) • Efficient data comparison from different sources.
• Big Data Analytics and Data Mining (Horban, 2016) • Achieving Resilience by Data Analytics  • Big Data Analytics for Flexible Energy Sharing (F. Li et al., 2018) • Defined Schema building (M. Malik et al., 2017) • Data Processing Framework in different layers (Horban, 2016) Computing Models  (Rizwan et al., 2016) • Semantics from various sources and for various types of data (Shukla & Champaneria, 2017) • Ontology Definitions • Machine Learning (Shukla & Champaneria, 2017) Computing Models • Interpretation of Data collected on various angles and from different sources (Shukla & Champaneria, 2017) • Qualitative information generation from quantitative data interpretation • Sensors based infrastructure to analyze different types of data and process (Shukla & Champaneria, 2017) • Clustering and Machine learning techniques (Abberley et al., 2017) Safety and Privacy collective information intelligence to support timely decision making. The use of disruptive technologies including IoT, sensors, 2D/3D GIS, Cloud, Edge, Fog, Hyper ledgers, Human-Computer Interaction, for example, interactive web, AR/VR, distributed computing models such as service orientation, map-reduce, real-time stream processing, context-awareness and processing models using existing and new algorithms in spatial analysis, machine learning, AI, data mining are key enablers for smart governance. However, smart governance solutions can only be beneficial if there is willingness in city administrations to transform their processes and adopt these solutions. Therefore, there is need for close collaboration between various stakeholders, for example, domain experts, business organizations, citizens and IT experts to design, develop and adopt smart governance services.

| Environment
Smart environments are enabled by the use of IoT-based sensor networks. For example, pollution, wind, temperature, humidity, etc. sensors can be deployed across a city to get a real-time overview of the weather situation as well as predict weather changes. Such monitoring systems typically collect data on an hourly basis and analyze it to produce an overview. The use of IoT in such a way produces a large amount of data that needs to be processed in near real-time on an almost hourly basis. Machine learning is by far the most common technique for performing these kinds of analyses. There are also a number of other challenges that arise from these sorts of monitoring systems, for example, how to represent the data such that analyses can be performed efficiently on it. Environment monitoring systems also typically suffer from bad quality data and big data analysis techniques can again help to preprocess such data for further processing. Future challenges in smart environments include exploring the use of deep learning-based techniques to make the aforementioned monitoring systems more efficient and effective.

| Energy
Smart Energy sector has been also the major focus of the research in IoT and Smart Cities research. It is going to improve the energy supply for household supplies according to requirements and two-way transportation of energy will save lot of power as well. Smart meters have been also a vital component of research in Smart energy domain. They are going to support the efficient usage of energy and saving money to the people by calculating the two-way transportation of energy. It will also make the life of meter readers easy by sending the billing information directly to the billing server. However, more research should be focusing on standardization of data models and analytics to enable users and businesses to provide energy on established prices in way that is more flexible. Also the industries producing the products for Smart Grids and other relevant sector, need to emphasize more on the generation of standards for data collection, storage and transmission of data.

| Economy
Correct estimates of inflation, spending, employment, etc. are some of the most important aims of economic forecasting (Nyman et al., 2014). Economic recession of 2008 / 09 was not predicted by the researchers despite data availability Nyman & Ormerod, 2017). One of the factors is the incorrect approximation of the household income process (Druedah & Munk-Nielsen, 2018). Good estimation of the household income has a long history in economics. Another challenge is how artificial intelligence can interact with behavioral economics. Behavioral economics is defined as the analysis of the willpower, natural limits on computation, and self-interest, and the effects of those boundaries for economic analysis (Camerer, 2017). It is vital to study how AI technology used in business can both exploit and overcome human limits.
Big Data does provide increasing opportunities to learn knowledge that might be useful to the above task. Current, economic trends produce more volume, variety and velocity of data more than before. The main challenge is to extract useful information from big data in order to get meaningful data. In recent years, a lot of machine learning techniques have been proposed to accomplish that task. Supervised and unsupervised machine learning techniques can be quite useful here since these methods can be used to create variables that can be useful in standard econometric analyses. Curse-of-Dimensionality is another challenge in analyzing economic data. Curse-of-Dimensionality arises when number of samples are less than the number of variables. Dimensionality reduction and Feature selection techniques are quite useful to solve curse-of-dimensionality problem. For emerging economies, big data analytics can also be useful to support global manufacturing and supply chain innovation by improving human decision-making and data transparency (Tan, Ji, Lim, & Tseng, 2017).

| Transportation
Transportation has been the major focus of researchers for smart cities studies. Autonomous vehicles have revolutionize the industry and are going to be available in next 2-7 years according to different sources. The data analytics are helping aerospace, airlines, trains and cars industries to solve their different long existing problems. As discussed in the paper in Transport section, many projects are done to solve the problems of congestion, car park, congestion and pollution. It is analyzed from literature review that most of the products in transport area are commercial and all products are following their own structures and approach to store and analyze the data. There is need to develop the standards for data processing specially collection and storage structures for data. Therefore, there is a need that standardization bodies, researchers and industry should work together to standardize these things to accelerate the development of more products from more vendors in this domain. 5.6 | Current and future trends Gartner (n.d.) provides a methodology to assess current and future technology trends. The key stages for a technology identified by this methodology are: Peak of Inflated Expectations: early stages where the hype of a technology is at its peak and there's lot of publicity surrounding it.
Trough of Disillusionment: experiments and implementations begin to fail and interest wanes in the technology. Slope of Enlightenment: understanding of how a technology can benefit enterprises and systems improves and concrete applications begin to appear.
Plateau of Productivity: a technology is adopted in the mainstream and becomes broadly applicable in the market. Technologies important for enabling smart cities such as deep learning, machine learning and blockchains are currently at the peak of their inflated expectations (Panetta, 2018). In the coming years the AI technologies are expected to enter their trough of disillusionment. This will entail reduced interest in these technologies followed by deeper understanding of their potential and finally robust and wider adoption. Blockchain is still a relatively new phenomenon so it is not expected to mature until the next 5 to 10 years. Technologies like edge computing, IoT, etc. are still in the initial stages of their hype cycle and so are not expected to reach maturity for another 10 years or so. This indicates that these are the technologies of the future with the greatest potential for investment, research and innovation. These trends are also borne out by our survey, where research involving IoT, edge computing, augmented reality, etc. is still only just emerging. In addition to these new technologies such as Virtual Assistants, Digital Twin, Edge and Fog AI, etc. are new trends that have emerged in the past year that are important for smart city applications. AI, Data mining, machine learning are the core methods which enable all these technologies that have the potential to reshape the smart city landscape and open up new avenues for research and innovation in the coming years.

| SUMMARY
In this paper, we have discussed the challenges faced by smart cities and the key role data mining, machine learning and statistical methods can play to enable intelligent solutions for different applications. We present a structured data mining-based literature review methodology which we use to study the role of big data analytics smart cities and present major challenges and applied data mining and machine learning methods. Some of the major points discussed are: 1. Smart cities are a fertile area where big data analytics has a significant role to play. 2. There are several data mining, machine learning and statistical methods commonly applied in smart city big data analytics. These are: Support Vector Machines, Decision Tree, Nearest Neighbor, Bayesian inference, Deep Neural Networks (DNNs), Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), Long Short Term Memory (LSTM), Autoencoders (AEs), Generative Adversarial Networks (GANs), and Deep Belief Network (DBNs), SVM-DS, LRT. 3. The advent of IoT has further generated the need for big data analytics techniques as an unprecedented level of data is being generated and collected. 4. Big data is applicable to many different areas of smart cities such as governance, environment, economy, transportation, etc.
5. Big data analytics can support governance through enabling the mining of citizen preferences and opinions collected from a number of sources such as social media, polls, online comments, etc. In addition, cross-thematic and crossdepartmental data analysis is essential for smart governance. Smart governance needs to consider both socio-technical aspects and need proper mechanisms to introduce data driven strategies in city processes. 6. Big data analytics can also support evidence-based decision-making by integrating and bringing together multi-sectoral data and helping to extract information out of it. 7. In the field of smart environments big data analytics can help to collect, manage and analyze big data collected from a number of different sources such as environmental sensors, historical climate and mobility data, predict future environmental scenarios and visualize existing situations, etc. 8. In smart transportation big data analytics can help to solve the problems of traffic congestion by finding alternate routes earlier, reducing pollution by electric vehicles, traffic parking and in time provision of medical services after accidents. Location based data and vehicles data should be analyzed by advanced analytical techniques to solve the traffic problems. 9. In smart economy big data can help to manage and analyze massive amounts of historical economic data and analyze and predict future economic scenarios. In this regard economic forecasting is a major application. 10. For smart energy big data analytics can support the living of people in Smart Cities by providing cleaner environment, recycling of energy, lower energy consumption and emissions to reduce pollution and lowering the energy bills for homes and businesses. The consumers will also be generating energy and will contribute to better energy management. 11. From a technological standpoint a number of big data technologies have proven to be very useful for support smart cities.
For example, distributed storage solutions such as Amazon EC2 and HDFS help to manage the massive amounts of smart city data. Distributed data processing platforms such as Hadoop and Microsoft Azure help to process the smart city data. NoSQL databases like MongoDB and time series databases like InfluxDB lend themselves to storing the collected and processed smart city data.
From what we have been able to discern, so far newer technologies like deep learning, digital twin, virtual assistants, Edge AI are still in their early stages and offer significant opportunities for future research and development. For example, deep learning could allow smart city representatives to analyze the already collected data on a more deeper level and to improve the efficiency of the solutions that have already been developed. Virtual assistants can apply AI, data mining and machine learning methods to make cross-thematic city data more usable and enable city stakeholders to query public services in an efficient way. Also, less resource intensive algorithms on edge nodes would be highly desirable in smart city applications. Security, privacy and trust are significant issues going forward and must be taken into account by any new data analytic solution. For example, data assimilation must not violate privacy of individuals and transparency of algorithms can add trust on the new information.