Pixel based visual data mining of geo-spatial data (2023)

ScienceDirect

RegisterSign in

ViewPDF

  • Access throughyour institution

Article preview

  • Abstract
  • Introduction
  • Section snippets
  • References (27)
  • Cited by (45)
  • Recommended articles (6)

Volume 28, Issue 3,

June 2004

, Pages 327-344

Author links open overlay panel

Abstract

In many application domains, data is collected and referenced by its geo-spatial location. Spatial data mining, or the discovery of interesting patterns in such databases, is an important capability in the development of database systems. A noteworthy trend is the increasing size of data sets in common use, such as records of business transactions, environmental data and census demographics. These data sets often contain millions of records, or even far more. This situation creates new challenges in coping with scale.

For data mining of large data sets to be effective, it is also important to include humans in the data exploration process and combine their flexibility, creativity, and general knowledge with the enormous storage capacity and computational power of today's computers. Visual data mining applies human visual perception to the exploration of large data sets. Presenting data in an interactive, graphical form often fosters new insights, encouraging the formation and validation of new hypotheses to the end of better problem-solving and gaining deeper domain knowledge. In this paper we give a short overview of visual data mining techniques, especially for analyzing geo-spatial data. We provide examples for effective visualizations of geo-spatial data in important application areas such as consumer analysis and census demographics.

Introduction

Progress in technology allows computer systems to store and exchange data sets that were, until recently, considered extraordinarily vast. Nowadays, almost all transactions of everyday life, such as purchases made with credit cards, web pages visited, or telephone calls made are recorded by computers. This data is collected for its potential value in providing a competitive advantage to its holders. Government agencies also provide a wealth of statistical information that can be applied to problems in public health and safety, and combined with proprietary data to increase its value.

Data mining is the extraction of interesting patterns or models from observed data. Finding valuable details that reveal fine structures hidden in these already large and ever-growing data sets is difficult. With current data management systems, it is only possible to directly view very small portions of such data. With little possibility for exploring the full volume of data that was collected for its potential value, the data becomes useless and databases become ‘data dumps’.

A positive trend is that visual feedback plays an increasing role in data mining. Presenting data in an interactive, graphical form often fosters new insights and encourages the formation and validation of new hypotheses, to the end of better problem-solving and deeper domain knowledge. Typically, a data analyst first specifies some parameters to restrict the search space, then runs a data mining algorithm to extract potentially interesting patterns, and examines the results graphically. For data mining to be effective, it is important to include humans in the data exploration process, combining their flexibility, creativity, and domain knowledge with the storage capacity and computational power of current computer systems. Visual data exploration thus aims at involving humans closely in data exploration, applying their perceptual abilities to the problem. Visual data mining techniques have proven to be essential in exploratory data analysis, and have high potential for the discovery of interesting patterns in very large databases.

There are many ways to approach data mining problems, including creating statistical models, clustering, and finding association rules, but in practice when data with geographic attributes is involved, it is often important to find relationships involving location. Consider, for example: credit card purchase transactions including both the address of the place of purchase and of the purchaser; telephone records including addresses or cell phone base antennae locations; space satellite remote sensed data; census and other government statistics with addresses or other geographic indexes for residents; or records of property ownership based on physical locations. Discovering spatial patterns is often crucial for understanding these data sets. Spatial data mining is the branch of data mining that deals with this problem.

Section snippets

Visual data mining

Visual data exploration often follows a three-step process: Overview first, zoom and filter, and then details-on-demand which has been called the Information Seeking Mantra [1]. In other words, in the exploratory data analysis (EDA) of a data set, an analyst first obtains an overview. This may reveal potentially interesting patterns or certain subsets of the data that deserve further investigation. The analyst then focuses on one or more of these, inspecting the details of the data.

Spatial data mining

Spatial data describes objects or phenomena with specific real-world locations. Large spatial data sets occur naturally when accumulating many samples or readings of phenomena in the real world while moving through two dimensions in space. Spatial data mining methods can be applied to understand spatial phenomena and to discover relationships between spatial and non-spatial data.

A very common approach to analyzing geo-spatial data has been to apply standard statistical analysis methods.

PixelMaps

This paper describes PixelMaps, a new way of displaying dense point sets on maps, which combines clustering and visualization. PixelMaps are novel in several ways: First, they provide a new tool for exploratory data analysis with large point sets on maps, and thus augment the flexibility, creativity, and domain knowledge of human data analysts. Second, they combine advanced clustering algorithms with pixel-oriented visualization, and thus exploit the computational and graphics capabilities of

PixelMap algorithm—an efficient implementation

In this section, we present PixelMap, an efficient algorithm that combines the advantages of gridfiles and quadtrees into a new data structure. This data structure approximates the kernel density functions to enable the placement of data points at unique positions on the output map, as previously described. The combination supports recursive partitioning of Euclidean 2D space, with automatic smoothing depending on x,y density and an array-based 3-D density estimation.

Evaluation of the defined optimization goals

In this section, we compare our PixelMap approach with a genetic multi-objective optimization algorithm using both absolute and relative position preservation as defined in constraints 1 and 2 and clustering effectiveness defined in constraint 3 in Section 4.2 to evaluate and judge the quality of the computed PixelMaps. We implemented a genetic multi-objective optimization algorithm for generating PixelMaps that likewise attempt to optimize the goals presented in Section 4.2 (for further

Visual evaluation and applications

Formal effectiveness measures, such as the absolute and relative position preservation and clustering errors considered above, are of limited value if they do not correspond to useful visualizations. In this section, we provide a visual comparison of the PixelMap technique with traditional approaches, which in general confirm the measured mathematical criteria.

Our first comparison (see Fig. 14) is a map of the United States showing the U.S. Year 2000 Median Household Income Data. The left

Conclusions

We presented PixelMap, a novel pixel-based visual data mining technique that combines kernel-density-based clustering with visualization, with an efficient approximation for displaying large spatially referenced data sets. PixelMap avoids the problem of losing information because of overplotting data points. More precisely, it assigns each data point to a unique pixel in the 2D display space, and tries to achieve a good trade-off between spatial locality (absolute and relative position

Acknowledgements

We thank Carmen Sanz Merino and Hartmut Ziegler for their great support. We also thank Dave Belanger and Mike Wish for encouraging this investigation.

Daniel A. Keim received the Ph.D. degree in Computer Science from the University of Munich in 1994. He is working in the area of information visualization and data mining. In the field of information visualization, he developed several novel techniques which use visualization technology to explore large databases. He has published on information visualization and data mining; he has given tutorials on related issues at several large conferences including Visualization, SIGMOD, VLDB, and KDD; he

References (27)

  • Shneiderman B. The eyes have it: a task by data type taxonomy for information visualizations. In: Proceedings of the...
  • S. Card et al.

    Readings in information visualization

    (1999)

  • H. Schumann et al.

    Visualisierung: grundlagen und allgemeine methoden

    (2000)

  • Spence B. Information visualization, UK: Pearson Education Higher Education Publishers;...
  • C. Ware

    Information visualization: perception for design

    (2000)

  • Geisler G. Making information more accessible: a survey of information visualization applications and techniques,...
  • A.S. Fotheringham et al.

    Spatial analysis and GIS

    (1994)

  • J. Han et al.

    Data mining: concepts and techniques

    (2001)

  • D.J. Hand et al.

    Principles of data mining

    (2001)

  • Koperski K, Adhikary J, Han J. Spatial data mining: progress and challenges. In: Research issues on data mining and...

  • Koperski K, Han J, Adhikary J. Mining knowledge in geographical data. Communications of the ACM...
  • Keim DA, North SC, Panse C, Sips M. Pixelmaps: a new visual data mining approach for analyzing large spatial data sets....
  • Advizor Solutions Inc., Visual Insight In3D, http://www.advizorsolutions.com/, February...
  • Cited by (45)

    • Classifying influential for project information to discover rule sets for project disputes and possible resolutions

      2016, International Journal of Project Management

      Public–private partnership (PPP) is a strategy where governments encourage private institutions to financially support public construction projects, by providing proper incentives based on collaboration with private institutions. However, disputes may occur during a contract management. This paper investigates various public–private partnership (PPP) disputes and their critical influential factors for associating fundamental project information and dispute resolutions. In this study, knowledge is extracted from the association rules so that dispute handling patterns can be identified from historical database. Analytical results show that the rule sets achieve 83.92% confidence level. By applying the results in practice, project managers can determine the likely method for dispute resolutions with known project attributes, dispute items, and the phase in which a dispute occurs. This research demonstrates an effective application and valuable reference for early notice of dispute handling methods in public infrastructure projects.

    • A decision support system for evidence based medicine

      2014, Journal of Visual Languages and Computing

      Citation Excerpt :

      Moreover, visual data mining it is effective when little is known about the data and the exploration goals are vague, since these can be adjusted during the exploration process. It can provide a qualitative overview of the data and it can allow unexpectedly detected phenomena to be pointed out and explored using further quantitative analysis [19]. The visual data mining process starts by forming the criteria about the visualizations to choose and the attributes to display.

      We present a decision support system to let medical doctors analyze important clinical data, like patients medical history, diagnosis, or therapy, in order to detect common patterns of knowledge useful in the diagnosis process. The underlying approach mainly exploits case-based reasoning (CBR), which is useful to extract knowledge from previously experienced cases. In particular, we used sequence data mining to detect common patterns in patients histories and to highlight the effects of medical practices, based on evidence.

      We also exploited data warehousing techniques, such OLAP queries to let medical doctor analyze diagnosis along several measures, and recent visual data integration approaches and tools to effectively support the complex task of integrating and reconciling data from different medical data sources. In addition, due to massive presence of textual information within the clinical records of many hospitals, text mining techniques have been devised. In particular, we performed lexical analysis of free text in order to extract discriminatory terms and to derive encoded information. Finally, the system provides user friendly mechanisms to manage the protection of confidential medical data.

      System validation has been performed, mainly focusing on usability issues, by running experiments based on a large database from a primary public hospital.

    • Mining customer knowledge for exploring online group buying behavior

      2012, Expert Systems with Applications

      Online group buying is an effective marketing method. By using online group buying, customers get unbelievable discounts on premium products and services. This not only meets customer demand, but also helps sellers to find new ways to sell products sales and open up new business models, all parties benefit in these transactions. During these bleak economic times, group buying has become extremely popular. Therefore, this study proposes a data mining approach for exploring online group buying behavior in Taiwan. Thus, this study uses the Apriori algorithm as an association rules approach, and clustering analysis for data mining, which is implemented for mining customer knowledge among online group buying customers in Taiwan. The results of knowledge extraction from data mining are illustrated as knowledge patterns, rules, and knowledge maps in order to propose suggestions and solutions to online group buying firms for future development.

    • Mining customer knowledge for direct selling and marketing

      2011, Expert Systems with Applications

      Citation Excerpt :

      Thus, how to effectively process and use data is a critical issue that calls for new techniques to help analyze, understand or even visualize the huge amounts of stored data gathered from business and scientific applications (Liao & Chen, 2004). Among the new techniques developed, data mining is the process of discovering significant knowledge, such as patterns, associations, changes, anomalies and significant structures from large amounts of data stored in databases, data warehouses, or other information repositories (Keim et al., 2004). Customer knowledge extracted through data mining can be integrated with product and marketing knowledge from research and can be provided to up stream suppliers as well as downstream retailers.

      Direct marketing is an effective marketing method. To compare with the expensive media advertisements, direct marketing could provide exclusive products and services for specific consumers. Also, this method could reduce transaction costs. The communication channel is diverse because virtual shop stores and online shopping are springing up. Therefore, this study proposes the application of Internet marketing to the direct selling industry and the cosmetics market in Taiwan. This study implements association rules and cluster analysis as approaches for data mining. By doing so, we analyze consumer adumbration, lifestyle habits and purchasing behavior. Finally, this study finds some models including cluster consumer purchase preference and demand in order to generate different marketing alternatives for decisions. These research results can help attract more direct marketing firms to open up broader markets and earn higher profits for direct selling.

    • Mining customer knowledge to implement online shopping and home delivery for hypermarkets

      2011, Expert Systems with Applications

      Citation Excerpt :

      This calls for new techniques to help analyze, understand or even visualize the huge amounts of stored data gathered from business and scientific applications (Liao & Chen, 2004). Among the new techniques developed, data mining is the process of discovering significant knowledge, such as patterns, associations, changes, anomalies and significant structures from large amounts of data stored in databases, data warehouses, or other information repositories (Keim et al., 2004). Customer knowledge extracted through data mining can be integrated with product and marketing knowledge from research and can be provided to up stream suppliers as well as downstream retailers.

      With advances in modern technology, the Internet population has increased year by year globally. For young customers who consider convenience and speed as prerequisites, online shopping has become a new type of consumption. In addition, business-to-customer (B2C) home delivery markets have taken shape gradually, because virtual stores have risen and developed, e.g. mail-order, TV marketing, e-commerce. To integrate the above statements, this study combines online shopping and home delivery, and attempts to use association rules to determine unknown bundling of fresh products and non-fresh products in a hypermarket. Customers are then divided up in clusters by clustering analysis, and the catalog is design based on each of the cluster’s consumption preferences. By this method, to increase the catalogue’s attraction to customers, hypermarkets are offered an online shopping and home delivery business model for sales services and propositions. With such a model, we can expect to attract more customers open up more broad markets, and earn the higher profits for hypermarkets.

    • Ontology-based data mining approach implemented for sport marketing

      2009, Expert Systems with Applications

      Since sport marketing is a commercial activity, precise customer and marketing segmentation must be investigated frequently and it would help to know the sport market after a specific customer profile, segmentation, or pattern come with marketing activities has found. Such knowledge would not only help sport firms, but would also contribute to the broader field of sport customer behavior and marketing. This paper proposes using the Apriori algorithm of association rules, and clustering analysis based on an ontology-based data mining approach, for mining customer knowledge from the database. Knowledge extracted from data mining results is illustrated as knowledge patterns, rules, and maps in order to propose suggestions and solutions to the case firm, Taiwan Adidas, for possible product promotion and sport marketing.

    View all citing articles on Scopus

    Recommended articles (6)

    • Research article

      Non-linear stability and remote unconnected equilibria of shallow arches with asymmetric geometric imperfections

      International Journal of Non-Linear Mechanics, Volume 77, 2015, pp. 1-11

      This paper presents an analytical method to study the non-linear stability and remote unconnected equilibria of shallow arches with non-symmetric geometric imperfections. The exact solutions of the equilibria and critical loads are obtained. Unlike many previous studies, these solutions can be applied to arbitrary shallow arches with arbitrary geometric imperfections. It is found that slightly imperfect arches have multiple remote unconnected equilibria that cannot be obtained in experiments or using finite element simulations if a proper perturbation is not performed. The formulas to directly calculate the critical loads, including those of the remote unconnected equilibria, are also derived. The effect of asymmetric geometric imperfections on the equilibria and critical loads is revealed by applying the derived formulas to half-sine arches with different geometric imperfections.

    • Research article

      Molecular evolution and epidemiology of echovirus 6 in Finland

      Infection, Genetics and Evolution, Volume 16, 2013, pp. 234-247

      Echovirus 6 (E-6) (family Picornaviridae, genus Enterovirus) is one of the most commonly detected enteroviruses worldwide. The aim of this study was to determine molecular evolutionary and epidemiologic patterns of E-6. A complete genome of one E-6 strain and the partial VP1 coding regions of 169 strains were sequenced and analyzed along with sequences retrieved from the GenBank. The complete genome sequence analysis suggested complex recombination history for the Finnish E-6 strain. In VP1 region, the phylogenetic analysis suggested three major clusters that were further divided to several subclusters. The evolution of VP1 coding region was dominated by negative selection suggesting that the phylogeny of E-6 VP1 gene is predominantly a result of synonymous substitutions (i.e. neutral genetic drift). The partial VP1 sequence analysis suggested wide geographical distribution for some E-6 lineages. In Finland, multiple different E-6 lineages have circulated at the same time.

    • Research article

      Application of the Contact Layer in the Solution of the Problem of Bending the Multilayer Beam

      Procedia Engineering, Volume 153, 2016, pp. 59-65

      The article deals with solution for stress-strain state of multilayer composite beams of rectangular cross-section, which is bended by the normally distributed load . The interaction between layers is accomplished by the contact layer, in which the substances of adhesive and substrata are mixed.

    • Research article

      A new ant based distributed framework for urban road map updating from high resolution satellite imagery

      Computers & Geosciences, Volume 54, 2013, pp. 337-350

      Receiving updated information about the network of roads from high resolution satellite imagery is a crucially important issue in continuously changing developing urban regions. Considering experiences in road extraction and also exploiting distributed evolutionary computational approaches, in this paper a new framework for road map updating from remotely sensed data is proposed. Three main computational entities of ant-agent, seed extractor and algorithm library are designed and road map updating is performed through three main stages of verification of the old map, extraction of possible roads and grouping of the results of both stages. Extracting corresponding pixels to each road element in the map, an object level supervised classification or any available road verification algorithm from the library capable of producing a road likeliness value is applied. Since road extraction is a simple and also a complex problem, more comprehensive algorithms are chosen from library iteratively by ant-agents so the decision about verification and rejection of each road element is finally made. Ant-agents facilitate choosing road elements and moving of ant agents via stigmergic communication by pheromone cast and evaporation. The proposed method is developed and tested using GeoEye-1 pan-sharpen imagery and 1:2000 corresponding digital vector map of the region. As observed, the results are satisfactory in terms of detection, verification and extraction of roads and generation of the updated map specifically in case of inspection of main roads. Besides, some missed road items are reported in case of inspection of bystreets and alleys specially when situated at the margin of the image. Completeness, correctness and quality measures are computed for evaluation of the initial and the resulted updated maps. The computed measures verify the improvement of the updated map.

    • Research article

      On the systematic development of fast fuzzy vector quantization for grayscale image compression

      Neural Networks, Volume 36, 2012, pp. 83-96

      In this paper we propose a learning mechanism to systematically design fast fuzzy clustering-based vector quantizers. Although the utilization of fuzzy clustering in vector quantization is able to reduce the dependence on initialization, it finally obtains high computational cost. This problem has been investigated by many researchers. So far, the most widely used solution is to equip the quantizer with specialized strategies for the smooth transition from fuzzy to crisp conditions. Hereby, we propose an enhanced solution to that problem. In our contribution we combine three different learning modules. The first one concerns the reduction of the number of codewords that are affected by a specific training pattern. The second one acts to reduce the number of training patterns involved in the design process. The sequential implementation of the above two modules manages to significantly reduce the computational cost of the quantizer. However, the potential risk related to the implementation of the first module is the high probability to generate small and badly delineated clusters. To handle this problem we apply, in the third module, a novel cluster distortion equalization process, according to which the codewords of small clusters are moved to the neighborhood of large ones in order to increase their size and become more competitive, obtaining a better local minimum. The proposed algorithm is rigorously evaluated and compared to other sophisticated methods in terms of grayscale image compression.

    • Research article

      Sequential pattern mining of geo-tagged photos with an arbitrary regions-of-interest detection method

      Expert Systems with Applications, Volume 41, Issue 7, 2014, pp. 3514-3526

      Geo-tagged photos leave trails of movement that form trajectories. Regions-of-interest detection identifies interesting hot spots where many trajectories visit and large geo-tagged photos are uploaded. Extraction of exact shapes of regions-of-interest is a key step to understanding these trajectories and mining sequential trajectory patterns. This article introduces an efficient and effective grid-based regions-of-interest detection method that is linear to the number of grid cells, and is able to detect arbitrary shapes of regions-of-interest. The proposed algorithm is combined with sequential pattern mining to reveal sequential trajectory patterns. Experimental results reveal quality regions-of-interest and promising sequential trajectory patterns that demonstrate the benefits of our algorithm.

    Daniel A. Keim received the Ph.D. degree in Computer Science from the University of Munich in 1994. He is working in the area of information visualization and data mining. In the field of information visualization, he developed several novel techniques which use visualization technology to explore large databases. He has published on information visualization and data mining; he has given tutorials on related issues at several large conferences including Visualization, SIGMOD, VLDB, and KDD; he was program cochair of the IEEE Information Visualization Symposia in 1999 and 2000 and General Symposium Chair in 2003; he was Program Cochair of the ACM SIGKDD conference in 2002; and he is an Editor of the IEEE Transactions on Visualization and Computer Graphics, IEEE Knowledge and Data Engineering and the Palgrave Information Visualization Journal. He has been an Assistant Professor in the Computer Science Department of the University of Munich, an Associate Professor in the Computer Science Department of the Martin-Luther-University Halle, and he is a full Professor in the Computer Science Department of the University of Constance.

    Stephen C. North received the Ph.D. degree in Computer Science from Princeton University in 1986. He is head of Information Visualization Research at AT&T Labs, a group that studies novel interactive displays and high performance graphics for network visualization in the AT&T Infolab. His background is in software visualization, applied computational geometry, and the design of reusable software. He is one of the authors of graphviz, a widely used collection of open source programs for drawing and interacting with graph layouts. His other current technical interests include dynamic and large-scale graph layout, and spatial data transformation. He is a senior member of the IEEE and a member of the ACM.

    Christian Panse received the Master's Degree from the Martin-Luther-University Halle-Wittenberg, Germany, in 2001. He is currently pursing the Ph.D. degree in the Data Mining and Visualization Group at the University of Constance, Germany. His research interests include visual data mining on large spatial data and cartogram drawing. He is a member of the IEEE Computer Society.

    Mike Sips received the Master's Degree from the Martin-Luther-University Halle-Wittenberg, Germany, in 2001. He is currently pursing the Ph.D. degree in the Data Mining and Visualization Group at the University of Constance, Germany. His research interests include visual data mining on large spatial data and spatial data transformation as well as information visualization and advanced visual interfaces. He is a member of the IEEE Computer Society, ACM, and GI (the German Society for Informatics).

    View full text

    Copyright © 2004 Elsevier Ltd. All rights reserved.

    Top Articles
    Latest Posts
    Article information

    Author: Tuan Roob DDS

    Last Updated: 11/06/2022

    Views: 6486

    Rating: 4.1 / 5 (42 voted)

    Reviews: 89% of readers found this page helpful

    Author information

    Name: Tuan Roob DDS

    Birthday: 1999-11-20

    Address: Suite 592 642 Pfannerstill Island, South Keila, LA 74970-3076

    Phone: +9617721773649

    Job: Marketing Producer

    Hobby: Skydiving, Flag Football, Knitting, Running, Lego building, Hunting, Juggling

    Introduction: My name is Tuan Roob DDS, I am a friendly, good, energetic, faithful, fantastic, gentle, enchanting person who loves writing and wants to share my knowledge and understanding with you.