article

Hadoop GIS: a high performance spatial data warehousing system over mapreduce

Authors:
Ablimit Aji

Department of Mathematics and Computer Science, Emory University

Department of Mathematics and Computer Science, Emory University
View Profile

,
Fusheng Wang

Department of Biomedical Informatics, Emory University

Department of Biomedical Informatics, Emory University
View Profile

,
Hoang Vo

Department of Mathematics and Computer Science, Emory University

Department of Mathematics and Computer Science, Emory University
View Profile

,
Rubao Lee

Department of Computer Science and Engineering, The Ohio State University

Department of Computer Science and Engineering, The Ohio State University
View Profile

,
Qiaoling Liu

Department of Mathematics and Computer Science, Emory University

Department of Mathematics and Computer Science, Emory University
View Profile

,
Xiaodong Zhang

Department of Computer Science and Engineering, The Ohio State University

Department of Computer Science and Engineering, The Ohio State University
View Profile

,
Joel Saltz

Department of Biomedical Informatics, Emory University

Department of Biomedical Informatics, Emory University
View Profile

Proceedings of the VLDB Endowment Volume 6 Issue 11pp 1009–1020https://doi.org/10.14778/2536222.2536227

Published:01 August 2013Publication History

Proceedings of the VLDB Endowment

Abstract

Support of high performance queries on large volumes of spatial data becomes increasingly important in many application domains, including geospatial problems in numerous fields, location based services, and emerging scientific applications that are increasingly data- and compute-intensive. The emergence of massive scale spatial data is due to the proliferation of cost effective and ubiquitous positioning technologies, development of high resolution imaging technologies, and contribution from a large number of community users. There are two major challenges for managing and querying massive spatial data to support spatial queries: the explosion of spatial data, and the high computational complexity of spatial queries. In this paper, we present Hadoop-GIS - a scalable and high performance spatial data warehousing system for running large scale spatial queries on Hadoop. Hadoop-GIS supports multiple types of spatial queries on MapReduce through spatial partitioning, customizable spatial query engine RESQUE, implicit parallel spatial query execution on MapReduce, and effective methods for amending query results through handling boundary objects. Hadoop-GIS utilizes global partition indexing and customizable on demand local spatial indexing to achieve efficient query processing. Hadoop-GIS is integrated into Hive to support declarative spatial queries with an integrated architecture. Our experiments have demonstrated the high efficiency of Hadoop-GIS on query response and high scalability to run on commodity clusters. Our comparative experiments have showed that performance of Hadoop-GIS is on par with parallel SDBMS and outperforms SDBMS for compute-intensive queries. Hadoop-GIS is available as a set of library for processing spatial queries, and as an integrated software package in Hive.

References

Geocouch. https://github.com/couchbase/geocouch/.Google Scholar
neo4j/spatial. https://github.com/neo4j/spatial.Google Scholar
The sloan digital sky survey project (sdss). http://www.sdss.org.Google Scholar
Spatial index library. http://libspatialindex.github.com.Google Scholar
Spatialhadoop. http://spatialhadoop.cs.umn.edu/.Google Scholar
Geos. http://trac.osgeo.org/geos, 2013.Google Scholar
Hadoop-GIS wiki. https://web.cci.emory.edu/confluence/display/hadoopgis, 2013.Google Scholar
Openstreetmap. http://www.openstreetmap.org, 2013.Google Scholar
Pathology analytical imaging standards. https://web.cci.emory.edu/confluence/display/PAIS, 2013.Google Scholar
A. Abouzeid, K. Bajda-Pawlikowski, D. Abadi, A. Silberschatz, and A. Rasin. Hadoopdb: an architectural hybrid of mapreduce and dbms technologies for analytical workloads. Proc. VLDB Endow., 2:922-933, August 2009. Google Scholar
A. Aji. High performance spatial query processing for large scale scientific data. In Proceedings of the on SIGMOD/PODS 2012 PhD Symposium, pages 9-14, New York, NY, USA, 2012. ACM. Google Scholar
A. Aji, F. Wang, and J. H. Saltz. Towards building a high performance spatial query system for large scale medical imaging data. In SIGSPATIAL/GIS, pages 309-318, 2012. Google Scholar
A. Akdogan, U. Demiryurek, F. Banaei-Kashani, and C. Shahabi. Voronoi-based geospatial query processing with mapreduce. In CLOUDCOM, pages 9-16, 2010. Google Scholar
N. Beckmann, H. Kriegel, R. Schneider, and B. Seeger. The r*-tree: An efficient and robust access method for points and rectangles. In SIGMOD, 1990. Google Scholar
J. V. d. Bercken and B. Seeger. An evaluation of generic bulk loading techniques. In VLDB, pages 461-470, 2001. Google Scholar
S. Blanas, J. M. Patel, V. Ercegovac, J. Rao, E. J. Shekita, and Y. Tian. A comparison of join algorithms for log processing in mapreduce. In SIGMOD, 2010. Google Scholar
T. Brinkhoff, H.-P. Kriegel, and B. Seeger. Parallel processing of spatial joins using r-trees. In ICDE, 1996. Google Scholar
A. Cary, Z. Sun, V. Hristidis, and N. Rishe. Experiences on processing spatial data with mapreduce. In SSDBM, pages 302-319, 2009. Google Scholar
R. Chaiken, B. Jenkins, P. Larson, B. Ramsey, D. Shakib, S. Weaver, and J. Zhou. SCOPE: easy and efficient parallel processing of massive data sets. PVLDB, 1(2):1265-1276, 2008. Google Scholar
J. Dean and S. Ghemawat. Mapreduce: a flexible data processing tool. Commun. ACM, 53(1):72-77, 2010. Google Scholar
A. Gates, O. Natkovich, S. Chopra, P. Kamath, S. Narayanam, C. Olston, B. Reed, S. Srinivasan, and U. Srivastava. Building a high level dataflow system on top of MapReduce: The Pig experience. PVLDB, 2(2):1414-1425, 2009. Google Scholar
H. Gupta, B. Chawda, S. Negi, T. A. Faruquie, L. V. Subramaniam, and M. Mohania. Processing multi-way spatial joins on map-reduce. In EDBT, pages 113-124, 2013. Google Scholar
I. Kamel and C. Faloutsos. Hilbert r-tree: An improved r-tree using fractals. In VLDB, pages 500-509, 1994. Google Scholar
R. Lee, T. Luo, Y. Huai, F. Wang, Y. He, and X. Zhang. Ysmart: Yet another sql-to-mapreduce translator. In ICDCS, 2011. Google Scholar
M.-L. Lo and C. V. Ravishankar. Spatial hash-joins. In SIGMOD, pages 247-258, 1996. Google Scholar
M. A. Nieto-Santisteban, A. R. Thakar, and A. S. Szalay. Cross-matching very large datasets. In NSTC NASA Conference, 2007.Google Scholar
C. Olston, B. Reed, U. Srivastava, R. Kumar, and A. Tomkins. Pig latin: a not-so-foreign language for data processing. In SIGMOD, 2008. Google Scholar
J. Patel, J. Yu, N. Kabra, K. Tufte, B. Nag, J. Burger, N. Hall, K. Ramasamy, R. Lueder, C. Ellmann, J. Kupsch, S. Guo, J. Larson, D. De Witt, and J. Naughton. Building a scaleable geo-spatial dbms: technology, implementation, and evaluation. In SIGMOD, SIGMOD '97, pages 336-347, 1997. Google Scholar
A. Pavlo, E. Paulson, A. Rasin, D. J. Abadi, D. J. DeWitt, S. Madden, and M. Stonebraker. A comparison of approaches to large-scale data analysis. In SIGMOD, pages 165-178, 2009. Google Scholar
M. Stonebraker, D. J. Abadi, D. J. DeWitt, S. Madden, E. Paulson, A. Pavlo, and A. Rasin. Mapreduce and parallel dbmss: friends or foes? Commun. ACM, 53(1):64-71, 2010. Google Scholar
A. S. Szalay, G. Bell, J. vandenBerg, A. Wonders, R. C. Burns, D. Fay, J. Heasley, T. Hey, M. A. Nieto-Santisteban, A. Thakar, C. v. Ingen, and R. Wilton. Graywulf: Scalable clustered architecture for data intensive computing. In HICSS, pages 1-10, 2009. Google Scholar
A. Thusoo, J. S. Sarma, N. Jain, Z. Shao, P. Chakka, S. Anthony, H. Liu, P. Wyckoff, and R. Murthy. Hive: a warehousing solution over a map-reduce framework. Proc. VLDB Endow., 2(2):1626-1629, Aug. 2009. Google Scholar
A. Thusoo, J. S. Sarma, N. Jain, Z. Shao, P. Chakka, S. Anthony, H. Liu, P. Wyckoff, and R. Murthy. Hive: a warehousing solution over a map-reduce framework. volume 2, pages 1626-1629, August 2009. Google Scholar
F. Wang, J. Kong, L. Cooper, T. Pan, K. Tahsin, W. Chen, A. Sharma, C. Niedermayr, T. W. Oh, D. Brat, A. B. Farris, D. Foran, and J. Saltz. A data model and database for high-resolution pathology analytical image informatics. J Pathol Inform, 2(1):32, 2011.Google Scholar
F. Wang, J. Kong, J. Gao, D. Adler, L. Cooper, C. Vergara-Niedermayr, Z. Zhou, B. Katigbak, T. Kurc, D. Brat, and J. Saltz. A high-performance spatial database based approach for pathology imaging algorithm evaluation. J Pathol Inform, 4(5), 2013.Google Scholar
K. Wang, Y. Huai, R. Lee, F. Wang, X. Zhang, and J. H. Saltz. Accelerating pathology image data cross-comparison on cpu-gpu hybrid systems. Proc. VLDB Endow., 5(11):1543-1554, July 2012. Google Scholar
Y. Xu, P. Kostamaa, and L. Gao. Integrating hadoop and parallel dbms. In SIGMOD, pages 969-974, 2010. Google Scholar
S. Zhang, J. Han, Z. Liu, K. Wang, and Z. Xu. Sjmr: Parallelizing spatial join with mapreduce on clusters. In CLUSTER, 2009.Google Scholar
Y. Zhong, J. Han, T. Zhang, Z. Li, J. Fang, and G. Chen. Towards parallel spatial query processing for big spatial data. In IPDPSW, pages 2085-2094, 2012. Google Scholar
X. Zhou, D. J. Abel, and D. Truffet. Data partitioning for parallel spatial join processing. Geoinformatica, 2:175-204, June 1998. Google Scholar

Index Terms

Hadoop GIS: a high performance spatial data warehousing system over mapreduce
1. Information systems
  1. Data management systems
    1. Database management system engines
      1. Database query processing
  2. Information systems applications
    1. Spatial-temporal systems
2. Theory of computation
  1. Theory and algorithms for application domains
    1. Database theory
      1. Database query processing and optimization (theory)

Index terms have been assigned to the content through auto-classification.

Recommendations

Demonstration of Hadoop-GIS: a spatial data warehousing system over MapReduce
SIGSPATIAL'13: Proceedings of the 21st ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems

The proliferation of GPS-enabled devices, and the rapid improvement of scientific instruments have resulted in massive amounts of spatial data in the last decade. Support of high performance spatial queries on large volumes data has become increasingly ...
Read More
Practical Hadoop Ecosystem: A Definitive Guide to Hadoop-Related Frameworks and Tools
Read More
Hadoop Essentials - Tackling the Challenges of Big Data with Hadoop
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

Proceedings of the VLDB Endowment Volume 6, Issue 11
August 2013
237 pages
ISSN:2150-8097
Issue’s Table of Contents
Sponsors
In-Cooperation
Publisher
VLDB Endowment
Publication History
- Published: 1 August 2013
Published in pvldb Volume 6, Issue 11
Qualifiers
- article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 142
  Total Citations
  View Citations
- 2,118
  Total Downloads
- Downloads (Last 12 months)67
- Downloads (Last 6 weeks)6
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Hadoop GIS: a high performance spatial data warehousing system over mapreduce

Proceedings of the VLDB Endowment

Abstract

References

Cited By

Index Terms

Recommendations

Demonstration of Hadoop-GIS: a spatial data warehousing system over MapReduce

Practical Hadoop Ecosystem: A Definitive Guide to Hadoop-Related Frameworks and Tools

Hadoop Essentials - Tackling the Challenges of Big Data with Hadoop

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Hadoop GIS: a high performance spatial data warehousing system over mapreduce

Proceedings of the VLDB Endowment

Abstract

References

Cited By

Index Terms

Recommendations

Demonstration of Hadoop-GIS: a spatial data warehousing system over MapReduce

Practical Hadoop Ecosystem: A Definitive Guide to Hadoop-Related Frameworks and Tools

Hadoop Essentials - Tackling the Challenges of Big Data with Hadoop

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media