Research Overview
Distinguished Professor Emeritus · Computer and Information Science
Curriculum Vitae (PDF) Google ScholarIn his research, Jeff Vitter exploits the rich interdependence between computing theory and practice. Beginning with his thesis on coalesced hashing, a widely used search method, Dr. Vitter has made many contributions to the design and analysis of algorithms, using mathematical analysis and asymptotics to derive precise estimates for resource requirements.
You can electronically download several of Dr. Vitter’s more recent publications via his online publication library.
External Memory Algorithms
Focuses upon reducing the I/O communication bottleneck between fast internal memory and slow external storage (such as disk), which is important for a variety of database and other data-intensive applications. His 2008 book, Algorithms and Data Structures for External Memory, is a reference for the field. His approach for utilizing parallel independent disks, using the notion of read/write duality, has led to state-of-the-art sorting methods. He has contributed to algorithm engineering via the TPIE system (Transparent Parallel I/O programming Environment) developed by a former student.
Compressed Data Structures
The goal is to operate directly upon compressed representations of data and still achieve fast response time. Dr. Vitter co-developed the wavelet tree data structure (not to confuse with wavelets discussed two paragraphs below), which is an elegant structure for coding sequences from a multicharacter alphabet; it is a key component in modern indexing and compression. Until this century, fast data structures for text indexing (such as suffix trees and suffix arrays) required much more space than the data being indexed! Based upon a recursive decomposition of the suffix array, Dr. Vitter and colleagues invented the compressed suffix array — the first fast index provably shown to require only linear space, and then later the first ever whose size per character was proven to be asymptotically equal to the higher-order entropy of the text (i.e., with constant of proportionality 1). The index can reconstruct the original text in a random access manner, and thus the original text can be discarded. The net effect is that the text can be completely replaced by an index structure that is fast to query and has small size.
Data Compression
Dr. Vitter has done fundamental work on data compression for text, images, and video. He is noted for his analytical bent. A provably efficient algorithm for adaptive Huffman coding bears his name. With a former student, Dr. Vitter developed and analyzed practical methods for arithmetic coding. They invented the FELICS algorithm for lossless image compression, later implemented in hardware as part of NASA’s Mars Reconnaissance Orbiter. It introduced a low-cost prediction framework that influenced algorithms adopted into the Lossless JPEG standard. In video compression, Dr. Vitter and group proposed the paradigm of minimizing the combined measure of rate plus distortion to significantly improve motion estimation coding; rate-distortion optimization has since been incorporated into the H.264/MPEG-4 AVC standard’s reference encoder, used widely in the computing and communications industry.
Database
Dr. Vitter and collaborators were the first in the database and systems communities to apply wavelets and compression techniques as key tools for summarizing, analyzing, approximating, and predicting data. Wavelets have since become widely used in database optimization, data warehousing, data analytics, data streams, and data mining. For his work on wavelets, he and coauthor received the 2009 ACM SIGMOD Test of Time award, which recognizes the SIGMOD paper from 10 years earlier that had the most subsequent impact in research, products, and methodology. Dr. Vitter co-developed novel machine learning and prediction mechanisms based upon data compression, using the principle that the more compressible a sequence is, the more predictable it is. His universal prediction algorithms for online prefetching are provably asymptotically optimal (i.e., with constant of proportionality 1). They predict as well as special-purpose methods tuned to the characteristics of the sequence. His learning work includes algorithms for prefetching, caching, data streams, database query optimization, data mining, and power management in mobile devices.
Research Funding
For a complete list of Dr. Vitter's funded research see his curriculum vitæ. The following is his most recent funding awards.
National Natural Science Foundation of China · "High-Order Entropy-Compressed Indexes and Retrieval for Pan-Genome" · Xidian University · PI: H. Huo, Co-PI: J. S. Vitter · 540,000 RMB
National Science Foundation · "Pattern Matching for Massive Data Sets" · University of Kansas (subcontracted from Louisiana State University) · Co-PIs: R. T. Shah, J. S. Vitter · $235,773 (part of $500,000 collaborative grant)
National Science Foundation · "Performance Models and Systems Optimization for Disk-Bound Applications" · Purdue University · Co-PIs: M. S. Thottethodi, J. S. Vitter, R. T. Shah, T. N. Vijaykumar, V. S. Pai · $889,788
National Science Foundation · "Entropy-Compressed Data Structures" · Purdue University · $255,000
Army Research Office · "External Memory Algorithms: Dealing with MASSIVE Data" · Purdue University · $107,529
IBM · "Dynamic Optimization in Databases and Information Systems" · Purdue University · $40,000
Army Research Office · "External Memory Algorithms: Dealing with MASSIVE Data" · Duke University and Purdue University · $90,985
Microsoft Research · "Interactive Research/Teaching Classroom" · Duke University · Co-PIs: J. S. Vitter, R. Lucic, L. Arge, O. L. Astrachan, J. S. Chase, C. S. Ellis, D. Ramm, S. Rodger, A. Vahdat · $1,191,470
National Science Foundation · "Algorithms for Active Storage" · Duke University · Co-PIs: J. S. Vitter, J. S. Chase · $252,000
National Science Foundation · "External Memory Algorithms: Dealing with MASSIVE Data" · Duke University and Purdue University · $290,000