Apache Hive

Choose Index below for a list of all words and phrases defined in this glossary.

Apache Hive - Apache Hive is an open-source data warehouse system for querying and analyzing large datasets stored in Hadoop files. Hadoop is a framework for handling large datasets in a distributed computing environment.

Hive has three main functions: data summarization, query and analysis. It supports queries expressed in a language called HiveQL, which automatically translates SQL-like queries into MapReduce jobs executed on Hadoop. In addition, HiveQL supports custom MapReduce scripts to be plugged into queries. Hive also enables data serialization/deserialization and increases flexibility in schema design by including a system catalog called Hive-Metastore.

According to the Apache Hive wiki, "Hive is not designed for OLTP workloads and does not offer real-time queries or row-level updates. It is best used for batch jobs over large sets of append-only data (like web logs)."

Hive supports text files (also called flat files), SequenceFiles (flat files consisting of binary key/value pairs) and RCFiles (Record Columnar Files which store columns of a table in a columnar database way.)

Related glossary terms: In-Memory Data Grid (IMDG), Apache HBase

[Category=Data Management ]

Source: WhatIs.com, 05 July 2013 09:13:37, http://whatis.techtarget.com/glossary/Data-and-Data-Management

These advertisers support this free service

Hive - A SQL-like query and data warehouse engine.

[Category=Big Data ]

Source: DataInformed, 31 October 2013 09:06:30, http://data-informed.com/glossary-of-big-data-terms/

Apache Hive

Apache Hive - definitions