Duration: 4 Days
Apache HBase is a distributed, scalable, NoSQL database built on Apache Hadoop. HBase can store data in massive tables consisting of billions of rows and millions of columns, serve data to many users and applications in real time, and provide fast, random read/write access to users and applications.
What You Will Learn
- Use cases and usage occasions for HBase, Hadoop, and RDBMS
- Using the HBase shell to directly manipulate HBase tables
- Designing optimal HBase schemas for efficient data storage and recovery
- Connect to HBase using the Java API, configure the HBase cluster, and administer an HBase cluster
- Best practices for identifying and resolving performance bottlenecks
Audience
This course is appropriate for developers and administrators who intend to use HBase. Prior experience with databases and data modeling is helpful, but not required. Prior knowledge of Java is helpful. Prior knowledge of Hadoop is not required, but Cloudera Developer Training for Apache Hadoop provides an excellent foundation for this course.
Prerequistes
- Familiarity with Hadoop's architecture and APIs
- Experience writing basic applications
- Prior programming experience, preferably Java
- Experience with databases and data modeling is helpful, but it is not required
- Cloudera Developer Training for Apache Hadoop provides an excellent foundation for this course.
Course Outline
1. Introduction to Hadoop
- What Is Big Data?
- Hadoop
- Hadoop Components
2. Introduction to HBase
- What Is HBase?
- Why Use HBase?
- HBase and RDBMS
- The Give and Take of HBase
3. HBase Concepts
4. The HBase Administration API
- HBase Shell
- Creating Tables
- HBase Jave API
- Administration Calls
5. Accessing Data with the HBase API
- API Usage
- Getting Data from the Shell, Java API, and Thrift API
- Adding and Updating Data in the Shell
- Driving Data from the Shell, Java API, and Thrift API
- Adding and Updating Data with the API
- The Scan API
- Advanced API
- Working with Eclipse
6. HBase Architecture
- Cluster Components
- How HBase Scales
- HBase Write Paths
- HBase Read Paths
- Compactions and Splits
7. Installation and Configuration
- HBase Installation
- Hardware Considerations
- HBase Configuration
- MapReduce and HBase Clusters
- Replication and Disaster Recovery
8. Row Key Design in HBase
- From RDBMS to HBase Schema Design
- Application-Centric Design
- Row Key Design
9. Schema Design in HBase
- Column Families
- Schema Design Considerations
- Hotspotting
10. The HBase Ecosystem
- OpenTSDB
- Kiji
- HBase and Hive
Course Labs