PySpark - Sparkling Water - Module 6.2

PySpark - Sparkling Water - Module 6.2

Installing H2O on WSL Ubuntu 22.04

In this guide, I’ll walk you through the steps to install H2O on a WSL (Windows Subsystem for Linux) Ubuntu 22.04 environment. We assume you already have Java 11 and Jupyter Notebook installed and running. By the end, you’ll be able to access the H2O web UI directly from your Windows machine.

Prerequisites

  • WSL running Ubuntu 22.04.
  • Java 11 installed (required for H2O).
  • Jupyter Notebook already installed and running.

Step 1: Install H2O on Ubuntu

First, we need to install the H2O Python module, which will allow us to run H2O directly within our Jupyter Notebook and also access the H2O web interface. Follow these steps:

  1. Update your system packages:

    sudo apt update && sudo apt upgrade -y
    
  2. Install Python dependencies: Ensure pip is installed and up-to-date:

    sudo apt install python3-pip -y
    pip3 install --upgrade pip
    
  3. Install the H2O Python library: Use the following commands to install the H2O Python library from the H2O repository:

    pip3 install h2o
    
  4. Verify the installation: To verify that H2O was successfully installed, open Python in your terminal and run the following commands:

    import h2o
    h2o.init()
    

    If the installation was successful, you should see a message indicating that H2O is running locally.

Step 2: Configuring H2O Web UI for Remote Access

Now, let’s configure the H2O web interface to be accessible from your Windows machine. H2Oโ€™s web interface runs on a specific port (default is 54321), and we’ll expose this port so that you can access it via a browser on Windows.

  1. Start H2O: Open Python in your WSL terminal, and start H2O using:

    import h2o
    h2o.init(bind_to_localhost=False, port=54321)
    
    • bind_to_localhost=False: This ensures that H2O listens on all available interfaces.
    • port=54321: This specifies the port on which H2O will run.
  2. Check WSL Network Settings: To find the IP address of your WSL instance, run:

    hostname -I
    

    This will return the IP address (e.g., 172.22.66.1) that we will use to access H2O from your Windows browser.

Step 3: Accessing H2O Web UI from Windows

With H2O running, you can now access the H2O web interface from your Windows machine.

  1. Open a browser on Windows (Chrome, Firefox, etc.).
  2. Enter the following URL in your browserโ€™s address bar, replacing <WSL_IP> with the IP address you retrieved earlier:
    http://<WSL_IP>:54321
    

For example, if your WSL IP is 172.22.66.1, you would enter:

http://172.22.66.1:54321

You should now see the H2O web interface, where you can manage models, datasets, and much more. Get yourself familiarized with the H2O web interface.

You might also want to explore the H2O Wave Web App for building AI web based applications. See the Youtube video bellow.

Or you might as well want to look into Wave Git Repo if you have interest in bulding ML web application.

Step 4: Running H2O in Jupyter Notebooks

Now that H2O is installed and accessible via the web interface, you can also run it in your Jupyter Notebook.

  1. Launch Jupyter Notebook from your WSL terminal:
jupyter notebook
  1. In a new notebook, use the following code to initialize H2O:
import h2o
h2o.init()

This will initialize H2O and you can start working with it within your Jupyter environment.

References: