HyperAIHyperAI

Introduction to Gear

Introduction to the usage of HyperAI Gear

Execution Model of Gear

A Gear, also referred to as a computing container, functions as a versatile computing unit capable of handling a wide array of tasks, including data preprocessing, machine learning model training, and inference on unlabeled data using existing models. Their composition includes the following key elements:

  1. Basic hardware configuration, currently including four major elements: CPU, GPU, memory, and disk, specified through [compute type](/docs/getting-started/concepts#Computing Resource).
  2. Basic runtime environment, mainly selecting the desired deep learning framework and its supporting dependencies, specified through image. For specific dependency lists, refer to Runtime Environment.
  3. Required code and data, provided by binding data, binding the "working directory" of other container executions, or directly uploading code.

Each execution of a container will reallocate storage and save the data stored in that execution of the container. Therefore, each execution under a container is independent. Combined with tools such as custom parameters, if used properly, this can greatly improve the reproducibility of machine learning experiments. However, without a good understanding of these concepts, it may lead to data being copied back and forth between executions, which not only slows down container execution but also greatly increases unnecessary data storage overhead.

Container Creation

Containers currently support two access methods: "Task" and "Workspace". The default working directory for both is the /output directory in the system (a soft link is also set at /hyperai/home, meaning /hyperai/home and /output are the same directory).

A container can create multiple "executions", each of which is an independent container that can have independent compute configuration and image. After each "execution" is closed, the contents under its working directory /hyperai/home will be saved and can be viewed through the "Working Directory" tab on the page.

Note

When a container executes, its working directory is /hyperai/home, so references to other data repositories /hyperai/input/input0-4 need to use absolute paths, while uploaded code can be executed using relative paths.

For example, when creating a "Task", if you upload a file named train.py that needs to read data from the /hyperai/input/input0 directory, you must use the absolute path /hyperai/input/input0/<file content> to reference it during execution. However, if you need to execute the file directly, you can use the relative path python train.py instead of python /hyperai/home/train.py.

Data Binding

See Container Data Binding.

Workspace

Workspace is an interactive runtime environment we developed based on JupyterLab. The first supported programming language is Python, which has now become the default working environment for many data scientists. By accessing computing containers through Jupyter Workspace, you can use computing resources just as you would in any other environment.

Workspace supports two environments: Notebook and Lab. We support Lab by default. If you don't know how to use Jupyter Workspace yet, you can refer to its documentation. Here we won't exhaustively cover all aspects of using Jupyter Workspace, but rather emphasize several key features of Jupyter Workspace under HyperAI.

For more information, see Jupyter Workspace.

Continue Execution

If the access method is set to “Task”, to facilitate users creating new "executions" based on execution history, HyperAI currently provides a "Continue Execution" option.

HyperAI will do the following for us:

  1. Bind the data repositories that were bound in the previous "execution" to the same locations
  2. Bind the same code as well
  3. Bind the "working directory" of the previous "execution" to the /hyperai/home directory

Note

In HyperAI, you can bind a computing container's "working directory" to a new container to achieve a "pipeline" effect. Here we are using the "working directory" from a previous "model training" as input for a "model inference" task. However, this usage pattern copies the entire "working directory" from the previous execution to the new container, which will double the storage used. Therefore, if you don't need to perform write operations on the previous execution's "working directory", it's recommended to bind it to the "input0-4" directories, which will link the data to the new container in read-only mode without generating additional data usage.

To delete a specific execution, go to the Execution History page, click the “···” at the bottom-right corner of that execution, and select “Delete This Execution”.

Scenarios Where Code is Modified After Selecting Continue Execution

"Continue Execution" is intended to facilitate users continuing previous training with unchanged code. Special attention is needed if code is updated in a "Continue Execution" scenario.

After clicking "Continue Execution", if you try to upload new code here, it may conflict with the code in the "working directory from the previous execution" that is currently bound. For example, in the previous execution, we already uploaded a file named main.py, which has been saved to the "working directory" of that previous execution. If you upload a modified file also named main.py, HyperAI will ignore this modification and keep the existing file.

Therefore, if you use "Continue Execution" and find that the executed code is inconsistent with your expectations, it may be because the uploaded code content was overwritten by the working directory bound to the previous container. If you want to avoid this situation, you can modify the binding directory of the default "Working Directory from Last Execution".

How to Accelerate Container Startup

Saving a large number of files in the container's working directory (/hyperai/home or /output, which are equivalent) will affect the container's startup speed, especially copying a large number of small files can be very time-consuming. When the container starts and enters the data copying process, the execution status will change to "Synchronizing Data" and display the corresponding synchronization speed.

You can create separate "Data Repositories" for data or models and bind them to /input0-4 through data binding to avoid the copying process. You can see how to create a new dataset version from the container's "Working Directory" in Container Working Directory - Create Working Directory as Dataset Version.

Set Notifications

Currently, HyperAI provides email notifications, which can be managed in the “Notification Settings” section on the left.

Using Task and Workspace Modes Together

workspace mode is suitable for immediate file execution and modification, but its computational resource usage efficiency is not high - resources are often wasted during user editing and debugging. Task method executes Python code immediately after the container starts running, making efficient use of computational resources, but it is very troublesome to modify, requiring code to be re-uploaded with each update.

Therefore, it is recommended to first create a workspace in low-computation mode (CPU computation), ensure the code can execute normally, then close the resources and download the "Working Directory". Then create a GPU-computation container in Task mode, upload the downloaded code, and execute the script.

Converting .ipynb Files to .py Files

Select "File" - "Export Notebook As..." - "Export Notebook to Executable Script" to download the current ipynb file in py format to your local machine. Then drag it to the file directory on the left side of the Jupyter workspace to upload the file to the container again:

You can see that the code snippets from the ipynb file are now concatenated and saved in the .py file.

Make Container Public

Containers are created as "Private Containers" by default. In the container's Settings interface, you can set the container as public. For security reasons, public containers only allow all registered users to see container executions that have been closed.

Container Termination

Containers can be terminated at any stage of execution, but please note that terminating a container may result in some data results not being synchronized successfully. Please confirm the integrity of the current data in the container's "Working Directory" tab before terminating the container.

Container Deletion

After a container finishes execution, it will automatically release the computing resources it occupied. However, typically, some files will be saved after the container completes execution for other uses. The working directory will occupy the user's storage resources. If you determine that the entire "container's" data is no longer needed, you can delete the entire container in the "Settings" tab of the "Container". After the container is deleted, all user storage resources occupied by the container will be released.

Danger

This operation is very dangerous. Data deleted from the container cannot be recovered!