Introduction to Data Warehouse
Currently, there are two categories under Data Warehouse:
- Datasets: All data except model-related content can be placed here
- Models: Used to store model files, code used in conjunction with model files, etc.
Creating a Data Warehouse
The two types of data warehouses have separate entry points for creation.
Creating a Dataset
Click the “+” button on the right side of the dataset to Create Datasets.
Creating a Model
Click the “+” button on the right side of the Model to Create Models.
As both are data warehouses, projects with the same name cannot appear under both "Models" and "Datasets".
Switching Data Warehouse Types
On the "Settings" page, you can switch the type of data warehouse:
Select the Data Type, which can be switched to Model or Dataset.
Copying Between Data Warehouses
To facilitate user dataset management, besides allowing creating a working directory as a data warehouse version, it also allows recreating a subdirectory of a data warehouse as a data warehouse version:
Click "Copy the Current Directory to Data Warehouse" in a directory of a data warehouse version to select a specified dataset, choosing either "Add to Existing Data Warehouse", "Create Dataset", or "Create Model".
- "Add to Existing Data Warehouse" will add the current data warehouse's subdirectory to the selected existing dataset.
- "Create Dataset" will create a new dataset version in the target dataset from the current data directory.
- "Create Model" will create a new model version in the target model from the current data directory.
During the copying or creation process, the new dataset version will be marked as "Copying data" status. After copying is complete, the dataset version will be marked as "Processed" and ready for use.
Adding README.md File to Data Warehouse
Each model warehouse version can provide a file named README.md to provide some description of that model warehouse version. This file will be displayed on the model warehouse version page.
Making Data Warehouse Public
Created data warehouses are "Personal Visibility" by default. On the data warehouse's "Settings" page, you can set the entire data warehouse as a "Public Resource". All registered users can access this data warehouse through the URL.
The number of "Public Datasets" each person can create is limited. This limit can be viewed in "Quota Usage" - "Quota Limits" - "Public Datasets".