Skip to content

CR8TOR CLI Commands Reference

CR8TOR CLI provides a comprehensive set of commands to manage the entire data access request (DAR) lifecycle, from project initiation to data publication.

Core Workflow Commands

Initiate Project

Initializes a new CR8 project using a specified cookiecutter template.

Parameters:

Name Type Description Default
template_path str

The GitHub URL or relative path to the cr8-cookiecutter template. This is prompted from the user if not provided.

required
push_to_github bool

Flag to indicate if the project should be pushed to GitHub. Defaults to False.

False
git_org str

The target GitHub organization name. Required if push_to_github is True.

None
checkout str

The branch, tag, or commit to checkout from the cookiecutter template.

None
project_name str

The name of the project to be created. If provided, cookiecutter will skip the prompt for other values.

None
environment str

The target environment (DEV, TEST, PROD). Defaults to "PROD".

'PROD'
cr8tor_branch str

For development and debugging. Specifies the GitHub cr8tor branch to be used in the orchestration layer.

None
runner_os str

The target runner OS for GitHub Actions workflows (Windows, Linux). Defaults to "Windows".

'Windows'

This command performs the following actions: - Generates a new project by applying the specified cookiecutter template. - Adds a timestamp to the context used by the template. - If push_to_github is True, creates a GitHub repository under the specified organization and pushes the generated project to GitHub using the personal access token (retrieved from os.getenv("GH_TOKEN")).

Example usage

cr8tor initiate -t https://github.com/lsc-sde-crates/cr8-cookiecutter

cr8tor initiate -t path-to-local-cr8-cookiecutter-dir

cr8tor initiate -t path-to-local-cr8-cookiecutter-dir -n "my-project" -org "lsc-sde-crates" --push

cr8tor initiate -t path-to-local-cr8-cookiecutter-dir -n "my-project" -org "lsc-sde-crates" -ros "Linux" --push

Source code in src/cr8tor/cli/initiate.py
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
@app.command(name="initiate")
def initiate(
    template_path: Annotated[
        str,
        typer.Option(
            default="-t",
            help="GitHub URL or relative path to cr8-cookiecutter template",
            prompt=True,
        ),
    ],
    push_to_github: Annotated[
        bool,
        typer.Option(
            "--push/--no-push",
            help="Flag to indicate if the project should be pushed to GitHub",
        ),
    ] = False,
    git_org: Annotated[
        str,
        typer.Option(
            "-org",
            help="Target github organisation name",
            hide_input=True,
        ),
    ] = None,
    checkout: Annotated[
        str,
        typer.Option(
            "-chk",
            help="Branch, tag or commit to checkout from cookiecutter template",
        ),
    ] = None,
    project_name: Annotated[
        str,
        typer.Option(
            "-n",
            help="Name of the project to be created. This is optional and can be provided as an argument.",
        ),
    ] = None,
    environment: Annotated[
        str,
        typer.Option(
            "-e",
            help="Target environment. Default PROD. Must be one of the three options: DEV, TEST, PROD.",
            case_sensitive=False,
            show_choices=True,
        ),
    ] = "PROD",
    cr8tor_branch: Annotated[
        str,
        typer.Option(
            "-cb",
            help="For developing and debugging. Provide the github cr8tor branch that should be used in orchestration layer.",
        ),
    ] = None,
    runner_os: Annotated[
        str,
        typer.Option(
            "-ros",
            help="Target runner OS for GitHub Actions workflows. Must be one of: Windows, Linux.",
            case_sensitive=False,
            show_choices=True,
        ),
    ] = "Windows",
):
    """
    Initializes a new CR8 project using a specified cookiecutter template.

    Args:
        template_path (str): The GitHub URL or relative path to the cr8-cookiecutter template.
                             This is prompted from the user if not provided.
        push_to_github (bool): Flag to indicate if the project should be pushed to GitHub. Defaults to False.
        git_org (str, optional): The target GitHub organization name. Required if `push_to_github` is True.
        checkout (str, optional): The branch, tag, or commit to checkout from the cookiecutter template.
        project_name (str, optional): The name of the project to be created. If provided, cookiecutter will skip the prompt for other values.
        environment (str): The target environment (DEV, TEST, PROD). Defaults to "PROD".
        cr8tor_branch (str, optional): For development and debugging. Specifies the GitHub cr8tor branch to be used in the orchestration layer.
        runner_os (str): The target runner OS for GitHub Actions workflows (Windows, Linux). Defaults to "Windows".

    This command performs the following actions:
    - Generates a new project by applying the specified cookiecutter template.
    - Adds a timestamp to the context used by the template.
    - If `push_to_github` is True, creates a GitHub repository under the specified organization and pushes the generated project to GitHub using the personal access token (retrieved from `os.getenv("GH_TOKEN")`).

    Example usage:
        cr8tor initiate -t https://github.com/lsc-sde-crates/cr8-cookiecutter

        cr8tor initiate -t path-to-local-cr8-cookiecutter-dir

        cr8tor initiate -t path-to-local-cr8-cookiecutter-dir -n "my-project" -org "lsc-sde-crates" --push

        cr8tor initiate -t path-to-local-cr8-cookiecutter-dir -n "my-project" -org "lsc-sde-crates" -ros "Linux" --push
    """
    valid_environments = ["DEV", "TEST", "PROD"]
    if environment.upper() not in valid_environments:
        raise typer.BadParameter(
            f"Invalid environment. Choose from {valid_environments}."
        )

    valid_runner_os = ["Windows", "Linux"]
    if runner_os not in valid_runner_os:
        raise typer.BadParameter(f"Invalid runner OS. Choose from {valid_runner_os}.")

    extra_context = {
        "__timestamp": datetime.now().isoformat(timespec="seconds"),
        "__cr8_cc_template": template_path,
        "environment": environment.upper(),
        "__github_cr8tor_branch": cr8tor_branch,
        "runner_os": runner_os,
    }

    # Generate the project with cookiecutter
    if project_name is not None:
        extra_context.update({"project_name": project_name})
        extra_context.update({"github_organization": git_org})
        try:
            project_dir = cookiecutter(
                template_path,
                checkout=checkout,
                extra_context=extra_context,
                no_input=True,
            )
        except OutputDirExistsException as e:
            log.info("Project directory already exists. Skipping creation...")
            # Extract folder name from exception message
            folder_name = re.search(r'"(.*?)"', str(e)).group(1)
            project_dir = Path.cwd() / folder_name
    else:
        try:
            project_dir = cookiecutter(
                template_path, checkout=checkout, extra_context=extra_context
            )
        except FailedHookException as e:
            # Extract error message from the exception
            error_msg = str(e)
            if "VALIDATION_ERROR:" in error_msg:
                validation_error = error_msg.split("VALIDATION_ERROR:")[1].strip()
                print(f"Validation failed: {validation_error}")
            else:
                print(f"Hook failed: {error_msg}")
            sys.exit(1)
        except OutputDirExistsException as e:
            log.info("Project directory already exists. Skipping creation...")
            # Extract folder name from exception message
            folder_name = re.search(r'"(.*?)"', str(e)).group(1)
            project_dir = Path.cwd() / folder_name

    resources_dir = Path(project_dir).joinpath("resources")
    project_resource_path = resources_dir.joinpath("governance", "project.toml")
    project_dict = project_resources.read_resource_entity(
        project_resource_path, "project"
    )
    project_info = schemas.ProjectProps(**project_dict)

    if push_to_github and git_org:
        repo_name = project_info.reference

        gh_client = gh_rest_api_client.GHApiClient(git_org)

        # Create the repository and push the project to GitHub
        gh_rest_api_client.create_and_push_project(gh_client, project_dir, repo_name)

        # Check and create contributor teams
        gh_rest_api_client.check_and_create_teams(gh_client, repo_name)

        # Create repository environments for Signing Off experience
        gh_rest_api_client.create_github_environments(gh_client, repo_name)
Warning

--push argument requires a fine-grained PAT token generated in GitHub. It must be stored under local environment variable GH_TOKEN. See minimum PAT token permissions defined here.

Create Project RO-Crate

Generates the initial RO-Crate data crate within the target Cr8tor project from the specified metadata resources.

This command performs the following actions: - Generates a UUID for the project. - Builds an RO-Crate along with an RO-Crate knowledge graph. - Packages the crate as a non-serialized BagIt Archive in the "bagit/" directory. - If the dryrun option is provided, prints the crate details without writing to the "crate/" directory.

Parameters:

Name Type Description Default
agent str

The agent label triggering the validation. Defaults to None.

None
resources_dir Path

Directory containing resources to include in the RO-Crate. Defaults to "./resources".

'./resources'
bagit_dir Path

Bagit directory containing the RO-Crate data directory. Defaults to "./bagit".

'./bagit'
config_file Path

Location of the configuration TOML file. Defaults to "./config.toml".

'./config.toml'
dryrun bool

If True, prints the crate details without writing to the "crate/" directory. Defaults to False.

False
Example usage

cr8tor create -a agent_label -i path-to-resources-dir -b path-to-bagit-dir -c path-to-config-file --dryrun

Source code in src/cr8tor/cli/create.py
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
@app.command(name="create")
def create(
    agent: Annotated[
        str,
        typer.Option(default="-a", help="The agent label triggering the validation."),
    ] = None,
    resources_dir: Annotated[
        Path,
        typer.Option(
            default="-i", help="Directory containing resources to include in RO-Crate."
        ),
    ] = "./resources",
    bagit_dir: Annotated[
        Path,
        typer.Option(
            default="-b", help="Bagit directory containing RO-Crate data directory"
        ),
    ] = "./bagit",
    config_file: Annotated[
        Path, typer.Option(default="-c", help="Location of configuration TOML file.")
    ] = "./config.toml",
    dryrun: Annotated[bool, typer.Option(default="--dryrun")] = False,
):
    """
    Generates the initial RO-Crate data crate within the target Cr8tor project from the specified metadata resources.

    This command performs the following actions:
    - Generates a UUID for the project.
    - Builds an RO-Crate along with an RO-Crate knowledge graph.
    - Packages the crate as a non-serialized BagIt Archive in the "bagit/" directory.
    - If the `dryrun` option is provided, prints the crate details without writing to the "crate/" directory.

    Args:
        agent (str): The agent label triggering the validation. Defaults to None.
        resources_dir (Path): Directory containing resources to include in the RO-Crate. Defaults to "./resources".
        bagit_dir (Path): Bagit directory containing the RO-Crate data directory. Defaults to "./bagit".
        config_file (Path): Location of the configuration TOML file. Defaults to "./config.toml".
        dryrun (bool): If True, prints the crate details without writing to the "crate/" directory. Defaults to False.

    Example usage:
        cr8tor create -a agent_label -i path-to-resources-dir -b path-to-bagit-dir -c path-to-config-file --dryrun
    """

    if agent is None:
        agent = os.getenv("AGENT_USER")

    exit_msg = "Create complete"
    exit_code = schemas.Cr8torReturnCode.SUCCESS

    create_start_dt = datetime.now()
    project_uuid: Annotated[
        str,
        "Project UUID is a unique auto-generated identifier on creation of the project",
    ] = os.getenv("PROJECT_UUID", str(uuid.uuid4()))

    if not resources_dir.exists():
        cli_utils.exit_command(
            schemas.Cr8torCommandType.CREATE,
            schemas.Cr8torReturnCode.ACTION_EXECUTION_ERROR,
            f"Missing resources directory at: {resources_dir}",
        )

    project_resource_path = resources_dir.joinpath("governance", "project.toml")
    governance = project_resources.read_resource(project_resource_path)

    if bagit_dir.exists():
        if "id" in governance["project"]:
            current_rocrate_graph = proj_graph.ROCrateGraph(bagit_dir)
            if current_rocrate_graph.is_project_action_complete(
                command_type=schemas.Cr8torCommandType.CREATE,
                action_type=schemas.RoCrateActionType.CREATE,
                project_id=governance["project"]["id"],
            ):
                cli_utils.exit_command(
                    schemas.Cr8torCommandType.CREATE,
                    schemas.Cr8torReturnCode.ACTION_EXECUTION_ERROR,
                    "Create command can only be run once on a project",
                )

    governance["project"].setdefault("id", project_uuid)
    governance["project"].setdefault(
        "project_start_time", create_start_dt.strftime("%Y%m%d_%H%M%S")
    )
    project_resources.update_resource_entity(
        project_resource_path, "project", governance["project"]
    )

    project_resources.create_resource_entity(project_resource_path, "actions", [])

    cli_utils.close_create_action_command(
        command_type=schemas.Cr8torCommandType.CREATE,
        start_time=create_start_dt,
        project_id=project_uuid,
        agent=agent,
        project_resource_path=project_resource_path,
        resources_dir=resources_dir,
        exit_msg=exit_msg,
        exit_code=exit_code,
        instrument=os.getenv("APP_NAME"),
        result=[{"@id": project_uuid}],
        dryrun=dryrun,
        config_file=config_file,
    )

Build RO-Crate Package

Builds the RO-Crate data crate for the target Cr8tor project using the specified metadata resources and configuration.

This command performs the following actions: - Reads the configuration from the specified TOML file. - Includes resources from the specified directory into the RO-Crate. - If the dryrun option is provided, prints the crate details without writing to the "crate/" directory.

Parameters:

Name Type Description Default
resources_dir Path

Directory containing resources to include in the RO-Crate. Defaults to "./resources".

'./resources'
config_file Path

Location of the configuration TOML file. Defaults to "./config.toml".

'./config.toml'
dryrun bool

If True, prints the crate details without writing to the "crate/" directory. Defaults to False.

False
Example usage

cr8tor build -i path-to-resources-dir -c path-to-config-file --dryrun

Source code in src/cr8tor/cli/build.py
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
@app.command(name="build")
def build(
    resources_dir: Annotated[
        Path,
        typer.Option(
            default="-i", help="Directory containing resources to include in RO-Crate."
        ),
    ] = "./resources",
    config_file: Annotated[
        Path, typer.Option(default="-c", help="Location of configuration TOML file.")
    ] = "./config.toml",
    dryrun: Annotated[bool, typer.Option(default="--dryrun")] = False,
):
    """
    Builds the RO-Crate data crate for the target Cr8tor project using the specified metadata resources and configuration.

    This command performs the following actions:
    - Reads the configuration from the specified TOML file.
    - Includes resources from the specified directory into the RO-Crate.
    - If the `dryrun` option is provided, prints the crate details without writing to the "crate/" directory.

    Args:
        resources_dir (Path): Directory containing resources to include in the RO-Crate. Defaults to "./resources".
        config_file (Path): Location of the configuration TOML file. Defaults to "./config.toml".
        dryrun (bool): If True, prints the crate details without writing to the "crate/" directory. Defaults to False.

    Example usage:
        cr8tor build -i path-to-resources-dir -c path-to-config-file --dryrun
    """
    ###############################################################################
    # 1 Validate project build materials (i.e. resources/ & config.toml)
    ###############################################################################

    config = project_resources.read_resource(config_file)

    if not resources_dir.exists():
        raise DirectoryNotFoundError(resources_dir)

    project_resource_path = resources_dir.joinpath("governance", "project.toml")
    governance = project_resources.read_resource(project_resource_path)

    access_resource_path = resources_dir.joinpath("access", "access.toml")
    access = project_resources.read_resource(access_resource_path)

    ###############################################################################
    # 2 Check mandatory user-defined elements (i.e. gov, access) exists before
    #  pydantic model validation on fields
    ###############################################################################

    governance_required_keys = {
        "project": f"To build ro-crate 'project' properties must be defined in resource: {project_resource_path}",
        "requesting_agent": f"To build ro-crate 'requesting_agent' properties must be defined in resource: {project_resource_path}",
        "repository": f"To build ro-crate 'repository' properties must be defined in resource: {project_resource_path}",
        "actions": f"To build ro-crate 'actions'list property must be defined in resource: {project_resource_path}",
    }

    access_required_keys = {
        "source": f"To build ro-crate source connection info is needed in resource: {access_resource_path}",
        "credentials": f"To build ro-crate connection credentials info is needed in resource: {access_resource_path}",
    }

    check_required_keys(governance, governance_required_keys)
    check_required_keys(access, access_required_keys)

    ###############################################################################
    # 3 Create initial Ro-Crate & build contextual entities
    ###############################################################################

    crate = ROCrate(gen_preview=True)

    #
    # Load project info and init RC 'Project' entity
    #

    project_props = s.ProjectProps(**governance["project"])
    log.info(
        f"[cyan]Creating RO-Crate for[/cyan] - [bold magenta]{project_props.name}[/bold magenta]",
    )

    project_entity = m.ContextEntity(
        crate=crate,
        identifier=project_props.id,
        properties={
            "@type": "Project",
            "name": project_props.name,
            "identifier": project_props.reference,
        },
    )
    crate.add(project_entity)

    #
    # Load requesting agent info and init RC 'Person' entity
    #
    requesting_agent_props = s.AgentProps(**governance["requesting_agent"])
    person_entity = m.Person(
        crate,
        identifier=f"requesting-agent-{project_props.id}",
        properties={
            "name": requesting_agent_props.name,
            "affiliation": {"@id": f"requesting-agent-org-{project_props.id}"},
        },
    )

    aff_entity = m.ContextEntity(
        crate,
        identifier=f"requesting-agent-org-{project_props.id}",
        properties={
            "@type": "Organisation",
            "name": requesting_agent_props.affiliation.name,
            "url": str(requesting_agent_props.affiliation.url),
        },
    )

    crate.add(aff_entity)
    crate.add(person_entity)

    # Relation definition for ro-crate metadata file only (i.e. not stored are managed in the resources)
    project_entity["memberOf"] = [{"@id": person_entity.id}]

    #
    # Load project repository info and init RC 'SoftwareSourceCode' entity
    #
    repo_props = s.SoftwareSourceCodeProps(**governance["repository"])

    repo_entity = m.ContextEntity(
        crate=crate,
        identifier=f"repo-{project_props.id}",
        properties={
            "@type": "SoftwareSourceCode",
            "name": repo_props.name,
            "description": repo_props.description,
            "codeRepository": f"{repo_props.codeRepository}cr8-{project_props.id}",
        },
    )

    crate.add(repo_entity)
    crate.metadata["isBasedOn"] = {"@id": f"repo-{project_props.id}"}

    #
    # Load access info and init RC entities
    #

    # contract_props = s.DataAccessContract(
    #     source=s.DatabricksSourceConnection(**access["source"]),
    #     credentials=s.SourceAccessCredential(**access["credentials"]),
    #     project_name=governance["project"]["project_name"],
    #     project_start_time=governance["project"]["project_start_time"],
    #     destination_type=governance["project"]["destination"]["type"],
    #     destination_name=governance["project"]["destination"]["name"],
    #     destination_format=governance["project"]["destination"]["format"],
    #     metadata=None
    # )
    # TODO: Identify and init any RC contextual entities for describing data access

    ###############################################################################
    # 4 Build data entities
    ###############################################################################

    #
    # Governance resources
    #

    crate.add_file(
        source=project_resource_path,
        dest_path="governance/project.toml",
        properties={
            "name": project_props.name,
            "description": project_props.description,
        },
    )

    log.info(
        msg="[cyan]Validated and added file[/cyan] - [bold magenta]governance/project.toml[/bold magenta]",
    )

    #
    # Metadata resources
    #

    for f in resources_dir.joinpath("metadata").glob("dataset*.toml"):
        dataset_dict = project_resources.read_resource(f)
        dataset_props = s.DatasetMetadata(**dataset_dict)

        crate.add_file(
            source=f,
            dest_path=f"metadata/{f.name}",
            properties={
                "name": dataset_props.name,
                "description": dataset_props.description,
            },
        )

        hasparts = []

        if dataset_props.staging_path is not None:
            staging_entity = m.ContextEntity(
                crate=crate,
                identifier=f"{dataset_props.name}-staging",
                properties={
                    "@type": "Dataset",
                    "name": f"{dataset_props.name} (Staging)",
                    "url": f"{dataset_props.staging_path}",
                    "encodingFormat": "application/x-duckdb",  # TODO: add format from project metadata
                },
            )
            crate.add(staging_entity)
            hasparts.append({"@id": staging_entity.id})

        if dataset_props.publish_path is not None:
            publish_entity = m.ContextEntity(
                crate=crate,
                identifier=f"{dataset_props.name}-publish",
                properties={
                    "@type": "Dataset",
                    "name": f"{dataset_props.name} (Publish)",
                    "url": f"{dataset_props.publish_path}",
                    "encodingFormat": "application/x-duckdb",  # TODO: add format from project metadata
                },
            )
            crate.add(publish_entity)
            hasparts.append({"@id": publish_entity.id})

        data_ctx_entity = m.ContextEntity(
            crate=crate,
            identifier=f"{dataset_props.name}",
            properties={
                "@type": "Dataset",
                "name": f"{dataset_props.name}",
                "description": dataset_props.description,
                "hasPart": hasparts,
            },
        )

        crate.add(data_ctx_entity)

    #
    # Access resources
    #

    source_data = {}
    source_data["source"] = access["source"].copy()
    source_data["source"]["type"] = source_data["source"]["type"].lower()
    source_data["source"]["credentials"] = access["credentials"]
    source_data["extract_config"] = (
        access["extract_config"] if "extract_config" in access else None
    )
    access_source = s.SourceConnectionModel(**source_data)
    crate.add_file(
        source=access_resource_path,
        dest_path="access/access.toml",
        properties={"name": access_source.source.type},
    )

    log.info(
        msg="[cyan]Validated and added access descriptor file[/cyan] - [bold magenta]access/access.toml[/bold magenta]",
    )

    ###############################################################################
    # 5 Finalise Crate
    ###############################################################################
    crate.name = project_props.name
    crate.description = project_props.description
    crate.license = s.CrateMeta.License
    crate.publisher = m.ContextEntity(
        crate,
        identifier=s.CrateMeta.Publisher,
        properties={
            "@type": "Organisation",
            "name": "LSC SDE",
            "url": repo_props.codeRepository,
        },
    )
    crate.mainEntity = project_entity

    ###############################################################################
    # 6 Process and render all action entities
    ###############################################################################

    #
    # Check for actions
    #

    for action in governance["actions"]:
        if action["type"] == "CreateAction":
            action_props = s.CreateActionProps(**action)
        elif action["type"] == "AssessAction":
            action_props = s.AssessActionProps(**action)

        crate.add_action(
            instrument=action_props.instrument,
            identifier=action_props.id,
            result=[item.model_dump() for item in action_props.result],
            properties={
                "@type": action_props.type,
                "name": action_props.name,
                "startTime": action_props.start_time.isoformat(),
                "endTime": action_props.end_time.isoformat(),
                "actionStatus": action_props.action_status,
                "agent": action_props.agent,
            },
        )

    ###############################################################################
    # 7 Add Ro-crate meta to bagit directory structure
    ###############################################################################
    if not dryrun:
        bagit_dir = Path("./bagit")

        if bagit_dir.exists() and bagit_dir.is_dir():
            bag = bagit.Bag(str(bagit_dir))

            # Update bag info from config.toml; This does not modify the External-Identifier.
            # Delete and recreate the bag if the External-Identifier needs to be changed.
            bag.info.update(**config["bagit-info"])
            log.info("Loaded existing bag")
        else:
            bag = init_bag(
                project_id=project_props.id, bagit_dir=bagit_dir, config=config
            )

        crate.write(bagit_dir / "data")
        bag.save(manifests=True)

        n_payload_files = len(list(bag.payload_files()))
        log.info(
            f"[cyan]RO-Crate BagIt created at[/cyan] - [bold magenta]{bagit_dir} with {n_payload_files} files.[/bold magenta]",
        )
    else:
        log.warning(
            "[bold red]Dry run option set. Crate will not be written to disk.[/bold red]\n"
        )

    print_crate(crate=crate)

Validate Project Metadata

Validate the contents of a Bagit directory containing an RO-Crate data directory.

Parameters:

Name Type Description Default
agent str

The agent label triggering the validation. Defaults to None.

None
bagit_dir Path

The Bagit directory containing the RO-Crate data directory. Defaults to "./bagit".

'./bagit'
resources_dir Path

The directory containing resources to include in the RO-Crate. Defaults to "./resources".

'./resources'

This function performs the following: - Validates the contents of the specified Bagit directory and its RO-Crate data directory. - Validates access and governance metadata resources. - Rebuilds the Bagit contents, including the RO-Crate metadata.

Example usage

cr8tor validate -b -i

Source code in src/cr8tor/cli/validate.py
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
@app.command(name="validate")
def validate(
    agent: Annotated[
        str,
        typer.Option(default="-a", help="The agent label triggering the validation."),
    ] = None,
    bagit_dir: Annotated[
        Path,
        typer.Option(
            default="-b", help="Bagit directory containing RO-Crate data directory"
        ),
    ] = "./bagit",
    resources_dir: Annotated[
        Path,
        typer.Option(
            default="-i", help="Directory containing resources to include in RO-Crate."
        ),
    ] = "./resources",
):
    """
    Validate the contents of a Bagit directory containing an RO-Crate data directory.

    Args:
        agent (str): The agent label triggering the validation. Defaults to None.
        bagit_dir (Path): The Bagit directory containing the RO-Crate data directory.
                          Defaults to "./bagit".
        resources_dir (Path): The directory containing resources to include in the RO-Crate.
                              Defaults to "./resources".

    This function performs the following:
    - Validates the contents of the specified Bagit directory and its RO-Crate data directory.
    - Validates access and governance metadata resources.
    - Rebuilds the Bagit contents, including the RO-Crate metadata.

    Example usage:
        cr8tor validate -b <path-to-bagit-dir> -i <path-to-resources-dir>
    """

    if agent is None:
        agent = os.getenv("AGENT_USER")

    exit_msg = "Validation complete"
    exit_code = schemas.Cr8torReturnCode.SUCCESS

    start_time = datetime.now()
    access_resource_path = resources_dir.joinpath("access", "access.toml")
    project_resource_path = resources_dir.joinpath("governance", "project.toml")
    project_dict = project_resources.read_resource_entity(
        project_resource_path, "project"
    )
    project_info = schemas.ProjectProps(**project_dict)

    current_rocrate_graph = proj_graph.ROCrateGraph(bagit_dir)
    if not current_rocrate_graph.is_project_action_complete(
        command_type=schemas.Cr8torCommandType.CREATE,
        action_type=schemas.RoCrateActionType.CREATE,
        project_id=project_info.id,
    ):
        cli_utils.close_assess_action_command(
            command_type=schemas.Cr8torCommandType.VALIDATE,
            start_time=start_time,
            project_id=project_info.id,
            agent=agent,
            project_resource_path=project_resource_path,
            resources_dir=resources_dir,
            exit_msg="The create command must be run on the target project before validation",
            exit_code=schemas.Cr8torReturnCode.ACTION_WORKFLOW_ERROR,
            instrument=os.getenv("METADATA_NAME"),
        )

    for dataset_meta_file in resources_dir.joinpath("metadata").glob("dataset_*.toml"):
        try:
            access = project_resources.read_resource(access_resource_path)
            dataset_meta = project_resources.read_resource(dataset_meta_file)
            source_data = {}
            source_data["source"] = access["source"].copy()
            source_data["source"]["type"] = source_data["source"]["type"].lower()
            source_data["source"]["credentials"] = access["credentials"]
            source_data["extract_config"] = (
                access["extract_config"] if "extract_config" in access else None
            )
            access_contract = schemas.DataContractValidateRequest(
                project_name=project_dict["project_name"],
                project_start_time=project_dict["project_start_time"],
                destination=project_dict["destination"],
                source=source_data["source"],
                extract_config=source_data.get("extract_config"),
                dataset=schemas.DatasetMetadata(**dataset_meta),
            )
            metadata = asyncio.run(api.validate_access(access_contract))
            validate_dataset_info = schemas.DatasetMetadata(**metadata)

        except Exception as e:
            cli_utils.close_assess_action_command(
                command_type=schemas.Cr8torCommandType.VALIDATE,
                start_time=start_time,
                project_id=project_info.id,
                agent=agent,
                project_resource_path=project_resource_path,
                resources_dir=resources_dir,
                exit_msg=f"{str(e)}",
                exit_code=schemas.Cr8torReturnCode.UNKNOWN_ERROR,
                instrument=os.getenv("METADATA_NAME"),
            )

        is_valid, err = verify_tables_metadata(
            validate_dataset_info.tables, access_contract.dataset.tables
        )
        if not is_valid:
            exit_msg = err
            exit_code = schemas.Cr8torReturnCode.VALIDATION_ERROR
            break

        merge_metadata_into_dataset(dataset_meta_file, validate_dataset_info)
    #
    # This assumes validate can be run multiple times on a project
    # Ensures previous run entities for this action are cleared in "actions" before
    # actions is updated with the new action entity
    #

    cli_utils.close_assess_action_command(
        command_type=schemas.Cr8torCommandType.VALIDATE,
        start_time=start_time,
        project_id=project_info.id,
        agent=agent,
        project_resource_path=project_resource_path,
        resources_dir=resources_dir,
        exit_msg=exit_msg,
        exit_code=exit_code,
        instrument=os.getenv("METADATA_NAME"),
        additional_type="Semantic Validation",
    )

Approval Workflow Commands

Sign-Off Project

Logs sign-off metadata in the RO-Crate and verifies project sign-off in the approvals management platform (e.g., GitHub).

Parameters:

Name Type Description Default
agreement_url str

URL to the project sign-off event (e.g., PR event in the project's GitHub history).

required
signing_entity str

The entity that agreed to sign off the project request.

required
agent str

The agent label triggering the validation. Defaults to None.

None
bagit_dir Path

Path to the Bagit directory containing the RO-Crate data directory. Defaults to "./bagit".

'./bagit'
resources_dir Path

Path to the directory containing resources to include in the RO-Crate. Defaults to "./resources".

'./resources'

This command performs the following actions: - Updates the project approvals metadata in the RO-Crate. - Verifies the project sign-off in the approvals management platform.

Example usage

cr8tor sign-off -agreement -signing-entity -a -b -i

Source code in src/cr8tor/cli/sign_off.py
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
@app.command(name="sign-off")
def sign_off(
    agreement_url: Annotated[
        str,
        typer.Option(
            default="-agreement",
            help="URL to the project sign off event (i.e. PR event in project github history)",
        ),
    ],
    signing_entity: Annotated[
        str,
        typer.Option(
            default="-signing-entity",
            help="Entity that agreed to sign off the project request.",
        ),
    ],
    agent: Annotated[
        str,
        typer.Option(default="-a", help="The agent label triggering the validation."),
    ] = None,
    bagit_dir: Annotated[
        Path,
        typer.Option(
            default="-b", help="Bagit directory containing RO-Crate data directory"
        ),
    ] = "./bagit",
    resources_dir: Annotated[
        Path,
        typer.Option(
            default="-i", help="Directory containing resources to include in RO-Crate."
        ),
    ] = "./resources",
):
    """
    Logs sign-off metadata in the RO-Crate and verifies project sign-off in the approvals management platform (e.g., GitHub).

    Args:
        agreement_url (str): URL to the project sign-off event (e.g., PR event in the project's GitHub history).
        signing_entity (str): The entity that agreed to sign off the project request.
        agent (str): The agent label triggering the validation. Defaults to None.
        bagit_dir (Path): Path to the Bagit directory containing the RO-Crate data directory. Defaults to "./bagit".
        resources_dir (Path): Path to the directory containing resources to include in the RO-Crate. Defaults to "./resources".

    This command performs the following actions:
    - Updates the project approvals metadata in the RO-Crate.
    - Verifies the project sign-off in the approvals management platform.

    Example usage:
        cr8tor sign-off -agreement <url_to_approved_policy> -signing-entity <entity_name> -a <agent_label> -b <bagit_dir> -i <resources_dir>
    """

    if agent is None:
        agent = os.getenv("APP_NAME")

    start_time = datetime.now()
    project_resource_path = resources_dir.joinpath("governance", "project.toml")
    project_dict = project_resources.read_resource_entity(
        project_resource_path, "project"
    )
    project_info = s.ProjectProps(**project_dict)

    if not bagit_dir.exists():
        cli_utils.exit_command(
            s.Cr8torCommandType.SIGN_OFF,
            s.Cr8torReturnCode.ACTION_EXECUTION_ERROR,
            f"Missing bagit directory at: {bagit_dir}",
        )

    current_rocrate_graph = proj_graph.ROCrateGraph(bagit_dir)

    if not current_rocrate_graph.is_project_action_complete(
        command_type=s.Cr8torCommandType.VALIDATE,
        action_type=s.RoCrateActionType.ASSESS,
        project_id=project_info.id,
    ):
        cli_utils.close_assess_action_command(
            command_type=s.Cr8torCommandType.SIGN_OFF,
            start_time=start_time,
            project_id=project_info.id,
            agent=agent,
            project_resource_path=project_resource_path,
            resources_dir=resources_dir,
            exit_msg="The project must be validated before sign off / approval",
            exit_code=s.Cr8torReturnCode.ACTION_WORKFLOW_ERROR,
            instrument=f"{signing_entity}",
            additional_type="Sign off",
        )
    #
    # Should we verify that the approved PR URI exists here?
    #

    cli_utils.close_assess_action_command(
        command_type=s.Cr8torCommandType.SIGN_OFF,
        start_time=start_time,
        project_id=project_info.id,
        agent=agent,
        project_resource_path=project_resource_path,
        resources_dir=resources_dir,
        exit_msg="Sign off complete",
        exit_code=s.Cr8torReturnCode.SUCCESS,
        instrument=f"{signing_entity}",
        additional_type="Sign off",
        result=[{"@id": agreement_url}],
    )

Disclosure Check

Logs disclosure metadata in the RO-Crate and verifies project disclosure in the approvals management platform (e.g., GitHub).

Parameters:

Name Type Description Default
agreement_url str

URL to the project disclosure event (e.g., PR event in the project's GitHub history).

required
signing_entity str

The entity that completed the disclosure check.

required
agent str

The agent label triggering the validation. Defaults to None.

None
bagit_dir Path

Path to the Bagit directory containing the RO-Crate data directory. Defaults to "./bagit".

'./bagit'
resources_dir Path

Path to the directory containing resources to include in the RO-Crate. Defaults to "./resources".

'./resources'

This command performs the following actions: - Updates the project approvals metadata in the RO-Crate. - Verifies the project disclosure in the approvals management platform.

Example usage

cr8tor disclosure -agreement -signing-entity -a -b -i

Source code in src/cr8tor/cli/disclosure.py
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
@app.command(name="disclosure")
def disclosure(
    agreement_url: Annotated[
        str,
        typer.Option(
            default="-agreement",
            help="URL to disclosure action (i.e. PR event in project github history)",
        ),
    ],
    signing_entity: Annotated[
        str,
        typer.Option(
            default="-signing-entity",
            help="Entity that completed disclosure check",
        ),
    ],
    agent: Annotated[
        str,
        typer.Option(default="-a", help="The agent label triggering the validation."),
    ] = None,
    bagit_dir: Annotated[
        Path,
        typer.Option(
            default="-b", help="Bagit directory containing RO-Crate data directory"
        ),
    ] = "./bagit",
    resources_dir: Annotated[
        Path,
        typer.Option(
            default="-i", help="Directory containing resources to include in RO-Crate."
        ),
    ] = "./resources",
):
    """
    Logs disclosure metadata in the RO-Crate and verifies project disclosure in the approvals management platform (e.g., GitHub).

    Args:
        agreement_url (str): URL to the project disclosure event (e.g., PR event in the project's GitHub history).
        signing_entity (str): The entity that completed the disclosure check.
        agent (str, optional): The agent label triggering the validation. Defaults to None.
        bagit_dir (Path): Path to the Bagit directory containing the RO-Crate data directory. Defaults to "./bagit".
        resources_dir (Path): Path to the directory containing resources to include in the RO-Crate. Defaults to "./resources".

    This command performs the following actions:
    - Updates the project approvals metadata in the RO-Crate.
    - Verifies the project disclosure in the approvals management platform.

    Example usage:
        cr8tor disclosure -agreement <url_to_disclosure_event> -signing-entity <entity_name> -a <agent_label> -b <bagit_dir> -i <resources_dir>
    """

    if agent is None:
        agent = os.getenv("AGENT_USER")

    start_time = datetime.now()
    project_resource_path = resources_dir.joinpath("governance", "project.toml")
    project_dict = project_resources.read_resource_entity(
        project_resource_path, "project"
    )
    project_info = s.ProjectProps(**project_dict)

    if not bagit_dir.exists():
        cli_utils.exit_command(
            s.Cr8torCommandType.DISCLOSURE_CHECK,
            s.Cr8torReturnCode.ACTION_EXECUTION_ERROR,
            f"Missing bagit directory at: {bagit_dir}",
        )

    current_rocrate_graph = proj_graph.ROCrateGraph(bagit_dir)
    if not current_rocrate_graph.is_project_action_complete(
        command_type=s.Cr8torCommandType.STAGE_TRANSFER,
        action_type=s.RoCrateActionType.CREATE,
        project_id=project_info.id,
    ):
        cli_utils.close_assess_action_command(
            command_type=s.Cr8torCommandType.DISCLOSURE_CHECK,
            start_time=start_time,
            project_id=project_info.id,
            agent=agent,
            project_resource_path=project_resource_path,
            resources_dir=resources_dir,
            exit_msg="The project data must be staged before disclosure checks can be completed.",
            exit_code=s.Cr8torReturnCode.ACTION_WORKFLOW_ERROR,
            instrument=f"{signing_entity}",
            additional_type="Discloure Check",
        )

    #
    # Should we verify that the disclosure PR ?
    #

    cli_utils.close_assess_action_command(
        command_type=s.Cr8torCommandType.DISCLOSURE_CHECK,
        start_time=start_time,
        project_id=project_info.id,
        agent=agent,
        project_resource_path=project_resource_path,
        resources_dir=resources_dir,
        exit_msg="Disclosure checks complete",
        exit_code=s.Cr8torReturnCode.SUCCESS,
        instrument=f"{signing_entity}",
        additional_type="Disclosure Check",
        result=[{"@id": agreement_url}],
    )

Data Transfer Commands

Stage Data Transfer

Stages the data by transferring it from the specified source to the sink TRE.

Parameters:

Name Type Description Default
agent str

The agent label triggering the validation. Defaults to None.

None
bagit_dir Path

Path to the Bagit directory containing the RO-Crate data directory. Defaults to "./bagit".

'./bagit'
resources_dir Path

Path to the directory containing resources to include in the RO-Crate. Defaults to "./resources".

'./resources'

This function prepares the data transfer for the specified CR8 project by: - Validating the current RO-Crate graph. - Ensuring that all necessary resources are included.

Example usage

cr8tor stage-transfer -a agent_label -b path-to-bagit-dir -i path-to-resources-dir

Source code in src/cr8tor/cli/stage_transfer.py
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
@app.command(name="stage-transfer")
def stage_transfer(
    agent: Annotated[
        str,
        typer.Option(default="-a", help="The agent label triggering the validation."),
    ] = None,
    bagit_dir: Annotated[
        Path,
        typer.Option(
            default="-b", help="Bagit directory containing RO-Crate data directory"
        ),
    ] = "./bagit",
    resources_dir: Annotated[
        Path,
        typer.Option(
            default="-i", help="Directory containing resources to include in RO-Crate."
        ),
    ] = "./resources",
):
    """
    Stages the data by transferring it from the specified source to the sink TRE.

    Args:
        agent (str): The agent label triggering the validation. Defaults to None.
        bagit_dir (Path): Path to the Bagit directory containing the RO-Crate data directory.
                          Defaults to "./bagit".
        resources_dir (Path): Path to the directory containing resources to include in the RO-Crate.
                              Defaults to "./resources".

    This function prepares the data transfer for the specified CR8 project by:
    - Validating the current RO-Crate graph.
    - Ensuring that all necessary resources are included.

    Example usage:
        cr8tor stage-transfer -a agent_label -b path-to-bagit-dir -i path-to-resources-dir
    """

    if agent is None:
        agent = os.getenv("AGENT_USER")

    exit_msg = "Staging transfer complete"
    exit_code = schemas.Cr8torReturnCode.SUCCESS
    staging_results = []
    start_time = datetime.now()

    project_resource_path = resources_dir.joinpath("governance", "project.toml")
    access_resource_path = resources_dir.joinpath("access", "access.toml")
    project_info = project_resources.read_resource(project_resource_path)

    if not bagit_dir.exists():
        cli_utils.exit_command(
            schemas.Cr8torCommandType.DISCLOSURE_CHECK,
            schemas.Cr8torReturnCode.ACTION_EXECUTION_ERROR,
            f"Missing bagit directory at: {bagit_dir}",
        )

    current_rocrate_graph = proj_graph.ROCrateGraph(bagit_dir)

    if not current_rocrate_graph.is_project_action_complete(
        command_type=schemas.Cr8torCommandType.SIGN_OFF,
        action_type=schemas.RoCrateActionType.ASSESS,
        project_id=project_info["project"]["id"],
    ):
        cli_utils.close_create_action_command(
            command_type=schemas.Cr8torCommandType.STAGE_TRANSFER,
            start_time=start_time,
            project_id=project_info["project"]["id"],
            agent=agent,
            project_resource_path=project_resource_path,
            resources_dir=resources_dir,
            exit_msg="The data project must have sign-off before staging the data transfer",
            exit_code=schemas.Cr8torReturnCode.ACTION_WORKFLOW_ERROR,
            instrument=os.getenv("PUBLISH_NAME"),
        )

    for dataset_meta_file in resources_dir.joinpath("metadata").glob("dataset*.toml"):
        dataset_dict = project_resources.read_resource(dataset_meta_file)
        dataset_props = schemas.DatasetMetadata(**dataset_dict)

        try:
            access = project_resources.read_resource(access_resource_path)
            source_data = {}
            source_data["source"] = access["source"].copy()
            source_data["source"]["type"] = source_data["source"]["type"].lower()
            source_data["source"]["credentials"] = access["credentials"]
            source_data["extract_config"] = (
                access["extract_config"] if "extract_config" in access else None
            )
            access_contract = schemas.DataContractTransferRequest(
                project_name=project_info["project"]["project_name"],
                project_start_time=project_info["project"]["project_start_time"],
                destination=project_info["project"]["destination"],
                source=source_data["source"],
                dataset=dataset_props,
            )

            resp_dict = asyncio.run(api.stage_transfer(access_contract))
            resp_dict["destination_type"] = project_info["project"]["destination"][
                "type"
            ]
            validate_resp = schemas.StageTransferPayload(**resp_dict)

            # TODO: Handle multiple staging locations
            # TODO: Add error response handler for action error property

            if validate_resp.data_retrieved:
                staging_location_dict = validate_resp.data_retrieved[0].model_dump()
                staging_location_dict["@id"] = str(uuid.uuid4())

                staging_results.append(staging_location_dict)

                project_resources.create_resource_entity(
                    dataset_meta_file, "staging_path", staging_location_dict
                )

        except Exception as e:
            cli_utils.close_create_action_command(
                command_type=schemas.Cr8torCommandType.STAGE_TRANSFER,
                start_time=start_time,
                project_id=project_info["project"]["id"],
                agent=agent,
                project_resource_path=project_resource_path,
                resources_dir=resources_dir,
                exit_msg=f"{str(e)}",
                exit_code=schemas.Cr8torReturnCode.UNKNOWN_ERROR,
                instrument=os.getenv("PUBLISH_NAME"),
            )

    cli_utils.close_create_action_command(
        command_type=schemas.Cr8torCommandType.STAGE_TRANSFER,
        start_time=start_time,
        project_id=project_info["project"]["id"],
        agent=agent,
        project_resource_path=project_resource_path,
        resources_dir=resources_dir,
        exit_msg=exit_msg,
        exit_code=exit_code,
        instrument=os.getenv("PUBLISH_NAME"),
        result=staging_results,
    )

Publish Data

Publishes the data by transferring it from staging to production storage, making it accessible to a TRE and/or authorised TRE workspace.

Parameters:

Name Type Description Default
agent str

The agent label triggering the validation. Defaults to None.

None
bagit_dir Path

Path to the Bagit directory containing the RO-Crate data directory. Defaults to "./bagit".

'./bagit'
resources_dir Path

Path to the directory containing resources to include in the RO-Crate. Defaults to "./resources".

'./resources'

This command performs the following actions: - Transfers the staged data to production storage. - Ensures the data is accessible to the TRE or authorised TRE workspace.

Example usage

cr8tor publish -a -b -i

Source code in src/cr8tor/cli/publish.py
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
@app.command(name="publish")
def publish(
    agent: Annotated[
        str,
        typer.Option(default="-a", help="The agent label triggering the validation."),
    ] = None,
    bagit_dir: Annotated[
        Path,
        typer.Option(
            default="-b", help="Bagit directory containing RO-Crate data directory"
        ),
    ] = "./bagit",
    resources_dir: Annotated[
        Path,
        typer.Option(
            default="-i", help="Directory containing resources to include in RO-Crate."
        ),
    ] = "./resources",
):
    """
    Publishes the data by transferring it from staging to production storage, making it accessible to a TRE and/or authorised TRE workspace.

    Args:
        agent (str): The agent label triggering the validation. Defaults to None.
        bagit_dir (Path): Path to the Bagit directory containing the RO-Crate data directory. Defaults to "./bagit".
        resources_dir (Path): Path to the directory containing resources to include in the RO-Crate. Defaults to "./resources".

    This command performs the following actions:
    - Transfers the staged data to production storage.
    - Ensures the data is accessible to the TRE or authorised TRE workspace.

    Example usage:
        cr8tor publish -a <agent_label> -b <path-to-bagit-dir> -i <path-to-resources-dir>
    """
    if agent is None:
        agent = os.getenv("AGENT_USER")

    exit_msg = "Publish complete"
    exit_code = schemas.Cr8torReturnCode.SUCCESS
    publish_results = []
    start_time = datetime.now()

    project_resource_path = resources_dir.joinpath("governance", "project.toml")
    project_info = project_resources.read_resource(project_resource_path)

    if not bagit_dir.exists():
        cli_utils.exit_command(
            schemas.Cr8torCommandType.DISCLOSURE_CHECK,
            schemas.Cr8torReturnCode.ACTION_EXECUTION_ERROR,
            f"Missing bagit directory at: {bagit_dir}",
        )

    current_rocrate_graph = proj_graph.ROCrateGraph(bagit_dir)
    if not current_rocrate_graph.is_project_action_complete(
        command_type=schemas.Cr8torCommandType.DISCLOSURE_CHECK,
        action_type=schemas.RoCrateActionType.ASSESS,
        project_id=project_info["project"]["id"],
    ):
        cli_utils.close_create_action_command(
            command_type=schemas.Cr8torCommandType.PUBLISH,
            start_time=start_time,
            project_id=project_info["project"]["id"],
            agent=agent,
            project_resource_path=project_resource_path,
            resources_dir=resources_dir,
            exit_msg="The data project must have disclosure completed before publishing",
            exit_code=schemas.Cr8torReturnCode.ACTION_WORKFLOW_ERROR,
            instrument=os.getenv("PUBLISH_NAME"),
        )

    dataset_meta_file = None

    # TODO: Discuss with Piotr whether the publish function should be called per dataset or per project?
    # Currently assumes 1 dataset file in metadata

    try:
        for f in resources_dir.joinpath("metadata").glob("dataset*.toml"):
            dataset_meta_file = f
            break

        publish_req = schemas.DataContractPublishRequest(
            project_name=project_info["project"]["project_name"],
            project_start_time=project_info["project"]["project_start_time"],
            destination=project_info["project"]["destination"],
        )

        resp_dict = asyncio.run(api.publish(publish_req))
        resp_dict["destination_type"] = project_info["project"]["destination"]["type"]
        validate_resp = schemas.PublishPayload(**resp_dict)
        if validate_resp.data_published:
            publish_location_dict = validate_resp.data_published[0].model_dump()
            publish_location_dict["@id"] = str(uuid.uuid4())

            publish_results.append(publish_location_dict)

            project_resources.create_resource_entity(
                dataset_meta_file, "publish_path", publish_location_dict
            )

    except Exception as e:
        cli_utils.close_create_action_command(
            command_type=schemas.Cr8torCommandType.PUBLISH,
            start_time=start_time,
            project_id=project_info["project"]["id"],
            agent=agent,
            project_resource_path=project_resource_path,
            resources_dir=resources_dir,
            exit_msg=f"{str(e)}",
            exit_code=schemas.Cr8torReturnCode.UNKNOWN_ERROR,
            instrument=os.getenv("PUBLISH_NAME"),
        )

    cli_utils.close_create_action_command(
        command_type=schemas.Cr8torCommandType.PUBLISH,
        start_time=start_time,
        project_id=project_info["project"]["id"],
        agent=agent,
        project_resource_path=project_resource_path,
        resources_dir=resources_dir,
        exit_msg=exit_msg,
        exit_code=exit_code,
        instrument=os.getenv("PUBLISH_NAME"),
        result=publish_results,
    )

Command Workflow

The CR8TOR commands follow a specific sequence in the data access workflow:

  1. initiate - Creates a new DAR project repository from cookiecutter template
  2. create - Initializes the project with unique identifiers and basic metadata
  3. build - Builds the BagIt RO-Crate package containing project metadata
  4. validate - Validates data source connections and retrieves metadata
  5. sign-off - Records approval for the validated data request
  6. stage-transfer - Transfers data from source to staging storage
  7. disclosure - Records disclosure approval for staged data
  8. publish - Moves data from staging to production storage
Command Dependencies

Each command typically depends on the successful completion of previous commands in the workflow. The CLI validates these dependencies and will exit with an error if prerequisite steps are missing.