使用 DALI 库时,数据处理任务都时基于 Pipeline 实现的. Pipeline 对象是 nvidia.dali.Pipeline 的实例,或衍生类.

DALI Pipeline 的定义有如下方式:

[1] - 采用 DALI 内置 operators 来定义函数,并用 pipeline_def() 装饰器进行装饰;

[2] - 直接实例化 Pipeline 对象,构建图并通过 Pipeline.set_outputs() 进行定义;

[3] - 继承 Pipeline 类,并重写 Pipeline.define_graph(). (这是 DALI Pipelines 定义的遗留方式.)

1. 数据处理 Graphs

DALI Pipeline 被表示为操作子组成的图(graph),图中由两种节点(node):

[1] - Operators - 调用的操作子定义

[2] - Data node(DataNode) - 表示 operators 的输入和输出.

如:

#create a pipeline with processing graph defined by the function below
@pipeline_def 
def segm_pipeline():
    """
    Create a pipeline which reads images and masks, decodes the images and returns them. 
    """
    img_files, labels = fn.readers.file(file_root="image_dir", seed=1)
    mask_files, _ = fn.readers.file(file_root="mask_dir", seed=1)
    images = fn.decoders.image(img_files, device="mixed")
    masks  = fn.decoders.image(mask_files, device="mixed")
    return images, masks, labels

#
pipe = my_pipeline(batch_size=4, num_threads=2, device_id=0)
pipe.build()

得到的图如:

2. nvidia.dali.Pipeline

Class nvidia.dali.Pipeline(batch_size=-1, 
                          num_threads=-1, 
                          device_id=-1, 
                          seed=-1, 
                          exec_pipelined=True, 
                          prefetch_queue_depth=2, 
                          exec_async=True, 
                          bytes_per_sample=0, 
                          set_affinity=False, 
                          max_streams=-1, 
                          default_cuda_stream_priority=0, 
                          *, 
                          enable_memory_stats=False, 
                          py_num_workers=1, 
                          py_start_method='fork')

具体参数说明可见: nvidia.dali.Pipeline

Last modification:May 9th, 2021 at 04:30 pm