Extending Renard
Creating new steps
Usually, steps must implement at least four functions :
PipelineStep.__init__()
: is used to pass options at step init time. Options passed at step init time should be valid for a collection of texts, and not be text specific.PipelineStep.__call__()
: is called at pipeline run time.PipelineStep.needs()
: declares the set of informations needed from the pipeline state by this step. Each returned string should be an attribute ofPipelineState
.PipelineStep.production()
: declares the set of informations produced by this step. As inPipelineStep.needs()
, each returned string should be an attribute ofPipelineState
.
Here is an example of creating a basic tokenization step :
from typing import Dict, Any, Set
from renard.pipeline.core import PipelineStep
class BasicTokenizerStep(PipelineStep):
def __init__(self):
pass
def __call__(self, text: str, **kwargs) -> Dict[str, Any]:
return {"tokens": text.split(" ")}
def needs(self) -> Set[str]:
return {"text"}
def production(self) -> Set[str]:
return {"tokens"}
Additionally, the following methods can be overridden:
PipelineStep.optional_needs()
: specifies optional dependencies the same way asPipelineStep.needs()
.PipelineStep._pipeline_init_()
: is used for pipeline-wide arguments, such as language settings. This method is called at by the pipeline at pipeline run time.PipelineStep.supported_langs()
: declares the set of supported languages as a set of ISO 639-3 codes (or the special value"any"
). By default, will be{"eng"}
.