A transformer that converts HTML content to plain text.

Example

const loader = new CheerioWebBaseLoader("https://example.com/some-page");
const docs = await loader.load();

const splitter = new RecursiveCharacterTextSplitter({
maxCharacterCount: 1000,
});
const transformer = new HtmlToTextTransformer();

// The sequence of text splitting followed by HTML to text transformation
const sequence = splitter.pipe(transformer);

// Processing the loaded documents through the sequence
const newDocuments = await sequence.invoke(docs);

console.log(newDocuments);

Hierarchy

Constructors

Properties

options: HtmlToTextOptions = {}

Methods

  • Default implementation of batch, which calls invoke N times. Subclasses should override this method if they can batch more efficiently.

    Parameters

    • inputs: Document<Record<string, any>>[][]

      Array of inputs to each batch call.

    • Optional options: Partial<BaseCallbackConfig> | Partial<BaseCallbackConfig>[]

      Either a single call options object to apply to each batch call or an array for each call.

    • Optional batchOptions: RunnableBatchOptions & {
          returnExceptions?: false;
      }

    Returns Promise<Document<Record<string, any>>[][]>

    An array of RunOutputs, or mixed RunOutputs and errors if batchOptions.returnExceptions is set

  • Parameters

    Returns Promise<(Error | Document<Record<string, any>>[])[]>

  • Parameters

    Returns Promise<(Error | Document<Record<string, any>>[])[]>

  • Method to invoke the document transformation. This method calls the transformDocuments method with the provided input.

    Parameters

    • input: Document<Record<string, any>>[]

      The input documents to be transformed.

    • _options: BaseCallbackConfig

      Optional configuration object to customize the behavior of callbacks.

    Returns Promise<Document<Record<string, any>>[]>

    A Promise that resolves to the transformed documents.

  • Create a new runnable sequence that runs each individual runnable in series, piping the output of one runnable into another runnable or runnable-like.

    Type Parameters

    • NewRunOutput

    Parameters

    • coerceable: RunnableLike<Document<Record<string, any>>[], NewRunOutput>

      A runnable, function, or object whose values are functions or runnables.

    Returns RunnableSequence<Document<Record<string, any>>[], Exclude<NewRunOutput, Error>>

    A new runnable sequence.

  • Stream output in chunks.

    Parameters

    Returns Promise<IterableReadableStream<Document<Record<string, any>>[]>>

    A readable stream that is also an iterable.

  • Stream all output from a runnable, as reported to the callback system. This includes all inner runs of LLMs, Retrievers, Tools, etc. Output is streamed as Log objects, which include a list of jsonpatch ops that describe how the state of the run has changed in each step, and the final state of the run. The jsonpatch ops can be applied in order to construct state.

    Parameters

    • input: Document<Record<string, any>>[]
    • Optional options: Partial<BaseCallbackConfig>
    • Optional streamOptions: Omit<LogStreamCallbackHandlerInput, "autoClose">

    Returns AsyncGenerator<RunLogPatch, any, unknown>

  • Default implementation of transform, which buffers input and then calls stream. Subclasses should override this method if they can start producing output while input is still being generated.

    Parameters

    Returns AsyncGenerator<Document<Record<string, any>>[], any, unknown>

  • Transform a list of documents.

    Parameters

    • documents: Document<Record<string, any>>[]

      A sequence of documents to be transformed.

    Returns Promise<Document<Record<string, any>>[]>

    A list of transformed documents.

  • Bind lifecycle listeners to a Runnable, returning a new Runnable. The Run object contains information about the run, including its id, type, input, output, error, startTime, endTime, and any tags or metadata added to the run.

    Parameters

    • params: {
          onEnd?: ((run) => void | Promise<void>);
          onError?: ((run) => void | Promise<void>);
          onStart?: ((run) => void | Promise<void>);
      }

      The object containing the callback functions.

      • Optional onEnd?: ((run) => void | Promise<void>)
          • (run): void | Promise<void>
          • Called after the runnable finishes running, with the Run object.

            Parameters

            Returns void | Promise<void>

      • Optional onError?: ((run) => void | Promise<void>)
          • (run): void | Promise<void>
          • Called if the runnable throws an error, with the Run object.

            Parameters

            Returns void | Promise<void>

      • Optional onStart?: ((run) => void | Promise<void>)
          • (run): void | Promise<void>
          • Called before the runnable starts running, with the Run object.

            Parameters

            Returns void | Promise<void>

    Returns Runnable<Document<Record<string, any>>[], Document<Record<string, any>>[], BaseCallbackConfig>

Generated using TypeDoc