Integrate with Spark NLP¶
Comet integrates with Spark NLP.
Spark NLP is a state-of-the-art Natural Language Processing (NLP) library built on top of Apache Spark. It provides simple, performant and accurate NLP annotations for machine learning pipelines that scale easily in a distributed environment.
import sparknlp
from sparknlp.base import *
from sparknlp.annotator import *
from sparknlp.logging.comet import CometLogger
spark = sparknlp.start()
logger = CometLogger()
## Your training code
Log automatically¶
SparkNLP ships with a dedicated CometLogger
and can automatically track the following experiment data:
- Model Metrics
Comet can automatically monitor model training metrics from your PySpark
pipelines.
Configuring the CometLogger
for SparkNLP¶
Find more information about the CometLogger
in the SparkNLP documentation
End-to-end Example¶
import sparknlp
from sparknlp.base import *
from sparknlp.annotator import *
from sparknlp.logging.comet import CometLogger
spark = sparknlp.start()
OUTPUT_LOG_PATH = "./run"
logger = CometLogger()
document = DocumentAssembler() \
.setInputCol("text")\
.setOutputCol("document")
embds = UniversalSentenceEncoder.pretrained() \
.setInputCols("document") \
.setOutputCol("sentence_embeddings")
multiClassifier = MultiClassifierDLApproach() \
.setInputCols("sentence_embeddings") \
.setOutputCol("category") \
.setLabelColumn("labels") \
.setBatchSize(128) \
.setLr(1e-3) \
.setThreshold(0.5) \
.setShufflePerEpoch(False) \
.setEnableOutputLogs(True) \
.setOutputLogsPath(OUTPUT_LOG_PATH) \
.setMaxEpochs(1)
logger.monitor(logdir=OUTPUT_LOG_PATH, model=multiClassifier)
trainDataset = spark.createDataFrame(
[("Nice.", ["positive"]), ("That's bad.", ["negative"])],
schema=["text", "labels"],
)
pipeline = Pipeline(stages=[document, embds, multiClassifier])
pipeline_model = pipeline.fit(trainDataset)
logger.log_pipeline_parameters(pipeline_model)
logger.end()
Try it out!¶
Try our example for using Comet with Spark NLP.
Jan. 17, 2025