Apache Spark is a unified analytics engine for large-scale data processing. It provides high-level APIs in Scala, Java, Python, and R, and an optimized engine that supports general computation graphs for data analysis. It also supports a rich set of higher-level tools includingSpark SQL
for SQL and DataFrames,pandas API on Spark
for pandas workloads,MLlib
for machine learning,GraphX
for graph processing, andStructured Streaming
for stream processing.
PySpark
DataFrame.
See a usage example.
Spark SQL
.
See a usage example.
InfoSparkSQLTool
: tool for getting metadata about a Spark SQLListSparkSQLTool
: tool for getting tables namesQueryCheckerTool
: tool uses an LLM to check if a query is correctQuerySparkSQLTool
: tool for querying a Spark SQL