The update includes engines for Apache Spark 3.3.0 and Python 3.10. Both engines have performance improvements and bug fixes, with Spark providing features like improved error messages and row-level runtime filtering. The Cloud Shuffle Service for Spark, the Ray compute framework, and Adaptive Query Execution are supported by new engine plugins in Glue 4.0. Also included is support for the Python-based Pandas data analysis and manipulation tool. Delta Lake, Apache Iceberg, and Apache Hudi all now have new data format support. The Parquet vectorized reader, with support for additional encodings and data types, is also part of Glue 4.0.
Read More: AWS Glue upgrades Spark engines, backs Ray framework