There was a scenario, when a query was executed on a delta
table, by joining one or more delta tables, it was taking long time.
The reason for this is there are large number of small parquet files are
created.
It is possible to avoid these large number of small files and compact those
files using the Auto Optimize feature.
For a Delta table we can set the table properties as delta.autoOptimize.optimizeWrite
= true and delta.autoOptimize.autoCompact = true in the CREATE
TABLE command.
This will avoid those large number of small files and compact those.
Reference:
https://docs.microsoft.com/en-us/azure/databricks/delta/optimizations/auto-optimize
No comments:
Post a Comment