How to write a Spark dataframe to Excel
Install xlsxwriter
, convert the frame to Pandas, write it as Excel. Due to limitations in the local file API (i.e. not supporting random writes), you’ll have to write the file to temporary storage first, before moving it to somewhere useful (like a storage mount point). When moving the temp file, it looks like dbutils.fs.cp
isn’t able to find it, but shutil.copyfile
can. π€·π»ββοΈ
%pip install xlsxwriter
from shutil import copyfile
dfs = spark.read.parquet('dbfs:/mnt/in/myveryimportantdata.parquet')
df = dfs.toPandas()
df.to_excel('/local_disk0/tmp/data.xlsx', engine='xlsxwriter')
copyfile('/local_disk0/tmp/data.xlsx', '/dbfs/mnt/out/data.xlsx')
– via XlsxWriter Github and Databricks Docs
How to enable change data feed on a mount point (instead of on a table)
That’s assuming your mount point supports delta tables, i.e. it points to an ADLS gen 2 or something similar.
ALTER TABLE delta.`/mnt/your/path` SET TBLPROPERTIES (delta.enableChangeDataFeed=true)
SHOW TBLPROPERTIES delta.`/mnt/your/path`
DESCRIBE HISTORY delta.`/mnt/your/path`
–via blood, sweat, tears, and this link