Table of Contents

How to write a Spark dataframe to Excel

Install xlsxwriter, convert the frame to Pandas, write it as Excel. Due to limitations in the local file API (i.e. not supporting random writes), you’ll have to write the file to temporary storage first, before moving it to somewhere useful (like a storage mount point). When moving the temp file, it looks like dbutils.fs.cp isn’t able to find it, but shutil.copyfile can. 🤷🏻‍♂️

%pip install xlsxwriter

from shutil import copyfile

dfs ='dbfs:/mnt/in/myveryimportantdata.parquet')

df = dfs.toPandas()
df.to_excel('/local_disk0/tmp/data.xlsx', engine='xlsxwriter')

copyfile('/local_disk0/tmp/data.xlsx', '/dbfs/mnt/out/data.xlsx')

– via XlsxWriter Github and Databricks Docs

How to enable change data feed on a mount point (instead of on a table)

That’s assuming your mount point supports delta tables, i.e. it points to an ADLS gen 2 or something similar.

ALTER TABLE delta.`/mnt/your/path` SET TBLPROPERTIES  (delta.enableChangeDataFeed=true)

SHOW TBLPROPERTIES delta.`/mnt/your/path` 

DESCRIBE HISTORY delta.`/mnt/your/path`

–via blood, sweat, tears, and this link