Table of Contents


How to write a Spark dataframe to Excel

Install xlsxwriter, convert the frame to Pandas, write it as Excel. Due to limitations in the local file API (i.e. not supporting random writes), you’ll have to write the file to temporary storage first, before moving it to somewhere useful (like a storage mount point). When moving the temp file, it looks like dbutils.fs.cp isn’t able to find it, but shutil.copyfile can. 🤷🏻‍♂️

%pip install xlsxwriter

from shutil import copyfile

dfs = spark.read.parquet('dbfs:/mnt/in/myveryimportantdata.parquet')

df = dfs.toPandas()
df.to_excel('/local_disk0/tmp/data.xlsx', engine='xlsxwriter')

copyfile('/local_disk0/tmp/data.xlsx', '/dbfs/mnt/out/data.xlsx')

– via XlsxWriter Github and Databricks Docs