Reading a file into a dataframe using PySpark in Databricks

Reading a file into a dataframe using PySpark in Databricks

By Ayush Aryan May 19, 2021

The very first thing we need is a file. For this demo, this is our sample file :

A comma-separated file with 3 columns and 3 rows.

So now that we have that, the first step would be to actually upload the file over to databricks.

Get the cluster running.

Go to Data tab

Click on "Create Table"

Drag the file to the specified space.

At this point, the file will be uploaded to the DBFS storage. After this, we can work with the file.

Copy the location of the file. (Here: /FileStore/tables/SampleData-1.csv)

Now, let's get back to the notebook to work with the file.

Creating a dataframe from the file

dataframe_var=spark.read.option("header","true").csv('/FileStore/tables/SampleData-1.csv')

Display the dataframe

We can see that the file data is successfully displayed in the dataframe.

We will take a look at how to load files to tables in the next blog.

Comments